Python Regular Expressions (RegEx) Tutorial
Learn how to use Python's built-in re module for working with Regular Expressions (RegEx). Discover how to define and search for patterns in strings with this comprehensive guide.
Python RegEx
A Regular Expression, or RegEx, is a sequence of characters that defines a search pattern. It is commonly used to check if a string contains a specific pattern.
RegEx Module
Python includes a built-in package called re, which you can use to work with Regular Expressions.
Importing the re Module
First, you need to import the re module:
Syntax
import re
Using RegEx in Python
Once you've imported the re module, you can start using regular expressions. Here’s an example:
Example
import re
txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)
RegEx Functions
The re module provides a set of functions to search a string for a match:
findall: Returns a list of all matchessearch: Returns a Match object if there's a matchsplit: Splits the string at each match and returns a listsub: Replaces one or many matches with a string
Metacharacters
Metacharacters are characters with special meanings in RegEx:
| Character | Description | Example |
|---|---|---|
| [] | A set of characters | "[a-m]" |
| \ | Signals a special sequence (or escapes special characters) | "\d" |
| . | Any character (except newline) | "he..o" |
| ^ | Starts with | "^hello" |
| $ | Ends with | "planet$" |
| * | Zero or more occurrences | "he.*o" |
| + | One or more occurrences | "he.+o" |
| ? | Zero or one occurrence | "he.?o" |
| {} | Exactly the specified number of occurrences | "he.{2}o" |
| | | Either or | "falls|stays" |
| () | Capture and group |
Special Sequences
Special sequences are denoted by a backslash (\) followed by a character and have a special meaning:
| Character | Description | Example |
|---|---|---|
| \A | Matches if specified characters are at the beginning of the string | "\AThe" |
| \b | Matches if specified characters are at the beginning or end of a word | r"\bain", r"ain\b" |
| \B | Matches if specified characters are not at the beginning or end of a word | r"\Bain", r"ain\B" |
| \d | Matches any digit (0-9) | "\d" |
| \D | Matches any non-digit | "\D" |
| \s | Matches any whitespace character | "\s" |
| \S | Matches any non-whitespace character | "\S" |
| \w | Matches any word character (letters, digits, and underscore) | "\w" |
| \W | Matches any non-word character | "\W" |
| \Z | Matches if specified characters are at the end of the string | "Spain\Z" |
Sets
A set is a group of characters inside square brackets [] with a special meaning:
| Set | Description |
|---|---|
| [arn] | Matches any one of the specified characters (a, r, or n) |
| [a-n] | Matches any character alphabetically between a and n |
| [^arn] | Matches any character except a, r, and n |
| [0123] | Matches any of the specified digits (0, 1, 2, or 3) |
| [0-9] | Matches any digit between 0 and 9 |
| [0-5][0-9] | Matches any two-digit numbers from 00 to 59 |
| [a-zA-Z] | Matches any character between a and z (case insensitive) |
| [+] | Matches any + character (in sets, +, *, ., |, (), $, and {} have no special meaning) |
The findall() Function
The findall() function returns a list of all matches:
Example
import re
txt = "The rain in Spain"
matches = re.findall("ai", txt)
print(matches)
Output
['ai', 'ai']
The search() Function
The search() function searches the string for a match and returns a Match object if there's a match. Only the first occurrence is returned:
Example
import re
txt = "The rain in Spain"
match = re.search("\s", txt)
print("The first white-space character is located in position:", match.start())
Output
The first white-space character is located in position: 3
The split() Function
The split() function returns a list where the string has been split at each match:
Example
import re
txt = "The rain in Spain"
split_txt = re.split("\s", txt)
print(split_txt)
Output
['The', 'rain', 'in', 'Spain']
The sub() Function
The sub() function replaces the matches with the text of your choice:
Example
import re
txt = "The rain in Spain"
result = re.sub("\s", "9", txt)
print(result)
Output
The9rain9in9Spain
Match Object
A Match Object contains information about the search and the result. If there is no match, None is returned.
Example
import re
txt = "The rain in Spain"
match = re.search("ai", txt)
print(match)
Output
The Match object has methods and properties to extract details about the search result:
.span(): Returns a tuple with the start and end positions of the match.string: Returns the string passed into the function.group(): Returns the part of the string where there was a match
Example
import re
txt = "The rain in Spain"
match = re.search(r"\bS\w+", txt)
print(match.span())
Output
(12, 17)
Example
import re
txt = "The rain in Spain"
match = re.search(r"\bS\w+", txt)
print(match.string)
Output
The rain in Spain
Example
import re
txt = "The rain in Spain"
match = re.search(r"\bS\w+", txt)
print(match.group())
Output
Spain