Regular Expressions
A Regular Expression, also called RegEx, is a sequence of characters that forms a search pattern for text.
The basics about Regular Expressions
Python supports regular expressions with a built-in module called re
. In the example below we will use the re
module to search for a pattern in a string and print the result.
1#!/usr/bin/env python3
2
3import re
4
5def main():
6 txt = "Hello great world"
7 x = re.search("^Hello.*world$", txt)
8 if x:
9 print("YES! We have a match!")
10 else:
11 print("No match")
12
13if __name__ == "__main__":
14 main()
Output:
1YES! We have a match!
Metacharacters
In the previous example we used the ^
and $
metacharacters to search for a pattern at the beginning and end of a string. Beside these two metacharacters there are other metacharacters that we will use in the next examples and are listed in the table below.
Character |
Description |
Example |
---|---|---|
|
A set of characters |
|
|
Signals a special sequence (can also be used to escape special characters) |
|
|
Any character (except newline character) |
|
|
Starts with |
|
|
Ends with |
|
|
Zero or more occurrences |
|
|
One or more occurrences |
|
|
Exactly the specified number of occurrences |
|
|
Either or |
|
|
Capture and group |
Special Sequences
A special sequence is a \
followed by one of the characters in the list below, and has a special meaning:
Character |
Description |
Example |
---|---|---|
|
Returns a match if the specified characters are at the beginning of the string |
“AThe” |
|
Returns a match where the specified characters are at the beginning or at the end of a word(the “r” in the beginning is making sure that the string is being treated as a “raw string”) |
r”bain” r”ainb” |
|
Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word (the “r” in the beginning is making sure that the string is being treated as a “raw string”) |
r”Bain” r”ainB” |
|
Returns a match where the string contains digits (numbers from 0-9) |
“d” |
|
Returns a match where the string DOES NOT contain digits |
“D” |
|
Returns a match where the string contains a white space character |
“s” |
|
Returns a match where the string DOES NOT contain a white space character |
“S” |
|
Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character) |
“w” |
|
Returns a match where the string DOES NOT contain any word characters |
“W” |
|
Returns a match if the specified characters are at the end of the string |
“SpainZ” |
Sets
A set is a set of characters inside a pair of square brackets []
with a special meaning:
Set |
Description |
---|---|
|
Returns a match where one of the specified characters (a, r, or n) are present |
|
Returns a match for any lower case character, alphabetically between a and n |
|
Returns a match for any character EXCEPT a, r, and n |
|
Returns a match where any of the specified digits (0, 1, 2, or 3) are present |
|
Returns a match for any digit between 0 and 9 |
|
Returns a match for any two-digit numbers from 00 and 59 |
|
Returns a match for any character alphabetically between a and z, lower case OR upper case |
|
In sets, |
1#!/usr/bin/env python3
2
3import re
4
5def main():
6 txt = "Hello great world"
7 x = re.findall("[arn]", txt)
8 print(x)
9
10if __name__ == "__main__":
11 main()
Regular Expressions functions
findall()
Returns a list containing all matches
1#!/usr/bin/env python3
2
3import re
4
5def main():
6 txt = "Hello great world"
7 x = re.findall("ea", txt)
8 print(x)
9
10if __name__ == "__main__":
11 main()
search()
Returns a Match object if there is a match anywhere in the string
1#!/usr/bin/env python3
2
3import re
4
5def main():
6 txt = "Hello great world"
7 x = re.search("\s", txt)
8 print("The first white-space character is located in position:", x.start())
9
10if __name__ == "__main__":
11 main()
split()
Returns a list where the string has been split at each match
1#!/usr/bin/env python3
2
3import re
4
5def main():
6 txt = "Hello great world"
7 x = re.split("\s", txt)
8 print(x)
9
10if __name__ == "__main__":
11 main()
sub()
Replaces one or many matches with a string
1#!/usr/bin/env python3
2
3import re
4
5def main():
6 txt = "Hello great world"
7 x = re.sub("\s", "9", txt)
8 print(x)
9
10if __name__ == "__main__":
11 main()
1#!/usr/bin/env python3
2
3import re
4
5def main():
6 txt = "Hello great world"
7 x = re.sub("\s", "9", txt, 2)
8 print(x)
9
10if __name__ == "__main__":
11 main()
matching
A Match Object is an object containing information about the search and the result.
Note
If there is no match, the value None will be returned, instead of the Match Object.
1#!/usr/bin/env python3
2
3import re
4
5def main():
6 txt = "Hello great world"
7 x = re.search("ea", txt)
8 print(x)
9
10if __name__ == "__main__":
11 main()
The Match object has properties and methods used to retrieve information about the search, and the result:
.span()
returns a tuple containing the start-, and end positions of the match..string()
returns the string passed into the function.group()
returns the part of the string where there was a match
span()
1#!/usr/bin/env python3
2
3import re
4
5def main():
6 txt = "Hello great world"
7 x = re.search(r"\bS\w+", txt)
8 print(x.span())
9
10if __name__ == "__main__":
11 main()
string()
1#!/usr/bin/env python3
2
3import re
4
5def main():
6 txt = "Hello great world"
7 x = re.search(r"\bS\w+", txt)
8 print(x.string())
9
10if __name__ == "__main__":
11 main()
group()
1#!/usr/bin/env python3
2
3import re
4
5def main():
6 txt = "Hello great world"
7 x = re.search(r"\bS\w+", txt)
8 print(x.group())
9
10if __name__ == "__main__":
11 main()