Replace matched text with a constant value or the value a function returns.
subn()
template()
_MAXCACHE
_cache
_compile()
_compile_repl
_expand()
_locale()
_pickle()
_special_chars_map
_subx()
flags
Some functions in the re module (such as re.compile()) have a flags parameter which specifies further characteristics of the regular expression. The value of this parameter is added from one or more the following values:
re.A
re.ASCII
(?a)
256
\w etc match ASCII characters only (default is that \w etc. match Unicode characrters)
re.DEBUG
128
print debug information about compiled regexp
re.I
re.IGNORECASE
(?i)
2
match case insensitively
re.L
re.LOCALE
(?L)
4
\w etc. mtach case insensitively depending on the current locale
re.M
re.MULTILINE
(?m)
8
^ matches start of string or character after new line
#!/usr/bin/python
import re
re_number = re.compile('\d+')
for i in ['foo', 'bar 42 baz', 'hello', 'etc', '20' ]:
if re_number.match(i):
print (i + " is a number")
if re_number.search(i):
print (i + " contains a number")
# bar 42 baz contains a number
# 20 is a number
# 20 contains a number
print ("---")
for found in re.findall(r'(\w+)\s+(\d+)', 'foo 42 bar 18 baz 19 x'):
print (found[0] + ': ' + found[1])
# foo: 42
# bar: 18
# baz: 19
print ("---")
print (re.sub(r'\d+', 'XX', 'foo 42 bar 18 baz 19 x'))
# foo XX bar XX baz XX x
#!/usr/bin/python
import re
match = re.search('(\d\d\d|\w\w\w)', 'one 234 five s')
if match:
print (match.group())
# one
print (match.group(1))
# one
else:
print ("didn't match")
In the following example, the pattern contains parantheses. Each match is returned as a tuple where the values of the text matched in the parantheses is captured in the elements of the tuple.
import re
for pair in re.findall('(\w+): (\d+)', 'foo: 42; bar: 99; baz: 0'):
print(pair[0] + ' = ' + pair[1])
search() vs match()
re.search() searches within the entire text while match() only searches from the text's start.
Both, re.search() and re.match() return a re.Match object.
sub
Replace a range
Replace every character between g and p with an asterik.
Note the unintuitive order of parameters: First the pattern, then the replacement and only then the text on which the replacement is to take place.
import re
def double(m):
print(type(m.group(0)))
return str(2 * int(m.group(0)))
print(re.sub(r'\d+', double, 'foo 42 bar 99 baz'))
Iterate over words in a text
The following example iterates over the words in a piece of text and skips punctuation:
import re
txt = """\
Foo, bar and baz. Those three words! Do
new lines work, too? Yes: they do.\
"""
words=re.split('[ .,?;:!\n]+', txt)
for word in words:
print(word)
import re
text = """\
This is the first line.
The second one.
The final one."""
re_1st_line = re.compile('.*')
first_line = re.match(re_1st_line, text)
print(first_line[0])
Using the returned search() object in an if statement
With the walrus operator, it is possible to assign the object that is returned by search() in an if statement:
import re
reNumbers = re.compile('(\d+)')
def getNumber(txt):
if m := reNumbers.search(txt):
print('The extracted number is ' + m.group(1))
else:
print('No number found in ' + txt)
getNumber('hello world')
getNumber('the number is 42, what else?')