Skip to Content
Course content

9.3.1 Pattern matching with re module

Pattern matching is a powerful feature in Python that allows you to search, match, and manipulate strings based on specific patterns. This is done using regular expressions (regex), which define a search pattern. Python’s re module provides functions to perform various operations related to pattern matching.

Key Functions in the re Module

The re module provides several functions to work with regular expressions and perform pattern matching. Here’s a summary of the most commonly used functions for pattern matching:

1. re.match()

The match() function attempts to match a pattern from the beginning of a string. If a match is found at the start, it returns a match object. If no match is found, it returns None.

import re

pattern = r"hello"
text = "hello world"
result = re.match(pattern, text)

if result:
    print("Match found:", result.group())  # Returns 'hello'
else:
    print("No match")

Output:

Match found: hello

2. re.search()

The search() function searches the entire string for the first occurrence of the pattern. Unlike match(), it doesn’t require the match to be at the start of the string.

import re

pattern = r"world"
text = "hello world"
result = re.search(pattern, text)

if result:
    print("Search found:", result.group())  # Returns 'world'
else:
    print("No match")

Output:

Search found: world

3. re.findall()

The findall() function returns all non-overlapping matches of the pattern in the string as a list of strings.

import re

pattern = r"\d+"  # Matches all numbers
text = "The price is 100 dollars, and the discount is 20."
matches = re.findall(pattern, text)

print("Numbers found:", matches)  # Returns ['100', '20']

Output:

Numbers found: ['100', '20']

4. re.finditer()

The finditer() function returns an iterator yielding match objects for all non-overlapping matches. Unlike findall(), it returns a match object that contains more information about the match (such as the start and end positions of the match).

import re

pattern = r"\d+"
text = "There are 100 apples and 50 bananas."

matches = re.finditer(pattern, text)
for match in matches:
    print(f"Match: {match.group()}, Start: {match.start()}, End: {match.end()}")

Output:

Match: 100, Start: 10, End: 13
Match: 50, Start: 25, End: 27

5. re.sub()

The sub() function replaces the occurrences of the pattern with a specified string. It can be used to perform substitutions or replacements in a string.

import re

pattern = r"\d+"  # Matches numbers
text = "The total cost is 100 dollars."
new_text = re.sub(pattern, "X", text)

print("Updated text:", new_text)  # Returns 'The total cost is X dollars.'

Output:

Updated text: The total cost is X dollars.

6. re.split()

The split() function splits the string by the occurrences of the pattern, returning a list of substrings.

import re

pattern = r"\s+"  # Split by whitespace
text = "This is a test string."
result = re.split(pattern, text)

print("Split text:", result)  # Returns ['This', 'is', 'a', 'test', 'string.']

Output:

Split text: ['This', 'is', 'a', 'test', 'string.']

Special Characters in Regular Expressions

  • . (Dot): Matches any character except a newline (\n).
  • \d: Matches any digit (equivalent to [0-9]).
  • \D: Matches any character that is not a digit.
  • \w: Matches any alphanumeric character (letters, digits, and underscore).
  • \W: Matches any non-alphanumeric character.
  • \s: Matches any whitespace character (space, tab, newline).
  • \S: Matches any non-whitespace character.
  • ^: Matches the start of the string.
  • $: Matches the end of the string.
  • []: A set of characters, matches any single character inside the brackets.
  • |: Alternation, matches either the pattern on the left or the pattern on the right (similar to "OR").
  • *: Matches zero or more repetitions of the preceding pattern.
  • +: Matches one or more repetitions of the preceding pattern.
  • ?: Matches zero or one occurrence of the preceding pattern.

Example Use Case: Matching Email Addresses

You can use regular expressions to validate or extract email addresses from text.

import re

pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
email = "test@example.com"

if re.match(pattern, email):
    print("Valid email")
else:
    print("Invalid email")

Output:

Valid email

Example Use Case: Extracting Phone Numbers

Another common task is extracting phone numbers from a string using regular expressions.

import re

pattern = r"\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}"  # Matches phone numbers
text = "You can reach me at (123) 456-7890 or 987-654-3210."

matches = re.findall(pattern, text)
print("Phone numbers found:", matches)

Output:

Phone numbers found: ['(123) 456-7890', '987-654-3210']

Summary of Key Points:

  • The re module allows you to perform pattern matching on strings.
  • Functions like re.match(), re.search(), re.findall(), and re.sub() are commonly used to find, extract, or manipulate strings based on regular expressions.
  • Regular expressions use special syntax to define patterns that can match specific characters, sequences, and groups in text.
  • Common use cases include validating data (such as emails or phone numbers), extracting information from strings, and performing replacements.

By mastering the re module, you can solve many text processing and string manipulation problems efficiently in Python.

Commenting is not enabled on this course.