-
1. Introduction to Python
-
2. Python Basics
-
3. Working with Data Structures
-
4. Functions and Modules
-
5. Object-Oriented Programming (OOP)
-
6. File Handling
-
7. Error and Exception Handling
-
8. Python for Data Analysis
-
9. Advanced Topics in Python
-
10. Working with APIs
-
11. Python for Automation
-
12. Capstone Projects
- 13. Final Assessment and Quizzes
9.3.1 Pattern matching with re module
Pattern matching is a powerful feature in Python that allows you to search, match, and manipulate strings based on specific patterns. This is done using regular expressions (regex), which define a search pattern. Python’s re module provides functions to perform various operations related to pattern matching.
Key Functions in the re Module
The re module provides several functions to work with regular expressions and perform pattern matching. Here’s a summary of the most commonly used functions for pattern matching:
1. re.match()
The match() function attempts to match a pattern from the beginning of a string. If a match is found at the start, it returns a match object. If no match is found, it returns None.
import re pattern = r"hello" text = "hello world" result = re.match(pattern, text) if result: print("Match found:", result.group()) # Returns 'hello' else: print("No match")
Output:
Match found: hello
2. re.search()
The search() function searches the entire string for the first occurrence of the pattern. Unlike match(), it doesn’t require the match to be at the start of the string.
import re pattern = r"world" text = "hello world" result = re.search(pattern, text) if result: print("Search found:", result.group()) # Returns 'world' else: print("No match")
Output:
Search found: world
3. re.findall()
The findall() function returns all non-overlapping matches of the pattern in the string as a list of strings.
import re pattern = r"\d+" # Matches all numbers text = "The price is 100 dollars, and the discount is 20." matches = re.findall(pattern, text) print("Numbers found:", matches) # Returns ['100', '20']
Output:
Numbers found: ['100', '20']
4. re.finditer()
The finditer() function returns an iterator yielding match objects for all non-overlapping matches. Unlike findall(), it returns a match object that contains more information about the match (such as the start and end positions of the match).
import re pattern = r"\d+" text = "There are 100 apples and 50 bananas." matches = re.finditer(pattern, text) for match in matches: print(f"Match: {match.group()}, Start: {match.start()}, End: {match.end()}")
Output:
Match: 100, Start: 10, End: 13 Match: 50, Start: 25, End: 27
5. re.sub()
The sub() function replaces the occurrences of the pattern with a specified string. It can be used to perform substitutions or replacements in a string.
import re pattern = r"\d+" # Matches numbers text = "The total cost is 100 dollars." new_text = re.sub(pattern, "X", text) print("Updated text:", new_text) # Returns 'The total cost is X dollars.'
Output:
Updated text: The total cost is X dollars.
6. re.split()
The split() function splits the string by the occurrences of the pattern, returning a list of substrings.
import re pattern = r"\s+" # Split by whitespace text = "This is a test string." result = re.split(pattern, text) print("Split text:", result) # Returns ['This', 'is', 'a', 'test', 'string.']
Output:
Split text: ['This', 'is', 'a', 'test', 'string.']
Special Characters in Regular Expressions
- . (Dot): Matches any character except a newline (\n).
- \d: Matches any digit (equivalent to [0-9]).
- \D: Matches any character that is not a digit.
- \w: Matches any alphanumeric character (letters, digits, and underscore).
- \W: Matches any non-alphanumeric character.
- \s: Matches any whitespace character (space, tab, newline).
- \S: Matches any non-whitespace character.
- ^: Matches the start of the string.
- $: Matches the end of the string.
- []: A set of characters, matches any single character inside the brackets.
- |: Alternation, matches either the pattern on the left or the pattern on the right (similar to "OR").
- *: Matches zero or more repetitions of the preceding pattern.
- +: Matches one or more repetitions of the preceding pattern.
- ?: Matches zero or one occurrence of the preceding pattern.
Example Use Case: Matching Email Addresses
You can use regular expressions to validate or extract email addresses from text.
import re pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$" email = "test@example.com" if re.match(pattern, email): print("Valid email") else: print("Invalid email")
Output:
Valid email
Example Use Case: Extracting Phone Numbers
Another common task is extracting phone numbers from a string using regular expressions.
import re pattern = r"\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}" # Matches phone numbers text = "You can reach me at (123) 456-7890 or 987-654-3210." matches = re.findall(pattern, text) print("Phone numbers found:", matches)
Output:
Phone numbers found: ['(123) 456-7890', '987-654-3210']
Summary of Key Points:
- The re module allows you to perform pattern matching on strings.
- Functions like re.match(), re.search(), re.findall(), and re.sub() are commonly used to find, extract, or manipulate strings based on regular expressions.
- Regular expressions use special syntax to define patterns that can match specific characters, sequences, and groups in text.
- Common use cases include validating data (such as emails or phone numbers), extracting information from strings, and performing replacements.
By mastering the re module, you can solve many text processing and string manipulation problems efficiently in Python.
Commenting is not enabled on this course.