Completed
-
1. Introduction to Python
-
2. Python Basics
-
3. Working with Data Structures
-
4. Functions and Modules
-
5. Object-Oriented Programming (OOP)
-
6. File Handling
-
7. Error and Exception Handling
-
8. Python for Data Analysis
-
9. Advanced Topics in Python
-
10. Working with APIs
-
11. Python for Automation
-
12. Capstone Projects
- 13. Final Assessment and Quizzes
9.3 Regular Expressions
Regular Expressions (RegEx) are patterns used to match and manipulate strings in a flexible and efficient way. They allow you to search for specific patterns in text, such as validating email addresses, extracting data, and replacing or modifying parts of strings.
In Python, the re module is used to work with regular expressions. It provides a set of functions for searching, matching, and manipulating strings based on specific patterns.
1. Introduction to Regular Expressions
A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings using a specialized syntax. Common use cases include:
- Validating input (e.g., email addresses, phone numbers).
- Searching for patterns in large bodies of text.
- Extracting parts of a string, such as dates, names, or emails.
A regular expression consists of:
- Literal characters: Characters that match themselves (e.g., a, 1, @).
- Special characters: Characters that have a special meaning in regular expressions, such as . (dot), * (asterisk), + (plus), etc.
2. Basic Syntax of Regular Expressions
Here are some basic elements of a regular expression:
- . (dot): Matches any character except newline.
- ^: Matches the start of a string.
- $: Matches the end of a string.
- []: A set of characters, matches any single character inside the brackets.
- |: Alternation, matches one of several patterns.
- *: Matches 0 or more repetitions of the preceding character.
- +: Matches 1 or more repetitions of the preceding character.
- ?: Matches 0 or 1 occurrence of the preceding character.
- \d: Matches any digit, equivalent to [0-9].
- \w: Matches any alphanumeric character (letters and digits), equivalent to [a-zA-Z0-9_].
- \s: Matches any whitespace character (spaces, tabs, newlines).
3. Using the re Module
The re module in Python provides several functions to work with regular expressions. Here are some of the most commonly used functions:
3.1 re.match()
This function checks if the regular expression matches the beginning of a string.
import re pattern = r"hello" text = "hello world" result = re.match(pattern, text) if result: print("Match found:", result.group()) else: print("No match")
Output:
Match found: hello
3.2 re.search()
This function searches for the pattern anywhere in the string and returns the first match.
import re pattern = r"world" text = "hello world" result = re.search(pattern, text) if result: print("Search found:", result.group()) else: print("No match")
Output:
Search found: world
3.3 re.findall()
This function finds all matches of the pattern in the string and returns them as a list.
import re pattern = r"\d+" # Find all numbers text = "There are 10 apples and 5 bananas." matches = re.findall(pattern, text) print("Numbers found:", matches)
Output:
Numbers found: ['10', '5']
3.4 re.sub()
This function is used to replace occurrences of the pattern with a specified string.
import re pattern = r"\d+" # Match all numbers text = "There are 10 apples and 5 bananas." result = re.sub(pattern, "X", text) print("Replaced text:", result)
Output:
Replaced text: There are X apples and X bananas.
3.5 re.split()
This function splits the string based on the given pattern and returns a list.
import re pattern = r"\s+" # Split based on one or more spaces text = "Hello world! How are you?" result = re.split(pattern, text) print("Split text:", result)
Output:
Split text: ['Hello', 'world!', 'How', 'are', 'you?']
4. Common Use Cases of Regular Expressions
4.1 Email Validation
A common use case for regular expressions is validating email addresses.
import re pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$" email = "test@example.com" if re.match(pattern, email): print("Valid email") else: print("Invalid email")
Output:
Valid email
4.2 Extracting Dates
Regular expressions can also be used to extract dates from text.
import re pattern = r"\d{2}/\d{2}/\d{4}" # Matches dates in the format DD/MM/YYYY text = "The event is on 25/12/2024, and another one on 01/01/2025." dates = re.findall(pattern, text) print("Dates found:", dates)
Output:
Dates found: ['25/12/2024', '01/01/2025']
4.3 Finding Phone Numbers
You can use regular expressions to identify phone numbers in various formats.
import re pattern = r"\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}" text = "Call me at (123) 456-7890 or 987-654-3210." phone_numbers = re.findall(pattern, text) print("Phone numbers found:", phone_numbers)
Output:
Phone numbers found: ['(123) 456-7890', '987-654-3210']
5. Regular Expressions Best Practices
- Use raw strings (r"pattern"): In Python, using raw strings for regular expressions is important because it prevents the need for escaping backslashes. For example, r"\d" is much more readable than "\\d".
- Be cautious with greedy matches: Regular expressions are often greedy by default (they match as much as possible). Use ? to make them non-greedy if necessary (e.g., .*?).
- Test regular expressions: Use online tools like regex101 to test and debug your regular expressions.
- Use specific patterns: Avoid using overly broad patterns like .* unless necessary, as they can lead to inefficient matching.
6. Summary
- Regular Expressions are powerful tools for pattern matching in strings.
- The re module provides several functions like match(), search(), findall(), sub(), and split() to work with regular expressions.
- Regular expressions are commonly used for tasks like validation, searching, and extracting data.
- It's important to understand the syntax and use regular expressions efficiently to solve real-world problems.
Commenting is not enabled on this course.