Python 正則表達式 · PythonGuru 中文系列教程

# Python 正則表達式 > 原文： [https://thepythonguru.com/python-regular-expression/](https://thepythonguru.com/python-regular-expression/) * * * 于 2020 年 1 月 7 日更新 * * * 正則表達式廣泛用于模式匹配。 Python 具有對常規功能的內置支持。要使用正則表達式，您需要導入`re`模塊。 ```py import re ``` 現在您可以使用正則表達式了。 ## `search`方法 * * * `re.search()`用于查找字符串中模式的第一個匹配項。 **語法**： `re.search(pattern, string, flags[optional])` `re.search()`方法接受模式和字符串，并在成功時返回`match`對象；如果找不到匹配項，則返回`None`。 `match`對象具有`group()`方法，該方法在字符串中包含匹配的文本。您必須使用原始字符串來指定模式，即像這樣用`r`開頭的字符串。 ```py r'this \n' ``` 所有特殊字符和轉義序列在原始字符串中均失去其特殊含義，因此`\n`不是換行符，它只是一個反斜杠`\`后跟一個`n`。 ```py >>> import re >>> s = "my number is 123" >>> match = re.search(r'\d\d\d', s) >>> match <_sre.SRE_Match object; span=(13, 16), match='123'> >>> match.group() '123' ``` 上面我們使用`\d\d\d`作為模式。 `\d`正則表達式匹配一位數字，因此 `\d\d\d`將匹配`111`，`222`和`786`之類的數字。它與`12`和`1444`不匹配。 ## 正則表達式中使用的基本模式 * * * | 符號 | 描述 | | --- | --- | | `.` | 點匹配除換行符以外的任何字符 | | `\w` | 匹配任何單詞字符，即字母，字母數字，數字和下劃線（`_`） | | `\W` | 匹配非單詞字符 | | `\d` | 匹配一個數字 | | `\D` | 匹配不是數字的單個字符 | | `\s` | 匹配任何空白字符，例如`\n`，`\t`，空格 | | `\S` | 匹配單個非空白字符 | | `[abc]` | 匹配集合中的單個字符，即匹配`a`，`b`或`c` | | `[^abc]` | 匹配`a`，`b`和`c`以外的單個字符 | | `[a-z]` | 匹配`a`至`z`范圍內的單個字符。 | | `[a-zA-Z]` | 匹配`a-z`或`A-Z`范圍內的單個字符 | | `[0-9]` | 匹配`0`-`9`范圍內的單個字符 | | `^` | 匹配從字符串開頭開始 | | `$` | 匹配從字符串末尾開始 | | `+` | 匹配一個或多個前面的字符（貪婪匹配）。 | | `*` | 匹配零個或多個前一個字符（貪婪匹配）。 | 再舉一個例子： ```py import re s = "tim email is tim@somehost.com" match = re.search(r'[\w.-]+@[\w.-]+', s) # the above regular expression will match a email address if match: ? ? print(match.group()) else: ? ? print("match not found") ``` 這里我們使用了`[\w.-]+@[\w.-]+`模式來匹配電子郵件地址。成功后，`re.search()`返回一個`match`對象，其`group()`方法將包含匹配的文本。 ## 捕捉組 * * * 組捕獲允許從匹配的字符串中提取部分。您可以使用括號`()`創建組。假設在上面的示例中，我們想從電子郵件地址中提取用戶名和主機名。為此，我們需要在用戶名和主機名周圍添加`()`，如下所示。 ```py match = re.search(r'([\w.-]+)@([\w.-]+)', s) ``` 請注意，括號不會更改模式匹配的內容。如果匹配成功，則`match.group(1)`將包含第一個括號中的匹配，`match.group(2)`將包含第二個括號中的匹配。 ```py import re s = "tim email is tim@somehost.com" match = re.search('([\w.-]+)@([\w.-]+)', s) if match: ? ? print(match.group()) ## tim@somehost.com (the whole match) ? ? print(match.group(1)) ## tim (the username, group 1) ? ? print(match.group(2)) ## somehost (the host, group 2) ``` ## `findall()`函數 * * * 如您所知，現在`re.search()`僅找到模式的第一個匹配項，如果我們想找到字符串中的所有匹配項，這就是`findall()`發揮作用的地方。 **語法**： `findall(pattern, string, flags=0[optional])` 成功時，它將所有匹配項作為字符串列表返回，否則返回空列表。 ```py import re s = "Tim's phone numbers are 12345-41521 and 78963-85214" match = re.findall(r'\d{5}', s) if match: ? ? print(match) ``` **預期輸出**： ```py ['12345', '41521', '78963', '85214'] ``` 您還可以通過`findall()`使用組捕獲，當應用組捕獲時，`findall()`返回一個元組列表，其中元組將包含匹配的組。一個示例將清除所有內容。 ```py import re s = "Tim's phone numbers are 12345-41521 and 78963-85214" match = re.findall(r'(\d{5})-(\d{5})', s) print(match) for i in match: ? ? print() ? ? print(i) ? ? print("First group", i[0]) ? ? print("Second group", i[1]) ``` **預期輸出**： ```py [('12345', '41521'), ('78963', '85214')] ('12345', '41521') First group 12345 Second group 41521 ('78963', '85214') First group 78963 Second group 85214 ``` ## 可選標志 * * * `re.search()`和`re.findall()`都接受可選參數稱為標志。標志用于修改模式匹配的行為。 | 標志 | 描述 | | --- | --- | | `re.IGNORECASE` | 忽略大寫和小寫 | | `re.DOTALL` | 允許（`.`）匹配換行符，默認（`.`）匹配除換行符之外的任何字符 | | `re.MULTILINE` | 這將允許`^`和`$`匹配每行的開始和結束 | ## 使用`re.match()` * * * `re.match()`與`re.search()`非常相似，區別在于它將在字符串的開頭開始尋找匹配項。 ```py import re s = "python tuts" match = re.match(r'py', s) if match: print(match.group()) ``` 您可以通過使用`re.search()`將`^`應用于模式來完成同一件事。 ```py import re s = "python tuts" match = re.search(r'^py', s) if match: print(match.group()) ``` 這樣就完成了您需要了解的有關`re`模塊的所有內容。 * * * * * *