語法篇（9）：python中的正則表達式 · 我的python小冊

# [正則表達式在Python中的應用][1] [1]: http://www.runoob.com/python/python-reg-expressions.html [TOC] Python提供re模塊，包含所有正則表達式的功能。 ```python import re s = 'ABC\\-001' # Python的字符串 # 對應的正則表達式字符串變成：'ABC\-001' 因為Python的字符串本身也用\轉義 # 強烈建議使用Python的r前綴，就不用考慮轉義的問題了 s = r'ABC\-001' # 'ABC\-001' # match()方法判斷是否匹配，如果匹配成功，返回一個Match對象，否則返回None。 re.match(r'^\d{3}\-\d{3,8}$', '010-12345') # <_sre.SRE_Match object; span=(0, 9), match='010-12345'> re.match(r'^\d{3}\-\d{3,8}$', '010 12345') # 用正則按一個或多個空格切分字符串 re.split(r'\s+', 'a b c') # ['a', 'b', 'c'] # 根據一個或多個，以及空格切分 re.split(r'[\s\,]+', 'a,b, c d') # ['a', 'b', 'c', 'd'] # 找到正則表達式所匹配的所有子串，并返回一個列表，如果沒有找到匹配的，則返回空列表 pattern = re.compile(r'\d+') # 查找數字 result1 = pattern.findall('runoob 123 google 456') # ['123', '456'] ``` ## 分組同其它語言一樣，0代表全部，1代表第一個分組，2代表第二個分組；以此類推； ```python m = re.match(r'^(\d{3})-(\d{3,8})$', '010-12345') m.groups() # ('010', '12345') m.group(0) # '010-12345' m.group(1) # '010' m.group(2) # '12345' ``` > 注意！注意！！！match嘗試從字符串的起始位置匹配一個模式，如果不是起始位置匹配成功的話，match()就返回none。 ```python m = re.match(r'^(\d{3})-(\d{3,8})$', 'z010-12345') # none n = re.search(r'(\d{3})-(\d{3,8})$', 'z010-12345'); # <_sre.SRE_Match object; span=(1, 10), match='010-12345'> ``` ## 貪婪匹配同其它語言一樣，正則匹配默認是貪婪匹配 ```python re.match(r'^(\d+)(0*)$', '102300').groups() # ('102300', '') 默認第二組匹配不到 re.match(r'^(\d+?)(0*)$', '102300').groups() # ('1023', '00') 加個?就可以讓\d+采用非貪婪匹配 re.match(r'^(\d+?)$', '102300').groups() # ('102300',) 如果后面沒有的話，也會匹配完 ``` ## 提前編譯當我們在Python中使用正則表達式時，re模塊內部會干兩件事情： 1. 編譯正則表達式，如果正則表達式的字符串本身不合法，會報錯； 2. 用編譯后的正則表達式去匹配字符串 ```python re_telephone = re.compile(r'^(\d{3})-(\d{3,8})$') re_telephone.match('010-12345').groups() # ('010', '12345') ``` ## 檢索和替換 ```python phone = "2004-959-559 # 這是一個國外電話號碼" # 刪除字符串中的 Python注釋 num = re.sub(r'#.*$', "", phone) print(num) ```