Chapter 10 重構 · Dive Into Python3

# Chapter 10 重構 > " After one has played a vast quantity of notes and more notes, it is simplicity that emerges as the crowning reward of art. " > — [Frédéric Chopin](http://en.wikiquote.org/wiki/Fr%C3%A9d%C3%A9ric_Chopin) ## 深入就算是竭盡了全力編寫全面的單元測試，還是會遇到錯誤。我所說的“錯誤”是什么意思？錯誤是尚未寫到的測試實例。 ``` >>> import roman7 0 ``` 1. 這就是錯誤。和其它無效羅馬數字的一系列字符一樣，空字符串將引發 `InvalidRomanNumeralError` 例外。在重現該錯誤后，應該在修復前寫出一個導致該失敗情形的測試實例，這樣才能描述該錯誤。 ``` class FromRomanBadInput(unittest.TestCase): . . . def testBlank(self): '''from_roman should fail with blank string''' ``` 1. 這段代碼非常簡單。通過傳入一個空字符串調用 `from_roman()` ，并確保其引發一個 `InvalidRomanNumeralError` 例外。難的是發現錯誤；找到了該錯誤之后對它進行測試是件輕松的工作。由于代碼有錯誤，且有用于測試該錯誤的測試實例，該測試實例將會導致失敗： ``` you@localhost:~/diveintopython3/examples$ python3 romantest8.py -v from_roman should fail with blank string ... FAIL from_roman should fail with malformed antecedents ... ok from_roman should fail with repeated pairs of numerals ... ok from_roman should fail with too many repeated numerals ... ok from_roman should give known result with known input ... ok to_roman should give known result with known input ... ok from_roman(to_roman(n))==n for all n ... ok to_roman should fail with negative input ... ok to_roman should fail with non-integer input ... ok to_roman should fail with large input ... ok to_roman should fail with 0 input ... ok ====================================================================== FAIL: from_roman should fail with blank string ---------------------------------------------------------------------- Traceback (most recent call last): File "romantest8.py", line 117, in test_blank self.assertRaises(roman8.InvalidRomanNumeralError, roman8.from_roman, '') AssertionError: InvalidRomanNumeralError not raised by from_roman ---------------------------------------------------------------------- Ran 11 tests in 0.171s FAILED (failures=1) ``` _現在_ 可以修復該錯誤了。 ``` def from_roman(s): '''convert Roman numeral to integer''' raise InvalidRomanNumeralError('Input can not be blank') if not re.search(romanNumeralPattern, s): result = 0 index = 0 for numeral, integer in romanNumeralMap: while s[index:index+len(numeral)] == numeral: result += integer index += len(numeral) return result ``` 1. 只需兩行代碼：一行明確地對空字符串進行檢查，另一行為 `raise` 語句。 2. 在本書中還尚未提到該內容，因此現在讓我們講講 [字符串格式化](strings.html#formatting-strings) 最后一點內容。從 Python 3.1 起，在格式化標示符中使用位置索引時可以忽略數字。也就是說，無需使用格式化標示符 `{0}` 來指向 `format()` 方法的第一個參數，只需簡單地使用 `{}` 而 Python 將會填入正確的位置索引。該規則適用于任何數量的參數；第一個 `{}` 代表 `{0}`，第二個 `{}` 代表 `{1}`，以此類推。 ``` you@localhost:~/diveintopython3/examples$ python3 romantest8.py -v from_roman should fail with malformed antecedents ... ok from_roman should fail with repeated pairs of numerals ... ok from_roman should fail with too many repeated numerals ... ok from_roman should give known result with known input ... ok to_roman should give known result with known input ... ok from_roman(to_roman(n))==n for all n ... ok to_roman should fail with negative input ... ok to_roman should fail with non-integer input ... ok to_roman should fail with large input ... ok to_roman should fail with 0 input ... ok ---------------------------------------------------------------------- Ran 11 tests in 0.156s ``` 1. 現在空字符串測試實例通過了測試，也就是說錯誤被修正了。 2. 所有其它測試實例仍然可以通過，說明該錯誤修正沒有破壞其它部分。代碼編寫結束。用此方式編寫代碼將使得錯誤修正變得更困難。簡單的錯誤（像這個）需要簡單的測試實例；復雜的錯誤將會需要復雜的測試實例。在以測試為中心的環境中，由于必須在代碼中精確地描述錯誤（編寫測試實例），然后修正錯誤本身，看起來 _好像_ 修正錯誤需要更多的時間。而如果測試實例無法正確地通過，則又需要找出到底是修正方案有錯誤，還數測試實例本身就有錯誤。然而從長遠看，這種在測試代碼和經測試代碼之間的來回折騰是值得的，因為這樣才更有可能在第一時間修正錯誤。同時，由于可以對新代碼輕松地重新運行 _所有_ 測試實例，在修正新代碼時破壞舊代碼的機會更低。今天的單元測試就是明天的回歸測試。 ## 控制需求變化為了獲取準確的需求，盡管已經竭力將客戶“釘”在原地，并經歷了反復剪切、粘貼的痛苦，但需求仍然會變化。大多數客戶在看到產品之前不知道自己想要什么，而且就算知道，他們也不擅長清晰地表述自己的想法。而即便擅長表述，他們在下一個版本中也會提出更多要求。因此，必須隨時準備好更新測試實例以應對需求變化。舉個例子來說，假定我們要擴展羅馬數字轉換函數的能力范圍。正常情況下，羅馬數字中的任何一個字符在同一行中不得重復出現三次以上。但羅馬人卻愿意該規則有個例外：通過一行中的 4 個 `M` 字符來代表 `4000` 。進行該修改后，將會把可轉換數字的范圍從 `1..3999` 拓展為 `1..4999`。但首先必須對測試實例進行一些修改。 ``` class KnownValues(unittest.TestCase): known_values = ( (1, 'I'), . . . (3999, 'MMMCMXCIX'), (4500, 'MMMMD'), (4888, 'MMMMDCCCLXXXVIII'), (4999, 'MMMMCMXCIX') ) class ToRomanBadInput(unittest.TestCase): def test_too_large(self): '''to_roman should fail with large input''' . . . class FromRomanBadInput(unittest.TestCase): def test_too_many_repeated_numerals(self): '''from_roman should fail with too many repeated numerals''' self.assertRaises(roman8.InvalidRomanNumeralError, roman8.from_roman, s) . . . class RoundtripCheck(unittest.TestCase): def test_roundtrip(self): '''from_roman(to_roman(n))==n for all n''' numeral = roman8.to_roman(integer) result = roman8.from_roman(numeral) self.assertEqual(integer, result) ``` 1. 現有的已知數值不會變（它們依然是合理的測試數值），但必須在 `4000` 范圍之內（外）增加一些。在此，我已經添加了 `4000` (最短)、 `4500` (第二短)、 `4888` (最長) 和 `4999` (最大)。 2. “過大值輸入” 的定義已經發生了變化。該測試用于通過傳入 `4000` 調用 `to_roman()` 并期望引發一個錯誤；目前 `4000-4999` 是有效的值，必須將該值調整為 `5000` 。 3. “太多重復數字”的定義也發生了變化。該測試通過傳入 `'MMMM'` 調用 `from_roman()` 并預期發生一個錯誤；目前 `MMMM` 被認定為有效的羅馬數字，必須將該條件修改為 `'MMMMM'` 。 4. 對范圍內的每個數字進行完整循環測試，從 `1` 到 `3999`。由于范圍已經進行了拓展，該 `for` 循環同樣需要修改為以 `4999` 為上限。現在，測試實例已經按照新的需求進行了更新，但代碼還沒有，因按照預期，某些測試實例將返回失敗結果。 ``` you@localhost:~/diveintopython3/examples$ python3 romantest9.py -v from_roman should fail with blank string ... ok from_roman should fail with malformed antecedents ... ok from_roman should fail with non-string input ... ok from_roman should fail with repeated pairs of numerals ... ok from_roman should fail with too many repeated numerals ... ok to_roman should fail with negative input ... ok to_roman should fail with non-integer input ... ok to_roman should fail with large input ... ok to_roman should fail with 0 input ... ok ====================================================================== ERROR: from_roman should give known result with known input ---------------------------------------------------------------------- Traceback (most recent call last): File "romantest9.py", line 82, in test_from_roman_known_values result = roman9.from_roman(numeral) File "C:\home\diveintopython3\examples\roman9.py", line 60, in from_roman raise InvalidRomanNumeralError('Invalid Roman numeral: {0}'.format(s)) roman9.InvalidRomanNumeralError: Invalid Roman numeral: MMMM ====================================================================== ERROR: to_roman should give known result with known input ---------------------------------------------------------------------- Traceback (most recent call last): File "romantest9.py", line 76, in test_to_roman_known_values result = roman9.to_roman(integer) File "C:\home\diveintopython3\examples\roman9.py", line 42, in to_roman raise OutOfRangeError('number out of range (must be 0..3999)') roman9.OutOfRangeError: number out of range (must be 0..3999) ====================================================================== ERROR: from_roman(to_roman(n))==n for all n ---------------------------------------------------------------------- Traceback (most recent call last): File "romantest9.py", line 131, in testSanity numeral = roman9.to_roman(integer) File "C:\home\diveintopython3\examples\roman9.py", line 42, in to_roman raise OutOfRangeError('number out of range (must be 0..3999)') roman9.OutOfRangeError: number out of range (must be 0..3999) ---------------------------------------------------------------------- Ran 12 tests in 0.171s FAILED (errors=3) ``` 1. 一旦遇到 `'MMMM'`，`from_roman()` 已知值測試將會失敗，因為 `from_roman()` 仍將其視為無效羅馬數字。 2. 一旦遇到 `4000`，`to_roman()` 已知值測試將會失敗，因為 `to_roman()` 仍將其視為超范圍數字。 3. 而往返（譯注：指在普通數字和羅馬數字之間來回轉換）檢查遇到 `4000` 時也會失敗，因為 `to_roman()` 仍認為其超范圍。現在，我們有了一些由新需求導致失敗的測試實例，可以考慮修正代碼讓它與新測試實例一致起來。（剛開始編寫單元測試的時候，被測試代碼絕不會在測試實例“之前”出現確實讓人感覺有點怪。）盡管編碼工作被置后安排，但還是不少要做的事情，一旦與測試實例相符，編碼工作就可以結束了。一旦習慣單元測試后，您可能會對自己曾在編程時不進行測試感到很奇怪。） ``` roman_numeral_pattern = re.compile(''' ^ # beginning of string (CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 Cs), # or 500-800 (D, followed by 0 to 3 Cs) (XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 Xs), # or 50-80 (L, followed by 0 to 3 Xs) (IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 Is), # or 5-8 (V, followed by 0 to 3 Is) $ # end of string ''', re.VERBOSE) def to_roman(n): '''convert integer to Roman numeral''' raise OutOfRangeError('number out of range (must be 1..4999)') if not isinstance(n, int): raise NotIntegerError('non-integers can not be converted') result = '' for numeral, integer in roman_numeral_map: while n >= integer: result += numeral n -= integer return result def from_roman(s): . . . ``` 1. 根本無需對 `from_roman()` 函數進行任何修改。唯一需要修改的是 `roman_numeral_pattern` 。仔細觀察下，將會發現我已經在正則表達式的第一部分中將 `M` 字符的數量從 `3` 優化為 `4` 。該修改將允許等價于 `4999` 而不是 `3999` 的羅馬數字。實際的 `from_roman()` 函數完全是通用的；它只查找重復的羅馬數字字符并將它們加起來，而不關心它們重復了多少次。之前無法處理 `'MMMM'` 的唯一原因是我們通過正則表達式匹配明確地阻止了它這么做。 2. `to_roman()` 函數只需在范圍檢查中進行一個小改動。將之前檢查 `0 < n < 4000` 的地方現在修改為檢查 `0 < n < 5000` 。同時修改 `引發` 的錯誤信息，以體現新的可接受范圍 (`1..4999` 取代 `1..3999`) 。無需對函數剩下部分進行任何修改；它已經能夠應對新的實例。（它將對找到的每個千位增加 `'M'` ；如果給定 `4000`，它將給出 `'MMMM'`。之前它不這么做的唯一原因是我們通過范圍檢查明確地阻止了它。）所需做的就是這兩處小修改，但你可能會有點懷疑。嗨，別光聽我說，你自己看看吧。 ``` you@localhost:~/diveintopython3/examples$ python3 romantest9.py -v from_roman should fail with blank string ... ok from_roman should fail with malformed antecedents ... ok from_roman should fail with non-string input ... ok from_roman should fail with repeated pairs of numerals ... ok from_roman should fail with too many repeated numerals ... ok from_roman should give known result with known input ... ok to_roman should give known result with known input ... ok from_roman(to_roman(n))==n for all n ... ok to_roman should fail with negative input ... ok to_roman should fail with non-integer input ... ok to_roman should fail with large input ... ok to_roman should fail with 0 input ... ok ---------------------------------------------------------------------- Ran 12 tests in 0.203s ``` 1. 所有測試實例均通過了。代碼編寫結束。全面單元測試的意思是：無需依賴某個程序員來說“相信我吧。” ## 重構關于全面單元測試，最美妙的事情不是在所有的測試實例通過后的那份心情，也不是別人抱怨你破壞了代碼，而你通過實踐 _證明_ 自己沒有時的快感。單元測試最美妙之處在于它給了你大刀闊斧進行重構的自由。重構是修改可運作代碼，使其表現更佳的過程。通常，“更佳”指的是“更快”，但它也可能指的是“占用更少內存“、”占用更少磁盤空間“或者”更加簡潔”。對于你的環境、你的項目來說，無論重構意味著什么，它對程序的長期健康都至關重要。本例中，“更佳”的意思既包括“更快”也包括“更易于維護”。具體而言，因為用于驗證羅馬數字的正則表達式生澀冗長，該 `from_roman()` 函數比我所希望的更慢，也更加復雜。現在，你可能會想，“當然，正則表達式就又臭又長的，難道我有其它辦法驗證任意字符串是否為羅馬數字嗎？” 答案是：只針對 5000 個數進行轉換；為什么不知建立一個查詢表呢？意識到 _根本不需要使用正則表達式_ 之后，這個主意甚至變得更加理想了。在建立將整數轉換為羅馬數字的查詢表的同時，還可以建立將羅馬數字轉換為整數的逆向查詢表。在需要檢查任意字符串是否是有效羅馬數字的時候，你將收集到所有有效的羅馬數字。“驗證”工作簡化為一個簡單的字典查詢。最棒的是，你已經有了一整套單元測試。可以修改模塊中一半以上的代碼，而單元測試將會保持不變。這意味著可以向你和其他人證明：新代碼運作和最初的一樣好。 ``` class OutOfRangeError(ValueError): pass class NotIntegerError(ValueError): pass class InvalidRomanNumeralError(ValueError): pass roman_numeral_map = (('M', 1000), ('CM', 900), ('D', 500), ('CD', 400), ('C', 100), ('XC', 90), ('L', 50), ('XL', 40), ('X', 10), ('IX', 9), ('V', 5), ('IV', 4), ('I', 1)) to_roman_table = [ None ] from_roman_table = {} def to_roman(n): '''convert integer to Roman numeral''' if not (0 < n < 5000): raise OutOfRangeError('number out of range (must be 1..4999)') if int(n) != n: raise NotIntegerError('non-integers can not be converted') return to_roman_table[n] def from_roman(s): '''convert Roman numeral to integer''' if not isinstance(s, str): raise InvalidRomanNumeralError('Input must be a string') if not s: raise InvalidRomanNumeralError('Input can not be blank') if s not in from_roman_table: raise InvalidRomanNumeralError('Invalid Roman numeral: {0}'.format(s)) return from_roman_table[s] def build_lookup_tables(): def to_roman(n): result = '' for numeral, integer in roman_numeral_map: if n >= integer: result = numeral n -= integer break if n > 0: result += to_roman_table[n] return result for integer in range(1, 5000): roman_numeral = to_roman(integer) to_roman_table.append(roman_numeral) from_roman_table[roman_numeral] = integer build_lookup_tables() ``` 讓我們打斷一下，進行一些剖析工作。可以說，最重要的是最后一行： ``` build_lookup_tables() ``` 可以注意到這是一次函數調用，但沒有 `if` 語句包裹住它。這不是 `if __name__ == '__main__'` 語塊；_模塊被導入時_ 它將會被調用。（重要的是必須明白：模塊將只被導入一次，隨后被緩存了。如果導入一個已導入模塊，將不會導致任何事情發生。因此這段代碼將只在第一此導入時運行。）那么，該 `build_lookup_tables()` 函數究竟進行了哪些操作呢?很高興你問這個問題。 ``` to_roman_table = [ None ] from_roman_table = {} . . . def build_lookup_tables(): result = '' for numeral, integer in roman_numeral_map: if n >= integer: result = numeral n -= integer break if n > 0: result += to_roman_table[n] return result for integer in range(1, 5000): from_roman_table[roman_numeral] = integer ``` 1. 這是一段聰明的程序代碼……也許過于聰明了。上面定義了 `to_roman()` 函數；它在查詢表中查找值并返回結果。而 `build_lookup_tables()` 函數重定義了 `to_roman()` 函數用于實際操作（像添加查詢表之前的例子一樣）。在 `build_lookup_tables()` 函數內部，對 `to_roman()` 的調用將會針對該重定義的版本。一旦 `build_lookup_tables()` 函數退出，重定義的版本將會消失?—?它的定義只在 `build_lookup_tables()` 函數的作用域內生效。 2. 該行代碼將調用重定義的 `to_roman()` 函數，該函數實際計算羅馬數字。 3. 一旦獲得結果（從重定義的 `to_roman()` 函數），可將整數及其對應的羅馬數字添加到兩個查詢表中。查詢表建好后，剩下的代碼既容易又快捷。 ``` def to_roman(n): '''convert integer to Roman numeral''' if not (0 < n < 5000): raise OutOfRangeError('number out of range (must be 1..4999)') if int(n) != n: raise NotIntegerError('non-integers can not be converted') def from_roman(s): '''convert Roman numeral to integer''' if not isinstance(s, str): raise InvalidRomanNumeralError('Input must be a string') if not s: raise InvalidRomanNumeralError('Input can not be blank') if s not in from_roman_table: raise InvalidRomanNumeralError('Invalid Roman numeral: {0}'.format(s)) ``` 1. 像前面那樣進行同樣的邊界檢查之后，`to_roman()` 函數只需在查詢表中查找并返回適當的值。 2. 同樣，`from_roman()` 函數也縮水為一些邊界檢查和一行代碼。不再有正則表達式。不再有循環。O(1) 轉換為或轉換到羅馬數字。但這段代碼可以運作嗎？為什么可以，是的它可以。而且我可以證明。 ``` you@localhost:~/diveintopython3/examples$ python3 romantest10.py -v from_roman should fail with blank string ... ok from_roman should fail with malformed antecedents ... ok from_roman should fail with non-string input ... ok from_roman should fail with repeated pairs of numerals ... ok from_roman should fail with too many repeated numerals ... ok from_roman should give known result with known input ... ok to_roman should give known result with known input ... ok from_roman(to_roman(n))==n for all n ... ok to_roman should fail with negative input ... ok to_roman should fail with non-integer input ... ok to_roman should fail with large input ... ok to_roman should fail with 0 input ... ok ---------------------------------------------------------------------- OK ``` 1. 它不僅能夠回答你的問題，還運行得非常快！好象速度提升了 10 倍。當然，這種比較并不公平，因為此版本在導入時耗時更長（在建造查詢表時）。但由于只進行一次導入，啟動的成本可以由對 `to_roman()` 和 `from_roman()` 函數的所有調用攤薄。由于該測試進行幾千次函數調用（來回單獨測試上萬次），節省出來的效率成本得以迅速提升！這個故事的寓意是什么？ * 簡單是一種美德。 * 特別在涉及到正則表達式的時候。 * 單元測試令你在進行大規模重構時充滿自信。 ## 摘要單元測試是一個威力強大的概念，如果正確實施，不但可以降低維護成本，還可以提高長期項目的靈活性。但同時還必須明白：單元測試既不是靈丹妙藥，也不是解決問題的魔術，更不是銀彈。編寫良好的測試實例非常艱難，確保它們時刻保持最新必須成為一項紀律（特別在客戶要求關鍵錯誤修正時）。單元測試不是功能測試、集成測試或用戶承受能力測試等其它測試的替代品。但它是可行的、行之有效的，見識過其功用后，你將對之前曾沒有用它而感到奇怪。這幾章覆蓋的內容很多，很大一部分都不是 Python 所特有的。許多語言都有單元測試框架，但所有框架都要求掌握同一基本概念： * 設計測試實例是件具體、自動且獨立的工作。 * 在編寫被測試代碼 _之前_ 編寫測試實例。 * 編寫用于檢查好輸入并驗證正確結果的測試 * 編寫用于測試“壞”輸入并做出正確失敗響應的測試。 * 編寫并更新測試實例以反映新的需求 * 毫不留情地重構以提升性能、可擴展性、可讀性、可維護性及任何缺乏的特性。