14.5.?roman.py, 第 5 階段 · Dive Into Python

# 14.5.?`roman.py`, 第 5 階段現在 `fromRoman` 對于有效輸入能夠正常工作了，是揭開最后一個謎底的時候了：使它正常工作于無效輸入的情況下。這意味著要找出一個方法檢查一個字符串是不是有效的羅馬數字。這比 `toRoman` 中[驗證有效的數字輸入](stage_3.html "14.3.?roman.py, 第 3 階段")困難，但是你可以使用一個強大的工具：正則表達式。如果你不熟悉正則表達式，并且沒有讀過 [第?7?章 _正則表達式_](../regular_expressions/index.html "第?7?章?正則表達式")，現在是該好好讀讀的時候了。如你在 [第?7.3?節 “個案研究：羅馬字母”](../regular_expressions/roman_numerals.html "7.3.?個案研究：羅馬字母")中所見到的，構建羅馬數字有幾個簡單的規則：使用字母 `M`, `D`, `C`, `L`, `X`, `V` 和 `I`。讓我們回顧一下： 1. 字符是被“加”在一起的：`I` 是 `1`，`II` 是 `2`，`III` 是 `3`。`VI` 是 `6` (看上去就是 “`5` 加 `1`”)，`VII` 是 `7`，`VIII` 是 `8`。 2. 這些字符 (`I`, `X`, `C` 和 `M`) 最多可以重復三次。對于 `4`，你則需要利用下一個能夠被5整除的字符進行減操作得到。你不能把 `4` 表示為 `IIII` 而應該表示為 `IV` (“比 `5` 小 `1` ”)。`40` 則被寫作 `XL` (“比 `50` 小 `10`”)，`41` 表示為 `XLI`，`42` 表示為 `XLII`，`43` 表示為 `XLIII`，`44` 表示為 `XLIV` (“比`50`小`10`，加上 `5` 小 `1`”)。 3. 類似地，對于數字 `9`，你必須利用下一個能夠被10整除的字符進行減操作得到：`8` 是 `VIII`，而 `9` 是 `IX` (“比 `10` 小 `1`”)，而不是 `VIIII` (由于 `I` 不能重復四次)。`90` 表示為 `XC`，`900` 表示為 `CM`。 4. 含五的字符不能被重復：`10` 應該表示為 `X`，而不會是 `VV`。`100` 應該表示為 `C`，而不是 `LL`。 5. 羅馬數字一般從高位到低位書寫，從左到右閱讀，因此不同順序的字符意義大不相同。`DC` 是 `600`，`CD` 是完全另外一個數 (`400`，“比 `500` 少 `100`”)。`CI` 是 `101`，而 `IC` 根本就不是一個有效的羅馬數字 (因為你無法從`100`直接減`1`，應該寫成 `XCIX`，意思是 “比 `100` 少 `10`，然后加上數字 `9`，也就是比 `10` 少 `1`”)。 ## 例?14.12.?`roman5.py` 這個程序可以在例子目錄下的`py/roman/stage5/` 目錄中找到。如果您還沒有下載本書附帶的樣例程序, 可以 [下載本程序和其他樣例程序](http://www.woodpecker.org.cn/diveintopython/download/diveintopython-exampleszh-cn-5.4b.zip "Download example scripts")。 ``` """Convert to and from Roman numerals""" import re #Define exceptions class RomanError(Exception): pass class OutOfRangeError(RomanError): pass class NotIntegerError(RomanError): pass class InvalidRomanNumeralError(RomanError): pass #Define digit mapping romanNumeralMap = (('M', 1000), ('CM', 900), ('D', 500), ('CD', 400), ('C', 100), ('XC', 90), ('L', 50), ('XL', 40), ('X', 10), ('IX', 9), ('V', 5), ('IV', 4), ('I', 1)) def toRoman(n): """convert integer to Roman numeral""" if not (0 < n < 4000): raise OutOfRangeError, "number out of range (must be 1..3999)" if int(n) <> n: raise NotIntegerError, "non-integers can not be converted" result = "" for numeral, integer in romanNumeralMap: while n >= integer: result += numeral n -= integer return result #Define pattern to detect valid Roman numerals romanNumeralPattern = '^M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$' def fromRoman(s): """convert Roman numeral to integer""" if not re.search(romanNumeralPattern, s): raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s result = 0 index = 0 for numeral, integer in romanNumeralMap: while s[index:index+len(numeral)] == numeral: result += integer index += len(numeral) return result ``` | | | | --- | --- | | \[1\] | 這只是 [第?7.3?節 “個案研究：羅馬字母”](../regular_expressions/roman_numerals.html "7.3.?個案研究：羅馬字母") 中討論的匹配模版的繼續。十位上可能是`XC` (`90`)，`XL` (`40`)，或者可能是 `L` 后面跟著 0 到 3 個 `X` 字符。個位則可能是 `IX` (`9`)，`IV` (`4`)，或者是一個可能是 `V` 后面跟著 0 到 3 個 `I` 字符。 | | \[2\] | 把所有的邏輯編碼成正則表達式，檢查無效羅馬字符的代碼就很簡單了。如果 `re.search` 返回一個對象則表示匹配了正則表達式，輸入是有效的，否則輸入無效。 | 這里你可能會懷疑，這個面目可憎的正則表達式是否真能查出錯誤的羅馬字符表示。沒關系，不必完全聽我的，不妨看看下面的結果： ## 例?14.13.?用 `romantest5.py` 測試 `roman5.py` 的結果 ``` fromRoman should only accept uppercase input ... ok toRoman should always return uppercase ... ok fromRoman should fail with malformed antecedents ... ok fromRoman should fail with repeated pairs of numerals ... ok fromRoman should fail with too many repeated numerals ... ok fromRoman should give known result with known input ... ok toRoman should give known result with known input ... ok fromRoman(toRoman(n))==n for all n ... ok toRoman should fail with non-integer input ... ok toRoman should fail with negative input ... ok toRoman should fail with large input ... ok toRoman should fail with 0 input ... ok ---------------------------------------------------------------------- Ran 12 tests in 2.864s OK ``` | | | | --- | --- | | \[1\] | 有件事我未曾講過，那就是默認情況下正則表達式大小寫敏感。由于正則表達式 `romanNumeralPattern` 是以大寫字母構造的，`re.search` 將拒絕不全部是大寫字母構成的輸入。因此大寫輸入的檢查就通過了。 | | \[2\] | 更重要的是，無效輸入測試也通過了。例如，上面這個用例測試了 `MCMC` 之類的情形。正如你所見，這不匹配正則表達式，因此 `fromRoman` 引發一個測試用例正在等待的 `InvalidRomanNumeralError` 異常，所以測試通過了。 | | \[3\] | 事實上，所有的無效輸入測試都通過了。正則表達式捕捉了你在編寫測試用例時所能預見的所有情況。 | | \[4\] | 最終迎來了 “`OK`”這個平淡的“年度大獎”，所有測試都通過后 `unittest` 模塊就會輸出它。 | > 注意 > 當所有測試都通過了，停止編程。