Chapter 14 Files 文件 · Think Python 2e 中文版

# Chapter 14 Files 文件 This chapter introduces the idea of “persistent” programs that keep data in permanent storage, and shows how to use different kinds of permanent storage, like files and databases. > 本章介紹的內容是『持久的』程序，就是把數據進行永久存儲，本章介紹了永久存儲的不同種類，比如文件與數據庫。 ## 14.1 Persistence 持久 Most of the programs we have seen so far are transient in the sense that they run for a short time and produce some output, but when they end, their data disappears. If you run the program again, it starts with a clean slate. > 目前為止我們見過的程序大多是很短暫的，它們往往只是運行那么一會，然后產生一些輸出，等運行結束了，它們的數據就也都沒了。如果你再次運行一個程序，又要從頭開始了。 Other programs are persistent: they run for a long time (or all the time); they keep at least some of their data in permanent storage (a hard drive, for example); and if they shut down and restart, they pick up where they left off. > 另外的一些程序就是持久的：它們運行時間很長（甚至一直在運行）；這些程序還會至少永久保存一部分數據（比如存在硬盤上等等）；然后如果程序關閉了或者重新開始了，也能從之前停留的狀態繼續工作。 Examples of persistent programs are operating systems, which run pretty much whenever a computer is on, and web servers, which run all the time, waiting for requests to come in on the network. > 這種有持久性的程序的例子很多，比如操作系統，幾乎只要電腦開著，操作系統就要運行；再比如網站服務器，也是要一直開著，等待來自網絡上的請求。 One of the simplest ways for programs to maintain their data is by reading and writing text files. We have already seen programs that read text files; in this chapter we will see programs that write them. > 程序保存數據最簡單的方法莫過于讀寫文本文件。之前我們已經見過一些讀取文本文件的程序了；本章中我們會來見識一下寫出文本的程序。 An alternative is to store the state of the program in a database. In this chapter I will present a simple database and a module, pickle, that makes it easy to store program data. > 另一種方法是把程序的狀態存到數據庫里面。在本章我會演示一種簡單的數據庫，以及一個 pickle 模塊，這個模塊大大簡化了保存程序數據的過程。 ## 14.2 Reading and writing 讀寫文件 A text file is a sequence of characters stored on a permanent medium like a hard drive, flash memory, or CD-ROM. We saw how to open and read a file in Section 9.1. > 文本文件就是一系列的字符串，存儲在一個永久介質中，比如硬盤、閃存或者光盤之類的東西里面。 > 在9.1的時候我們就看到過如何打開和讀取一個文件了。 To write a file, you have to open it with mode 'w' as a second parameter: > 要寫入一個文件，就必須要在打開它的時候用『w』作為第二個參數（譯者注：w 就是 wirte 的意思了）： ```Python >>> fout = open('output.txt', 'w') >>> fout = open('output.txt', 'w') ``` If the file already exists, opening it in write mode clears out the old data and starts fresh, so be careful! If the file doesn’t exist, a new one is created. > 如果文件已經存在了，這樣用寫入的模式來打開，會把舊的文件都清除掉，然后重新寫入文件，所以一定要小心！如果文件不存在，程序就會創建一個新的。 open returns a file object that provides methods for working with the file. The write method puts data into the file. > open 函數會返回一個文件對象，文件對象會提供各種方法來處理文件。write 這個方法就把數據寫入到文件中了。 ```Python >>> line1 = "This here's the wattle,\n" >>> line1 = "This here's the wattle,\n" >>> fout.write(line1) >>> fout.write(line1) 24 ``` The return value is the number of characters that were written. The file object keeps track of where it is, so if you call write again, it adds the new data to the end of the file. > 返回值是已寫入字符的數量。文件對象會記錄所在位置，所以如果你再次調用write方法，會從文件結尾的地方繼續添加新的內容。 ```Python >>> line2 = "the emblem of our land.\n" >>> line2 = "the emblem of our land.\n" >>> fout.write(line2) >>> fout.write(line2) 24 ``` When you are done writing, you should close the file. > 寫完文件之后，你需要用 close 方法來關閉文件。 ```Python >>> fout.close() >>> fout.close() ``` If you don’t close the file, it gets closed for you when the program ends. > 如果不 close 這個文件，就要等你的程序運行結束退出的時候，它自己才關閉了。 ## 14.3 Format operator 格式運算符 The argument of write has to be a string, so if we want to put other values in a file, we have to convert them to strings. The easiest way to do that is with str: > write?方法必須用字符串來做參數，所以如果要把其他類型的值寫入文件，就得先轉換成字符串才行。最簡單的方法就是用 str函數： >>> x = 52 >>> x = 52 >>> fout.write(str(x)) >>> fout.write(str(x)) An alternative is to use the format operator, %. When applied to integers, % is the modulus operator. But when the first operand is a string, % is the format operator. > 另外一個方法就是用格式運算符，也就是百分號%。在用于整數的時候，百分號%是取余數的運算符。但當第一個運算對象是字符串的時候，百分號%就成了格式運算符了。 The first operand is the format string, which contains one or more format sequences, which specify how the second operand is formatted. The result is a string. > 第一個運算對象也就是說明格式的字符串，包含一個或者更多的格式序列，規定了第二個運算對象的輸出格式。返回的結果就是格式化后的字符串了。 For example, the format sequence '%d' means that the second operand should be formatted as a decimal integer: > 例如，'%d'這個格式序列的意思就是第二個運算對象要被格式化成為一個十進制的整數： ```Python >>> camels = 42 >>> camels = 42 >>> '%d' % camels >>> '%d' % camels '42' ``` The result is the string '42', which is not to be confused with the integer value 42. > 你看，經過格式化后，結果就是字符串'42'了，而不是再是整數值42了。 A format sequence can appear anywhere in the string, so you can embed a value in a sentence: > 這種格式化序列可以放到一個字符串的任何一個位置，這樣就可以在一句話里面嵌入一個值了： ```Python >>> 'I have spotted %d camels.' % camels >>> 'I have spotted %d camels.' % camels 'I have spotted 42 camels.' ``` If there is more than one format sequence in the string, the second argument has to be a tuple. Each format sequence is matched with an element of the tuple, in order. > 如果格式化序列有一個以上了，那么第二個參數就必須是一個元組了。每個格式序列對應元組當中的一個元素，次序相同。 The following example uses '%d' to format an integer, '%g' to format a floating-point number, and '%s' to format a string: > 下面的例子中，用了'%d'來格式化輸出整型值，用'%g'來格式化浮點數，'%s'就是給字符串用的了。 ```Python >>> 'In %d years I have spotted %g %s.' % (3, 0.1, 'camels') >>> 'In %d years I have spotted %g %s.' % (3, 0.1, 'camels') 'In 3 years I have spotted 0.1 camels.' ``` The number of elements in the tuple has to match the number of format sequences in the string. Also, the types of the elements have to match the format sequences: > 這就要注意力，如果字符串中格式化序列有多個，那個數一定要和后面的元組中元素數量相等才行。另外格式化序列與元組中元素的類型也必須一樣： ```language >>> '%d %d %d' % (1, 2) >>> '%d %d %d' % (1, 2) TypeError: not enough arguments for format string >>> '%d' % 'dollars' >>> '%d' % 'dollars' TypeError: %d format: a number is required, not str ``` In the first example, there aren’t enough elements; in the second, the element is the wrong type. > 第一個例子中，后面元組的元素數量缺一個，所以報錯了；第二個例子中，元組里面的元素類型與前面格式不匹配，所以也報錯了。 For more information on the format operator, see [Here](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting). A more powerful alternative is the string format method, which you can read about at [Here](https://docs.python.org/3/library/stdtypes.html#str.format). > 想要對格式運算符進行深入了解，可以點擊[這里](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting)。然后還有一種功能更強大的替代方法，就是用字符串的格式化方法 format，可以點擊[這里](https://docs.python.org/3/library/stdtypes.html#str.format)來了解更多細節。 ## 14.4 Filenames and paths 文件名與路徑 Files are organized into directories (also called “folders”). Every running program has a “current directory”, which is the default directory for most operations. For example, when you open a file for reading, Python looks for it in the current directory. > 文件都是按照目錄（也叫文件夾）來組織存放的。每一個運行著的程序都有一個當前目錄，也就是用來處理絕大多數運算和操作的默認目錄。比如當你打開一個文件來讀取內容的時候，Python 就從當前目錄先來查找這個文件了。 The os module provides functions for working with files and directories (“os” stands for “operating system”). os.getcwd returns the name of the current directory: > 提供函數來處理文件和目錄的是 os 模塊（os 就是 operating system即操作系統的縮寫）。 ```Python >>> import os >>> import os >>> cwd = os.getcwd() >>> cwd = os.getcwd() >>> cwd >>> cwd '/home/dinsdale' ``` cwd stands for “current working directory”. The result in this example is/home/dinsdale, which is the home directory of a user named dinsdale. > cwd 代表的是『current working directory』（即當前工作目錄）的縮寫。剛剛這個例子中返回的結果是/home/dinsdale，這就是一個名字叫 dinsdale 的人的個人賬戶所在位置了。 A string like ’/home/dinsdale’ that identifies a file or directory is called a path. > 像是’/home/dinsdale’這樣表示一個文件或者目錄的字符串就叫做路徑。 A simple filename, like memo.txt is also considered a path, but it is a relative path because it relates to the current directory. If the current directory is/home/dinsdale, the filename memo.txt would refer to /home/dinsdale/memo.txt. > 一個簡單的文件名，比如 memo.txt 也可以被當做路徑，但這是相對路徑，因為這種路徑是指代了文件與當前工作目錄的相對位置。如果當前目錄是/home/dinsdale，那么 memo.txt 這個文件名指代的就是/home/dinsdale/memo.txt 這個文件了。 A path that begins with / does not depend on the current directory; it is called an absolute path. To find the absolute path to a file, you can use os.path.abspath: > 用右斜杠/開頭的路徑不依賴當前目錄；這就叫做絕對路徑。要找到一個文件的絕對路徑，可以用 os.path.abspath： ```Python >>> os.path.abspath('memo.txt') >>> os.path.abspath('memo.txt') '/home/dinsdale/memo.txt' ``` os.path provides other functions for working with filenames and paths. For example, os.path.exists checks whether a file or directory exists: > os.path 提供了其他一些函數，可以處理文件名和路徑。比如 os.path.exists 會檢查一個文件或者目錄是否存在： ```Python >>> os.path.exists('memo.txt') >>> os.path.exists('memo.txt') True ``` If it exists, os.path.isdir checks whether it’s a directory: > 如果存在，os.path.isdir 可以來檢查一下對象是不是一個目錄： ```Python >>> os.path.isdir('memo.txt') >>> os.path.isdir('memo.txt') False >>> os.path.isdir('/home/dinsdale') >>> os.path.isdir('/home/dinsdale') True ``` Similarly, os.path.isfile checks whether it’s a file. os.listdir returns a list of the files (and other directories) in the given directory: > 同理，os.path.isfile 就可以檢查對象是不是一個文件了。 > os.listdir 會返回指定目錄內的文件（以及次級目錄）列表。 ```Python >>> os.listdir(cwd) >>> os.listdir(cwd) ['music', 'photos', 'memo.txt'] ``` To demonstrate these functions, the following example “walks” through a directory, prints the names of all the files, and calls itself recursively on all the directories. > 為了展示一下這些函數的用法，下面這個例子中，walks 這個函數就遍歷了一個目錄，然后輸出了所有該目錄下的文件的名字，并且在該目錄下的所有子目錄中遞歸調用自身。 ```language def walk(dirname): for name in os.listdir(dirname): path = os.path.join(dirname, name) if os.path.isfile(path): print(path) else: walk(path) ``` os.path.join takes a directory and a file name and joins them into a complete path. > os.path.join 接收一個目錄和一個文件名做參數，然后把它們拼接成一個完整的路徑。 The os module provides a function called walk that is similar to this one but more versatile. As an exercise, read the documentation and use it to print the names of the files in a given directory and its subdirectories. You can download my solution from [Here](http://thinkpython2.com/code/walk.py). > os 模塊還提供了一個叫 walk 的函數，與上面這個函數很像，功能要更強大一些。做一個練習吧，讀一下文檔，然后用這個 walk 函數來輸出給定目錄中的文件名以及子目錄的名字。可以從[這里](http://thinkpython2.com/code/walk.py)下載我的樣例代碼。 ## 14.5 Catching exceptions 捕獲異常 A lot of things can go wrong when you try to read and write files. If you try to open a file that doesn’t exist, you get an IOError: > 讀寫文件的時候有很多容易出錯的地方。如果你要打開的文件不存在，就會得到一個 IOerror： ```Python >>> fin = open('bad_file') >>> fin = open('bad_file') IOError: [Errno 2] No such file or directory: 'bad_file' ``` If you don’t have permission to access a file: > 如果你要讀取一個文件卻沒有權限，就得到一個權限錯誤permissionError： ```Python >>> fout = open('/etc/passwd', 'w') >>> fout = open('/etc/passwd', 'w') PermissionError: [Errno 13] Permission denied: '/etc/passwd' ``` And if you try to open a directory for reading, you get > 如果你把一個目錄錯當做文件來打開，就會得到下面這種IsADirectoryError錯誤了： ```Python >>> fin = open('/home') >>> fin = open('/home') IsADirectoryError: [Errno 21] Is a directory: '/home' ``` To avoid these errors, you could use functions like os.path.exists and os.path.isfile, but it would take a lot of time and code to check all the possibilities (if “Errno 21” is any indication, there are at least 21 things that can go wrong). > 你可以用像是os.path.exists、os.path.isfile 等等這類的函數來避免上面這些錯誤，不過這就需要很長時間，還要檢查很多代碼（比如“Errno 21”就表明有至少21處地方有可能存在錯誤）。 It is better to go ahead and try—and deal with problems if they happen—which is exactly what the try statement does. The syntax is similar to an if...else statement: > 所以更好的辦法是提前檢查，用 try 語句，這種語句就是用來處理異常情況的。其語法形式就跟 if...else 語句是差不多的： ```Python try: fin = open('bad_file') except: print('Something went wrong.') ``` Python starts by executing the try clause. If all goes well, it skips the except clause and proceeds. If an exception occurs, it jumps out of the try clause and runs the except clause. > Python 會先執行 try 后面的語句。如果運行正常，就會跳過 except 語句，然后繼續運行。如果除了異常，就會跳出 try 語句，然后運行 except 語句中的代碼。 Handling an exception with a try statement is called catching an exception. In this example, the except clause prints an error message that is not very helpful. In general, catching an exception gives you a chance to fix the problem, or try again, or at least end the program gracefully. > 這種用 try 語句來處理異常的方法，就叫異常捕獲。上面的例子中，except 語句中的輸出信息并沒有什么用。一般情況，得到異常之后，你可以選擇解決掉這個問題或者再重試一下，或者就以正常狀態退出程序了。 ## 14.6 Databases 數據庫 A database is a file that is organized for storing data. Many databases are organized like a dictionary in the sense that they map from keys to values. The biggest difference between a database and a dictionary is that the database is on disk (or other permanent storage), so it persists after the program ends. > 數據庫是一個用來管理已存儲數據的文件。很多數據庫都以類似字典的形式來管理數據，就是從鍵到鍵值成對映射。數據庫和字典的最大區別就在于數據庫是存儲在磁盤（或者其他永久性存儲設備中），所以程序運行結束退出后，數據庫依然存在。 > （譯者注：這里作者為了便于理解，對數據庫的概念進行了極度的簡化，實際上數據庫的類型、模式、功能等等都與字典有很大不同，比如有關系型數據庫和非關系型數據庫，還有分布式的和單一文件式的等等。如果有興趣對數據庫進行進一步了解，譯者推薦一本書：SQLite Python Tutorial。） The module dbm provides an interface for creating and updating database files. As an example, I’ll create a database that contains captions for image files. Opening a database is similar to opening other files: > dbm 模塊提供了一個創建和更新數據庫文件的交互界面。下面這個例子中，我創建了一個數據庫，其中的內容是圖像文件的標題。 > 打開數據庫文件就跟打開其他文件差不多： ```Python >>> import dbm >>> import dbm >>> db = dbm.open('captions', 'c') >>> db = dbm.open('captions', 'c') ``` The mode 'c' means that the database should be created if it doesn’t already exist. The result is a database object that can be used (for most operations) like a dictionary. > 后面這個 c 是一個模式，意思是如果該數據庫不存在就創建一個新的。得到的返回結果就是一個數據庫對象了，用起來很多的運算都跟字典很像。 When you create a new item, dbm updates the database file. > 創建一個新的項的時候，dbm 就會對數據庫文件進行更新了。 ```Python >>> db['cleese.png'] = 'Photo of John Cleese.' >>> db['cleese.png'] = 'Photo of John Cleese.' ``` When you access one of the items, dbm reads the file: > 讀取里面的某一項的時候，dbm 就讀取數據庫文件： ```Python >>>db['cleese.png'] >>>db['cleese.png'] b'Photo of John Cleese.' ``` The result is a bytes object, which is why it begins with b. A bytes object is similar to a string in many ways. When you get farther into Python, the difference becomes important, but for now we can ignore it. > 上面的代碼返回的結果是一個二進制對象，這也就是開頭有個 b 的原因了。二進制對象就跟字符串在很多方面都挺像的。以后對 Python 的學習深入了之后，這種區別就變得很重要了，不過現在還不要緊，咱們就忽略掉。 If you make another assignment to an existing key, dbm replaces the old value: > 如果對一個已經存在值的鍵進行賦值，dbm 就會把舊的值替換成新的值： ```Python >>> db['cleese.png'] = 'Photo of John Cleese doing a silly walk.' >>> db['cleese.png'] = 'Photo of John Cleese doing a silly walk.' >>> db['cleese.png'] >>> db['cleese.png'] b'Photo of John Cleese doing a silly walk.' ``` Some dictionary methods, like keys and items, don’t work with database objects. But iteration with a for loop works: > 字典的一些方法，比如 keys 和 items，是不能用于數據庫對象的。但用一個 for 循環來迭代是可以的： ```Python for key in db: print(key, db[key]) ``` As with other files, you should close the database when you are done: > 然后就同其他文件一樣，用完了之后你得用 close 方法關閉數據庫： ```Python >>> db.close() >>> db.close() ``` ## 14.7 Pickling Pickle模塊 A limitation of dbm is that the keys and values have to be strings or bytes. If you try to use any other type, you get an error. > dbm 的局限就在于鍵和鍵值必須是字符串或者二進制。如果用其他類型數據，就得到錯誤了。 The pickle module can help. It translates almost any type of object into a string suitable for storage in a database, and then translates strings back into objects. > 這時候就可以用 pickle 模塊了。該模塊可以把幾乎所有類型的對象翻譯成字符串模式，以便存儲在數據庫中，然后用的時候還可以把字符串再翻譯回來。 pickle.dumps takes an object as a parameter and returns a string representation (dumps is short for “dump string”): > pickle.dumps 接收一個對象做參數，然后返回一個字符串形式的內容翻譯（dumps 就是『dump string』的縮寫）： ```Python >>> import pickle >>> import pickle >>> t = [1, 2, 3] >>> t = [1, 2, 3] >>> pickle.dumps(t) >>> pickle.dumps(t) b'\x80\x03]q\x00(K\x01K\x02K\x03e.' ``` The format isn’t obvious to human readers; it is meant to be easy for pickle to interpret. pickle.loads (“load string”) reconstitutes the object: > 這種格式讓人讀起來挺復雜；這種設計能讓 pickle 模塊解譯起來比較容易。pickle.lods("load string")就又會把原來的對象解譯出來： ```Python >>> t1 = [1, 2, 3] >>> t1 = [1, 2, 3] >>> s = pickle.dumps(t1) >>> s = pickle.dumps(t1) >>> t2 = pickle.loads(s) >>> t2 = pickle.loads(s) >>> t2 >>> t2 [1, 2, 3] ``` Although the new object has the same value as the old, it is not (in general) the same object: > 這里要注意了，新的對象與舊的有一樣的值，但（通常）并不是同一個對象： ```Python >>> t1 == t2 >>> t1 == t2 True >>> t1 is t2 >>> t1 is t2 False ``` In other words, pickling and then unpickling has the same effect as copying the object. > 換句話說，就是說 pickle 解譯的過程就如同復制了原有對象一樣。 You can use pickle to store non-strings in a database. In fact, this combination is so common that it has been encapsulated in a module called shelve. > 有 pickle了，就可以把非字符串的數據也存到數據庫里面了。實際上這種結合方式特別普遍，已經封裝到一個叫shelve的模塊中了。 ## 14.8 Pipes 管道 Most operating systems provide a command-line interface, also known as a shell. Shells usually provide commands to navigate the file system and launch applications. For example, in Unix you can change directories with cd, display the contents of a directory with ls, and launch a web browser by typing (for example) firefox. > 大多數操作系統都提供了一個命令行界面，也被稱作『shell』。Shell 通常提供了很多基礎的命令，能夠來搜索文件系統，以及啟動應用軟件。比如，在 Unix 下面，就可以通過 cd 命令來切換目錄，用 ls 命令來顯示一個目錄下的內容，如果裝了火狐瀏覽器，就可以輸入 fireforx 來啟動瀏覽器了。 Any program that you can launch from the shell can also be launched from Python using a pipe object, which represents a running program. > 在 shell 下能夠啟動的所有程序，也都可以在 Python 中啟動，這要用到一個 pipe 對象，這個直接翻譯意思為管道的對象可以理解為 Python 到操作系統的 Shell 進行通信的途徑，一個 pipe 對象就代表了一個運行的程序。 For example, the Unix command ls -l normally displays the contents of the current directory in long format. You can launch ls with os.popen: > 舉個例子吧，Unix 的 ls -l 命令通常會用長文件名格式來顯示當前目錄的內容。在 Python 中就可以用 os.open 來啟動它： ```Python >>> cmd = 'ls -l' >>> cmd = 'ls -l' >>> fp = os.popen(cmd) >>> fp = os.popen(cmd) ``` The argument is a string that contains a shell command. The return value is an object that behaves like an open file. You can read the output from the ls process one line at a time with readline or get the whole thing at once with read: > 參數 cmd 是包含了 shell 命令的一個字符串。返回的結果是一個對象，用起來就像是一個打開了的文件一樣。 > 可以讀取ls 進程的輸出，用 readline 的話每次讀取一行，用 read 的話就一次性全部讀取： ```Python >>> res = fp.read() >>> res = fp.read() ``` When you are done, you close the pipe like a file: > 用完之后要關閉，這點也跟文件一樣： ```Python >>> stat = fp.close() >>> stat = fp.close() >>> print(stat) >>> print(stat) None ``` The return value is the final status of the ls process; None means that it ended normally (with no errors). For example, most Unix systems provide a command called md5sum that reads the contents of a file and computes a “checksum”. You can read about MD5 at [Here](http://en.wikipedia.org/wiki/Md5). > 返回值是 ls 這個進程的最終狀態；None 的意思就是正常退出（沒有錯誤）。 > 舉個例子，大多數 Unix 系統都提供了一個教唆 md5sum 的函數，會讀取一個文件的內容，然后計算一個『checksum』（校驗值）。你可以點擊[這里](http://en.wikipedia.org/wiki/Md5)閱讀更多相關內容。 This command provides an efficient way to check whether two files have the same contents. The probability that different contents yield the same checksum is very small (that is, unlikely to happen before the universe collapses). > 這個命令可以很有效地檢查兩個文件是否有相同內容。兩個不同內容產生同樣的校驗值的可能性是很小的（實際上在宇宙坍塌之前都沒戲）。 You can use a pipe to run md5sum from Python and get the result: > 你就可以用一個 pipe 來從 Python 啟動運行 md5sum，然后獲取結果： ```Python >>> filename = 'book.tex' >>> filename = 'book.tex' >>> cmd = 'md5sum ' + filename >>> cmd = 'md5sum ' + filename >>> fp = os.popen(cmd) >>> fp = os.popen(cmd) >>> res = fp.read() >>> res = fp.read() >>> stat = fp.close() >>> stat = fp.close() >>> print(res) >>> print(res) 1e0033f0ed0656636de0d75144ba32e0 book.tex >>> print(stat) >>> print(stat) None ``` ## 14.9 Writing modules 編寫模塊 Any file that contains Python code can be imported as a module. For example, suppose you have a file named wc.py with the following code: > 任何包含 Python 代碼的文件都可以作為模塊被導入使用。舉個例子，假設你有一個名字叫 wc.py 的文件，里面代碼如下： ```Python def linecount(filename): count = 0 for line in open(filename): count += 1 return count print(linecount('wc.py')) ``` If you run this program, it reads itself and prints the number of lines in the file, which is 7. You can also import it like this: > 如果運行這個程序，程序就會讀取自己本身，然后輸出文件中的行數，也就是7行了。你還可以導入這個模塊，如下所示： ```Python >>> import wc >>> import wc 7 ``` Now you have a module object wc: > 現在你就有一個模塊對象 wc 了： ```Python >>> wc >>> wc <module 'wc' from 'wc.py'> ``` The module object provides linecount: > 該模塊提供了數行數的函數linecount： ```Python >>> wc.linecount('wc.py') >>> wc.linecount('wc.py') 7 ``` So that’s how you write modules in Python. The only problem with this example is that when you import the module it runs the test code at the bottom. Normally when you import a module, it defines new functions but it doesn’t run them. > 你看，你就可以這樣來為 Python 寫模塊了。 > 當然這個例子中有個小問題，就是導入模塊的時候，模塊內代碼在最后一行對自身進行了測試。 > 一般情況你導入一個模塊，模塊只是定義了新的函數，但不會去主動運行自己內部的函數。 Programs that will be imported as modules often use the following idiom: > 以模塊方式導入使用的程序一般用下面這樣的慣用形式： ```Python if __name__ == '__main__': print(linecount('wc.py')) ``` \_\_name\_\_ is a built-in variable that is set when the program starts. If the program is running as a script, \_\_name\_\_ has the value '\_\_main\_\_'; in that case, the test code runs. Otherwise, if the module is being imported, the test code is skipped. > \_\_name\_\_ 是一個內置變量，當程序開始運行的時候被設置。如果程序是作為腳本來運行的，\_\_name\_\_ 的值就是'\_\_main\_\_'；這樣的話，if條件滿足，測試代碼就會運行。而如果該代碼被用作模塊導入了，if 條件不滿足，測試的代碼就不會運行了。 As an exercise, type this example into a file named wc.py and run it as a script. Then run the Python interpreter and import wc. What is the value of \_\_name\_\_when the module is being imported? > 做個聯系吧，把上面的例子輸入到一個名為 wc.py 的文件中，然后作為腳本運行。然后再運行 Python 解釋器，然后導入 wc 作為模塊。看看作為模塊導入的時候\_\_name\_\_ 的值是什么？ Warning: If you import a module that has already been imported, Python does nothing. It does not re-read the file, even if it has changed. > 警告：如果你導入了一個已經導入過的模塊，Python 是不會有任何提示的。Python 并不會重新讀取模塊文件，即便該文件又被修改過也是如此。 If you want to reload a module, you can use the built-in function reload, but it can be tricky, so the safest thing to do is restart the interpreter and then import the module again. > 所以如果你想要重新加在一個模塊，你可以用內置函數 reload，但這個也不太靠譜，所以最靠譜的辦法莫過于重啟解釋器，然后再次導入該模塊。 ## 14.10 Debugging 調試 When you are reading and writing files, you might run into problems with whitespace. These errors can be hard to debug because spaces, tabs and newlines are normally invisible: > 讀寫文件的時候，你可能會碰到空格導致的問題。這些問題很難解決，因為空格、跳表以及換行，平常就難以用眼睛看出來： ```Python >>> s = '1 2\t 3\n 4' >>> s = '1 2\t 3\n 4' >>> print(s) >>> print(s) 1 2 3 4 ``` The built-in function repr can help. It takes any object as an argument and returns a string representation of the object. For strings, it represents whitespace characters with backslash sequences: > 這時候就可以用內置函數 repr 來幫忙。它接收任意對象作為參數，然后返回一個該對象的字符串表示。對于字符串，該函數可以把空格字符轉成反斜杠序列： ```Python >>> print(repr(s)) >>> print(repr(s)) '1 2\t 3\n 4' ``` This can be helpful for debugging. > 該函數的功能對調試來說很有幫助。 One other problem you might run into is that different systems use different characters to indicate the end of a line. Some systems use a newline, represented \n. Others use a return character, represented \r. Some use both. If you move files between different systems, these inconsistencies can cause problems. > 另外一個問題就是不同操作系統可能用不同字符表示行尾。 > 有的用一個換行符，也就是\n。有的用一個返回字符，也就是\r。有的兩個都虧。如果你把文件在不同操作系統只見移動，這種不兼容性就可能導致問題了。 For most systems, there are applications to convert from one format to another. You can find them (and read more about this issue) at [Here](http://en.wikipedia.org/wiki/Newline). Or, of course, you could write one yourself. > 對大多數操作系統，都有一些應用軟件來進行格式轉換。你可以在[這里](http://en.wikipedia.org/wiki/Newline)查找一下（并且閱讀關于該問題的更多細節）。當然，你也可以自己寫一個轉換工具了。 > > （譯者注：譯者這里也鼓勵大家，一般的小工具，自己有時間有精力的話完全可以嘗試著自己寫一寫，對自己是個磨練，也有利于對語言進行進一步的熟悉。這里再推薦一本書：Automate the Boring Stuff with，作者是 Al Sweigart。該書里面提到了很多常用的任務用 Python 來實現。） ## 14.11 Glossary 術語列表 persistent: Pertaining to a program that runs indefinitely and keeps at least some of its data in permanent storage. > 持久性：指一個程序可以隨時運行，然后可以存儲一部分數據到永久介質中。 format operator: An operator, %, that takes a format string and a tuple and generates a string that includes the elements of the tuple formatted as specified by the format string. > 格式運算符：%運算符，處理字符串和元組，然后生成一個包含元組中元素的字符串，根據給定的格式字符串進行格式化。 format string: A string, used with the format operator, that contains format sequences. > 格式字符串：用于格式運算符的一個字符串，內含格式序列。 format sequence: A sequence of characters in a format string, like %d, that specifies how a value should be formatted. > 格式序列：格式字符串內的一串字符，比如%d，規定了一個值如何進行格式化。 text file: A sequence of characters stored in permanent storage like a hard drive. > 文本文件：磁盤中永久存儲的一個文件，內容為一系列的字符。 directory: A named collection of files, also called a folder. > 目錄：有名字的文件集合，也叫做文件夾。 path: A string that identifies a file. > 路徑：指向某個文件的字符串。 relative path: A path that starts from the current directory. > 相對路徑：從當前目錄開始，到目標文件的路徑。 absolute path: A path that starts from the topmost directory in the file system. > 絕對路徑：從文件系統最底層的根目錄開始，到目標文件的路徑。 catch: To prevent an exception from terminating a program using the try and except statements. > 拋出異常：為了避免意外錯誤中止程序，使用 try 和 except 語句來處理異常。 database: A file whose contents are organized like a dictionary with keys that correspond to values. > 數據庫：一個文件，全部內容以類似字典的方式來組織，為鍵與對應的鍵值。 bytes object: An object similar to a string. > 二進制對象：暫時就當作是根字符串差不多的對象就可以了。 shell: A program that allows users to type commands and then executes them by starting other programs. > shell：一個程序，允許用戶與操作系統進行交互，可以輸入命令，然后啟動一些其他程序來執行。 pipe object: An object that represents a running program, allowing a Python program to run commands and read the results. > 管道對象：代表了一個正在運行的程序的對象，允許一個 Python 程序運行命令并讀取運行結果。 ## 14.12 Exercises 練習 ### Exercise 1 練習1 Write a function called sed that takes as arguments a pattern string, a replacement string, and two filenames; it should read the first file and write the contents into the second file (creating it if necessary). If the pattern string appears anywhere in the file, it should be replaced with the replacement string. > 寫一個函數，名為 sed，接收一個目標字符串，一個替換字符串，然后兩個文件名；讀取第一個文件，然后把內容寫入到第二個文件中，如果第二個文件不存在，就創建一個。如果目標字符串在文件中出現了，就用替換字符串把它替換掉。 If an error occurs while opening, reading, writing or closing files, your program should catch the exception, print an error message, and exit. [Solution](http://thinkpython2.com/code/sed.py). > 如果在打開、讀取、寫入或者關閉文件的時候發生了錯誤了，你的程序應該要捕獲異常，然后輸出錯誤信息，然后再退出。[樣例代碼](http://thinkpython2.com/code/sed.py)。 ### Exercise 2 練習2 If you download my solution to Exercise 2 from [Here](http://thinkpython2.com/code/anagram_sets.py), you’ll see that it creates a dictionary that maps from a sorted string of letters to the list of words that can be spelled with those letters. For example, ’opst’ maps to the list [’opts’, ’post’, ’pots’, ’spot’, ’stop’, ’tops’]. > 如果你從 [這里](http://thinkpython2.com/code/anagram_sets.py)下載了我的樣例代碼，你會發現該程序創建了一個字典，建立了從一個有序字母字符串到一個單詞列表的映射，列表中的單詞可以由這些字母拼成。例如'opst'就映射到了列表 [’opts’, ’post’, ’pots’, ’spot’, ’stop’, ’tops’]. Write a module that imports anagram_sets and provides two new functions:store_anagrams should store the anagram dictionary in a “shelf”;read_anagrams should look up a word and return a list of its anagrams. [Solution](http://thinkpython2.com/code/anagram_db.py). > 寫一個模塊，導入 anagram_sets 然后提供兩個函數：store_anagrams 可以把相同字母異序詞詞典存儲到一個『shelf』；read_anagrams 可以查找一個詞，返回一個由其相同字母異序詞組成的列表。 > [樣例代碼](http://thinkpython2.com/code/anagram_db.py)。 ### Exercise 3 練習3 In a large collection of MP3 files, there may be more than one copy of the same song, stored in different directories or with different file names. The goal of this exercise is to search for duplicates. > 現在有很多 MP3文件的一個大集合里面，一定有很多同一首歌重復了，然后存在不同的目錄或者保存的名字不同。本次練習的目的就是要找到這些重復的內容。 1. Write a program that searches a directory and all of its subdirectories, recursively, and returns a list of complete paths for all files with a given suffix (like .mp3). Hint: os.path provides several useful functions for manipulating file and path names. > 首先寫一個程序，搜索一個目錄并且遞歸搜索所有子目錄，然后返回一個全部給定后綴（比如.mp3）的文件的路徑。提示：os.path 提供了一些函數，能用來處理文件和路徑名稱。 2. To recognize duplicates, you can use md5sum to compute a “checksum” for each files. If two files have the same checksum, they probably have the same contents. > 要識別重復文件，要用到 md5sum 函數來對每一個文件計算一個『校驗值』。如果兩個文件校驗值相同，那很可能就是有同樣的內容了。 3. To double-check, you can use the Unix command diff. > 為了保險起見，再用 Unix 的 diff 命令來檢查一下。 [Solution](http://thinkpython2.com/code/find_duplicates.py). > [樣例代碼](http://thinkpython2.com/code/find_duplicates.py)。 * * * 備注1 popen is deprecated now, which means we are supposed to stop using it and start using the subprocess module. But for simple cases, I find subprocess more complicated than necessary. So I am going to keep using popen until they take it away. > 注意，popen 已經不被支持了，這就意味著咱們不應該再用它了，然后要用新的 subprocess 模塊。不過為了讓案例更簡單明了，還是用了 popen，引起我發現 subprocess 過于復雜，而且也沒太大必要。所以我就打算一直用著 popen，直到這個方法被廢棄移除不能使用了再說了。