文件(2) · 零基礎學python（第二版）

上一節，對文件有了初步認識。要牢記，文件無非也是一種類型的數據。 ## [](https://github.com/qiwsir/StarterLearningPython/blob/master/127.md#文件的狀態)文件的狀態很多時候，我們需要獲取一個文件的有關狀態（也稱為屬性），比如創建日期，訪問日期，修改日期，大小，等等。在os模塊中，有這樣一個方法，專門讓我們查看文件的這些狀態參數的。 ~~~ >>> import os >>> file_stat = os.stat("131.txt") #查看這個文件的狀態 >>> file_stat #文件狀態是這樣的。從下面的內容，有不少從英文單詞中可以猜測出來。 posix.stat_result(st_mode=33204, st_ino=5772566L, st_dev=2049L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=69L, st_atime=1407897031, st_mtime=1407734600, st_ctime=1407734600) >>> file_stat.st_ctime #這個是文件創建時間 1407734600.0882277 ~~~ 這是什么時間？看不懂！別著急，換一種方式。在python中，有一個模塊`time`，是專門針對時間設計的。 ~~~ >>> import time >>> time.localtime(file_stat.st_ctime) #這回看清楚了。 time.struct_time(tm_year=2014, tm_mon=8, tm_mday=11, tm_hour=13, tm_min=23, tm_sec=20, tm_wday=0, tm_yday=223, tm_isdst=0) ~~~ ## [](https://github.com/qiwsir/StarterLearningPython/blob/master/127.md#readreadlinereadlines)read/readline/readlines 上節中，簡單演示了如何讀取文件內容，但是，在用`dir(file)`的時候，會看到三個函數：read/readline/readlines，它們各自有什么特點，為什么要三個？一個不行嗎？在讀者向下看下面內容之前，請想一想，如果要回答這個問題，你要用什么方法？注意，我問的是用什么方法能夠找到答案，不是問答案內容是什么。因為內容，肯定是在某個地方存放著呢，關鍵是用什么方法找到。搜索？是一個不錯的方法。還有一種，就是在交互模式下使用的，你肯定也想到了。 ~~~ >>> help(file.read) ~~~ 用這樣的方法，可以分別得到三個函數的說明： ~~~ read(...) read([size]) -> read at most size bytes, returned as a string. If the size argument is negative or omitted, read until EOF is reached. Notice that when in non-blocking mode, less data than what was requested may be returned, even if no size parameter was given. readline(...) readline([size]) -> next line from the file, as a string. Retain newline. A non-negative size argument limits the maximum number of bytes to return (an incomplete line may be returned then). Return an empty string at EOF. readlines(...) readlines([size]) -> list of strings, each a line from the file. Call readline() repeatedly and return a list of the lines so read. The optional size argument, if given, is an approximate bound on the total number of bytes in the lines returned. ~~~ 對照一下上面的說明，三個的異同就顯現了。 EOF什么意思？End-of-file。在[維基百科](http://en.wikipedia.org/wiki/End-of-file)中居然有對它的解釋： ~~~ In computing, End Of File (commonly abbreviated EOF[1]) is a condition in a computer operating system where no more data can be read from a data source. The data source is usually called a file or stream. In general, the EOF is either determined when the reader returns null as seen in Java's BufferedReader,[2] or sometimes people will manually insert an EOF character of their choosing to signal when the file has ended. ~~~ 明白EOF之后，就對比一下： * read：如果指定了參數size，就按照該指定長度從文件中讀取內容，否則，就讀取全文。被讀出來的內容，全部塞到一個字符串里面。這樣有好處，就是東西都到內存里面了，隨時取用，比較快捷；“成也蕭何敗蕭何”，也是因為這點，如果文件內容太多了，內存會吃不消的。文檔中已經提醒注意在“non-blocking”模式下的問題，關于這個問題，不是本節的重點，暫時不討論。 * readline：那個可選參數size的含義同上。它則是以行為單位返回字符串，也就是每次讀一行，依次循環，如果不限定size，直到最后一個返回的是空字符串，意味著到文件末尾了(EOF)。 * readlines：size同上。它返回的是以行為單位的列表，即相當于先執行`readline()`，得到每一行，然后把這一行的字符串作為列表中的元素塞到一個列表中，最后將此列表返回。依次演示操作，即可明了。有這樣一個文檔，名曰：you.md，其內容和基本格式如下： > You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be. 分別用上述三種函數讀取這個文件。 ~~~ >>> f = open("you.md") >>> content = f.read() >>> content 'You Raise Me Up\nWhen I am down and, oh my soul, so weary;\nWhen troubles come and my heart burdened be;\nThen, I am still and wait here in the silence,\nUntil you come and sit awhile with me.\nYou raise me up, so I can stand on mountains;\nYou raise me up, to walk on stormy seas;\nI am strong, when I am on your shoulders;\nYou raise me up: To more than I can be.\n' >>> f.close() ~~~ **提示：養成一個好習慣，**只要打開文件，不用該文件了，就一定要隨手關閉它。如果不關閉它，它還駐留在內存中，后面又沒有對它的操作，是不是浪費內存空間了呢？同時也增加了文件安全的風險。 > 注意：在python中，'\n'表示換行，這也是UNIX系統中的規范。但是，在奇葩的windows中，用'\r\n'表示換行。python在處理這個的時候，會自動將'\r\n'轉換為'\n'。請仔細觀察，得到的就是一個大大的字符串，但是這個字符串里面包含著一些符號`\n`，因為原文中有換行符。如果用print輸出這個字符串，就是這樣的了，其中的`\n`起作用了。 ~~~ >>> print content You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be. ~~~ 用`readline()`讀取，則是這樣的： ~~~ >>> f = open("you.md") >>> f.readline() 'You Raise Me Up\n' >>> f.readline() 'When I am down and, oh my soul, so weary;\n' >>> f.readline() 'When troubles come and my heart burdened be;\n' >>> f.close() ~~~ 顯示出一行一行讀取了，每操作一次`f.readline()`，就讀取一行，并且將指針向下移動一行，如此循環。顯然，這種是一種循環，或者說可迭代的。因此，就可以用循環語句來完成對全文的讀取。 ~~~ #!/usr/bin/env python # coding=utf-8 f = open("you.md") while True: line = f.readline() if not line: #到EOF，返回空字符串，則終止循環 break print line , #注意后面的逗號，去掉print語句后面的'\n'，保留原文件中的換行 f.close() #別忘記關閉文件 ~~~ 將其和文件"you.md"保存在同一個目錄中，我這里命名的文件名是12701.py，然后在該目錄中運行`python 12701.py`，就看到下面的效果了： ~~~ ~/Documents$ python 12701.py You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be. ~~~ 也用`readlines()`來讀取此文件： ~~~ >>> f = open("you.md") >>> content = f.readlines() >>> content ['You Raise Me Up\n', 'When I am down and, oh my soul, so weary;\n', 'When troubles come and my heart burdened be;\n', 'Then, I am still and wait here in the silence,\n', 'Until you come and sit awhile with me.\n', 'You raise me up, so I can stand on mountains;\n', 'You raise me up, to walk on stormy seas;\n', 'I am strong, when I am on your shoulders;\n', 'You raise me up: To more than I can be.\n'] ~~~ 返回的是一個列表，列表中每個元素都是一個字符串，每個字符串中的內容就是文件的一行文字，含行末的符號。顯而易見，它是可以用for來循環的。 ~~~ >>> for line in content: ... print line , ... You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be. >>> f.close() ~~~ ## [](https://github.com/qiwsir/StarterLearningPython/blob/master/127.md#讀很大的文件)讀很大的文件前面已經說明了，如果文件太大，就不能用`read()`或者`readlines()`一次性將全部內容讀入內存，可以使用while循環和`readlin()`來完成這個任務。此外，還有一個方法：fileinput模塊 ~~~ >>> import fileinput >>> for line in fileinput.input("you.md"): ... print line , ... You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be. ~~~ 我比較喜歡這個，用起來是那么得心應手，簡潔明快，還用for。對于這個模塊的更多內容，讀者可以自己在交互模式下利用`dir()`，`help()`去查看明白。還有一種方法，更為常用： ~~~ >>> for line in f: ... print line , ... You Raise Me Up When I am down and, oh my soul, so weary; When troubles come and my heart burdened be; Then, I am still and wait here in the silence, Until you come and sit awhile with me. You raise me up, so I can stand on mountains; You raise me up, to walk on stormy seas; I am strong, when I am on your shoulders; You raise me up: To more than I can be. ~~~ 之所以能夠如此，是因為file是可迭代的數據類型，直接用for來迭代即可。 ## [](https://github.com/qiwsir/StarterLearningPython/blob/master/127.md#seek)seek 這個函數的功能就是讓指針移動。特別注意，它是以字節為單位進行移動的。比如： ~~~ >>> f = open("you.md") >>> f.readline() 'You Raise Me Up\n' >>> f.readline() 'When I am down and, oh my soul, so weary;\n' ~~~ 現在已經移動到第四行末尾了，看`seek()`的能力： ~~~ >>> f.seek(0) ~~~ 意圖是要回到文件的最開頭，那么如果用`f.readline()`應該讀取第一行。 ~~~ >>> f.readline() 'You Raise Me Up\n' ~~~ 果然如此。此時指針所在的位置，還可以用`tell()`來顯示，如 ~~~ >>> f.tell() 17L >>> f.seek(4) ~~~ `f.seek(4)`就將位置定位到從開頭算起的第四個字符后面，也就是"You "之后，字母"R"之前的位置。 ~~~ >>> f.tell() 4L ~~~ `tell()`也是這么說的。這時候如果使用`readline()`，得到就是從當前位置開始到行末。 ~~~ >>> f.readline() 'Raise Me Up\n' >>> f.close() ~~~ `seek()`還有別的參數，具體如下： > seek(...) seek(offset[, whence]) -> None. Move to new file position. > > Argument offset is a byte count. Optional argument whence defaults to 0 (offset from start of file, offset should be >= 0); other values are 1 (move relative to current position, positive or negative), and 2 (move relative to end of file, usually negative, although many platforms allow seeking beyond the end of a file). If the file is opened in text mode, only offsets returned by tell() are legal. Use of other offsets causes undefined behavior. Note that not all file objects are seekable. whence的值： * 默認值是0，表示從文件開頭開始計算指針偏移的量（簡稱偏移量）。這是offset必須是大于等于0的整數。 * 是1時，表示從當前位置開始計算偏移量。offset如果是負數，表示從當前位置向前移動，整數表示向后移動。 * 是2時，表示相對文件末尾移動。