17.1 文字處理 · Java 編程思想

# 17.1 文字處理如果您有C或C++的經驗，那么最開始可能會對Java控制文本的能力感到懷疑。事實上，我們最害怕的就是速度特別慢，這可能妨礙我們創造能力的發揮。然而，Java對應的工具（特別是`String`類）具有很強的功能，就象本節的例子展示的那樣（而且性能也有一定程度的提升）。正如大家即將看到的那樣，建立這些例子的目的都是為了解決本書編制過程中遇到的一些問題。但是，它們的能力并非僅止于此。通過簡單的改造，即可讓它們在其他場合大顯身手。除此以外，它們還揭示出了本書以前沒有強調過的一項Java特性。 ## 17.1.1 提取代碼列表對于本書每一個完整的代碼列表（不是代碼段），大家無疑會注意到它們都用特殊的注釋記號起始與結束（`//:`和`///:~`）。之所以要包括這種標志信息，是為了能將代碼從本書自動提取到兼容的源碼文件中。在我的前一本書里，我設計了一個系統，可將測試過的代碼文件自動合并到書中。但對于這本書，我發現一種更簡便的做法是一旦通過了最初的測試，就把代碼粘貼到書中。而且由于很難第一次就編譯通過，所以我在書的內部編輯代碼。但如何提取并測試代碼呢？這個程序就是關鍵。如果你打算解決一個文字處理的問題，那么它也很有利用價值。該例也演示了`String`類的許多特性。我首先將整本書都以ASCII文本格式保存成一個獨立的文件。`CodePackager`程序有兩種運行模式（在`usageString`有相應的描述）：如果使用`-p`標志，程序就會檢查一個包含了ASCII文本（即本書的內容）的一個輸入文件。它會遍歷這個文件，按照注釋記號提取出代碼，并用位于第一行的文件名來決定創建文件使用什么名字。除此以外，在需要將文件置入一個特殊目錄的時候，它還會檢查`package`語句（根據由`package`語句指定的路徑選擇）。但這樣還不夠。程序還要對包（`package`）名進行跟蹤，從而監視章內發生的變化。由于每一章使用的所有包都以`c02`，`c03`，`c04`等等起頭，用于標記它們所屬的是哪一章（除那些以`com`起頭的以外，它們在對不同的章進行跟蹤的時候會被忽略）——只要每一章的第一個代碼列表包含了一個`package`，所以`CodePackager`程序能知道每一章發生的變化，并將后續的文件放到新的子目錄里。每個文件提取出來時，都會置入一個`SourceCodeFile`對象，隨后再將那個對象置入一個集合（后面還會詳盡講述這個過程）。這些`SourceCodeFile`對象可以簡單地保存在文件中，那正是本項目的第二個用途。如果直接調用`CodePackager`，不添加`-p`標志，它就會將一個“打包”文件作為輸入。那個文件隨后會被提取（釋放）進入單獨的文件。所以`-p`標志的意思就是提取出來的文件已被“打包”（`packed`）進入這個單一的文件。但為什么還要如此麻煩地使用打包文件呢？這是由于不同的計算機平臺用不同的方式在文件里保存文本信息。其中最大的問題是換行字符的表示方法；當然，還有可能存在另一些問題。然而，Java有一種特殊類型的IO數據流——`DataOutputStream`——它可以保證“無論數據來自何種機器，只要使用一個`DataInputStream`收取這些數據，就可用本機正確的格式保存它們”。也就是說，Java負責控制與不同平臺有關的所有細節，而這正是Java最具魅力的一點。所以`-p`標志能將所有東西都保存到單一的文件里，并采用通用的格式。用戶可從Web下載這個文件以及Java程序，然后對這個文件運行`CodePackager`，同時不指定`-p`標志，文件便會釋放到系統中正確的場所（亦可指定另一個子目錄；否則就在當前目錄創建子目錄）。為確保不會留下與特定平臺有關的格式，凡是需要描述一個文件或路徑的時候，我們就使用File對象。除此以外，還有一項特別的安全措施：在每個子目錄里都放入一個空文件；那個文件的名字指出在那個子目錄里應找到多少個文件。下面是完整的代碼，后面會對它進行詳細的說明： ``` //: CodePackager.java // "Packs" and "unpacks" the code in "Thinking // in Java" for cross-platform distribution. /* Commented so CodePackager sees it and starts a new chapter directory, but so you don't have to worry about the directory where this program lives: package c17; */ import java.util.*; import java.io.*; class Pr { static void error(String e) { System.err.println("ERROR: " + e); System.exit(1); } } class IO { static BufferedReader disOpen(File f) { BufferedReader in = null; try { in = new BufferedReader( new FileReader(f)); } catch(IOException e) { Pr.error("could not open " + f); } return in; } static BufferedReader disOpen(String fname) { return disOpen(new File(fname)); } static DataOutputStream dosOpen(File f) { DataOutputStream in = null; try { in = new DataOutputStream( new BufferedOutputStream( new FileOutputStream(f))); } catch(IOException e) { Pr.error("could not open " + f); } return in; } static DataOutputStream dosOpen(String fname) { return dosOpen(new File(fname)); } static PrintWriter psOpen(File f) { PrintWriter in = null; try { in = new PrintWriter( new BufferedWriter( new FileWriter(f))); } catch(IOException e) { Pr.error("could not open " + f); } return in; } static PrintWriter psOpen(String fname) { return psOpen(new File(fname)); } static void close(Writer os) { try { os.close(); } catch(IOException e) { Pr.error("closing " + os); } } static void close(DataOutputStream os) { try { os.close(); } catch(IOException e) { Pr.error("closing " + os); } } static void close(Reader os) { try { os.close(); } catch(IOException e) { Pr.error("closing " + os); } } } class SourceCodeFile { public static final String startMarker = "//:", // Start of source file endMarker = "} ///:~", // End of source endMarker2 = "}; ///:~", // C++ file end beginContinue = "} ///:Continued", endContinue = "///:Continuing", packMarker = "###", // Packed file header tag eol = // Line separator on current system System.getProperty("line.separator"), filesep = // System's file path separator System.getProperty("file.separator"); public static String copyright = ""; static { try { BufferedReader cr = new BufferedReader( new FileReader("Copyright.txt")); String crin; while((crin = cr.readLine()) != null) copyright += crin + "\n"; cr.close(); } catch(Exception e) { copyright = ""; } } private String filename, dirname, contents = new String(); private static String chapter = "c02"; // The file name separator from the old system: public static String oldsep; public String toString() { return dirname + filesep + filename; } // Constructor for parsing from document file: public SourceCodeFile(String firstLine, BufferedReader in) { dirname = chapter; // Skip past marker: filename = firstLine.substring( startMarker.length()).trim(); // Find space that terminates file name: if(filename.indexOf(' ') != -1) filename = filename.substring( 0, filename.indexOf(' ')); System.out.println("found: " + filename); contents = firstLine + eol; if(copyright.length() != 0) contents += copyright + eol; String s; boolean foundEndMarker = false; try { while((s = in.readLine()) != null) { if(s.startsWith(startMarker)) Pr.error("No end of file marker for " + filename); // For this program, no spaces before // the "package" keyword are allowed // in the input source code: else if(s.startsWith("package")) { // Extract package name: String pdir = s.substring( s.indexOf(' ')).trim(); pdir = pdir.substring( 0, pdir.indexOf(';')).trim(); // Capture the chapter from the package // ignoring the 'com' subdirectories: if(!pdir.startsWith("com")) { int firstDot = pdir.indexOf('.'); if(firstDot != -1) chapter = pdir.substring(0,firstDot); else chapter = pdir; } // Convert package name to path name: pdir = pdir.replace( '.', filesep.charAt(0)); System.out.println("package " + pdir); dirname = pdir; } contents += s + eol; // Move past continuations: if(s.startsWith(beginContinue)) while((s = in.readLine()) != null) if(s.startsWith(endContinue)) { contents += s + eol; break; } // Watch for end of code listing: if(s.startsWith(endMarker) || s.startsWith(endMarker2)) { foundEndMarker = true; break; } } if(!foundEndMarker) Pr.error( "End marker not found before EOF"); System.out.println("Chapter: " + chapter); } catch(IOException e) { Pr.error("Error reading line"); } } // For recovering from a packed file: public SourceCodeFile(BufferedReader pFile) { try { String s = pFile.readLine(); if(s == null) return; if(!s.startsWith(packMarker)) Pr.error("Can't find " + packMarker + " in " + s); s = s.substring( packMarker.length()).trim(); dirname = s.substring(0, s.indexOf('#')); filename = s.substring(s.indexOf('#') + 1); dirname = dirname.replace( oldsep.charAt(0), filesep.charAt(0)); filename = filename.replace( oldsep.charAt(0), filesep.charAt(0)); System.out.println("listing: " + dirname + filesep + filename); while((s = pFile.readLine()) != null) { // Watch for end of code listing: if(s.startsWith(endMarker) || s.startsWith(endMarker2)) { contents += s; break; } contents += s + eol; } } catch(IOException e) { System.err.println("Error reading line"); } } public boolean hasFile() { return filename != null; } public String directory() { return dirname; } public String filename() { return filename; } public String contents() { return contents; } // To write to a packed file: public void writePacked(DataOutputStream out) { try { out.writeBytes( packMarker + dirname + "#" + filename + eol); out.writeBytes(contents); } catch(IOException e) { Pr.error("writing " + dirname + filesep + filename); } } // To generate the actual file: public void writeFile(String rootpath) { File path = new File(rootpath, dirname); path.mkdirs(); PrintWriter p = IO.psOpen(new File(path, filename)); p.print(contents); IO.close(p); } } class DirMap { private Hashtable t = new Hashtable(); private String rootpath; DirMap() { rootpath = System.getProperty("user.dir"); } DirMap(String alternateDir) { rootpath = alternateDir; } public void add(SourceCodeFile f){ String path = f.directory(); if(!t.containsKey(path)) t.put(path, new Vector()); ((Vector)t.get(path)).addElement(f); } public void writePackedFile(String fname) { DataOutputStream packed = IO.dosOpen(fname); try { packed.writeBytes("###Old Separator:" + SourceCodeFile.filesep + "###\n"); } catch(IOException e) { Pr.error("Writing separator to " + fname); } Enumeration e = t.keys(); while(e.hasMoreElements()) { String dir = (String)e.nextElement(); System.out.println( "Writing directory " + dir); Vector v = (Vector)t.get(dir); for(int i = 0; i < v.size(); i++) { SourceCodeFile f = (SourceCodeFile)v.elementAt(i); f.writePacked(packed); } } IO.close(packed); } // Write all the files in their directories: public void write() { Enumeration e = t.keys(); while(e.hasMoreElements()) { String dir = (String)e.nextElement(); Vector v = (Vector)t.get(dir); for(int i = 0; i < v.size(); i++) { SourceCodeFile f = (SourceCodeFile)v.elementAt(i); f.writeFile(rootpath); } // Add file indicating file quantity // written to this directory as a check: IO.close(IO.dosOpen( new File(new File(rootpath, dir), Integer.toString(v.size())+".files"))); } } } public class CodePackager { private static final String usageString = "usage: java CodePackager packedFileName" + "\nExtracts source code files from packed \n" + "version of Tjava.doc sources into " + "directories off current directory\n" + "java CodePackager packedFileName newDir\n" + "Extracts into directories off newDir\n" + "java CodePackager -p source.txt packedFile" + "\nCreates packed version of source files" + "\nfrom text version of Tjava.doc"; private static void usage() { System.err.println(usageString); System.exit(1); } public static void main(String[] args) { if(args.length == 0) usage(); if(args[0].equals("-p")) { if(args.length != 3) usage(); createPackedFile(args); } else { if(args.length > 2) usage(); extractPackedFile(args); } } private static String currentLine; private static BufferedReader in; private static DirMap dm; private static void createPackedFile(String[] args) { dm = new DirMap(); in = IO.disOpen(args[1]); try { while((currentLine = in.readLine()) != null) { if(currentLine.startsWith( SourceCodeFile.startMarker)) { dm.add(new SourceCodeFile( currentLine, in)); } else if(currentLine.startsWith( SourceCodeFile.endMarker)) Pr.error("file has no start marker"); // Else ignore the input line } } catch(IOException e) { Pr.error("Error reading " + args[1]); } IO.close(in); dm.writePackedFile(args[2]); } private static void extractPackedFile(String[] args) { if(args.length == 2) // Alternate directory dm = new DirMap(args[1]); else // Current directory dm = new DirMap(); in = IO.disOpen(args[0]); String s = null; try { s = in.readLine(); } catch(IOException e) { Pr.error("Cannot read from " + in); } // Capture the separator used in the system // that packed the file: if(s.indexOf("###Old Separator:") != -1 ) { String oldsep = s.substring( "###Old Separator:".length()); oldsep = oldsep.substring( 0, oldsep. indexOf('#')); SourceCodeFile.oldsep = oldsep; } SourceCodeFile sf = new SourceCodeFile(in); while(sf.hasFile()) { dm.add(sf); sf = new SourceCodeFile(in); } dm.write(); } } ///:~ ``` 我們注意到`package`語句已經作為注釋標志出來了。由于這是本章的第一個程序，所以`package`語句是必需的，用它告訴`CodePackager`已改換到另一章。但是把它放入包里卻會成為一個問題。當我們創建一個包的時候，需要將結果程序同一個特定的目錄結構聯系在一起，這一做法對本書的大多數例子都是適用的。但在這里，`CodePackager`程序必須在一個專用的目錄里編譯和運行，所以`package`語句作為注釋標記出去。但對`CodePackager`來說，它“看起來”依然象一個普通的`package`語句，因為程序還不是特別復雜，不能偵查到多行注釋（沒有必要做得這么復雜，這里只要求方便就行）。頭兩個類是“支持／工具”類，作用是使程序剩余的部分在編寫時更加連貫，也更便于閱讀。第一個是`Pr`，它類似ANSI C的`perror`庫，兩者都能打印出一條錯誤提示消息（但同時也會退出程序）。第二個類將文件的創建過程封裝在內，這個過程已在第10章介紹過了；大家已經知道，這樣做很快就會變得非常累贅和麻煩。為解決這個問題，第10章提供的方案致力于新類的創建，但這兒的“靜態”方法已經使用過了。在那些方法中，正常的異常會被捕獲，并相應地進行處理。這些方法使剩余的代碼顯得更加清爽，更易閱讀。幫助解決問題的第一個類是`SourceCodeFile`（源碼文件），它代表本書一個源碼文件包含的所有信息（內容、文件名以及目錄）。它同時還包含了一系列`String`常數，分別代表一個文件的開始與結束；在打包文件內使用的一個標記；當前系統的換行符；文件路徑分隔符（注意要用`System.getProperty()`偵查本地版本是什么）；以及一大段版權聲明，它是從下面這個`Copyright.txt`文件里提取出來的： ``` ////////////////////////////////////////////////// // Copyright (c) Bruce Eckel, 1998 // Source code file from the book "Thinking in Java" // All rights reserved EXCEPT as allowed by the // following statements: You may freely use this file // for your own work (personal or commercial), // including modifications and distribution in // executable form only. Permission is granted to use // this file in classroom situations, including its // use in presentation materials, as long as the book // "Thinking in Java" is cited as the source. // Except in classroom situations, you may not copy // and distribute this code; instead, the sole // distribution point is http://www.BruceEckel.com // (and official mirror sites) where it is // freely available. You may not remove this // copyright and notice. You may not distribute // modified versions of the source code in this // package. You may not use this file in printed // media without the express permission of the // author. Bruce Eckel makes no representation about // the suitability of this software for any purpose. // It is provided "as is" without express or implied // warranty of any kind, including any implied // warranty of merchantability, fitness for a // particular purpose or non-infringement. The entire // risk as to the quality and performance of the // software is with you. Bruce Eckel and the // publisher shall not be liable for any damages // suffered by you or any third party as a result of // using or distributing software. In no event will // Bruce Eckel or the publisher be liable for any // lost revenue, profit, or data, or for direct, // indirect, special, consequential, incidental, or // punitive damages, however caused and regardless of // the theory of liability, arising out of the use of // or inability to use software, even if Bruce Eckel // and the publisher have been advised of the // possibility of such damages. Should the software // prove defective, you assume the cost of all // necessary servicing, repair, or correction. If you // think you've found an error, please email all // modified files with clearly commented changes to: // Bruce@EckelObjects.com. (please use the same // address for non-code errors found in the book). ////////////////////////////////////////////////// ``` 從一個打包文件中提取文件時，當初所用系統的文件分隔符也會標注出來，以便用本地系統適用的符號替換它。當前章的子目錄保存在`chapter`字段中，它初始化成`c02`（大家可注意一下第2章的列表正好沒有包含一個打包語句）。只有在當前文件里發現一個`package`（打包）語句時，`chapter`字段才會發生改變。 (1) 構建一個打包文件第一個構造器用于從本書的ASCII文本版里提取出一個文件。發出調用的代碼（在列表里較深的地方）會讀入并檢查每一行，直到找到與一個列表的開頭相符的為止。在這個時候，它就會新建一個`SourceCodeFile`對象，將第一行的內容（已經由調用代碼讀入了）傳遞給它，同時還要傳遞`BufferedReader`對象，以便在這個緩沖區中提取源碼列表剩余的內容。從這時起，大家會發現`String`方法被頻繁運用。為提取出文件名，需調用`substring()`的重載版本，令其從一個起始偏移開始，一直讀到字符串的末尾，從而形成一個“子串”。為算出這個起始索引，先要用`length()`得出`startMarker`的總長，再用`trim()`刪除字符串頭尾多余的空格。第一行在文件名后也可能有一些字符；它們是用`indexOf()`偵測出來的。若沒有發現找到我們想尋找的字符，就返回-1；若找到那些字符，就返回它們第一次出現的位置。注意這也是`indexOf()`的一個重載版本，采用一個字符串作為參數，而非一個字符。解析出并保存好文件名后，第一行會被置入字符串`contents`中（該字符串用于保存源碼清單的完整正文）。隨后，將剩余的代碼行讀入，并合并進入`contents`字符串。當然事情并沒有想象的那么簡單，因為特定的情況需加以特別的控制。一種情況是錯誤檢查：若直接遇到一個`startMarker`（起始標記），表明當前操作的這個代碼列表沒有設置一個結束標記。這屬于一個出錯條件，需要退出程序。另一種特殊情況與`package`關鍵字有關。盡管Java是一種自由形式的語言，但這個程序要求`package`關鍵字必須位于行首。若發現`package`關鍵字，就通過檢查位于開頭的空格以及位于末尾的分號，從而提取出包名（注意亦可一次單獨的操作實現，方法是使用重載的`substring()`，令其同時檢查起始和結束索引位置）。隨后，將包名中的點號替換成特定的文件分隔符——當然，這里要假設文件分隔符僅有一個字符的長度。盡管這個假設可能對目前的所有系統都是適用的，但一旦遇到問題，一定不要忘了檢查一下這里。默認操作是將每一行都連接到`contents`里，同時還有換行字符，直到遇到一個`endMarker`（結束標記）為止。該標記指出構造器應當停止了。若在`endMarker`之前遇到了文件結尾，就認為存在一個錯誤。 (2) 從打包文件中提取第二個構造器用于將源碼文件從打包文件中恢復（提取）出來。在這兒，作為調用者的方法不必擔心會跳過一些中間文本。打包文件包含了所有源碼文件，它們相互間緊密地靠在一起。需要傳遞給該構造器的僅僅是一個`BufferedReader`，它代表著“信息源”。構造器會從中提取出自己需要的信息。但在每個代碼列表開始的地方還有一些配置信息，它們的身份是用`packMarker`（打包標記）指出的。若`packMarker`不存在，意味著調用者試圖用錯誤的方法來使用這個構造器。一旦發現`packMarker`，就會將其剝離出來，并提取出目錄名（用一個`#`結尾）以及文件名（直到行末）。不管在哪種情況下，舊分隔符都會被替換成本地適用的一個分隔符，這是用`String replace()`方法實現的。老的分隔符被置于打包文件的開頭，在代碼列表稍靠后的一部分即可看到是如何把它提取出來的。構造器剩下的部分就非常簡單了。它讀入每一行，把它合并到`contents`里，直到遇見`endMarker`為止。 (3) 程序列表的存取接下來的一系列方法是簡單的訪問器：`directory()`、`filename()`（注意方法可能與字段有相同的拼寫和大小寫形式）和`contents()`。而`hasFile()`用于指出這個對象是否包含了一個文件（很快就會知道為什么需要這個）。最后三個方法致力于將這個代碼列表寫進一個文件——要么通過`writePacked()`寫入一個打包文件，要么通過`writeFile()`寫入一個Java源碼文件。`writePacked()`需要的唯一東西就是`DataOutputStream`，它是在別的地方打開的，代表著準備寫入的文件。它先把頭信息置入第一行，再調用`writeBytes()`將`contents`（內容）寫成一種“通用”格式。準備寫Java源碼文件時，必須先把文件建好。這是用`IO.psOpen()`實現的。我們需要向它傳遞一個`File`對象，其中不僅包含了文件名，也包含了路徑信息。但現在的問題是：這個路徑實際存在嗎？用戶可能決定將所有源碼目錄都置入一個完全不同的子目錄，那個目錄可能是尚不存在的。所以在正式寫每個文件之前，都要調用`File.mkdirs()`方法，建好我們想向其中寫入文件的目錄路徑。它可一次性建好整個路徑。 (4) 整套列表的包容以子目錄的形式組織代碼列表是非常方便的，盡管這要求先在內存中建好整套列表。之所以要這樣做，還有另一個很有說服力的原因：為了構建更“健康”的系統。也就是說，在創建代碼列表的每個子目錄時，都會加入一個額外的文件，它的名字包含了那個目錄內應有的文件數目。 `DirMap`類可幫助我們實現這一效果，并有效地演示了一個“多重映射”的概述。這是通過一個散列表（`Hashtable`）實現的，它的“鍵”是準備創建的子目錄，而“值”是包含了那個特定目錄中的`SourceCodeFile`對象的`Vector`對象。所以，我們在這兒并不是將一個“鍵”映射（或對應）到一個值，而是通過對應的`Vector`，將一個鍵“多重映射”到一系列值。盡管這聽起來似乎很復雜，但具體實現時卻是非常簡單和直接的。大家可以看到，`DirMap`類的大多數代碼都與向文件中的寫入有關，而非與“多重映射”有關。與它有關的代碼僅極少數而已。可通過兩種方式建立一個`DirMap`（目錄映射或對應）關系：默認構造器假定我們希望目錄從當前位置向下展開，而另一個構造器讓我們為起始目錄指定一個備用的“絕對”路徑。 `add()`方法是一個采取的行動比較密集的場所。首先將`directory()`從我們想添加的`SourceCodeFile`里提取出來，然后檢查散列表（`Hashtable`），看看其中是否已經包含了那個鍵。如果沒有，就向散列表加入一個新的`Vector`，并將它同那個鍵關聯到一起。到這時，不管采取的是什么途徑，`Vector`都已經就位了，可以將它提取出來，以便添加`SourceCodeFile`。由于`Vector`可象這樣同散列表方便地合并到一起，所以我們從兩方面都能感覺得非常方便。寫一個打包文件時，需打開一個準備寫入的文件（當作`DataOutputStream`打開，使數據具有“通用”性），并在第一行寫入與老的分隔符有關的頭信息。接著產生對`Hashtable`鍵的一個`Enumeration`（枚舉），并遍歷其中，選擇每一個目錄，并取得與那個目錄有關的Vector，使那個`Vector`中的每個`SourceCodeFile`都能寫入打包文件中。用`write()`將Java源碼文件寫入它們對應的目錄時，采用的方法幾乎與`writePackedFile()`完全一致，因為兩個方法都只需簡單調用`SourceCodeFile`中適當的方法。但在這里，根路徑會傳遞給`SourceCodeFile.writeFile()`。所有文件都寫好后，名字中指定了已寫文件數量的那個附加文件也會被寫入。 (5) 主程序前面介紹的那些類都要在`CodePackager`中用到。大家首先看到的是用法字符串。一旦最終用戶不正確地調用了程序，就會打印出介紹正確用法的這個字符串。調用這個字符串的是`usage()`方法，同時還要退出程序。`main()`唯一的任務就是判斷我們希望創建一個打包文件，還是希望從一個打包文件中提取什么東西。隨后，它負責保證使用的是正確的參數，并調用適當的方法。創建一個打包文件時，它默認位于當前目錄，所以我們用默認構造器創建`DirMap`。打開文件后，其中的每一行都會讀入，并檢查是否符合特殊的條件： (1) 若行首是一個用于源碼列表的起始標記，就新建一個`SourceCodeFile`對象。構造器會讀入源碼列表剩下的所有內容。結果產生的引用將直接加入`DirMap`。 (2) 若行首是一個用于源碼列表的結束標記，表明某個地方出現錯誤，因為結束標記應當只能由`SourceCodeFile`構造器發現。提取／釋放一個打包文件時，提取出來的內容可進入當前目錄，亦可進入另一個備用目錄。所以需要相應地創建`DirMap`對象。打開文件，并將第一行讀入。老的文件路徑分隔符信息將從這一行中提取出來。隨后根據輸入來創建第一個`SourceCodeFile`對象，它會加入`DirMap`。只要包含了一個文件，新的`SourceCodeFile`對象就會創建并加入（創建的最后一個用光輸入內容后，會簡單地返回，然后`hasFile()`會返回一個錯誤）。 ## 17.1.2 檢查大小寫樣式盡管對涉及文字處理的一些項目來說，前例顯得比較方便，但下面要介紹的項目卻能立即發揮作用，因為它執行的是一個樣式檢查，以確保我們的大小寫形式符合“事實上”的Java樣式標準。它會在當前目錄中打開每個`.java`文件，并提取出所有類名以及標識符。若發現有不符合Java樣式的情況，就向我們提出報告。為了讓這個程序正確運行，首先必須構建一個類名，將它作為一個“倉庫”，負責容納標準Java庫中的所有類名。為達到這個目的，需遍歷用于標準Java庫的所有源碼子目錄，并在每個子目錄都運行`ClassScanner`。至于參數，則提供倉庫文件的名字（每次都用相同的路徑和名字）和命令行開關`-a`，指出類名應當添加到該倉庫文件中。為了用程序檢查自己的代碼，需要運行它，并向它傳遞要使用的倉庫文件的路徑與名字。它會檢查當前目錄中的所有類和標識符，并告訴我們哪些沒有遵守典型的Java大寫寫規范。要注意這個程序并不是十全十美的。有些時候，它可能報告自己查到一個問題。但當我們仔細檢查代碼的時候，卻發現沒有什么需要更改的。盡管這有點兒煩人，但仍比自己動手檢查代碼中的所有錯誤強得多。下面列出源代碼，后面有詳細的解釋： ``` //: ClassScanner.java // Scans all files in directory for classes // and identifiers, to check capitalization. // Assumes properly compiling code listings. // Doesn't do everything right, but is a very // useful aid. import java.io.*; import java.util.*; class MultiStringMap extends Hashtable { public void add(String key, String value) { if(!containsKey(key)) put(key, new Vector()); ((Vector)get(key)).addElement(value); } public Vector getVector(String key) { if(!containsKey(key)) { System.err.println( "ERROR: can't find key: " + key); System.exit(1); } return (Vector)get(key); } public void printValues(PrintStream p) { Enumeration k = keys(); while(k.hasMoreElements()) { String oneKey = (String)k.nextElement(); Vector val = getVector(oneKey); for(int i = 0; i < val.size(); i++) p.println((String)val.elementAt(i)); } } } public class ClassScanner { private File path; private String[] fileList; private Properties classes = new Properties(); private MultiStringMap classMap = new MultiStringMap(), identMap = new MultiStringMap(); private StreamTokenizer in; public ClassScanner() { path = new File("."); fileList = path.list(new JavaFilter()); for(int i = 0; i < fileList.length; i++) { System.out.println(fileList[i]); scanListing(fileList[i]); } } void scanListing(String fname) { try { in = new StreamTokenizer( new BufferedReader( new FileReader(fname))); // Doesn't seem to work: // in.slashStarComments(true); // in.slashSlashComments(true); in.ordinaryChar('/'); in.ordinaryChar('.'); in.wordChars('_', '_'); in.eolIsSignificant(true); while(in.nextToken() != StreamTokenizer.TT_EOF) { if(in.ttype == '/') eatComments(); else if(in.ttype == StreamTokenizer.TT_WORD) { if(in.sval.equals("class") || in.sval.equals("interface")) { // Get class name: while(in.nextToken() != StreamTokenizer.TT_EOF && in.ttype != StreamTokenizer.TT_WORD) ; classes.put(in.sval, in.sval); classMap.add(fname, in.sval); } if(in.sval.equals("import") || in.sval.equals("package")) discardLine(); else // It's an identifier or keyword identMap.add(fname, in.sval); } } } catch(IOException e) { e.printStackTrace(); } } void discardLine() { try { while(in.nextToken() != StreamTokenizer.TT_EOF && in.ttype != StreamTokenizer.TT_EOL) ; // Throw away tokens to end of line } catch(IOException e) { e.printStackTrace(); } } // StreamTokenizer's comment removal seemed // to be broken. This extracts them: void eatComments() { try { if(in.nextToken() != StreamTokenizer.TT_EOF) { if(in.ttype == '/') discardLine(); else if(in.ttype != '*') in.pushBack(); else while(true) { if(in.nextToken() == StreamTokenizer.TT_EOF) break; if(in.ttype == '*') if(in.nextToken() != StreamTokenizer.TT_EOF && in.ttype == '/') break; } } } catch(IOException e) { e.printStackTrace(); } } public String[] classNames() { String[] result = new String[classes.size()]; Enumeration e = classes.keys(); int i = 0; while(e.hasMoreElements()) result[i++] = (String)e.nextElement(); return result; } public void checkClassNames() { Enumeration files = classMap.keys(); while(files.hasMoreElements()) { String file = (String)files.nextElement(); Vector cls = classMap.getVector(file); for(int i = 0; i < cls.size(); i++) { String className = (String)cls.elementAt(i); if(Character.isLowerCase( className.charAt(0))) System.out.println( "class capitalization error, file: " + file + ", class: " + className); } } } public void checkIdentNames() { Enumeration files = identMap.keys(); Vector reportSet = new Vector(); while(files.hasMoreElements()) { String file = (String)files.nextElement(); Vector ids = identMap.getVector(file); for(int i = 0; i < ids.size(); i++) { String id = (String)ids.elementAt(i); if(!classes.contains(id)) { // Ignore identifiers of length 3 or // longer that are all uppercase // (probably static final values): if(id.length() >= 3 && id.equals( id.toUpperCase())) continue; // Check to see if first char is upper: if(Character.isUpperCase(id.charAt(0))){ if(reportSet.indexOf(file + id) == -1){ // Not reported yet reportSet.addElement(file + id); System.out.println( "Ident capitalization error in:" + file + ", ident: " + id); } } } } } } static final String usage = "Usage: \n" + "ClassScanner classnames -a\n" + "\tAdds all the class names in this \n" + "\tdirectory to the repository file \n" + "\tcalled 'classnames'\n" + "ClassScanner classnames\n" + "\tChecks all the java files in this \n" + "\tdirectory for capitalization errors, \n" + "\tusing the repository file 'classnames'"; private static void usage() { System.err.println(usage); System.exit(1); } public static void main(String[] args) { if(args.length < 1 || args.length > 2) usage(); ClassScanner c = new ClassScanner(); File old = new File(args[0]); if(old.exists()) { try { // Try to open an existing // properties file: InputStream oldlist = new BufferedInputStream( new FileInputStream(old)); c.classes.load(oldlist); oldlist.close(); } catch(IOException e) { System.err.println("Could not open " + old + " for reading"); System.exit(1); } } if(args.length == 1) { c.checkClassNames(); c.checkIdentNames(); } // Write the class names to a repository: if(args.length == 2) { if(!args[1].equals("-a")) usage(); try { BufferedOutputStream out = new BufferedOutputStream( new FileOutputStream(args[0])); c.classes.save(out, "Classes found by ClassScanner.java"); out.close(); } catch(IOException e) { System.err.println( "Could not write " + args[0]); System.exit(1); } } } } class JavaFilter implements FilenameFilter { public boolean accept(File dir, String name) { // Strip path information: String f = new File(name).getName(); return f.trim().endsWith(".java"); } } ///:~ ``` `MultiStringMap`類是個特殊的工具，允許我們將一組字符串與每個鍵項對應（映射）起來。和前例一樣，這里也使用了一個散列表（`Hashtable`），不過這次設置了繼承。該散列表將鍵作為映射成為`Vector`值的單一的字符串對待。`add()`方法的作用很簡單，負責檢查散列表里是否存在一個鍵。如果不存在，就在其中放置一個。`getVector()`方法為一個特定的鍵產生一個`Vector`；而`printValues()`將所有值逐個`Vector`地打印出來，這對程序的調試非常有用。為簡化程序，來自標準Java庫的類名全都置入一個`Properties`（屬性）對象中（來自標準Java庫）。記住`Properties`對象實際是個散列表，其中只容納了用于鍵和值項的`String`對象。然而僅需一次方法調用，我們即可把它保存到磁盤，或者從磁盤中恢復。實際上，我們只需要一個名字列表，所以為鍵和值都使用了相同的對象。針對特定目錄中的文件，為找出相應的類與標識符，我們使用了兩個`MultiStringMap`：`classMap`以及`identMap`。此外在程序啟動的時候，它會將標準類名倉庫裝載到名為`classes`的`Properties`對象中。一旦在本地目錄發現了一個新類名，也會將其加入`classes`以及`classMap`。這樣一來，`classMap`就可用于在本地目錄的所有類間遍歷，而且可用`classes`檢查當前標記是不是一個類名（它標記著對象或方法定義的開始，所以收集接下去的記號——直到碰到一個分號——并將它們都置入`identMap`）。 `ClassScanner`的默認構造器會創建一個由文件名構成的列表（采用`FilenameFilter`的`JavaFilter`實現形式，參見第10章）。隨后會為每個文件名都調用`scanListing()`。在`scanListing()`內部，會打開源碼文件，并將其轉換成一個`StreamTokenizer`。根據Java幫助文檔，將`true`傳遞給`slashStartComments()`和`slashSlashComments()`的本意應當是剝除那些注釋內容，但這樣做似乎有些問題（在Java 1.0中幾乎無效）。所以相反，那些行被當作注釋標記出去，并用另一個方法來提取注釋。為達到這個目的，`'/'`必須作為一個原始字符捕獲，而不是讓`StreamTokeinzer`將其當作注釋的一部分對待。此時要用`ordinaryChar()`方法指示`StreamTokenizer`采取正確的操作。同樣的道理也適用于點號（`'.'`），因為我們希望讓方法調用分離出單獨的標識符。但對下劃線來說，它最初是被`StreamTokenizer`當作一個單獨的字符對待的，但此時應把它留作標識符的一部分，因為它在`static final`值中以`TT_EOF`等等形式使用。當然，這一點只對目前這個特殊的程序成立。`wordChars()`方法需要取得我們想添加的一系列字符，把它們留在作為一個單詞看待的記號中。最后，在解析單行注釋或者放棄一行的時候，我們需要知道一個換行動作什么時候發生。所以通過調用`eollsSignificant(true)`，換行符（`EOL`）會被顯示出來，而不是被`StreamTokenizer`吸收。 `scanListing()`剩余的部分將讀入和檢查記號，直至文件尾。一旦`nextToken()`返回一個`final static`值——`StreamTokenizer.TT_EOF`，就標志著已經抵達文件尾部。若記號是個`'/'`，意味著它可能是個注釋，所以就調用`eatComments()`，對這種情況進行處理。我們在這兒唯一感興趣的其他情況是它是否為一個單詞，當然還可能存在另一些特殊情況。如果單詞是`class`（類）或`interface`（接口），那么接著的記號就應當代表一個類或接口名字，并將其置入`classes`和`classMap`。若單詞是`import`或者`package`，那么我們對這一行剩下的東西就沒什么興趣了。其他所有東西肯定是一個標識符（這是我們感興趣的），或者是一個關鍵字（對此不感興趣，但它們采用的肯定是小寫形式，所以不必興師動眾地檢查它們）。它們將加入到`identMap`。 `discardLine()`方法是一個簡單的工具，用于查找行末位置。注意每次得到一個新記號時，都必須檢查行末。只要在主解析循環中碰到一個正斜杠，就會調用`eatComments()`方法。然而，這并不表示肯定遇到了一條注釋，所以必須將接著的記號提取出來，檢查它是一個正斜杠（那么這一行會被丟棄），還是一個星號。但假如兩者都不是，意味著必須在主解析循環中將剛才取出的記號送回去！幸運的是，`pushBack()`方法允許我們將當前記號“壓回”輸入數據流。所以在主解析循環調用`nextToken()`的時候，它能正確地得到剛才送回的東西。為方便起見，`classNames()`方法產生了一個數組，其中包含了`classes`集合中的所有名字。這個方法未在程序中使用，但對代碼的調試非常有用。接下來的兩個方法是實際進行檢查的地方。在`checkClassNames()`中，類名從`classMap`提取出來（請記住，`classMap`只包含了這個目錄內的名字，它們按文件名組織，所以文件名可能伴隨錯誤的類名打印出來）。為做到這一點，需要取出每個關聯的`Vector`，并遍歷其中，檢查第一個字符是否為小寫。若確實為小寫，則打印出相應的出錯提示消息。在`checkIdentNames()`中，我們采用了一種類似的方法：每個標識符名字都從`identMap`中提取出來。如果名字不在`classes`列表中，就認為它是一個標識符或者關鍵字。此時會檢查一種特殊情況：如果標識符的長度等于3或者更長，而且所有字符都是大寫的，則忽略此標識符，因為它可能是一個`static fina`l值，比如`TT_EOF`。當然，這并不是一種完美的算法，但它假定我們最終會注意到任何全大寫標識符都是不合適的。這個方法并不是報告每一個以大寫字符開頭的標識符，而是跟蹤那些已在一個名為`reportSet()`的`Vector`中報告過的。它將`Vector`當作一個“集合”對待，告訴我們一個項目是否已在那個集合中。該項目是通過將文件名和標識符連接起來生成的。若元素不在集合中，就加入它，然后產生報告。程序列表剩下的部分由`main()`構成，它負責控制命令行參數，并判斷我們是準備在標準Java庫的基礎上構建由一系列類名構成的“倉庫”，還是想檢查已寫好的那些代碼的正確性。不管在哪種情況下，都會創建一個`ClassScanner`對象。無論準備構建一個“倉庫”，還是準備使用一個現成的，都必須嘗試打開現有倉庫。通過創建一個`File`對象并測試是否存在，就可決定是否打開文件并在`ClassScanner`中裝載`classes`這個`Properties`列表（使用`load()`）。來自倉庫的類將追加到由`ClassScanner`構造器發現的類后面，而不是將其覆蓋。如果僅提供一個命令行參數，就意味著自己想對類名和標識符名字進行一次檢查。但假如提供兩個參數（第二個是`-a`），就表明自己想構成一個類名倉庫。在這種情況下，需要打開一個輸出文件，并用`Properties.save()`方法將列表寫入一個文件，同時用一個字符串提供文件頭信息。