10.4 TCL培訓教程 · 博客雜記

[教程連接](https://jerkwin.github.io/2016/10/28/TCL%E5%9F%B9%E8%AE%AD%E6%95%99%E7%A8%8B/) [TOC] # TCL培訓教程 * 作者: 陳旭盛 **關鍵詞**: TCL **摘要**: 本文是TCL教材的第三稿, 前兩稿分別是《TCL的使用》和《TCL培訓教程》. 這一稿加入了不少內容, 是北研TCL興趣小組共同努力的結果. 本文詳細介紹了TCL的各個方面, 特別對利用C\\C++語言擴展TCL命令作了詳細論述. 本文附有大量實例. **縮略語清單**: TCL Tool Command Language 一種腳本語言參考資料清單名稱作者編號發布日期查閱地點或渠道出版單位Tcl and Tk ToolKitJohn K. Ousterhout981-235-951-61999自己的圖書Addison Wesley Publishing CommpanyTCL的使用陳旭盛自寫文檔TCL培訓教程陳旭盛自寫文檔 # 1\. 引言 TCL(Tool Command Language)是一種解釋執行的腳本語言(Scripting Language). 它提供了通用的編程能力: 支持變量, 過程和控制結構; 同時TCL還擁有一個功能強大的固有的核心命令集. 由于TCL的解釋器是用一個C\\C++語言的過程庫實現的, 因此在某種意義上我們又可以把TCL看作一個C庫, 這個庫中有豐富的用于擴展TCL命令的C\\C++過程和函數, 可以很容易就在C\\C++應用程序中嵌入TCL, 而且每個應用程序都可以根據自己的需要對TCL語言進行擴展. 我們可以針對某一特定應用領域對TCL語言的核心命令集進行擴展, 加入適合于自己的應用領域的擴展命令, 如果需要, 甚至可以加入新的控制結構, TCL解釋器將把擴展命令和擴展控制結構與固有命令和固有控制結構同等看待. 擴展后的TCL語言將可以繼承TCL 核心部分的所有功能, 包括核心命令, 控制結構, 數據類型, 對過程的支持等. 根據需要, 我們甚至可以屏蔽掉TCL的某些固有命令和固有控制結構. 通過對TCL的擴展, 繼承或屏蔽, 用戶用不著象平時定義一種計算機語言那樣對詞法, 語法, 語義, 語用等各方面加以定義, 就可以方便的為自己的應用領域提供一種功能完備的腳本語言. TCL良好的可擴展性使得它能很好地適應產品測試的需要, 測試任務常常會由于設計和需求的改變而迅速改變, 往往讓測試人員疲于應付. 利用TCL的可擴展性, 測試人員就可以迅速繼承多種新技術, 并針對產品新特點迅速推出擴展TCL命令集, 以用于產品的測試中, 可以較容易跟上設計需求的變化. 另外, 因為TCL是一種比C\\C++語言有著更高抽象層次的語言, 使用TCL可以在一種更高的層次上編寫程序, 它屏蔽掉了編寫C\\C++程序時必須涉及到的一些較為煩瑣的細節, 可以大大地提高開發測試例的速度. 而且, 使用TCL語言寫的測試例腳本, 即使作了修改, 也用不著重新編譯就可以調用TCL解釋器直接執行. 可以省卻不少時間. TCL目前已成為自動測試中事實上的標準. # 2\. 語法簡單的講, TCL語言的語法實際上是一些TCL解釋器怎樣對TCL命令進行分析的規則的集合. ## 2.1 腳本, 命令和單詞符號一個 TCL 腳本可以包含一個或多個命令. 命令之間必須用換行符或分號隔開, 下面的兩個腳本都是合法的: ~~~ set a 1 set b 2 ~~~ 或 ~~~ set a 1; set b 2 ~~~ TCL 的每一個命令包含一個或幾個單詞, 第一個單詞代表命令名, 另外的單詞則是這個命令的參數, 單詞之間必須用空格或 TAB 鍵隔開. TCL 解釋器對一個命令的求值過程分為兩部分: 分析和執行. 在分析階段, TCL 解釋器運用規則把命令分成一個個獨立的單詞, 同時進行必要的置換(substitution); 在執行階段, TCL 解釋器會把第一個單詞當作命令名, 并查看這個命令是否有定義, 如果有定義就激活這個命令對應的 C/C++ 過程, 并把所有的單詞作為參數傳遞給該命令過程, 讓命令過程進行處理. ## 2.2 置換(substitution) **注**: 在下面的所有章節的例子中, `%`為 TCL 的命令提示符, 輸入命令回車后, TCL 會在接著的一行輸出命令執行結果. `//`后面是我自己加上的說明, 不是例子的一部分. TCL 解釋器在分析命令時, 把所有的命令參數都當作字符串看待, 例如: ~~~ %set x 10 //定義變量 x, 并把 x 的值賦為10 10 %set y x+100 //y的值是 x+100, 而不是我們期望的 110 x+100 ~~~ 上例的第二個命令中, `x`被看作字符串`x+100`的一部分, 如果我們想使用`x`的值`10`, 就必須告訴 TCL 解釋器: 我們在這里期望的是變量`x`的值, 而非字符`x`. 怎么告訴 TCL 解釋器呢, 這就要用到 TCL 語言中提供的置換功能. TCL 提供三種形式的置換: 變量置換, 命令置換和反斜杠置換. 每種置換都會導致一個或多個單詞本身被其他的值所代替. 置換可以發生在包括命令名在內的每一個單詞中, 而且置換可以嵌套. ### 2.2.1 變量置換(variable subtitution) 變量置換由一個`$`符號標記, 變量置換會導致變量的值插入一個單詞中. 例如: ~~~ %set y $x+100 //y 的值是 10+100, 這里 x 被置換成它的值 10 10+100 ~~~ 這時, `y`的值還不是我們想要的值`110`, 而是`10+100`, 因為 TCL 解釋器把`10+100`看成是一個字符串而不是表達式, `y`要想得到值`110`, 還必須用命令置換, 使得 TCL 會把`10+100`看成一個表達式并求值. ### 2.2.2 命令置換(command substitution) 命令置換是由`[]`括起來的 TCL 命令及其參數, 命令置換會導致某一個命令的所有或部分單詞被另一個命令的結果所代替. 例如: ~~~ %set y [expr $x+100] 110 ~~~ `y`的值是`110`. 這里當 TCL 解釋器遇到字符`[`時, 它就會把隨后的`expr`作為一個命令名, 從而激活與`expr`對應的 C/C++過程, 并把`expr`和變量置換后得到的`10+110`傳遞給該命令過程進行處理. 如果在上例中我們去掉`[]`, 那么 TCL 會報錯. 因為在正常情況下, TCL 解釋器只把命令行中的第一個單詞看作命令, 其他的單詞都作為普通字符串處理, 看作是命令的參數. 注意, `[]`中必須是一個合法的 TCL 腳本, 長度不限. `[]`中腳本的值為最后一個命令的返回值, 例如: ~~~ %set y [expr $x+100;set b 300] //y 的值為 300, 因為 set b 300 的返回值為 300 300 ~~~ 有了命令置換, 實際上就表示命令之間是可以嵌套的, 即一個命令的結果可以作為別的命令的參數. ### 2.2.3 反斜杠置換(backslash substitution) TCL 語言中的反斜杠置換類似于 C 語言中反斜杠的用法, 主要用于在單詞符號中插入諸如換行符, 空格, `[`, `$`等被 TCL 解釋器當作特殊符號對待的字符. 例如: ~~~ set msg multiple\ space //msg 的值為 multiple space. ~~~ 如果沒有`\`的話, TCL 會報錯, 因為解釋器會把這里最后兩個單詞之間的空格認為是分隔符, 于是發現`set`命令有多于兩個參數, 從而報錯. 加入了`\`后, 空格不被當作分隔符, `multiple space`被認為是一個單詞(word). 又例如: ~~~ %set msg money\ \$3333\ \nArray\ a\[2] //這個命令的執行結果為: money $3333 Array a[2] ~~~ 這里的`$`不再被當作變量置換符. TCL 支持以下的反斜杠置換: 轉義序列替換為\aAudible alert (0x7)\bBackspace (0x8)\fForm feed (0xc)\nNewline (0xa)\rCarriage return (0xd)\tTab (0x9)\vVertical tab (0xb)\dddOctal value given by ddd (one, two, or three d's)\xhhHex value given by hh (any number of h's)\newline spaceA single space character 例如: ~~~ %set a \x48 //對應 \xhh H //十六進制的 48 正好是 72, 對應 H % set a \110 //對應 \ddd H //八進制的 110 正好是 72, 對應 H %set a expr \ //對應\newline space, 一個命令可以用 \newline 轉到下一行繼續 2+3] 5 ~~~ ### 2.2.4 雙引號和花括號除了使用反斜杠外, TCL 提供另外兩種方法來使得解釋器把分隔符和置換符等特殊字符當作普通字符, 而不作特殊處理, 這就要使用雙引號`"`和花括號`{}`. TCL 解釋器對雙引號中的各種分隔符將不作處理, 但是對換行符及`$`和`[]`兩種置換符會照常處理. 例如: ~~~ %set x 100 100 %set y "$x ddd" 100 ddd ~~~ 而在花括號中, 所有特殊字符都將成為普通字符, 失去其特殊意義, TCL 解釋器不會對其作特殊處理. ~~~ %set y {/n$x [expr 10+100]} /n$x [expr 10+100] ~~~ ## 2.3 注釋 TCL 中的注釋符是`#`, `#`和直到所在行結尾的所有字符都被 TCL 看作注釋, TCL 解釋器對注釋將不作任何處理. 不過, 要注意的是, `#`必須出現在 TCL 解釋器期望命令的第一個字符出現的地方, 才被當作注釋. 例如: ~~~ %#This is a comment %set a 100 # Not a comment wrong # args: should be "set varName ?newValue?" %set b 101 ; # this is a comment 101 ~~~ 第二行中`#`就不被當作注釋符, 因為它出現在命令的中間, TCL 解釋器把它和后面的字符當作命令的參數處理, 從而導致錯誤. 而第四行的`#`就被作為注釋, 因為前一個命令已經用一個分號結束, TCL 解釋器期望下一個命令接著出現. 現在在這個位置出現`#`, 隨后的字符就被當作注釋了. # 3\. 變量 ## 3.1 簡單變量一個 TCL 的簡單變量包含兩個部分: 名字和值. 名字和值都可以是任意字符串. 例如一個名為`1323 7&*: hdgg`的變量在 TCL 中都是合法的. 不過為了更好的使用置換(substitution), 變量名最好按 C\\C++ 語言中標識符的命名規則命名. TCL 解釋器在分析一個變量置換時, 只把從`$`符號往后直到第一個不是字母, 數字或下劃線的字符之間的單詞符號作為要被置換的變量的名字. 例如: ~~~ % set a 2 2 set a.1 4 4 % set b $a.1 2.1 ~~~ 在最后一個命令行, 我們希望把變量`a.1`的值付給`b`, 但是 TCL 解釋器在分析時只把`$`符號之后直到第一個不是字母, 數字或下劃線的字符(這里是`.`)之間的單詞符號(這里是`a`)當作要被置換的變量的名字, 所以 TCL 解釋器把`a`置換成`2`,然后把字符串`2.1`付給變量`b`. 這顯然與我們的初衷不同. 當然, 如果變量名中有不是字母, 數字或下劃線的字符, 又要用置換, 可以用花括號把變量名括起來. 例如: ~~~ %set b ${a.1} 4 ~~~ TCL 中的`set`命令能生成一個變量, 也能讀取或改變一個變量的值. 例如: ~~~ % set a {kdfj kjdf} kdfj kjdf ~~~ 如果變量`a`還沒有定義, 這個命令將生成變量`a`, 并將其值置為`kdfj kjdf`, 若`a`已定義, 就簡單的把`a`的值置為`kdfj kjdf`. ~~~ %set a kdfj kjdf ~~~ 這個只有一個參數的`set`命令讀取`a`的當前值`kdfj kjdf`. ## 3.2 數組數組是一些元素的集合. TCL 的數組和普通計算機語言中的數組有很大的區別. 在 TCL 中, 不能單獨聲明一個數組, 數組只能和數組元素一起聲明. 數組中, 數組元素的名字包含兩部分: 數組名和數組中元素的名字, TCL 中數組元素的名字(下標〕可以為任何字符串. 例如: ~~~ set day(monday) 1 set day(tuesday) 2 ~~~ 第一個命令生成一個名為`day`的數組, 同時在數組中生成一個名為`monday`的數組元素, 并把值置為`1`, 第二個命令生成一個名為`tuesday`的數組元素, 并把值置為`2`. 簡單變量的置換已經在前一節討論過, 這里講一下數組元素的置換. 除了有括號之外, 數組元素的置換和簡單變量類似. 例: ~~~ set a monday set day(monday) 1 set b $day(monday) //b 的值為 1, 即 day(monday)的值. set c $day($a) //c 的值為 1, 即 day(monday)的值. ~~~ TCL 不能支持復雜的數據類型, 這是一個很大的缺憾, 也是 TCL 受指責很多的方面. 但是 TCL的一個擴展 ITCL 填補了這個缺憾. ## 3.3 相關命令 ### 3.3.1 `set` 這個命令在 3.1 已有詳細介紹. ### 3.3.2 `unset` 這個命令從解釋器中刪除變量, 它后面可以有任意多個參數, 每個參數是一個變量名, 可以是簡單變量, 也可以是數組或數組元素. 例如: ~~~ % unset a b day(monday) ~~~ 上面的語句中刪除了變量`a`, `b`和數組元素`day(monday)`, 但是數組`day`并沒有刪除, 其他元素還存在, 要刪除整個數組, 只需給出數組的名字. 例如: ~~~ %puts $day(monday) can't read "day(monday)": no such element in array % puts $day(tuesday) 2 %unset day % puts $day(tuesday) can't read "day(tuesday)": no such variable ~~~ ### 3.3.3 `append`和`incr` 這兩個命令提供了改變變量的值的簡單手段. `append`命令把文本加到一個變量的后面, 例如: ~~~ % set txt hello hello % append txt "! How are you" hello! How are you ~~~ `incr`命令把一個變量值加上一個整數. `incr`要求變量原來的值和新加的值都必須是整數. ~~~ %set b a a % incr b expected integer but got "a" %set b 2 2 %incr b 3 5 ~~~ # 4\. 表達式 ## 4.1 操作數 TCL 表達式的操作數通常是整數或實數. 整數一般是十進制的, 但如果整數的第一個字符是`0`(零), 那么 TCL 將把這個整數看作八進制的, 如果前兩個字符是`0x`則這個整數被看作是十六進制的. TCL 的實數的寫法與 ANSI C 中完全一樣. 如: ~~~ 2.1 7.9e+12 6e4 3. ~~~ ## 4.2 運算符和優先級下面的表格中列出了 TCL 中用到的運算符, 它們的語法形式和用法跟 ANSI C 中很相似. 這里就不一一介紹. 下表中的運算符是按優先級從高到低往下排列的. 同一格中的運算符優先級相同. 語法形式結果操作數類型-a負a整型, 浮點型!a非a整型, 浮點型~a整型a*b乘整型, 浮點型a/b除a%b取模整型a+b加整型, 浮點型a-b減a<<b左移位整型a>>b右移位a<b小于整型, 浮點型,字符a>b大于a<=b小于等于a>=b大于等于a==b等于a!=b不等于a&b位操作與整型a^b位操作異或a|b位操作或a&&b邏輯與整型, 浮點型a||b邏輯或a?b:c選擇運算a為整型, 浮點型 ## 4.3 數學函數 TCL 支持常用的數學函數, 表達式中數學函數的寫法類似于 C\\C++ 語言的寫法, 數學函數的參數可以是任意表達式, 多個參數之間用逗號隔開. 例如: ~~~ %set x 2 2 % expr 2* sin($x<3) 1.68294196962 ~~~ 其中`expr`是 TCL 的一個命令, 語法為: `expr arg ?arg ...?` 兩個`?`之間的參數表示可省, 后面介紹命令時對于可省參數都使用這種表示形式. `expr`可以有一個或多個參數, 它把所有的參數組合到一起, 作為一個表達式, 然后求值: ~~~ %expr 1+2*3 7 %expr 1 +2 *3 7 ~~~ 需要注意的一點是, 數學函數并不是命令, 只在表達式中出現才有意義. TCL 中支持的數學函數如下 abs(x)絕對值acos(x)反余弦函數, 返回值范圍: [0, π]asin(x)反正弦, 返回值范圍: [-π/2, π/2]atan(x)反正切, 返回值范圍: [-π/2, π/2]atan2(y, x)y/x的反正切, 返回值范圍: [-π/2, π/2]ceil(x)Smallest integer not less than x.cos(x)Cosine of x (x in radians).cosh(x)Hyperbolic cosine of x.double(i)Real value equal to integer i.exp(x)e raised to the power x.floor(x)Largest integer not greater than x.fmod(x, y)Floating-point remainder of x divided by y.hypot(x, y)Square root of (x^2 + y^2).int(x)Integer value produced by truncating x.log(x)Natural logarithm of x.log10(x)Base 10 logarithm of x.pow(x, y)x raised to the power y.round(x)Integer value produced by rounding x.sin(x)Sine of x (x in radians).sinh(x)Hyperbolic sine of x.sqrt(x)Square root of x.tan(x)Tangent of x (x in radians).tanh(x)Hyperbolic tangent of x. TCL 中有很多命令都以表達式作為參數. 最典型的是`expr`命令. 另外`if`, `while`, `for`等循環控制命令的循環控制中也都使用表達式作為參數. # 5 List list這個概念在 TCL 中是用來表示集合的. TCL 中list是由一堆元素組成的有序集合, list可以嵌套定義, list 每個元素可以是任意字符串, 也可以是list. 下面都是 TCL 中的合法的list: ~~~ {} //空 list {a b c d} {a {b c} d} //list 可以嵌套 ~~~ list 是 TCL 中比較重要的一種數據結構, 對于編寫復雜的腳本有很大的幫助, TCL 提供了很多基本命令對list進行操作, 下面一一介紹. ## 5.1 `list`命令語法: `list ? value value...?` 這個命令生成一個list, list的元素就是所有的`value`. 例: ~~~ % list 1 2 {3 4} 1 2 {3 4} ~~~ ## 5.2 `concat`命令語法: `concat list ?list...?` 這個命令把多個 list 合成一個 list, 每個 list 變成新 list 的一個元素. ## 5.3 `lindex`命令語法: `lindex list index` 返回 list 的第`index`個(0-based)元素. 例: ~~~ % lindex {1 2 {3 4}} 2 3 4 ~~~ ## 5.4 `llength`命令語法: `llength list` 返回 list 的元素個數. 例 ~~~ % llength {1 2 {3 4}} 3 ~~~ ## 5.5 `linsert`命令語法: `linsert list index value ?value...?` 返回一個新串, 新串是把所有的`value`參數值插入 list 的第`index`個(0-based)元素之前得到. 例: ~~~ % linsert {1 2 {3 4}} 1 7 8 {9 10} 1 7 8 {9 10} 2 {3 4} ~~~ ## 5.6 `lreplace`命令語法: `lreplace list first last ?value value ...?` 返回一個新串, 新串是把 list 的第`first`(0-based)到第`last`個(0-based)元素用所有的`value`參數替換得到的. 如果沒有`value`參數, 就表示刪除第`first`到第`last`個元素. 例: ~~~ % lreplace {1 7 8 {9 10} 2 {3 4}} 3 3 1 7 8 2 {3 4} % lreplace {1 7 8 2 {3 4}} 4 4 4 5 6 1 7 8 2 4 5 6 ~~~ ## 5.7 `lrange`命令語法: `lrange list first last` 返回 list 的第`first`(0-based)到第`last`(0-based)元素組成的串,如果`last`的值是`end`, 就是從第`first`個直到串的最后. 例: ~~~ % lrange {1 7 8 2 4 5 6} 3 end 2 4 5 6 ~~~ ## 5.8 `lappend`命令語法: `lappend varname value ?value...?` 把每個`value`的值作為一個元素附加到變量`varname`后面, 并返回變量的新值, 如果`varname`不存在, 就生成這個變量. 例: ~~~ % lappend a 1 2 3 1 2 3 % set a 1 2 3 ~~~ ## 5.9 `lsearch`命令語法: `lsearch ?-exact? ?-glob? ?-regexp? list pattern` 返回 list 中第一個匹配模式`pattern`的元素的索引, 如果找不到匹配就返回-1. `-exact`, `-glob`, `-regexp`是三種模式匹配的技術. `-exact`表示精確匹配; `-glob`的匹配方式和`string match`命令的匹配方式相同, 將在后面第八節介紹 string 命令時介紹; `-regexp`表示正規表達式匹配, 將在第八節介紹`regexp`命令時介紹. 缺省時使用`-glob`匹配. 例: ~~~ % set a { how are you } how are you % lsearch $a y* 2 % lsearch $a y? -1 ~~~ ## 5.10 `lsort`命令語法: `lsort ?options? list` 這個命令返回把 list 排序后的串. `options`可以是如下值: * `-ascii`: 按 ASCII 字符的順序排序比較. 這是缺省情況. * `-dictionary`: 按字典排序, 與`-ascii`不同的地方是: 1. 不考慮大小寫 2. 如果元素中有數字的話, 數字被當作整數來排序. 因此: `bigBoy`排在`bigbang`和`bigboy`之間, `x10y`排在`x9y`和`x11y`之間. * `-integer`: 把 list 的元素轉換成整數, 按整數排序. * `-real`: 把 list 的元素轉換成浮點數, 按浮點數排序. * `-increasing`: 升序(按 ASCII 字符比較) * `-decreasing`: 降序(按 ASCII 字符比較) * `-command command`: TCL 自動利用`command`命令把每兩個元素一一比較, 然后給出排序結果. ## 5.11 `split`命令語法: `split string ?splitChars?` 把字符串`string`按分隔符`splitChars`分成一個個單詞, 返回由這些單詞組成的串. 如果`splitChars`是一個空字符`{}`, `string`被按字符分開. 如果`splitChars`沒有給出, 以空格為分隔符. 例: ~~~ % split "how.are.you" . how are you % split "how are you" how are you % split "how are you" {} h o w { } a r e { } y o u ~~~ ## 5.12 `join`命令語法: `join list ?joinString?` `join`命令是`split`命令的逆. 這個命令把 list 的所有元素合并到一個字符串中, 中間以`joinString`分開. 缺省的`joinString`是空格. 例: ~~~ % join {h o w { } a r e { } y o u} {} how are you % join {how are you} . how.are.you ~~~ # 6\. 控制流 TCL 中的控制流和 C 語言類似, 包括`if`, `while`, `for`, `foreach`, `switch`, `break`, `continue`等命令. 下面分別介紹. ## 6.1 `if`命令語法: `if test1 body1 ?elseif test2 body2 elseif.... ? ?else bodyn?` TCL 先把`test1`當作一個表達式求值, 如果值非0, 則把`body1`當作一個腳本執行并返回所得值, 否則把`test2`當作一個表達式求值, 如果值非0, 則把`body2`當作一個腳本執行并返回所得值……. 例如: ~~~ if { $x>0 } { ..... }elseif{ $x==1 } { ..... }elseif { $x==2 } { .... }else{ ..... } ~~~ 注意, 上例中 **`{`一定要寫在上一行**, 因為如果不這樣, TCL 解釋器會認為`if`命令在換行符處已結束, 下一行會被當成新的命令, 從而導致錯誤的結果. 在下面的循環命令的書寫中也要注意這個問題. 書寫中還要注意的一個問題是 **`if`和`{`之間應該有一個空格**, 否則 TCL 解釋器會把`if{`作為一個整體當作一個命令名, 從而導致錯誤. ## 6.2 循環命令: `while`, `for`, `foreach` ### 6.2.1 `while`命令語法: `while test body` 參數`test`是一個表達式, `body`是一個腳本, 如果表達式的值非`0`, 就運行腳本, 直到表達式為`0`才停止循環, 此時`while`命令中斷并返回一個空字符串. 例如: 假設變量`a`是一個鏈表, 下面的腳本把`a`的值復制到`b`: ~~~ set b " " set i [expr [llength $a] -1] while { $i>=0} { lappend b [lindex $a $i] incr i -1 } ~~~ ### 6.2.2 `for`命令語法: `for init test reinit body` 參數`init`是一個初始化腳本, 第二個參數`test`是一個表達式, 用來決定循環什么時候中斷, 第三個參數`reinit`是一個重新初始化的腳本, 第四個參數`body`也是腳本, 代表循環體. 下例與上例作用相同: ~~~ set b " " for {set i [expr [llength $a] -1]} {$i>=0} {incr i -1} { lappend b [lindex $a $i] } ~~~ ### 6.2.3 `foreach`命令這個命令有兩種語法形式 * `foreach varName list body` 第一個參數`varName`是一個變量, 第二個參數`list`是一個表(有序集合), 第三個參數`body`是循環體. 每次取得鏈表的一個元素, 都會執行循環體一次. 下例與上例作用相同: ~~~ set b " " foreach i $a { set b [linsert $b 0 $i] } ~~~ * `foreach varlist1 list1 ?varlist2 list2 ...? Body` 這種形式包含了第一種形式. 第一個參數`varlist1`是一個循環變量列表, 第二個參數是一個列表`list1`, `varlist1`中的變量會分別取`list1`中的值. `body`參數是循環體. `?varlist2 list2 ...?`表示可以有多個變量列表和列表對出現. 例如: ~~~ set x {} foreach {i j} {a b c d e f} { lappend x $j $i } ~~~ 這時總共有三次循環, `x`的值為`b a d c f e`. ~~~ set x {} foreach i {a b c} j {d e f g} { lappend x $i $j } ~~~ 這時總共有四次循環, `x`的值為`a d b e c f {} g`. ~~~ set x {} foreach i {a b c} {j k} {d e f g} { lappend x $i $j $k } ~~~ 這時總共有三次循環, `x`的值為`a d e b f g c {} {}`. ### 6.2.4 `break`和`continue`命令在循環體中, 可以用`break`和`continue`命令中斷循環. 其中`break`命令結束整個循環過程, 并從循環中跳出, `continue`只是結束本次循環. ### 6.2.5 `switch`命令和 C 語言中`switch`語句一樣, TCL 中的`switch`命令也可以由`if`命令實現. 只是書寫起來較為煩瑣. `switch`命令的語法為: `switch ? options? string { pattern body ? pattern body ...?}` 第一個是可選參數`options`, 表示進行匹配的方式. TCL 支持三種匹配方式: `-exact`方式, `-glob`方式, `-regexp`方式, 缺省情況表示`-glob`方式. `-exact`方式表示的是精確匹配, `-glob`方式的匹配方式和`string match`命令的匹配方式相同(第八節介紹), `-regexp`方式是正規表達式的匹配方式(第八節介紹). 第二個參數`string`是要被用來作測試的值, 第三個參數是括起來的一個或多個元素對, 例: ~~~ switch $x { a - b {incr t1} c {incr t2} default {incr t3} } ~~~ 其中`a`的后面跟一個`-`表示使用和下一個模式相同的腳本. `default`表示匹配任意值. 一旦`switch`命令找到一個模式匹配, 就執行相應的腳本, 并返回腳本的值, 作為`switch`命令的返回值. ## 6.3 `eval`命令 `eval`命令是一個用來構造和執行 TCL 腳本的命令, 其語法為: `eval arg ?arg ...?` 它可以接收一個或多個參數, 然后把所有的參數以空格隔開組合到一起成為一個腳本, 然后對這個腳本進行求值. 例如: ~~~ %eval set a 2 ;set b 4 4 ~~~ ## 6.4 `source`命令 `source`命令讀一個文件并把這個文件的內容作為一個腳本進行求值. 例如: ~~~ source e:/tcl&c/hello.tcl ~~~ 注意路徑的描述應該和 UNIX 相同, 使用`/`而不是`\`. # 7\. 過程(procedure) TCL 支持過程的定義和調用, 在 TCL 中, 過程可以看作是用 TCL 腳本實現的命令, 效果與 TCL 的固有命令相似. 我們可以在任何時候使用`proc`命令定義自己的過程, TCL 中的過程類似于 C 中的函數. ## 7.1 過程定義和返回值 TCL 中過程是由`proc`命令產生的. 例如: ~~~ % proc add {x y } {expr $x+$y} ~~~ `proc`命令的第一個參數是你要定義的過程的名字, 第二個參數是過程的參數列表, 參數之間用空格隔開, 第三個參數是一個 TCL 腳本, 代表過程體. `proc`生成一個新的命令, 可以象固有命令一樣調用: ~~~ % add 1 2 3 ~~~ 在定義過程時, 你可以利用`return`命令在任何地方返回你想要的值. `return`命令迅速中斷過程, 并把它的參數作為過程的結果. 例如: ~~~ % proc abs {x} { if {$x >= 0} { return $x } return [expr -$x] } ~~~ 過程的返回值是過程體中最后執行的那條命令的返回值. ## 7.2 局部變量和全局變量對于在過程中定義的變量, 因為它們只能在過程中被訪問, 并且當過程退出時會被自動刪除, 所以稱為局部變量; 在所有過程之外定義的變量我們稱之為全局變量. TCL 中, 局部變量和全局變量可以同名, 兩者的作用域的交集為空: 局部變量的作用域是它所在的過程的內部; 全局變量的作用域則不包括所有過程的內部. 這一點和 C 語言有很大的不同. 如果我們想在過程內部引用一個全局變量的值, 可以使用`global`命令. 例如: ~~~ % set a 4 4 % proc sample { x } { global a incr a return [expr $a+$x] } % sample 3 8 %set a 5 ~~~ 全局變量`a`在過程中被訪問. 在過程中對`a`的改變會直接反映到全局上. 如果去掉語句`global a`, TCL 會出錯, 因為它不認識變量`a`. ## 7.3 缺省參數和可變個數參數 TCL 還提供三種特殊的參數形式: 首先, 你可以定義一個沒有參數的過程, 例如: ~~~ proc add {} { expr 2+3 } ~~~ 其次, 可以定義具有缺省參數值的過程, 我們可以為過程的部分或全部參數提供缺省值, 如果調用過程時未提供那些參數的值, 那么過程會自動使用缺省值賦給相應的參數. 和 C\\C++中具有缺省參數值的函數一樣, 有缺省值的參數只能位于參數列表的后部, 即在第一個具有缺省值的參數后面的所有參數, 都只能是具有缺省值的參數. 例如: ~~~ proc add {val1 {val2 2} {val3 3}} { expr $val1+$val2+$val3 } ~~~ 則: ~~~ add 1 //值為 6 add 2 20 //值為 25 add 4 5 6 //值為 15 ~~~ 另外, TCL 的過程定義還支持可變個數的參數, 如果過程的最后一個參數是`args`, 那么就表示這個過程支持可變個數的參數調用. 調用時, 位于`args`以前的參數像普通參數一樣處理, 但任何附加的參數都需要在過程體中作特殊處理, 過程的局部變量`args`將會被設置為一個列表, 其元素就是所有附加的變量. 如果沒有附加的變量, `args`就設置成一個空串, 下面是一個例子: ~~~ proc add { val1 args } { set sum $val1 foreach i $args { incr sum $i } return $sum } ~~~ 則: ~~~ add 2 //值為 2 add 2 3 4 5 6 //值為 20 ~~~ ## 7.4 引用: `upvar` 命令語法: `upvar ?level? otherVar myVar ?otherVar myVar ...?` `upvar`命令使得用戶可以在過程中對全局變量或其他過程中的局部變量進行訪問. `upvar`命令的第一個參數`otherVar`是我們希望以引用方式訪問的參數的名字, 第二個參數`myVar`是這個程中的局部變量的名字, 一旦使用了`upvar`命令把`otherVar`和`myVar`綁定, 那么在過程中對局部變量`myVar`的讀寫就相當于對這個過程的調用者中`otherVar`所代表的局部變量的讀寫. 下面是一個例子: ~~~ % proc temp { arg } { upvar $arg b set b [expr $b+2] } % proc myexp { var } { set a 4 temp a return [expr $var+$a] } ~~~ 則: ~~~ % myexp 7 13 ~~~ 這個例子中, `upvar`把`$arg`(實際上是過程`myexp`中的變量`a`)和過程`temp`中的變量`b`綁定, 對`b`的讀寫就相當于對`a`的讀寫. `upvar`命令語法中的`level`參數表示: 調用`upvar`命令的過程相對于我們希望引用的變量`myVar`在調用棧中相對位置. 例如: ~~~ upvar 2 other x ~~~ 這個命令使得當前過程的調用者的調用者中的變量`other`, 可以在當前過程中利用`x`訪問. 缺省情況下, `level`的值為`1`, 即當前過程(上例中的`temp`)的調用者(上例中的`myexp`)中的變量(上例中`myexp`的`a`)可以在當前過程中利用局部變量(上例中`temp`的`b`)訪問. 如果要訪問全局變量可以這樣寫: ~~~ upvar #0 other x ~~~ 那么, 不管當前過程處于調用棧中的什么位置, 都可以在當前過程中利用`x`訪問全局變量`other`. # 8\. 字符串操作因為 TCL 把所有的輸入都當作字符串看待, 所以 TCL 提供了較強的字符串操作功能, TCL 中與字符串操作有關的命令有: `string`, `format`, `regexp`, `regsub`, `scan`等. ## 8.1 `format`命令語法: `format formatstring ?vlue value...?` `format`命令類似于 ANSIC 中的`sprintf`函數和 MFC 中`CString`類提供的`Format`成員函數. 它按`formatstring`提供的格式, 把各個`value`的值組合到`formatstring`中形成一個新字符串, 并返回. 例如: ~~~ %set name john John %set age 20 20 %set msg [format "%s is %d years old" $name $age] john is 20 years old ~~~ ## 8.2 `scan`命令語法: `scan string format varName ?varName ...?` `scan`命令可以認為是`format`命令的逆, 其功能類似于 ANSI C 中的`scanf`函數. 它按`format`提供的格式分析`string`字符串, 然后把結果存到變量`varName`中,注意除了空格和 TAB 鍵之外, `string`和`format`中的字符和`%`必須匹配. 例如: ~~~ % scan "some 26 34" "some %d %d" a b 2 % set a 26 % set b 34 % scan "12.34.56.78" "%d.%d.%d.%d" c d e f 4 % puts [format "the value of c is %d,d is %d,e is %d ,f is %d" $c $d $e $f] the value of c is 12,d is 34,e is 56 ,f is 78 ~~~ `scan`命令的返回值是匹配的變量個數. 而且, 我們發現, 如果變量`varName`不存在的話, TCL 會自動聲明該變量. ## 8.3 `regexp`命令語法: `regexp ?switchs? ?--? exp string ?matchVar?\ ?subMatchVar subMatchVar...?` `regexp`命令用于判斷正規表達式`exp`是否全部或部分匹配字符串`string`, 匹配返回`1`, 否則`0`. 在正規表達式中, 一些字符具有特殊的含義, 下表一一列出, 并給予了解釋. 字符意義.匹配任意單個字符^表示從頭進行匹配$表示從末尾進行匹配\x匹配字符x, 這可以抑制字符x的含義[chars]匹配字符集合chars中給出的任意字符, 如果chars中的第一個字符是^, 表示匹配任意不在chars中的字符, chars的表示方法支持a-z之類的表示(regexp)把regexp作為一個單項進行匹配*對*前面的項0進行次或多次匹配+對+前面的項進行1次或多次匹配?對?前面的項進行0次或1次匹配regexp1|regexp2匹配regexp1或regexp2中的一項下面的一個例子是從《Tcl and Tk ToolKit》中摘下來的, 下面進行說明: `^((0x)?[0-9a-fA-F]+|[0-9]+)$` 這個正規表達式匹配任何十六進制或十進制的整數. 兩個正規表達式以`|`分開`(0x)？[0-9a-fA-F]+`和`[0-9]+`, 表示可以匹配其中的任何一個, 事實上前者匹配十六進制, 后者匹配十進制. `^`表示必須從頭進行匹配, 從而上述正規表達式不匹配`jk12`之類不是以`0x`或數字開頭的串. `$`表示必須從末尾開始匹配, 從而上述正規表達式不匹配`12jk`之類不是數字或`a-fA-F`結尾的串. 下面以`(0x)？[0-9a-fA-F]+`進行說明, `(0x)`表示`0x`一起作為一項, `?`表示前一項`(0x)`可以出現0次或多次, `[0-9a-fA-F]`表示可以是任意0到9之間的單個數字或a到f或A到F之間的單個字母, `+`表示象前面那樣的單個數字或字母可以重復出現一次或多次. ~~~ % regexp {^((0x)?[0-9a-fA-F]+|[0-9]+)$} ab 1 % regexp {^((0x)?[0-9a-fA-F]+|[0-9]+)$} 0xabcd 1 % regexp {^((0x)?[0-9a-fA-F]+|[0-9]+)$} 12345 1 % regexp {^((0x)?[0-9a-fA-F]+|[0-9]+)$} 123j 0 ~~~ 如果`regexp`命令后面有參數`matchVar`和`subMatchVar`, 則所有的參數被當作變量名, 如果變量不存在, 就會被生成. `regexp`把匹配整個正規表達式的子字符串賦給第一個變量, 匹配正規表達式的最左邊的子表達式的子字符串賦給第二個變量, 依次類推, 例如: ~~~ % regexp { ([0-9]+) *([a-z]+)} " there is 100 apples" total num word 1 % puts " $total ,$num,$word" 100 apples ,100,apples ~~~ `regexp`可以設置一些開關(switchs), 來控制匹配結果: * `-nocase`: 匹配時不考慮大小寫 * `-indices`: 改變各個變量的值, 這使各個變量的值變成了對應的匹配子串在整個字符串中所處位置的索引. 例如: ~~~ % regexp -indices { ([0-9]+) *([a-z]+)} " there is 100 apples" total num word 1 % puts " $total ,$num,$word" 9 20 ,10 12,15 20 ~~~ 正好子串 `100 apples`的序號是 9-20, `100`的序號是 10-12, `apples`的序號是 15-20 * `-about`: 返回正則表達式本身的信息, 而不是對緩沖區的解析. 返回的是一個 list, 第一個元素是子表達式的個數, 第二個元素開始存放子表達式的信息. * `-expanded`: 啟用擴展的規則, 將空格和注釋忽略掉, 相當于使用內嵌語法(`?x`) * `-line`: 啟用行敏感匹配. 正常情況下`^`和`$`只能匹配緩沖區起始和末尾, 對于緩沖區內部新的行是不能匹配的, 通過這個開關可以使緩沖區內部新的行也可以被匹配. 它相當于同時使用`-linestop`和`-lineanchor`開關, 或者使用內嵌語法(`?n`) * `-linestop`: 啟動行結束敏感開關. 使`^`可以匹配緩沖區內部的新行. 相當于內嵌語法(`?p`) * `-lineanchor`: 改變`^`和`$`的匹配行為, 使可以匹配緩沖區內部的新行. 相當于內嵌語法(`?w`) * `-all`: 進最大可能的匹配 * `-inline`: Causes the command to return, as a list, the data that would otherwise be placed in match variables. When using -inline, match variables may not be specified. If used with -all, the list will be concatenated at each iteration, such that a flat list is always returned. For each match iteration, the command will append the overall match data, plus one element for each subexpression in the regular expression. Examples are: ~~~ regexp -inline -- {\w(\w)} " inlined " => {in n} regexp -all -inline -- {\w(\w)} " inlined " => {in n li i ne e} ~~~ * `-start index`: 強制從偏移為`index`開始的位置進行匹配. 使用這個開關之后, `^`將不能匹配行起始位置, `\A`將匹配字符串的`index`偏移位置. 如果使用了`-indices`開關, 則`indices`表示絕對位置, `index`表示輸入字符的相對位置. * `--`: 表示這后面再沒有開關(switchs)了, 即使后面有以`-`開頭的參數也被當作正規表達式的一部分. ## 8.4 `regsub`命令語法: `regsub ?switchs? exp string subSpec varname` `regsub`的第一個參數是一個整個表達式, 第二個參數是一個輸入字符串, 這一點和`regexp`命令完全一樣, 也是當匹配時返回 1, 否則返回 0. 不過`regsub`用第三個參數的值來替換字符串`string`中和正規表達式匹配的部分, 第四個參數被認為是一個變量, 替換后的字符串存入這個變量中. 例如: ~~~ % regsub there "They live there lives " their x 1 % puts $x They live their lives ~~~ 這里`there`被用`their`替換了. `regsub`命令也有幾個開關(switchs): * `-nocase`: 意義同`regexp`命令中. * `-all`: 沒有這個開關時, `regsub`只替換第一個匹配, 有了這個開關, `regsub`將把所有匹配的地方全部替換. * `--`: 意義同`regexp`命令中. ## 8.5 `string`命令 `string`命令的語法: `string option arg ?arg...?` `string`命令具有強大的操作字符串的功能, 其中的`option`選項多達 20 個. 下面介紹其中常用的部分. ### 8.5.1 `string compare ?-nocase? ?-length int? string1 string2` 把字符串`string1`和`string2`進行比較, 返回值為-1, 0 或 1, 分別對應`string1`小于, 等于或大于`string2`. 如果有`-length`參數, 那么只比較前`int`個字符, 如果`int`為負數, 那么這個參數被忽略. 如果有`-nocase`參數, 那么比較時不區分大小寫. ### 8.5.2 `string equal ?-nocase? ?-length int? string1 string2` 把字符串`string1`和`string2`進行比較, 如果兩者相同, 返回值為 1, 否則返回 0. 其他參數與 8.5.1 同. ### 8.5.3 `string first string1 string2 ?startindex?` 在`string2`中從頭查找與`string1`匹配的字符序列, 如果找到, 那么就返回匹配的第一個字母所在的位置(0-based). 如果沒有找到, 那么返回-1. 如果給出了`startindex`變量, 那么將從`startindex`處開始查找. 例如: ~~~ % string first ab defabc 3 % string first ab defabc 4 -1 ~~~ ### 8.5.4 `string index string charIndex` 返回`string`中第`charIndex`個字符(0-based). `charIndex`可以是下面的值: * `整數 n`: 字符串中第 n 個字符(0-based) * `end`: 最后一個字符 * `end-整數 n`: 倒數第 n 個字符. `string index "abcd" end-1` 返回字符`c` 如果`charIndex`小于 0, 或者大于字符串`string`的長度, 那么返回空. 例如: ~~~ % string index abcdef 2 c % string index abcdef end-2 d ~~~ ### 8.5.5 `string last string1 string2 ?startindex?` 參照 8.5.3, 唯一的區別是從后往前查找 ### 8.5.6 `string length string` 返回字符串`string`的長度. ### 8.5.7 `string match ?-nocase? pattern string` 如果`pattern`匹配`string`, 那么返回 1, 否則返回 0. 如果有`-nocase`參數, 那么就不區分大小寫. 在`pattern`中可以使用通配符: * `*`: 匹配 string 中的任意長的任意字符串, 包括空字符串. * `?`: 匹配 string 中任意單個字符 * `[chars]`: 匹配字符集合 chars 中給出的任意字符,其中可以使用 A-Z 這種形式 * `\x`: 匹配單個字符 x, 使用`\`是為了讓 x 可以為字符\*,-,\[,\]. 例子: ~~~ % string match * abcdef 1 % string match a* abcdef 1 string match a?cdef abcdef 1 % string match {a[b-f]cdef} abcdef //注意一定要用'{',否則 TCL 解釋器會把 b-f 當作命令名 1 //從而導致錯誤 % string match {a[b-f]cdef} accdef 1 ~~~ ### 8.5.8 `string range string first last` 返回字符串`string`中從第`first`個到第`last`個字符的子字符串(0-based). 如果`first`<0, 那么`first`被看作 0, 如果`last`大于或等于字符串的長度, 那么`last`被看作`end`, 如果`first`比`last`大, 那么返回空. ### 8.5.9 `string repeat string count` 返回值為: 重復了`string`字符串`count`次的字符串. 例如: ~~~ % string repeat "abc" 2 abcabc ~~~ ### 8.5.10 `string replace string first last ?newstring?` 返回值為: 從字符串`string`中刪除了第`first`到第`last`個字符(0-based)的字符串, 如果給出了`newstring`變量, 那么就用`newstring`替換從第`first`到第`last`個字符. 如果`first`<0, 那么`first`被看作 0, 如果`last`大于或等于字符串的長度, 那么`last`被看作`end`, 如果`first`比`last`大或者大于字符串`string`的長度或者`last`小于 0, 那么原封不動的返回`string`. ### 8.5.11 `string tolower string ?first? ?last?` 返回值為: 把字符串`string`轉換成小寫后的字符串, 如果給出了`first`和`last`變量, 就只轉換`first`和`last`之間的字符. ### 8.5.12 `string toupper string ?first? ?last?` 同 8.5.11. 轉換成大寫. ### 8.5.13 `string trim string ?chars?` 返回值為: 從`string`字符串的首尾刪除掉了字符集合`chars`中的字符后的字符串. 如果沒有給出`chars`, 那么將刪除掉 spaces, tabs, newlines, carriage returns 這些字符. 例如: ~~~ % string trim "abcde" {a d e} bc % string trim " def > " def ~~~ ### 8.5.14 `string trimleft string ?chars?` 同 8.5.13. 不過只刪除左邊的字符. ### 8.5.15 `string trimright string ?chars?` 同 8.5.13. 不過只刪除右邊的字符. # 9\. 文件訪問 TCL 提供了豐富的文件操作的命令. 通過這些命令你可以對文件名進行操作(查找匹配某一模式的文件), 以順序或隨機方式讀寫文件, 檢索系統保留的文件信息(如最后訪問時間). ## 9.1 文件名 TCL 中的文件名和我們熟悉的 windows 表示文件的方法有一些區別: 在表示文件的目錄結構時它使用`/`, 而不是`\`, 這和 TCL 最初是在 UNIX 下實現有關. 比如 C 盤 tcl 目錄下的文件`sample.tcl`在 TCL 中這樣表示: `C:/tcl/sample.tcl`. ## 9.2 基本文件輸入輸出命令這個名為`tgrep`的過程, 可以說明 TCL 文件 I/O 的基本特點: ~~~ proc tgrep { pattern filename} { set f [open $filename r] while { [gets $f line ] } { if {[regexp $pattern $line]} { puts stdout $line } } close $f } ~~~ 以上過程非常象 UNIX 的`grep`命令, 你可以用兩個參數調用它, 一個是模式, 另一個是文件名, `tgrep`將打印出文件中所有匹配該模式的行. 下面介紹上述過程中用到的幾個基本的文件輸入輸出命令. * `open name ?access?` `open`命令以`access`方式打開文件`name`. 返回供其他命令(`gets`, `close`等)使用的文件標識. 如果`name`的第一個字符是`|`, 管道命令被觸發, 而不是打開文件. 文件的打開方式和我們熟悉的 C 語言類似, 有以下方式: * `r`: 只讀方式打開. 文件必須已經存在. 這是默認方式. * `r+`: 讀寫方式打開, 文件必須已經存在. * `w`: 只寫方式打開文件, 如果文件存在則清空文件內容, 否則創建一新的空文件. * `w+`: 讀寫方式打開文件, 如文件存在則清空文件內容, 否則創建新的空文件. * `a`: 只寫方式打開文件, 文件必須存在, 并把指針指向文件尾. * `a+`: 只讀方式打開文件, 并把指針指向文件尾. 如文件不存在, 創建新的空文件. `open`命令返回一個字符串用于表識打開的文件. 當調用別的命令(如: `gets`, `puts`, `close`)對打開的文件進行操作時, 就可以使用這個文件標識符. TCL 有三個特定的文件標識: `stdin`, `stdout`和`stderr`, 分別對應標準輸入, 標準輸出和錯誤通道, 任何時候你都可以使用這三個文件標識. * `gets fileId ?varName?` 讀`fileId`標識的文件的下一行, 忽略換行符. 如果命令中有`varName`就把該行賦給它, 并返回該行的字符數(文件尾返回-1), 如果沒有`varName`參數, 返回文件的下一行作為命令結果(如果到了文件尾, 就返回空字符串). 和`gets`類似的命令是`read`, 不過`read`不是以行為單位的, 它有兩種形式: * `read ?-nonewline? fileId`: 讀并返回`fileId`標識的文件中所有剩下的字節. 如果沒有`nonewline`開關, 則在換行符處停止. * `read fileId numBytes`: 在`fileId`標識的文件中讀并返回下一個`numbytes`字節. * `puts ?-nonewline? ?fileId? string puts` 把`string`寫到`fileId`中, 如果沒有`nonewline`開關的話, 添加換行符. `fileId`默認是`stdout`. 命令返回值為一空字符串. `puts`命令使用 C 的標準 I/O 庫的緩沖區方案, 這就意味著使用`puts`產生的信息不會立即出現在目標文件中. 如果你想使數據立即出現在文件中, 那你就調用`flush`命令: * `flush fileId` 把緩沖區內容寫到`fileId`標識的文件中, 命令返回值為空字符串. `flush`命令迫使緩沖區數據寫到文件中. `flush`直到數據被寫完才返回. 當文件關閉時緩沖區數據會自動`flush`. * `close ?fileId?` 關閉標識為`fileId`的文件, 命令返回值為一空字符串. 這里特別說明的一點是, TCL 中對串口, 管道, socket 等的操作和對文件的操作類似, 以上對文件的操作命令同樣適用于它們. ## 9.3 隨機文件訪問默認文件輸入輸出方式是連續的: 即每個`gets`或`read`命令返回的是上次`gets`或`read`訪問位置后面的字節, 每個`puts`命令寫數據是接著上次`puts`寫的位置接著寫. TCL 提供了`seek`, `tell` 和`eof`等命令使用戶可以非連續訪問文件. 每個打開的打開文件都有訪問點, 即下次讀寫開始的位置. 文件打開時, 訪問點總是被設置為文件的開頭或結尾, 這取決于打開文件時使用的訪問模式. 每次讀寫后訪問位置按訪問的字節數后移相應的位數. 可以使用`seek`命令來改變文件的訪問點: * `seek fileId offset ?origin?`: 把`fileId`標識的文件的訪問點設置為相對于`origin`偏移量為`offset`的位置. `origin`可以是`start`, `current`, `end`, 默認是`start`. 命令的返回值是一空字符串. 例如: `seek fileId 2000`改變`fieleId`標識的文件訪問點, 以便下次讀寫開始于文件的第 2000 個字節. `seek`的第三個參數說明偏移量從哪開始計算. 第三個參數必為`start`, `current`或`end`中的一個. `start`是默認值: 即偏移量是相對文件開始處計算. `current`是偏移量從當前訪問位置計算. `end`是偏移量從文件尾開始計算. * `tell fileId`: 返回`fileId`標識的文件的當前訪問位置. * `eof fileId`: 如果到達`fileId`標識的文件的末尾返回 1, 否則返回 0. ## 9.4 當前工作目錄 TCL 提供兩個命令來管理當前工作目錄: `pwd`和`cd`. `pwd`和 UNIX 下的`pwd`命令完全一樣, 沒有參數, 返回當前目錄的完整路徑. `cd`命令也和 UNIX 命令也一樣, 使用一個參數, 可以把工作目錄改變為參數提供的目錄. 如果`cd`沒使用參數, UNIX 下, 會把工作目錄變為啟動 TCL 腳本的用戶的工作目錄, WINDOWS 下會把工作目錄變為 windows 操作系統的安裝目錄所在的盤的根目錄(如: C:/). 值得注意的是, 提供給`cd`的參數中路徑中的應該用`/`而不是`\`. 如`cd C:/TCL/lib`. 這是 UNIX 的風格. ## 9.5 文件操作和獲取文件信息 TCL 提供了兩個命令進行文件名操作: `glob`和`file`, 用來操作文件或獲取文件信息. `glob`命令采用一種或多種模式作為參數, 并返回匹配這個(些)模式的所有文件的列表, 其語法為: `glob ?switches? pattern ?pattern ...?` 其中`switches`可以取下面的值: * `-nocomplain`: 允許返回一個空串, 沒有`-nocomplain`時, 如果結果是空的, 就返回錯誤. * `--`: 表示`switches`結束, 即后面以`-`開頭的參數將不作為`switches`. `glob`命令的模式采用`string match`命令(見 8.5.7 節)的匹配規則. 例如: ~~~ %glob *.c *.h main.c hash.c hash.h ~~~ 返回當前目錄中所有`.c`或`.h`的文件名. `glob`還允許模式中包含括在花括號中間以逗號分開的多種選擇, 例如: ~~~ %glob { {src,backup}/*.[ch]} src/main.c src/hash.c src/hash.h backup/hash.c ~~~ 下面的命令和上面的命令等價: ~~~ glob {src/*.[ch]} {backup/*.[ch]} ~~~ 注意: 這些例子中模式周圍的花括號是必須的, 可以防止命令置換. 在調用`glob`命令對應的 C 過程前這些括號會被 TCL 解釋器去掉. 如果`glob`的模式以一斜線結束, 那將只匹配目錄名. 例如: `glob */`只返回當前目錄的所有子目錄. 如果`glob`返回的文件名列表為空, 通常會產生一個錯誤. 但是`glob`的在樣式參數之前的第一個參數是`-nocomplain`的話, 這時即使結果為空, `glob`也不會產生錯誤. 對文件名操作的第二個命令是`file`. `file`是有許多選項的常用命令, 可以用來進行文件操作也可以檢索文件信息. 這節討論與名字相關的選項, 下一節描述其他選項. 使用`file`命令時, 我們會發現其中有很明顯的 UNIX 痕跡. * `file atime name`: 返回一個十進制的字符串, 表示文件`name`最后被訪問的時間. 時間是以秒為單位從 1970 年 1 月 1 日 12: 00 AM 開始計算. 如果文件`name`不存在或查詢不到訪問時間就返回錯誤. 例: ~~~ % file atime license.txt 975945600 ~~~ * `file copy ?-force? ?--? source target` * `file copy ?-force? ?--? source ?source ...? targetDir` 這個命令把`source`中指明的文件或目錄遞歸的拷貝到目的地址`targetDir`, 只有當存在`-force`選項時, 已經存在的文件才會被覆蓋. 試圖覆蓋一個非空的目錄或以一個文件覆蓋一個目錄或以一個目錄覆蓋一個文件都會導致錯誤. `--`的含義和前面所說的一樣. * `file delete ?-force? ?--? pathname ?pathname ... ?`: 這個命令刪除`pathname`指定的文件或目錄, 當指定了`-force`時, 非空的目錄也會被刪除. 即使沒有指定`-force`, 只讀文件也會被刪除. 刪除一個不存在的文件不會引發錯誤. * `file dirname name`: 返回`name`中最后一個`/`前的所有字符; 如果`name`不包含`/`, 返回`.`; 如果`name`中最后一個`/`是第`name`的第一個字符, 返回`/`. * `file executable name`: 如果`name`對當前用戶是可以執行的, 就返回 1, 否則返回 0. * `file exists name`: 如果`name`存在于當前用戶擁有搜索權限的目錄下返回 1, 否則返回 0. * `file extension name`: 返回`name`中最后的`.`以后(包括這個小數點)的所有字符. 如果`name`中沒有`.`或最后斜線后沒有`.`返回空字符. * `file isdirectory name`: 如果`name`是目錄返回 1, 否則返回 0. * `file isfile name`: 如果`name`是文件返回 1, 否則返回 0. * `file lstat name arrayName`: 除了利用`lstat`內核調用代理`stat`內核調用之外, 和`file stat`命令一樣, 這意味著如果`name`是一個符號連接, 那么這個命令返回的是這個符號連接的信息, 而不是這個符號連接指向的文件的信息. 對于不支持符號連接的操作系統, 這個命令和`file stat`命令一樣. * `file mkdir dir ?dir ...?`: 這個命令和 UNIX 的`mkdir`命令類似, 創建`dir`中指明的目錄. 如果`dir`已經存在, 這個命令不作任何事情, 也不返回錯誤. 不過如果試圖用一個目錄覆蓋已經存在的一個文件會導致錯誤. 這個命令順序處理各個參數, 如果發生錯誤的話, 馬上退出. * `file mtime name`: 返回十進制的字符串, 表示文件`name`最后被修改的時間. 時間是以秒為單位從 1970 年 1 月 1 日 12: 00 AM 開始計算. * `file owned name`: 如果`name`被當前用戶擁有, 返回 1, 否則返回 0. * `file readable name`: 如果當前用戶可對`name`進行讀操作, 返回 1, 否則返回 0. * `file readlink name`: 返回`name`代表的符號連接所指向的文件. 如果`name`不是符號連接或者找不到符號連接, 返回錯誤. 在不支持符號連接的操作系統(如 windows)中選項`readlink`沒有定義. * `file rename ? -force? ?--? source target` * `file rename ?-force? ?--? source ?source ...? targetDir` 這個命令同時具有重命名和移動文件(夾)的功能. 把`source`指定的文件或目錄改名或移動到`targetDir`下. 只有當存在`-force`選項時, 已經存在的文件才會被覆蓋. 試圖覆蓋一個非空的目錄或以一個文件覆蓋一個目錄或以一個目錄覆蓋一個文件都會導致錯誤. * `file rootname name`: 返回`name`中最后`.`以前(不包括這個小數點)的所有字符. 如果`name`中沒有`.`返回`name`. * `file size name`: 返回十進制字符串, 以字節表示`name`的大小. 如果文件不存在或得不到`name`的大小, 返回錯誤. * `file stat name arrayName`: 調用`stat`內核來訪問`name`, 并設置`arrayName`參數來保存`stat`的返回信息. `arrayName`被當作一個數組, 它將有以下元素: `atime`, `ctime`, `dev`, `gid`, `ino`, `mode`, `mtime`, `nlink`, `size`, `type`和`uid`. 除了`type`以外, 其他元素都是十進制的字符串, `type`元素和`file type`命令的返回值一樣. 其它各個元素的含義如下: * `atime`: 最后訪問時間. * `ctime`: 狀態最后改變時間. * `dev`: 包含文件的設備標識. * `gid`: 文件組標識. * `ino`: 設備中文件的序列號. * `mode`: 文件的 mode 比特位. * `mtime`: 最后修改時間. * `nlink`: 到文件的連接的數目. * `size`: 按字節表示的文件尺寸. * `uid`: 文件所有者的標識. 這里的`atime`, `mtime`, `size`元素與前面討論的`file`的選項有相同的值. 要了解其他元素更多的信息, 就查閱`stat`系統調用的文件; 每個元都直接從相應`stat`返回的結構域中得到. 文件操作的`stat`選項提供了簡單的方法使一次能獲得一個文件的多條信息. 這要比分多次調用`file`來獲得相同的信息量要顯著的快. * `file tail name`: 返回`name`中最后一個斜線后的所有字符, 如果沒有斜線返回`name`. * `file type name`: 返回文件類型的字符串, 返回值可能是下列中的一個: `file`, `directory`, `characterspecial`, `blockSpecial`, `fifo`, `link`或`socket`. * `file writable name`: 如果當前用戶對`name`可進行寫操作, 返回 1, 否則返回 0. # 10\. 錯誤和異常錯誤和異常處理機制是創建大而健壯的應用程序的必備條件之一, 很多計算機語言都提供了錯誤和異常處理機制, TCL 也不例外. 錯誤(Errors)可以看作是異常(Exceptions)的特例. TCL 中, 異常是導致腳本被終止的事件, 除了錯誤還包括`break`, `continue`和`return`等命令. TCL 允許程序俘獲異常, 這樣僅有程序的一部分工作被撤銷. 程序腳本俘獲異常事件以后, 可以忽略它, 或者從異常中恢復. 如果腳本無法恢復此異常, 可以把它重新發布出去. 下面是與異常有關的 TCL 命令: * `catch command ?varName?`: 這個命令把`command`作為 TCL 腳本求值, 返回一個整型值表明`command`結束的狀態. 如果提供`varName`參數, TCL 將生成變量`varName`, 用于保存`command`產生的錯誤消息. * `error message ?info? ?code?`: 這個命令產生一個錯誤, 并把`message`作為錯誤信息. 如果提供`info`參數, 則被用于初始化全局變量`errorInfo`. 如果提供`code`參數, 將被存儲到全局變量`errorCode`中. * `return -code code ?-errorinfo info? ?-errorcode errorCode? ?string?`: 這個命令使特定過程返回一個異常. `code`指明異常的類型, 必須是`ok`, `error`, `return`, `break`, `continue`或者是一個整數. `-errorinfo`選項用于指定全局變量`errorInfo`的初始值, `-errorcode`用于指定全局變量`errorCode`的初始值. `string`給出`return`的返回值或者是相關的錯誤信息, 其默認值為空. ## 10.1 錯誤當發生一個 TCL 錯誤時, 當前命令被終止. 如果這個命令是一大段腳本的一部分, 那么整個腳本被終止. 如果一個 TCL 過程在運行中發生錯誤, 那么過程被終止, 同時調用它的過程, 以至整個調用棧上的活動過程都被終止, 并返回一個錯誤標識和一段錯誤描述信息. 舉個例子, 考慮下面腳本, 它希望計算出列表元素的總和: ~~~ set list {44 16 123 98 57} set sum 0 foreach el $list { set sum [expr $sum+$element] } => can't read "element": no such variable ~~~ 這個腳本是錯誤的, 因為沒有`element`這個變量. TCL 分析`expr`命令時, 會試圖用`element`變量的值進行替換, 但是找不到名字為`element`的變量, 所以會報告一個錯誤. 由于`foreach`命令利用 TCL 解釋器解釋循環體, 所以錯誤標識被返回給`foreach`. `foreach`收到這個錯誤, 會終止循環的執行, 然后把同樣的錯誤標識作為它自己的返回值返回給調用者. 按這樣的順序, 將致使整個腳本終止. 錯誤信息`can't read "element": no such variable`會被一路返回, 并且很可能被顯示給用戶. 很多情況下, 錯誤信息提供了足夠的信息為你指出哪里以及為什么發生了錯誤. 然而, 如果錯誤發生在一組深層嵌套的過程調用中, 僅僅給出錯誤信息還不能為指出哪里發生了錯誤提供足夠信息. 為了幫助我們指出錯誤的位置, 當 TCL 撤銷程序中運行的命令時, 創建了一個跟蹤棧, 并且把這個跟蹤棧存儲到全局變量`errorInfo`中. 跟蹤棧中描述了每一層嵌套調用. 例如發生上面的那個錯誤后, `errorInfo`有如下的值: ~~~ can't read "element": no such variable while executing "expr $sum+$element" ("foreach" body line 2) invoked from within "foreach el $list { set sum [expr $sum+$element] }" ~~~ 在全局變量`errorCode`中, TCL 還提供了一點額外的信息. `errorCode`變量是包含了一個或若干元素的列表. 第一個元素標示了錯誤類別, 其他元素提供更詳細的相關的信息. 不過, `errorCode`變量是 TCL 中相對較新的變量, 只有一部分處理文件訪問和子過程的命令會設置這個變量. 如果一個命令產生的錯誤沒有設置`errorCode`變量, TCL 會填一個`NONE`值. 當用戶希望得到某一個錯誤的詳細的信息, 除了命令返回值中的錯誤信息外, 可以查看全局變量`errorInfo`和`errorCode`的值. ## 10.2 從TCL腳本中產生錯誤大多數 TCL 錯誤是由實現 TCL 解釋器的 C 代碼和內建命令的 C 代碼產生的. 然而, 通過執行TCL 命令`error`產生錯誤也是可以的, 見下面的例子: ~~~ if {($x<0)||($x>100)} { error "x is out of range ($x)" } ~~~ `error`命令產生了一個錯誤, 并把它的參數作為錯誤消息. 作為一種編程的風格, 你應該只在迫不得已終止程序時下才使用`error`命令. 如果你認為錯誤很容易被恢復而不必終止整個腳本, 那么使用通常的`return`機制聲明成功或失敗會更好(例如, 命令成功返回某個值, 失敗返回另一個值, 或者設置變量來表明成功或失敗). 盡管從錯誤中恢復是可能的, 但恢復機制比通常的`return`返回值機制要復雜. 因此, 最好是在你不想恢復的情況下才使用`error`命令. ## 10.3 使用`catch`捕獲錯誤錯誤通常導致所有活動的 TCL 命令被終止, 但是有些情況下, 在錯誤發生后繼續執行腳本是有用的. 例如, 你用`unset`取消變量`x`的定義, 但執行`unset`時, `x可能不存在. 如果你用`unset`取消不存在的變量, 會產生一個錯誤: ~~~ % unset x can't unset "x": no such variable ~~~ 此時, 你可以用`catch`命令忽略這個錯誤: ~~~ % catch {unset x} 1 ~~~ `catch`的參數是 TCL 腳本. 如果腳本正常完成, `catch`返回 0. 如果腳本中發生錯誤, `catch`會俘獲錯誤(這樣保證`catch`本身不被終止掉)然后返回 1 表示發生了錯誤. 上面的例子忽略`unset`的任何錯誤, 這樣如果`x`存在則被取消, 即使`x`以前不存在也對腳本沒有任何影響. `catch`命令可以有第二個參數. 如果提供這個參數, 它應該是一個變量名, `catch`把腳本的返回值或者是出錯信息存入這個變量. ~~~ %catch {unset x} msg 1 %set msg can't unset "x": no such variable ~~~ 在這種情況下, `unset`命令產生錯誤, 所以`msg`被設置成包含了出錯信息. 如果變量`x`存在, 那么`unset`會成功返回, 這樣`catch`的返回值為 0, `msg`存放`unset`命令的返回值, 這里是個空串. 如果在命令正常返回時, 你想訪問腳本的返回值, 這種形式很有用; 如果你想在出錯時利用錯誤信息做些什么, 如產生 log 文件, 這種形式也很有用. ## 10.4 其他異常錯誤不是導致運行中程序被終止的唯一形式. 錯誤僅是被稱為異常的一組事件的一個特例. 除了`error`, TCL 中還有三種形式的異常, 他們是由`break`, `continue`和`return`命令產生的. 所有的異常以相同的方式導致正在執行的活動腳本被終止, 但有兩點不同: 首先, `errorInfo`和`errorCode`只在錯誤異常中被設置; 其次, 除了錯誤之外的異常幾乎總是被一個命令俘獲, 不會波及其他, 而錯誤通常撤銷整個程序中所有工作. 例如, `break`和`continue`通常是被引入到一個如`foreach`的循環命令中; `foreach`將俘獲`break`和`continue`異常, 然后終止循環或者跳到下一次重復. 類似地, `return`通常只被包含在過程或者被`source`引入的文件中. 過程實現和`source`命令將俘獲`return`異常. 所有的異常伴隨一個字符串值. 在錯誤情況, 這個串是錯誤信息, 在`return`方式, 字符串是過程或腳本的返回值, 在`break`和`continue`方式, 字符串是空的. `catch`命令其實可以俘獲所有的異常, 不僅是錯誤. `catch`命令的返回值表明是那種情況的異常, `catch`命令的第二個參數用來保存與異常相關的串. 例如: ~~~ %catch {return "all done"} string 2 %set string all done ~~~ 下表是對命令: `catch command ?varName?` 的說明. catch 返回值描述俘獲者0正常返回, varName 給出返回值無異常1錯誤. varName 給出錯誤信息catch2執行了 return 命令, varName 包含過程返回值或者返回給 source 的結果catch,source,過程調用3執行了 break 命令, varName 為空catch,for,foreach,while,過程4執行了 continue 命令, varName 為空catch,for,foreach,while,過程其他值用戶或應用自定義catch 與`catch`命令提供俘獲所有異常的機制相對應, `return`可以提供產生所有類型異常. 這里有一個`do`命令的實現, 使用了`catch`和`return`來正確處理異常: ~~~ proc do {varName first last body} { global errorInfo errorCode upvar $varName v for {set v $first} {$v <= $last} {incr v} { switch [catch {uplevel $body} string] { 1 {return -code error -errorInfo $errorInfo \ -errorcode $errorCode $string} 2 {return -code return $string} 3 return } } } ~~~ 這個新的實現在`catch`命令中求循環體的值, 然后檢查循環體是如何結束的. 如果沒有發生異常(0), 或者異常是`continue`(4), 那么`do`繼續下一個循環. 如果發生`error`(1)或者`return`(2), 那么`do`使用`return`把異常傳遞到調用者. 如果發生了`break`(3)異常, 那么`do`正常返回到調用者, 循環結束. 當`do`反射一個`error`到上層時, 它使用了`return`的`-errorInfo`選項, 保證錯誤發生后能夠得到一個正確的調用跟蹤棧. `-errorCode`選項用于類似的目的以傳遞由`catch`命令得到的初始`errorCode`, 作為`do`命令的`errorCode`返回. 如果沒有`-errorCode`選項, `errorCode`變量總是得到`NONE`值. # 11\. 深入 TCL 本章描述了一個允許您查詢和操縱TCL解釋器內部狀態的命令集. 例如, 您可以通過這些命令看一個變量是否存在, 可以查看數組有哪些入口(entry), 監控所有對變量的訪問操作, 可以重命名和刪除一個命令或處理那些未定義命令的參考信息. ## 11.1 查詢數組中的元素利用`array`命令可以查詢一個數組變量中已經定義了的元素的信息. `array`命令的形式如下: `array option arrayName ?arg arg ...?` 由于`option`的不同, `array`命令有多種形式. 如果我們打算開始對一個數組的元素進行查詢, 我們可以先啟動一個搜索(search), 這可以由下面的命令做到: * `array startserach arrayName`: 這個命令初始化一個對`name`數組的所有元素的搜索(search), 返回一個搜索標識(search identifier), 這個搜索標識將被用于命令`array nextelement`, `array anymore`和`array donesearch`. * `array nextelement arrayName searchId`: 這個命令返回`arrayName`的下一個元素, 如果`arrayName`的所有元素在這一次搜索中都已經返回, 那么返回一個空字符串. 搜索標識`searchId`必須是`array startserach`的返回值. 注意: 如果對`arrayName`的元素進行了添加或刪除, 那么所有的搜索都會自動結束, 就象調用了命令`array donesearch`一樣, 這樣會導致`array nextelement`操作失敗. * `array anymore arrayName searchId`: 如果在一個搜索中還有元素就返回 1, 否則返回 0. `searchId`同上. 這個命令對具有名字為空的元素的數組尤其有用, 因為這時從`array nextelement`中不能確定一個搜索是否完成. * `array donesearch arrayName searchId`: 這個命令中止一個搜索, 并銷毀和這個搜索有關的所有狀態. `searchId`同上. 命令返回值為一個空字符串. 當一個搜索完成時一定要注意調用這個命令. `array`命令的其他`option`如下: * `array exists arrayName`: 如果存在一個名為`arrayName`的數組, 返回 1, 否則返回 0. * `array get arrayName ?pattern?`: 這個命令的返回值是一個元素個數為偶數的的 list. 我們可以從前到后把相鄰的兩個元素分成一個個數據對, 那么, 每個數據對的第一個元素是`arrayName`中元素的名字, 數據對的第二個元素是該數據元素的值. 數據對的順序沒有規律. 如果沒有`pattern`參數, 那么數組的所有元素都包含在結果中, 如果有`pattern`參數, 那么只有名字和`pattern`匹配(用`string match`的匹配規則)的元素包含在結果中. 如果`arrayName`不是一個數組變量的名字或者數組中沒有元素, 那么返回一個空 list. 例: ~~~ % set b(first) 1 1 % set b(second) 2 2 % array get b second 2 first 1 ~~~ * `array set arrayName list`: 設置數組`arrayName`的元素的值. list 的形式和`array get`的返回值的 list 形式一樣. 如果`arrayName`不存在, 那么生成`arrayName`. 例: ~~~ % array set a {first 1 second 2} % puts $a(first) 1 % array get a second 2 first 1 ~~~ * `array names arrayName ?pattern?`: 這個命令返回數組`arrayName`中和模式`pattern`匹配的元素的名字組成的一個 list. 如果沒有`pattern`參數, 那么返回所有元素. 如果數組中沒有匹配的元素或者`arrayName`不是一個數組的名字, 返回一個空字符串. * `array size arrayName`: 返回代表數組元素個數的一個十進制的字符串, 如果`arrayName`不是一個數組的名字, 那么返回 0. 下面這個例子通過使用`array names`和`foreach`命令, 枚舉了數組所有的元素: ~~~ foreach i [array names a] { puts "a($i)=$a($i)" } ~~~ 當然, 我們也可以利用`startsearch`, `anymore`, `nextelement`和`donesearch`選項來遍歷一個數組．這種方法比上面所給出的`foreach`方法的效率更高, 不過要麻煩得多, 因此不常用. ## 11.2 `info`命令 `info`命令提供了查看 TCL 解釋器信息的手段, 它有超過一打的選項, 詳細說明請參考下面幾節. ### 11.2.1 變量信息 `info`命令的幾個選項提供了查看變量信息的手段. * `info exists varName`: 如果名為`varName`的變量在當前上下文(作為全局或局部變量)存在, 返回 1, 否則返回 0. * `info globals ?pattern?`: 如果沒有`pattern`參數, 那么返回包含所有全局變量名字的一個 list. 如果有`pattern`參數, 就只返回那些和`pattern`匹配的全局變量(匹配的方式和`string match`相同). * `info locals ?pattern?`: 如果沒有`pattern`參數, 那么返回包含所有局部變量(包括當前過程的參數)名字的一個 list, `global`和`upvar`命令定義的變量將不返回. 如果有`pattern`參數, 就只返回那些和`pattern`匹配的局部變量(匹配的方式和`string match`相同). * `info vars ?pattern?`: 如果沒有`pattern`參數, 那么返回包括局部變量和可見的全局變量的名字的一個 list. 如果有`pattern`參數, 就只返回和模式`pattern`匹配的局部變量和可見全局變量. 模式中可以用`namespace`來限定范圍, 如`:foo::option*`, 就只返回`namespace`中和`option*`匹配的局部和全局變量. (注: tcl80 以后引入了`namespace`概念, 不過我們一般編寫較小的 TCL 程序, 可以對`namespace`不予理睬, 用興趣的話可以查找相關資料.) 下面針對上述命令舉例, 假設存在全局變量`global1`和`global2`, 并且有下列的過程存在: ~~~ proc test {arg1 arg2} { global global1 set local1 1 set local2 2 ... } ~~~ 然后在過程中執行下列命令: ~~~ % info vars global1 arg1 arg2 local2 local1 //global2 不可見 % info globals global2 global1 % info locals arg1 arg2 local2 local1 % info vars *al* global1 local2 local1 ~~~ ### 11.2.2 過程信息 `info`命令的另外的一些選項可以查看過程信息. * `info procs ?pattern?`: 如果沒有`pattern`參數, 命令返回當前`namespace`中定義的所有過程的名字. 如果有`pattern`參數, 就只返回那些和`pattern`匹配的過程的名字(匹配的方式和`string match`相同). * `info body procname`: 返回過程`procname`的過程體. `procname`必須是一個 TCL 過程. * `info args procname`: 返回包含過程`procname`的所有參數的名字的一個 list. `procname`必須是一個 TCL 過程. * `info default procname arg varname`: `procname`必須是一個 TCL 過程, `arg`必須是這個過程的一個變量. 如果`arg`沒有缺省值, 命令返回 0; 否則返回 1, 并且把`arg`的缺省值賦給變量`varname`. * `info level ?number?`: 如果沒有`number`參數, 這個命令返回當前過程在調用棧的位置. 如果有`number`參數, 那么返回的是包含在調用棧的位置為`number`的過程的過程名及其參數的一個 list. 下面針對上述命令舉例: ~~~ proc maybeprint {a b {c 24}} { if {$a<$b} { puts stdout "c is $c" } } % info body maybeprint if {$a<$b} { puts stdout "c is $c" } % info args maybeprint a b c % info default maybeprint a x 0 % info default maybeprint a c 1 %set x 24 ~~~ 下面的過程打印出了當前的調用棧, 并顯示了每一個活動過程名字和參數: ~~~ proc printStack{} { set level [info level] for {set i 1} {$i<$level} {incr i} { puts "Level $i:[info level $i]" } } ~~~ ### 11.2.3 命令信息 `info`命令的另外選項可以查看命令信息. * `info commands ?pattern?`: 如果沒有參數`pattern`, 這個命令返回包含當前`namspace`中所有固有, 擴展命令以及以`proc`命令定義的過程在內的所有命令的名字的一個 list. `pattern`參數的含義和`info procs`一樣. * `info cmdcount`: 返回了一個十進制字符串, 表明多少個命令曾在解釋器中執行過. * `info complete command`: 如果命令`command`是完整的, 那么返回 1, 否則返回 0. 這里判斷命令是否完整僅判斷引號, 括號和花括號是否配套. * `info script`: 如果當前有腳本文件正在 Tcl 解釋器中執行, 則返回最內層處于激活狀態的腳本文件名; 否則將返回一個空的字符串. ### 11.2.4 TCL 的版本和庫 * `info tclversion`: 返回為 Tcl 解釋器返回的版本號, 形式為`major.minor`, 例如`8.3`. * `info library`: 返回 Tcl 庫目錄的完全路徑. 這個目錄用于保存 Tcl 所使用的標準腳本, TCL 在初始化時會執行這個目錄下的腳本. ## 11.3 命令的執行時間 TCL 提供`time`命令來衡量 TCL 腳本的性能: `time script ?count?` 這個命令重復執行`script`腳本`count`次. 再把花費的總時間的用`count`除, 返回一次的平均執行時間, 單位為微秒. 如果沒有`count`參數, 就取執行一次的時間. ## 11.4 跟蹤變量 TCL 提供了`trace`命令來跟蹤一個或多個變量. 如果已經建立對一個變量的跟蹤, 則不論什么時候對該變量進行了讀, 寫, 或刪除操作, 就會激活一個對應的 Tcl 命令, 跟蹤可以有很多的用途: 1. 監視變量的用法(例如打印每一個讀或寫的操作). 2. 把變量的變化傳遞給系統的其他部分(例如一個 TK 程序中, 在一個小圖標上始終顯示某個變量的當前值). 3. 限制對變量的某些操作(例如對任何試圖用非十進制數的參數來改變變量的值的行為產生一個錯誤.)或重載某些操作(例如每次刪除某個變量時, 又重新創建它). `trace`命令的語法為: `trace option ?arg arg ...?` 其中`option`有以下幾種形式: * `trace variable name ops command`: 這個命令設置對變量`name`的一個跟蹤: 每次當對變量`name`作`ops`操作時, 就會執行`command`命令. `name`可以是一個簡單變量, 也可以是一個數組的元素或者整個數組. `ops`可以是以下幾種操作的一個或幾個的組合: * `r`: 當變量被讀時激活`command`命令. * `w`: 當變量被寫時激活`command`命令. * `u`: 當變量被刪除時激活`command`命令. 通過用`unset`命令可以顯式的刪除一個變量, 一個過程調用結束則會隱式的刪除所有局部變量. 當刪除解釋器時也會刪除變量, 不過這時跟蹤已經不起作用了. 當對一個變量的跟蹤被觸發時, TCL 解釋器會自動把三個參數添加到命令`command`的參數列表中. 這樣`command`實際上變成了`command name1 name2 op`. 其中`op`指明對變量作的什么操作. `name1`和`name2`用于指明被操作的變量: 如果變量是一個標量, 那么`name1`給出了變量的名字, 而`name2`是一個空字符串; 如果變量是一個數組的一個元素, 那么`name1`給出數組的名字, 而`name2`給出元素的名字; 如果變量是整個數組, 那么`name1`給出數組的名字而`name2`是一個空字符串. 為了讓你很好的理解上面的敘述, 下面舉一個例子: ~~~ trace variable color w pvar trace variable a(length) w pvar proc pvar {name element op} { if {$element !=""} { set name ${name}($element) } upvar $name x puts "Variable $name set to $x" } ~~~ 上面的例子中, 對標量變量`color`和數組元素`a(length)`的寫操作都會激活跟蹤操作`pvar`. 我們看到過程`pvar`有三個參數, 這三個參數 TCL 解釋器會在跟蹤操作被觸發時自動傳遞給`pvar`. 比如如果我們對`color`的值作了改變, 那么激活的就是`pvar color "" w`. 我們敲入: ~~~ % set color green Variable color set to green green ~~~ `command`將在和觸發跟蹤操作的代碼同樣的上下文中執行: 如果對被跟蹤變量的訪問是在一個過程中, 那么`command`就可以訪問這個過程的局部變量. 比如: ~~~ proc Hello { } { set a 2 trace variable b w { puts $a ;list } set b 3 } % Hello 2 3 ~~~ 對于被跟蹤變量的讀寫操作, `command`是在變量被讀之后, 而返回變量的值之前被執行的. 因此, 我們可以在`command`對變量的值進行改變, 把新值作為讀寫的返回值. 而且因為在執行`command`時, 跟蹤機制會臨時失效, 所以在`command`中對變量進行讀寫不會導致`command`被遞歸激活. 例如: ~~~ % trace variable b r tmp % proc tmp {var1 var2 var3 } { upvar $var1 t incr t 1 } % set b 2 2 % puts $b 3 % puts $b 4 ~~~ 如果對讀寫操作的跟蹤失敗, 即`command`失敗, 那么被跟蹤的讀寫操作也會失敗, 并且返回和`command`同樣的失敗信息. 利用這個機制可以實現只讀變量. 下面這個例子實現了一個值只能為正整數的變量: ~~~ trace variable size w forceInt proc forceInt {name element op} { upvar $name x if ![regexp {^[0-9]*$} $x] { error "value must b a postive integer" } } ~~~ 如果一個變量有多個跟蹤信息, 那么各個跟蹤被觸發的先后原則是: 最近添加的跟蹤最先被觸發, 如果有一個跟蹤發生錯誤, 后面的跟蹤就不會被觸發. * `trace vdelete name ops command`: 刪除對變量`name`的`ops`操作的跟蹤. 返回值為一個空字符串. * `trace vinfo name`: 這個命令返回對變量的跟蹤信息. 返回值是一個 list, list 的每個元素是一個子串, 每個子串包括兩個元素: 跟蹤的操作和與操作關聯的命令. 如果變量`name`不存在或沒有跟蹤信息, 返回一個空字符串. ## 11.5 命令的重命名和刪除 `rename`命令可以用來重命名或刪除一個命令. `rename oldName newName`把命令`oldName`改名為`newName`, 如果`newName`為空, 那么就從解釋器中刪除命令`oldName`. 下面的腳本刪除了文件 I/O 命令: ~~~ foreach cmd {open close read gets puts} { rename $cmd {} } ~~~ 任何一個 Tcl 命令都可以被重命名或者刪除, 包括內建命令以及應用中定義的過程和命令. 重命名一個內建命令可能會很有用, 例如, `exit`命令在 Tcl 中被定義為立即退出過程. 如果某個應用希望在退出前獲得清除它內部狀態的機會, 那么可以這樣做: ~~~ rename exit exit.old proc exit status { application-specific cleanup ... exit.old $status } ~~~ 在這個例子中, `exit`命令被重命名為`exit.old`, 并且定義了新的`exit`命令, 這個新命令作了應用必需的清除工作而后調用了改了名字的`exit`命令來結束進程. 這樣在已存在的描述程序中調用`exit`時就會有機會做清理應用狀態的工作. ## 11.6 `unknown`命令 `unknown`命令的語法為: `unknown cmdName ?arg arg ...?` 當一個腳本試圖執行一個不存在的命令時, TCL 解釋器會激活`unknown`命令, 并把那個不存在的命令的名字和參數傳遞給`unknown`命令. `unknown`命令不是 TCL 的核心的一部分, 它是由 TCL 腳本實現的, 可以在 TCL 安裝目錄的`lib`子目錄下的`init.tcl`文件中找到其定義. `unknown`命令具有以下功能: 1. 如果命令是一個在 TCL 的某個庫文件(這里的庫文件指的是 TCL 目錄的`lib`子目錄下的 TCL 腳本文件)中定義的過程, 則加載該庫并重新執行命令, 這叫做“auto-loading”(自動加載), 關于它將在下一節描述. 2. 如果存在一個程序的名字與未知命令一致, 則調用`exec`命令來調用該程序, 這項特性叫做“auto-exec”(自動執行). 例如你輸入`dir`作為一個命令, `unknown`會執行`exec dir`來列出當前目錄的內容, 如果這里的命令沒有特別指明需要輸入輸出重定向, 則自動執行功能會使用當前 Tcl 應用所擁有的標準輸入輸出流, 以及標準錯誤流, 這不同于直接調用`exec`命令, 但是提供了在 Tcl 應用中直接執行其他應用程序的方法. 3. 如果命令是一組特殊字符, 將會產生一個新的調用, 這個調用的內容是歷史上已經執行過的命令. 例如, 如果命令時`!!`則上一條剛執行過的命令會再執行一遍. 下一章將詳細講述該功能. 4. 若命令是已知命令的唯一縮寫, 則調用對應的全名稱的正確命令. 在 TCL 中允許你使用命令名的縮寫, 只要縮寫唯一即可. 如果你不喜歡`unknown`的缺省的行為, 你也可以自己寫一個新版本的`unknown`或者對庫中已有`unknown`的命令進行擴展以增加某項功能. 如果你不想對未知命令做任何處理, 也可以刪除`unknown`, 這樣當調用到未知命令的時候就會產生錯誤. ## 11.7 自動加載在`unknown`過程中一項非常有用的功能就是自動加載, 自動加載功能允許你編寫一組 Tcl 過程放到一個腳本文件中, 然后把該文件放到庫目錄之下, 當程序調用這些過程的時候, 第一次調用時由于命令還不存在就會進入`unknown`命令, 而`unknown`則會找到在哪個庫文件中包含了這個過程的定義, 接著會加載它, 再去重新執行命令, 而到下次使用剛才調用過的命令的時候, 由于它已經存在了, 從而會正常的執行命令, 自動加載機制也就不會被再次啟動. 自動加載提供了兩個好處. 首先, 你可以把有用的過程建立為過程庫, 而你無需精確知道過程的定義到底在哪個源文件中, 自動加載機制會自動替你尋找, 第二個好處在于自動加載是非常有效率的, 如果沒有自動加載機制你將不得不在 TCL 應用的開頭使用`source`命令來加載所有可能用到的庫文件, 而應用自動加載機制, 應用啟動時無需加載任何庫文件, 而且有些用不到的庫文件永遠都不會被加載, 既縮短了啟動時間又節省了內存. 使用自動加載只需簡單的按下面三步來做: 第一, 在一個目錄下創建一組腳本文件作為庫, 一般這些文件都以`.tcl`結尾. 每個文件可以包含任意數量的過程定義. 建議盡量減少各腳本文件之間的關聯, 讓相互關聯的過程位于同一個文件中. 為了能夠讓自動加載功能正確運行, `proc`命令定義一定要頂到最左邊, 并且與函數名用空格分開, 過程名保持與`proc`在同一行上. 第二步, 為自動加載建立索引. 啟動 Tcl 應用比如`tclsh`, 調用命令`auto_mkindex dir pattern`, 第一個參數是目錄名, 第二個參數是一個模式. `auto_mkindex`在目錄`dir`中掃描文件名和模式`pattern`匹配的文件, 并建立索引以指出哪些過程定義在哪些文件中, 并把索引保存到目錄`dir`下一個叫`tclindex`的文件中. 如果修改了文件或者增減過程, 需要重新生成索引. 第三步是在應用中設置變量`auto_path`, 把存放了希望使用到的庫所在的目錄賦給它. `auto_path`變量包含了一個目錄的列表, 當自動加載被啟動的時候, 會搜索`auto_path`中所指的目錄, 檢查各目錄下的`tclindex`文件來確認過程被定義在哪個文件中. 如果一個函數被定義在幾個庫中, 則自動加載使用在`auto_path`中靠前的那個庫. 例如, 若一個應用使用目錄`/usr/local/tcl/lib/shapes`下的庫, 則在啟動描述中應增加: ~~~ set auto_path [linsert $auto_path 0 /usr/local/tcl/lib/shapes] ~~~ 這將把`/usr/local/tcl/lib/shapes`作為起始搜索庫的路徑, 同時保持所有的 Tcl/Tk 庫不變, 但是在`/usr/local/tcl/lib/shapes`中定義的過程具有更高的優先級, 一旦一個含有索引的目錄加到了`auto_path`中, 里面所有的過程都可以通過自動加載使用了. # 12\. 歷史記錄這部分內容主要描述TCL的歷史機制, 涉及到對以前執行過的命令的應用. 歷史機制在一個列表中保留了最近執行過的命令, 使你不必重新敲入命令, 還可以對以前的命令進行修改以創建新的命令而不必重新輸入新的命令, 特別是在命令較長時更加方便. `history`命令的格式為: `history ?option? ?arg arg ...?` 其中`option`可為`add`, `change`, `clear`, `event`, `info`, `keep`, `nextid`, 或者`redo`. 老版本中有`substitute`和`words`, 現在的版本（8.0以后）中被刪掉了, 增添了`clear`. 下面一一介紹. * `history`和`history info`相同, 顯示以前執行過的命令和序號, 如果執行過的命名個數超過了歷史記錄列表允許的最大的數量, 則只能按最大數量顯示最近執行過的命令. 注意: `history`命令本身在歷史記錄列表中也占了一條, 例如, 原來只有一條命令`set a 123`, 現在輸入`history`, 則顯示出兩條歷史記錄: ~~~ %history 1 set a 123 2 history ~~~ * `history add command ?exec?`: 在歷史記錄列表中加一條命令, 如果有`exec`選項, 則執行該命令, 并返回結果; 如果沒指定`exec`選項, 則返回空字符串作為結果. 其中添加的新命令要用雙引號或花括號括上. 如: ~~~ %history add "set b 100" exec 100 %history 1 history add "set b 100" exec 2 set b 100 3 history ~~~ * `history change newValue ?event?`: 用`newValue`替代序號為`event`的命令, 同時歷史記錄中的命令被改寫. 如沒指定`event`則替換當前命令. 如: ~~~ %history 1 set a 123 2 set b 23 3 history %history change "set a 100" 1 %history 1 set a 100 //set a 123被set a 100替換了 2 set b 23 3 history 4 history "set a 100" 1 5 history %history change "set c 1" %history 1 set a 100 2 set b 23 3 history 4 history "set a 100" 1 5 history 6 set c 1 //這里history change "set c 1"被set c 1替換了 7 history ~~~ * `history clear`: 清除歷史記錄列表的內容, 但記錄列表允許的最大記錄數這一屬性仍然保留. 例如, 如果用`history keep 50`把最大記錄數改變為50, `history clear`后記錄內容空了, 但最大記錄數仍為50. * `history event ?event?`: 其中`event`為歷史事件的序號, 返回該序號的命令行, 如果沒指定`event`則返回上一條命令行. 如: ~~~ % history 1 set a 123 2 set b 23 3 history %history event 1 set a 123 %history event history event 1 ~~~ * `history info ?count?`: 沒有`count`參數時和`history`命令一樣, 有`count`參數時返回最近執行的`count`個命令. * `history keep ?count?`: 把歷史記錄列表允許的最大數量設置為`count`. 系統最初的最大記錄為20. * `history nextid`: 返回下一條將要添加到歷史記錄列表中的命令的序號. 例: ~~~ %history 1 set a 123 2 set b 23 3 history %history nextid 5 //因為history nextid的序號為4. ~~~ * `history redo ?event?`: 重新執行記錄列表中序號為`event`的命令. `event`缺省為-1. 快捷鍵操作 * `!!`: 同命令`history redo`相同. * `! event`: 同命令`history redo event`相同. 這兩個快捷鍵操作對應上一節`unknown`命令的第3個功能. # 13\. TCL和C\\C++(略) # 14\. 總結這篇文章是《TCL的使用》和《TCL培訓教程》的補充和修訂, 并加入了很多新內容. 其中第9、10、11、12章是新加的內容, 新加的各個章節是北研測試部TCL興趣小組各個成員共同努力的結果: 第9、12章由付劍仲完成, 第10章杜祥宇完成, 第11章由鄧沈鴻完成, 最后由我對各個章節進行了整理和修改, 統一組稿. 本文基本上介紹了TCL的各個方面, 特別對使用C\\C++語言擴展TCL命令作了詳盡的描述, 這是所有的參考書上難以找到的內容. 參照本文的例子, 用戶完全可以寫出自己的TCL擴展命令. 希望這篇文章能對推廣在測試部使用TCL起一些推動作用. 學習一門計算機語言, 從根本上來說還是要上機實習, 希望測試部所有員工大家都能安裝上 TCL, 加以實習, 在應用的基礎上才能進一步提高. 如果需要一些本文中沒有的內容, 可以查閱 TCL 自帶的幫助. # 其他教程 * [Tcl/Tk 快速入門](http://www.ibm.com/developerworks/cn/education/linux/l-tcl/l-tcl-blt.html) # 附: TCL 正則表達式規則詳細說明 ## ◆DESCRIPTION(描述) A regular expression describes strings of characters. It’s a pattern that matches certain strings and doesn’t match others. ## ◆DIFFERENT FLAVORS OF REs(和標準正則表達式的區別) Regular expressions, as defined by POSIX, come in two flavors: extended REs and basic REs. EREs are roughly those of the traditional egrep, while BREs are roughly those of the traditional ed. This implementation adds a third flavor, advanced REs, basically EREs with some significant extensions. This manual page primarily describes AREs. BREs mostly exist for backward compatibility in some old programs; they will be discussed at the end. POSIX EREs are almost an exact subset of AREs. Features of AREs that are not present in EREs will be indicated. ## ◆REGULAR EXPRESSION SYNTAX(語法) Tcl regular expressions are implemented using the package written by Henry Spencer, based on the 1003.2 spec and some (not quite all) of the Perl5 extensions (thanks, Henry!). Much of the description of regular expressions below is copied verbatim from his manual entry. An ARE is one or more branches, separated by `|`, matching anything that matches any of the branches. A branch is zero or more constraints or quantified atoms, concatenated. It matches a match for the first, followed by a match for the second, etc; an empty branch matches the empty string. A quantified atom is an atom possibly followed by a single quantifier. Without a quantifier, it matches a match for the atom. The quantifiers, and what a so-quantified atom matches, are: ~~~ 字符意義 * a sequence of 0 or more matches of the atom + a sequence of 1 or more matches of the atom ? a sequence of 0 or 1 matches of the atom {m} a sequence of exactly m matches of the atom {m,} a sequence of m or more matches of the atom {m,n} a sequence of m through n (inclusive) matches of the atom; m may not exceed n *? +? ?? {m}? {m,}? {m,n}? non-greedy quantifiers, which match the same possibilities, but prefer the smallest number rather than the largest number of matches (see MATCHING) ~~~ The forms using { and } are known as bounds. The numbers m and n are unsigned decimal integers with permissible values from 0 to 255 inclusive. An atom is one of: ~~~ 字符意義 (re) (where re is any regular expression) matches a match for re, with the match noted for possible reporting (?:re) as previous, but does no reporting () matches an empty string, noted for possible reporting (?:) matches an empty string, without reporting [chars] a bracket expression, matching any one of the chars (see BRACKET EXPRESSIONS for more detail) . matches any single character \k where k is a non-alphanumeric character) matches that character taken as an ordinary character, e.g. \\ matches a backslash character \c where c is alphanumeric (possibly followed by other characters), an escape (AREs only), see ESCAPES below { when followed by a character other than a digit, matches the left-brace character `{`; when followed by a digit, it is the beginning of a bound (see above) x where x is a single character with no other significance, matches that character. ~~~ A constraint matches an empty string when specific conditions are met. A constraint may not be followed by a quantifier. The simple constraints are as follows; some more constraints are described later, under ESCAPES. ~~~ 字符意義 ^ matches at the beginning of a line $ matches at the end of a line (?=re) positive lookahead (AREs only), matches at any point where a substring matching re begins (?!re) negative lookahead (AREs only), matches at any point where no substring matching re begins ~~~ The lookahead constraints may not contain back references (see later), and all parentheses within them are considered non-capturing. An RE may not end with `\`. ## ◆BRACKET EXPRESSIONS(預定義表達式) A bracket expression is a list of characters enclosed in `[]`. It normally matches any single character from the list (but see below). If the list begins with `^`, it matches any single character (but see below) not from the rest of the list. If two characters in the list are separated by `-`, this is shorthand for the full range of characters between those two (inclusive) in the collating sequence, e.g. \[0-9\] in ASCII matches any decimal digit. Two ranges may not share an endpoint, so e.g. a-c-e is illegal. Ranges are very collating-sequence-dependent, and portable programs should avoid relying on them. To include a literal \] or - in the list, the simplest method is to enclose it in \[. and .\] to make it a collating element (see below). Alternatively, make it the first character (following a possible `^`), or (AREs only) precede it with `\`. Alternatively, for `-`, make it the last character, or the second endpoint of a range. To use a literal - as the first endpoint of a range, make it a collating element or (AREs only) precede it with`\`. With the exception of these, some combinations using \[ (see next paragraphs), and escapes, all other special characters lose their special significance within a bracket expression. Within a bracket expression, a collating element (a character, a multi-character sequence that collates as if it were a single character, or a collating-sequence name for either) enclosed in \[. and .\] stands for the sequence of characters of that collating element. The sequence is a single element of the bracket expression’s list. A bracket expression in a locale that has multi-character collating elements can thus match more than one character. So (insidiously), a bracket expression that starts with ^ can match multi-character collating elements even if none of them appear in the bracket expression! (Note: Tcl currently has no multi-character collating elements. This information is only for illustration.) For example, assume the collating sequence includes a ch multi-character collating element. Then the RE \[\[.ch.\]\]\*c (zero or more ch’s followed by c) matches the first five characters of `chchcc`. Also, the RE \[^c\]b matches all of `chb` (because \[^c\] matches the multi-character ch). Within a bracket expression, a collating element enclosed in \[= and =\] is an equivalence class, standing for the sequences of characters of all collating elements equivalent to that one, including itself. (If there are no other equivalent collating elements, the treatment is as if the enclosing delimiters were `[.` and `.]`.) For example, if o and ?are the members of an equivalence class, then `[[=o=]]`, `[[=?]]`, and `[o` are all synonymous. An equivalence class may not be an endpoint of a range. (Note: Tcl currently implements only the Unicode locale. It doesn’t define any equivalence classes. The examples above are just illustrations.) Within a bracket expression, the name of a character class enclosed in \[: and :\] stands for the list of all characters (not all collating elements!) belonging to that class. Standard character classes are: ~~~ 字符意義 alpha A letter. upper An upper-case letter. lower A lower-case letter. digit A decimal digit. xdigit A hexadecimal digit. alnum An alphanumeric (letter or digit). print An alphanumeric (same as alnum). blank A space or tab character. space A character producing white space in displayed text. punct A punctuation character. graph A character with a visible representation. cntrl A control character. ~~~ A locale may provide others. (Note that the current Tcl implementation has only one locale: the Unicode locale.) A character class may not be used as an endpoint of a range. There are two special cases of bracket expressions: the bracket expressions `[[:<:]]` and `[[:>:]]` are constraints, matching empty strings at the beginning and end of a word respectively. A word is defined as a sequence of word characters that is neither preceded nor followed by word characters. A word character is an alnum character or an underscore (\_). These special bracket expressions are deprecated; users of AREs should use constraint escapes instead (see below). ## ◆ESCAPES(轉意字符) Escapes (AREs only), which begin with a \\ followed by an alphanumeric character, come in several varieties: character entry, class shorthands, constraint escapes, and back references. A \\ followed by an alphanumeric character but not constituting a valid escape is illegal in AREs. In EREs, there are no escapes: outside a bracket expression, a \\ followed by an alphanumeric character merely stands for that character as an ordinary character, and inside a bracket expression, \\ is an ordinary character. (The latter is the one actual incompatibility between EREs and AREs.) Character-entry escapes (AREs only) exist to make it easier to specify non-printing and otherwise inconvenient characters in REs: ~~~ 字符意義 \a alert (bell) character, as in C \b backspace, as in C \B synonym for \ to help reduce backslash doubling in some applications where there are multiple levels of backslash processing \cX (where X is any character) the character whose low-order 5 bits are the same as those of X, and whose other bits are all zero \e the character whose collating-sequence name is `ESC`, or failing that, the character with octal value 033 \f formfeed, as in C \n newline, as in C \r carriage return, as in C \t horizontal tab, as in C \uwxyz (where wxyz is exactly four hexadecimal digits) the Unicode character U+wxyz in the local byte ordering \Ustuvwxyz (where stuvwxyz is exactly eight hexadecimal digits) reserved for a somewhat-hypothetical Unicode extension to 32 bits \v vertical tab, as in C are all available. \xhhh (where hhh is any sequence of hexadecimal digits) the character whose hexadecimal value is 0xhhh (a single character no matter how many hexadecimal digits are used). \0 the character whose value is 0 \xy (where xy is exactly two octal digits, and is not a back reference (see below)) the character whose octal value is 0xy \xyz (where xyz is exactly three octal digits, and is not a back reference (see below)) the character whose octal value is 0xyz ~~~ Hexadecimal digits are `0-9`, `a-f`, and `A-F`. Octal digits are `0-7`. The character-entry escapes are always taken as ordinary characters. For example, \\135 is \] in ASCII, but \\135 does not terminate a bracket expression. Beware, however, that some applications (e.g., C compilers) interpret such sequences themselves before the regular-expression package gets to see them, which may require doubling (quadrupling, etc.) the `\`. Class-shorthand escapes (AREs only) provide shorthands for certain commonly-used character classes: ~~~ 縮寫代表的完整表達式 \d [[:digit:]] \s [[:space:]] \w [[:alnum:]_] (note underscore) \D [^[:digit:]] \S [^[:space:]] \W [^[:alnum:]_] (note underscore) ~~~ Within bracket expressions, `\d`, `\s`, and `\w` lose their outer brackets, and `\D`, `\S`, and `\W` are illegal. (So, for example, \[a-c\\d\] is equivalent to \[a-c\[:digit:\]\]. Also, \[a-c\\D\], which is equivalent to `[a-c^[:digit:]]`, is illegal.) A constraint escape (AREs only) is a constraint, matching the empty string if specific conditions are met, written as an escape: ~~~ 字符意義 \A matches only at the beginning of the string (see MATCHING, below, for how this differs from `^`) \m matches only at the beginning of a word \M matches only at the end of a word \y matches only at the beginning or end of a word \Y matches only at a point that is not the beginning or end of a word \Z matches only at the end of the string (see MATCHING, below, for how this differs from `$`) \m (where m is a nonzero digit) a back reference, see below \mnn (where m is a nonzero digit, and nn is some more digits, and the decimal value mnn is not greater than the number of closing capturing parentheses seen so far) a back reference, see below ~~~ A word is defined as in the specification of `[[:<:]]` and `[[:>:]]` above. Constraint escapes are illegal within bracket expressions. A back reference (AREs only) matches the same string matched by the parenthesized subexpression specified by the number, so that (e.g.) (\[bc\])\\1 matches bb or cc but not `bc`. The subexpression must entirely precede the back reference in the RE. Subexpressions are numbered in the order of their leading parentheses. Non-capturing parentheses do not define subexpressions. There is an inherent historical ambiguity between octal character-entry escapes and back references, which is resolved by heuristics, as hinted at above. A leading zero always indicates an octal escape. A single non-zero digit, not followed by another digit, is always taken as a back reference. A multi-digit sequence not starting with a zero is taken as a back reference if it comes after a suitable subexpression (i.e. the number is in the legal range for a back reference), and otherwise is taken as octal. ## ◆METASYNTAX(內嵌語法) In addition to the main syntax described above, there are some special forms and miscellaneous syntactic facilities available. Normally the flavor of RE being used is specified by application-dependent means. However, this can be overridden by a director. If an RE of any flavor begins with `***:`, the rest of the RE is an ARE. If an RE of any flavor begins with `***=`, the rest of the RE is taken to be a literal string, with all characters considered ordinary characters. An ARE may begin with embedded options: a sequence (?xyz) (where xyz is one or more alphabetic characters) specifies options affecting the rest of the RE. These supplement, and can override, any options specified by the application. The available option letters are: ~~~ 字符意義 b rest of RE is a BRE c case-sensitive matching (usual default) e rest of RE is an ERE i case-insensitive matching (see MATCHING, below) m historical synonym for n n newline-sensitive matching (see MATCHING, below) p partial newline-sensitive matching (see MATCHING, below) q rest of RE is a literal string, all ordinary characters s non-newline-sensitive matching (usual default) t tight syntax (usual default; see below) w inverse partial newline-sensitive matching (see MATCHING, below) x expanded syntax (see below) ~~~ Embedded options take effect at the ) terminating the sequence. They are available only at the start of an ARE, and may not be used later within it. In addition to the usual (tight) RE syntax, in which all characters are significant, there is an expanded syntax, available in all flavors of RE with the -expanded switch, or in AREs with the embedded x option. In the expanded syntax, white-space characters are ignored and all characters between a # and the following newline (or the end of the RE) are ignored, permitting paragraphing and commenting a complex RE. There are three exceptions to that basic rule: a white-space character or `#` preceded by `\` is retained white space or `#` within a bracket expression is retained white space and comments are illegal within multi-character symbols like the ARE `(?:` or the BRE `\(` Expanded-syntax white-space characters are blank, tab, newline, and any character that belongs to the space character class. Finally, in an ARE, outside bracket expressions, the sequence `(?#ttt)` (where ttt is any text not containing a `)`) is a comment, completely ignored. Again, this is not allowed between the characters of multi-character symbols like `(?:`. Such comments are more a historical artifact than a useful facility, and their use is deprecated; use the expanded syntax instead. None of these metasyntax extensions is available if the application (or an initial \*\*\*= director) has specified that the user’s input be treated as a literal string rather than as an RE. ## ◆MATCHING(匹配) In the event that an RE could match more than one substring of a given string, the RE matches the one starting earliest in the string. If the RE could match more than one substring starting at that point, its choice is determined by its preference: either the longest substring, or the shortest. Most atoms, and all constraints, have no preference. A parenthesized RE has the same preference (possibly none) as the RE. A quantified atom with quantifier {m} or {m}? has the same preference (possibly none) as the atom itself. A quantified atom with other normal quantifiers (including {m,n} with m equal to n) prefers longest match. A quantified atom with other non-greedy quantifiers (including {m,n}? with m equal to n) prefers shortest match. A branch has the same preference as the first quantified atom in it which has a preference. An RE consisting of two or more branches connected by theoperator prefers longest match. Subject to the constraints imposed by the rules for matching the whole RE, subexpressions also match the longest or shortest possible substrings, based on their preferences, with subexpressions starting earlier in the RE taking priority over ones starting later. Note that outer subexpressions thus take priority over their component subexpressions. Note that the quantifiers {1,1} and {1,1}? can be used to force longest and shortest preference, respectively, on a subexpression or a whole RE. Match lengths are measured in characters, not collating elements. An empty string is considered longer than no match at all. For example, bb* matches the three middle characters of abbbc, (weekwee)(nightknights) matches all ten characters of weeknights, when (.). is matched against abc the parenthesized subexpression matches all three characters, and when (a) is matched against bc both the whole RE and the parenthesized subexpression match an empty string. If case-independent matching is specified, the effect is much as if all case distinctions had vanished from the alphabet. When an alphabetic that exists in multiple cases appears as an ordinary character outside a bracket expression, it is effectively transformed into a bracket expression containing both cases, so that x becomes `[xX]`. When it appears inside a bracket expression, all case counterparts of it are added to the bracket expression, so that \[x\] becomes \[xX\] and \[^x\] becomes `[^xX]`. If newline-sensitive matching is specified, . and bracket expressions using ^ will never match the newline character (so that matches will never cross newlines unless the RE explicitly arranges it) and ^ and $ will match the empty string after and before a newline respectively, in addition to matching at beginning and end of string respectively. ARE \\A and \\Z continue to match beginning or end of string only. If partial newline-sensitive matching is specified, this affects . and bracket expressions as with newline-sensitive matching, but not ^ and `$`. If inverse partial newline-sensitive matching is specified, this affects ^ and $ as with newline-sensitive matching, but not . and bracket expressions. This isn’t very useful but is provided for symmetry. ## ◆LIMITS AND COMPATIBILITY(限制和兼容性) No particular limit is imposed on the length of REs. Programs intended to be highly portable should not employ REs longer than 256 bytes, as a POSIX-compliant implementation can refuse to accept such REs. The only feature of AREs that is actually incompatible with POSIX EREs is that \\ does not lose its special significance inside bracket expressions. All other ARE features use syntax which is illegal or has undefined or unspecified effects in POSIX EREs; the \*\*\* syntax of directors likewise is outside the POSIX syntax for both BREs and EREs. Many of the ARE extensions are borrowed from Perl, but some have been changed to clean them up, and a few Perl extensions are not present. Incompatibilities of note include `\b`, `\B`, the lack of special treatment for a trailing newline, the addition of complemented bracket expressions to the things affected by newline-sensitive matching, the restrictions on parentheses and back references in lookahead constraints, and the longest/shortest-match (rather than first-match) matching semantics. The matching rules for REs containing both normal and non-greedy quantifiers have changed since early beta-test versions of this package. (The new rules are much simpler and cleaner, but don’t work as hard at guessing the user’s real intentions.) Henry Spencer’s original 1986 regexp package, still in widespread use (e.g., in pre-8.1 releases of Tcl), implemented an early version of today’s EREs. There are four incompatibilities between regexp’s near-EREs (`RREs’ for short) and AREs. In roughly increasing order of significance: In AREs, \\ followed by an alphanumeric character is either an escape or an error, while in RREs, it was just another way of writing the alphanumeric. This should not be a problem because there was no reason to write such a sequence in RREs. { followed by a digit in an ARE is the beginning of a bound, while in RREs, { was always an ordinary character. Such sequences should be rare, and will often result in an error because following characters will not look like a valid bound. In AREs, \\ remains a special character within `[]`, so a literal \\ within \[\] must be written `\\`. \\ also gives a literal \\ within \[\] in RREs, but only truly paranoid programmers routinely doubled the backslash. AREs report the longest/shortest match for the RE, rather than the first found in a specified search order. This may affect some RREs which were written in the expectation that the first match would be reported. (The careful crafting of RREs to optimize the search order for fast matching is obsolete (AREs examine all possible matches in parallel, and their performance is largely insensitive to their complexity) but cases where the search order was exploited to deliberately find a match which was not the longest/shortest will need rewriting.) ## ◆BASIC REGULAR EXPRESSIONS(基本正則表達式) BREs differ from EREs in several respects. `|`, `+`, and ? are ordinary characters and there is no equivalent for their functionality. The delimiters for bounds are { and `\}`, with { and } by themselves ordinary characters. The parentheses for nested subexpressions are ( and `\)`, with ( and ) by themselves ordinary characters. ^ is an ordinary character except at the beginning of the RE or the beginning of a parenthesized subexpression, $ is an ordinary character except at the end of the RE or the end of a parenthesized subexpression, and \* is an ordinary character if it appears at the beginning of the RE or the beginning of a parenthesized subexpression (after a possible leading `^`). Finally, single-digit back references are available, and are synonyms for \[\[::\]\] respectively; no other escapes are available.