awk · Linux學習手冊

[TOC] # awk簡介 free | awk '/^Mem:/{print $4}' ps -ef | grep Up | awk '{print $NF}' ###打印最后一列字段 awk其名稱得自于它的創始人 Alfred Aho 、Peter Weinberger 和 Brian Kernighan 姓氏的首個字母。實際上 AWK 的確擁有自己的語言： AWK 程序設計語言，三位創建者已將它正式定義為“樣式掃描和處理語言”。它允許您創建簡短的程序，這些程序讀取輸入文件、為數據排序、處理數據、對輸入執行計算以及生成報表，還有無數其他的功能。 awk?是一種很棒的語言，它適合文本處理和報表生成，其語法較為常見，借鑒了某些語言的一些精華，如?C?語言等。在?linux?系統日常處理工作中，發揮很重要的作用，掌握了?awk將會使你的工作變的高大上。?awk?是三劍客的老大，利劍出鞘，必會不同凡響。 ## 使用方法 awk '{pattern + action}' {filenames} 盡管操作可能會很復雜，但語法總是這樣，其中 pattern 表示 AWK 在數據中查找的內容，而 action 是在找到匹配內容時所執行的一系列命令。花括號（{}）不需要在程序中始終出現，但它們用于根據特定的模式對一系列指令進行分組。 pattern就是要表示的正則表達式，用斜杠括起來。 awk語言的最基本功能是在文件或者字符串中基于指定規則瀏覽和抽取信息，awk抽取信息后，才能進行其他文本操作。完整的awk腳本通常用來格式化文本文件中的信息。通常，awk是以文件的一行為處理單位的。awk每接收文件的一行，然后執行相應的命令，來處理文本。 ## awk 的原理通過一個簡短的命令，我們來了解其工作原理。? ``` [root@Gin scripts]# awk '{print $0}' /etc/passwd ... root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin ... [root@localhost opt]# echo "ha1233" | awk '{print "this is a test"}' this is a test [root@localhost opt]# echo -e "ha1233\nha1344" | awk '{print "this is a test"}' this is a test this is a test ``` 你將會見到/etc/passwd?文件的內容出現在眼前。現在，解釋?awk?做了些什么。調用?awk時，我們指定/etc/passwd?作為輸入文件。執行?awk?時，它依次對/etc/passwd?中的每一行執行?print?命令。所有輸出都發送到?stdout，所得到的結果與執行?cat /etc/passwd?完全相同。現在，解釋{ print }代碼塊。在?awk?中，花括號用于將幾塊代碼組合到一起，這一點類似于?C?語言。在代碼塊中只有一條?print?命令。在?awk?中，如果只出現?print?命令，那么將打印當前行的全部內容。再次說明，?awk?對輸入文件中的每一行都執行這個腳本。 ![](https://i.vgy.me/gIHQBo.jpg) $ awk -F":" '{ print $1 }' /etc/passwd$ awk -F":" '{ print $1 $3 }' /etc/passwd$ awk -F":" '{ print $1 " " $3 }' /etc/passwd$ awk -F":" '{ print "username: " $1 "\t\tuid:" $3" }' /etc/passwd \-F參數：指定分隔符，可指定一個或多個 print 后面做字符串的拼接下面通過幾實例來了解下awk的工作原理： * 實例一：只查看test.txt文件（100行）內第5到第15行的內容（企業面試） ``` [root@localhost opt]# awk '{if(NR>=5 && NR<=15) print $1}' test.txt 5 6 7 8 9 10 11 12 13 14 15 ``` * 實例二：已知test.txt文件內容為： ``` [root@Gin scripts]# cat test.txt I am Poe,my qq is 33794712 ``` 請從該文件中過濾出'Poe'字符串與33794712，最后輸出的結果為：Poe 33794712 ``` [root@Gin scripts]# awk -F '[" ",]+' '{print $3" "$7}' test.txt Poe 33794712 ``` ## BEGIN?和?END?模塊通常，對于每個輸入行，?awk?都會執行每個腳本代碼塊一次。然而，在許多編程情況中，可能需要在?awk?開始處理輸入文件中的文本之前執行初始化代碼。對于這種情況，?awk?允許您定義一個?BEGIN?塊。因為?awk?在開始處理輸入文件之前會執行?BEGIN?塊，因此它是初始化?FS（字段分隔符）變量、打印頁眉或初始化其它在程序中以后會引用的全局變量的極佳位置。 awk?還提供了另一個特殊塊，叫作?END?塊。?awk?在處理了輸入文件中的所有行之后執行這個塊。通常，?END?塊用于執行最終計算或打印應該出現在輸出流結尾的摘要信息。 * 實例一：統計/etc/passwd的賬戶人數 ``` awk '{count++;print $0;} END{print "user count is ",count}' /etc/passwd ... apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin zabbix:x:995:993:Zabbix Monitoring System:/var/lib/zabbix:/sbin/nologin geoclue:x:994:992:User for geoclue:/var/lib/geoclue:/sbin/nologin im_user:x:1000:1000::/home/im_user:/bin/bash user count is 27 ``` count是自定義變量。之前的action{}里都是只有一個print,其實print只是一個語句，而action{}可以有多個語句，以;號隔開。這里沒有初始化count，雖然默認是0，但是妥當的做法還是初始化為0: ``` awk 'BEGIN{count=0;print "[start] user count is ",count} {count=count+1;print $0} END{print "[end] user count is ",count}' /etc/ passwd [start] user count is 0 ... mysql:x:27:27:MariaDB Server:/var/lib/mysql:/sbin/nologin apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin zabbix:x:995:993:Zabbix Monitoring System:/var/lib/zabbix:/sbin/nologin geoclue:x:994:992:User for geoclue:/var/lib/geoclue:/sbin/nologin im_user:x:1000:1000::/home/im_user:/bin/bash [end] user count is 27 ``` ## awk運算符 ![](https://i.vgy.me/Fxwgga.jpg) ### awk?賦值運算符：a+5;等價于：?a=a+5;其他同類 ``` [root@Gin scripts]# awk 'BEGIN{a=5;a+=5;print a}' 10 ``` ### awk邏輯運算符： ``` awk 'BEGIN{a=1;b=2;print (a>2 && b>1,a=1 || b>1)}' 0 1 ``` 判斷表達式 a>2&&b>1為真還是為假，后面的表達式同理 ### awk正則運算符： ``` [root@Gin scripts]# awk 'BEGIN{a="100testaa";if(a~/100/) {print "ok"}}' ok [root@Gin scripts]# echo | awk 'BEGIN{a="100testaaa"}a~/test/{print "ok"}' ok ``` ### 關系運算符：如：?> <?可以作為字符串比較，也可以用作數值比較，關鍵看操作數如果是字符串就會轉換為字符串比較。**兩個都為數字才轉為數值比較**。字符串比較：按照ascii碼順序比較。常見ASCII碼的大小規則，0－9＜[A－Z]＜[a－z] ``` [root@Gin scripts]# awk 'BEGIN{a="11";if (a>=9) {print "ok"}}' #無輸出 [root@Gin scripts]# awk 'BEGIN{a="11";if (a>="9") {print "ok";}}' #無輸出 [root@Gin scripts]# awk 'BEGIN{a="11";if (a<="9") {print "ok";}}' ok [root@Gin scripts]# awk 'BEGIN{a=11;if(a>=9){print "ok"}}'? ok [root@Gin scripts]# awk 'BEGIN{a;if(a>=b){print "ok"}}' ok ``` ### awk?算術運算符：說明，所有用作算術運算符進行操作，操作數自動轉為數值，所有非數值都變為0。 ``` [root@Gin scripts]# awk 'BEGIN{a="b";print a++,++a}' 0 2 [root@Gin scripts]# awk 'BEGIN{a="20";print a++,++a}' 20 22 ``` 這里的a++ , ++a與javascript語言一樣：a++是先賦值加++；++a是先++再賦值 ### 三目運算符 ?: ``` [root@Gin scripts]# awk 'BEGIN{a="b";print a=="b"?"ok":"err"}' ok [root@Gin scripts]# awk 'BEGIN{a="b";print a=="c"?"ok":"err"}' err ``` ## ?常用?awk?內置變量? ![](https://i.vgy.me/MwdCAC.jpg) |屬性 |說明| | --- | --- | |$0 |當前記錄（作為單個變量）| |$1~$n |當前記錄的第n個字段，字段間由FS分隔| |FS |輸入字段分隔符默認是空格| |NF |當前記錄中的字段個數，就是有多少列| |NR |已經讀出的記錄數，就是行號，從1開始| |RS |輸入的記錄他隔符默認為換行符| |OFS |輸出字段分隔符默認也是空格| |ORS |輸出的記錄分隔符，默認為換行符| |ARGC |命令行參數個數| |ARGV |命令行參數數組| |FILENAME |當前輸入文件的名字| |IGNORECASE |如果為真，則進行忽略大小寫的匹配| |ARGIND |當前被處理文件的ARGV標志符| |CONVFMT |數字轉換格式 %.6g| |ENVIRON |UNIX環境變量| |ERRNO |UNIX系統錯誤消息| |FIELDWIDTHS |輸入字段寬度的空白分隔字符串| |FNR |當前記錄數| |OFMT |數字的輸出格式 %.6g| |RSTART |被匹配函數匹配的字符串首| |RLENGTH |被匹配函數匹配的字符串長度| |SUBSEP |\034| ### 字段分隔符 FS > FS="\\t"?一個或多個?Tab?分隔 ``` [root@Gin scripts]# cat tab.txt ww?? CC??????? IDD [root@Gin scripts]# awk 'BEGIN{FS="\t+"}{print $1,$2,$3}' tab.txt ww?? CC??????? IDD ``` > FS="\[\[:space:\]+\]"?一個或多個空白空格，**默認的** ``` [root@Gin scripts]# cat space.txt we are??? studing?awk?now! [root@Gin scripts]# awk -F [[:space:]+] '{print $1,$2,$3,$4,$5}' space.txt we are?? [root@Gin scripts]# awk -F [[:space:]+] '{print $1,$2}' space.txt we are ``` > FS="\[" ":\]+"?以一個或多個空格或：分隔? ``` [root@Gin scripts]# cat hello.txt root:x:0:0:root:/root:/bin/bash [root@Gin scripts]# awk -F [" ":]+ '{print $1,$2,$3}' hello.txt root x 0 ``` ### 字段數量?NF? ``` [root@Gin scripts]# cat hello.txt root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin:888 [root@Gin scripts]# awk -F ":" 'NF==8{print $0}' hello.txt bin:x:1:1:bin:/bin:/sbin/nologin:888 [root@Gin scripts]# awk -F ":" 'NF==7{print $0}' hello.txt root:x:0:0:root:/root:/bin/bash ``` ** NF==7只會取出字段為7的行** ### 記錄數量?NR ``` [root@localhost opt]# ifconfig enp1s0 enp1s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.0.166 netmask 255.255.255.0 broadcast 192.168.0.255 inet6 fe80::428d:5cff:fe7d:9c95 prefixlen 64 scopeid 0x20<link> ether 40:8d:5c:7d:9c:95 txqueuelen 1000 (Ethernet) RX packets 14921658 bytes 2812237063 (2.6 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 1072435 bytes 425047270 (405.3 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@localhost opt]# ifconfig enp1s0 | awk 'NR==2{print $2}' 192.168.0.166 ``` ### RS?記錄分隔符變量 ### OFS?輸出字段分隔符? ``` [root@Gin scripts]# cat hello.txt root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin:888 [root@Gin scripts]# awk 'BEGIN{FS=":"}{print $1","$2","$3}' hello.txt root,x,0 bin,x,1 [root@Gin scripts]# awk 'BEGIN{FS=":";OFS="#"}{print $1,$2,$3}' hello.txt root#x#0 bin#x#1 ``` ### ORS?輸出記錄分隔符? ``` [root@localhost opt]# awk -F ":" 'BEGIN{ORS="\n"} {print $1,$2,$3}' test root x 0 bin x 1 [root@localhost opt]# awk -F ":" 'BEGIN{ORS="\n\n"} {print $1,$2,$3}' test root x 0 bin x 1 [root@localhost opt] ``` ## ?awk?正則? ![](https://i.vgy.me/Si37zw.jpg) ?正則應用 ### 規則表達式 awk '/REG/{action} ' file,/REG/為正則表達式，可以將$0?中，滿足條件的記錄送入到：action?進行處理 [root@Gin scripts]# awk '/root/{print $0}' passwd ##匹配所有包含root的行root:x:0:0:root:/root:/bin/bashoperator:x:11:0:operator:/root:/sbin/nologin [root@Gin scripts]# awk -F: '$5~/root/{print $0}' passwd  ## 以分號作為分隔符，匹配第5個字段是root的行root:x:0:0:root:/root:/bin/bash [root@Gin scripts]# ifconfig eth0|awk 'BEGIN{FS="[[:space:]:]+"} NR==2{print $4}'192.168.17.129 ### 布爾表達式 awk '布爾表達式{action}' file?僅當對前面的布爾表達式求值為真時，?awk?才執行代碼塊。 [root@Gin scripts]# awk -F: '$1=="root"{print $0}' passwdroot:x:0:0:root:/root:/bin/bash[root@Gin scripts]# awk -F: '($1=="root")&&($5=="root") {print $0}' passwdroot:x:0:0:root:/root:/bin/bash ## ?awk?的?if、循環和數組? ### 條件語句 awk?提供了非常好的類似于?C?語言的?if?語句。 ``` { if ($1=="foo"){ if($2=="foo"){ print "uno" }else{ print "one" } }elseif($1=="bar"){ print "two" }else{ print "three" } } ``` 使用?if?語句還可以將代碼：? ``` !?/matchme/?{ print $1 $3 $4 } ``` 轉換成： ``` { 　　if ( $0 !~ /matchme/ ) { 　　　　print $1 $3 $4 　　} } ``` ### 循環結構我們已經看到了?awk?的?while?循環結構，它等同于相應的?C?語言?while?循環。?awk?還有"do...while"循環，它在代碼塊結尾處對條件求值，而不像標準?while?循環那樣在開始處求值。它類似于其它語言中的"repeat...until"循環。以下是一個示例： #### do...while?示例 ``` { count=1do { print "I get printed at least once no matter what" } while ( count !=1 ) } ``` 與一般的?while?循環不同，由于在代碼塊之后對條件求值，?"do...while"循環永遠都至少執行一次。換句話說，當第一次遇到普通?while?循環時，如果條件為假，將永遠不執行該循環。 #### for?循環 awk?允許創建?for?循環，它就象?while?循環，也等同于?C?語言的?for?循環： ``` for ( initial assignment; comparison; increment ) { code block } ``` 以下是一個簡短示例：? ``` for ( x=1;x<=4;x++ ) { print "iteration", x } ``` 此段代碼將打印：? ``` iteration1 iteration2 iteration3 iteration4 ``` #### break?和?continue 此外，如同?C?語言一樣，?awk?提供了?break?和?continue?語句。使用這些語句可以更好地控制?awk?的循環結構。以下是迫切需要?break?語句的代碼片斷： ``` while?死循環 while?(1) { print?"forever and ever..." } while?死循環 1 永遠代表是真，這個?while?循環將永遠運行下去。 ``` 以下是一個只執行十次的循環：? ``` #break 語句示例 x=1 while(1) { 　　print?"iteration", x if?( x==10 ) { break 　　} 　　x++ } ``` 這里，?break?語句用于“逃出”最深層的循環。?"break"使循環立即終止，并繼續執行循環代碼塊后面的語句。 continue?語句補充了?break，其作用如下： ``` x=1while (1) { if ( x==4 ) { x++ continue } print "iteration", x if ( x>20 ) { break } x++ } ``` 這段代碼打印"iteration1"到"iteration21"，?"iteration4"除外。如果迭代等于?4，則增加?x并調用?continue?語句，該語句立即使?awk?開始執行下一個循環迭代，而不執行代碼塊的其余部分。如同?break?一樣， continue?語句適合各種?awk?迭代循環。在?for?循環主體中使用時，?continue?將使循環控制變量自動增加。以下是一個等價循環： for ( x=1;x<=21;x++ ) {    if ( x==4 ) {        continue    }    print "iteration", x} 在while?循環中時，在調用?continue?之前沒有必要增加?x，因為?for?循環會自動增加?x。? ## 數組? AWK?中的數組都是關聯數組,數字索引也會轉變為字符串索引? ``` { cities[1]=”beijing” cities[2]=”shanghai” cities[“three”]=”guangzhou” for( c in cities) { print cities[c] } print cities[1] print cities[“1”] print cities[“three”] } ``` for…in?輸出，因為數組是關聯數組，默認是無序的。所以通過?for…in?得到是無序的數組。如果需要得到有序數組，需要通過下標獲得。數組的典型應用用?awk?中查看服務器連接狀態并匯總? ``` netstat -an|awk '/^tcp/{++s[$NF]}END{for(a in s)print a,s[a]}' ESTABLISHED 1 LISTEN 20 ``` 統計?web?日志訪問流量，要求輸出訪問次數，請求頁面或圖片，每個請求的總大小，總訪問流量的大小匯總 ``` awk '{a[$7]+=$10;++b[$7];total+=$10}END{for(x in a)print b[x],x,a[x]|"sort -rn -k1";print "total size is :"total}' /app/log/access_log total size is :172230 21 /icons/poweredby.png 83076 14 / 70546 8 /icons/apache_pb.gif 18608 a[$7]+=$10 表示以第 7 列為下標的數組（ $10 列為$7 列的大小），把他們大小累加得到 $7 每次訪問的大小，后面的 for 循環有個取巧的地方， a 和 b 數組的下標相同，所以一條 for 語句足矣 ``` ## 常用字符串函數 ![](https://i.vgy.me/fV07nF.jpg) ![](https://i.vgy.me/XIWU6Q.jpg) 字符串函數的應用? ### 替換? ``` awk 'BEGIN{info="this is a test2010test!";gsub(/[0-9]+/,"!",info);print info}' this is a test!test! 在 info 中查找滿足正則表達式， /[0-9]+/ 用”!”替換，并且替換后的值，賦值給 info 未給 info 值，默認是$0 ``` ### 查找? ``` awk 'BEGIN{info="this is a test2010test!";print index(info,"test")?"ok":"no found";}' ok #未找到，返回 0 ``` ### 匹配查找? ``` awk 'BEGIN{info="this is a test2010test!";print match(info,/[0-9]+/)?"ok":"no found";}' ok #如果查找到數字則匹配成功返回 ok，否則失敗，返回未找到 ``` ### 截取? ``` awk 'BEGIN{info="this is a test2010test!";print substr(info,4,10);}' s is a tes #從第 4 個字符開始，截取 10 個長度字符串 ``` ### 分割? ``` awk 'BEGIN{info="this is a test";split(info,tA," ");print length(tA);for(k in tA){print k,tA[k];}}' 4 4 test 1 this 2 is 3 a #分割 info,動態創建數組 tA,awk for …in 循環，是一個無序的循環。并不是從數組下標 1…n 開始 ```