文本之grep · linux&shell

在linux中經常需要對文本或輸出內容進行過濾，最常用的過濾命令是`grep` ~~~ grep?[OPTIONS]?PATTERN?[FILE...] ~~~ `grep`按行檢索輸入的每一行，如果輸入行包含模式`PATTERN`，則輸出這一行。這里的`PATTERN`是正則表達式(參考[前一篇](https://segmentfault.com/a/1190000007405687)，本文將結合grep一同舉例)。輸出文件`/etc/passwd`中包含`root`的行： ~~~ [root@centos7 temp]# grep root /etc/passwd root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin ~~~ 或者從標準輸入獲得： ~~~ [root@centos7 temp]# cat /etc/passwd | grep root root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin ~~~ 需要注意的地方是：當grep的輸入既來自文件也來自標準輸入時，grep將忽略標準輸入的內容不做處理，除非使用符號`-`來代表標準輸入： ~~~ [root@centos7 temp]# cat /etc/passwd | grep root /etc/passwd - /etc/passwd:root:x:0:0:root:/root:/bin/bash /etc/passwd:operator:x:11:0:operator:/root:/sbin/nologin (標準輸入):root:x:0:0:root:/root:/bin/bash (標準輸入):operator:x:11:0:operator:/root:/sbin/nologin ~~~ 此時，grep會標明哪些結果來自于文件哪些來自于標準輸入。輸出文件/etc/passwd和文件/etc/group中以root開頭的行： ~~~ [root@centos7 temp]# grep "^root" /etc/passwd /etc/group /etc/passwd:root:x:0:0:root:/root:/bin/bash /etc/group:root:x:0: ~~~ 輸出文件/etc/passwd中以/bin/bash結尾的行： ~~~ [root@centos7 temp]# grep "/bin/bash$" /etc/passwd root:x:0:0:root:/root:/bin/bash learner:x:1000:1000::/home/learner:/bin/bash ~~~ 注意以上兩個例子中`PATTERN`被雙引號引用起來以防止被shell解析。輸出文件/etc/passwd中不以a-s中任何一個字母開頭的行： ~~~ [root@centos7 temp]# grep "^[^a-s]" /etc/passwd tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin tcpdump:x:72:72::/:/sbin/nologin ~~~ 這里需要理解兩個`^`間不同的含義，第一個`^`表示行首，第二個在`[]`內部的首個字符`^`表示取反。輸出文件/etc/passwd中字符`0`連續出現3次及以上的行(注意轉義字符'\')： ~~~ [root@centos7 temp]# grep "0\{3,\}" /etc/passwd learner:x:1000:1000::/home/learner:/bin/bash ~~~ 如輸出文件/etc/passwd中以字符`r`或`l`開頭的行： ~~~ [root@centos7 temp]# grep "^[r,l]" /etc/passwd root:x:0:0:root:/root:/bin/bash lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin learner:x:1000:1000::/home/learner:/bin/bash ~~~ 選項`-i`使grep在匹配模式時忽略大小寫： ~~~ [root@centos7 temp]# grep -i abcd file ABCD function abcd() { [root@centos7 temp]# ~~~ 選項`-o`表示只輸出匹配的字符，而不是整行： ~~~ [root@centos7 temp]# grep -oi abcd file ABCD abcd [root@centos7 temp]# ~~~ 選項`-c`統計匹配的行數： ~~~ [root@centos7 temp]# grep -oic abcd file 2 [root@centos7 temp]# ~~~ 選項`-v`表示取反匹配，如輸出/etc/passwd中不以/sbin/nologin結尾的行： ~~~ [root@centos7 temp]# grep -v "/sbin/nologin$" /etc/passwd root:x:0:0:root:/root:/bin/bash sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt learner:x:1000:1000::/home/learner:/bin/bash ~~~ 選項`-f FILE`表示以文件FILE中的每一行作為模式匹配： ~~~ [root@centos7 temp]# cat test abcd ABCD [root@centos7 temp]# grep -f test file ABCD function abcd() { [root@centos7 temp]# ~~~ 選項`-x`表示整行匹配： ~~~ [root@centos7 temp]# grep -xf test file ABCD [root@centos7 temp]# ~~~ 選項`-w`表示匹配整個單詞： ~~~ [root@centos7 temp]# grep here file here there [root@centos7 temp]# grep -w here file here [root@centos7 temp]# ~~~ 選項`-h`表示當多個文件時不輸出文件名： ~~~ [root@centos7 temp]# cat /etc/passwd|grep ^root - /etc/passwd -h root:x:0:0:root:/root:/bin/bash root:x:0:0:root:/root:/bin/bash ~~~ 選項`-n`表示顯示行號： ~~~ [root@centos7 temp]# grep -n "^[r,l]" /etc/passwd 1:root:x:0:0:root:/root:/bin/bash 5:lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin 24:learner:x:1000:1000::/home/learner:/bin/bash ~~~ 選項`-A N`、`-B N`、`-C N`表示輸出匹配行和其'周圍行' ~~~ -A N 表示輸出匹配行和其之后(after)的N行 -B N 表示輸出匹配行和其之前(before)的N行 -C N 表示輸出匹配行和其之前之后各N行 [root@centos7 temp]# grep -A 2 ^operator /etc/passwd operator:x:11:0:operator:/root:/sbin/nologin games:x:12:100:games:/usr/games:/sbin/nologin ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin [root@centos7 temp]# grep -B2 ^operator /etc/passwd halt:x:7:0:halt:/sbin:/sbin/halt mail:x:8:12:mail:/var/spool/mail:/sbin/nologin operator:x:11:0:operator:/root:/sbin/nologin [root@centos7 temp]# grep -C1 ^operator /etc/passwd mail:x:8:12:mail:/var/spool/mail:/sbin/nologin operator:x:11:0:operator:/root:/sbin/nologin games:x:12:100:games:/usr/games:/sbin/nologin ~~~ 選項`-F`視`PATTERN`為它的字面意思匹配(忽略字符的特殊含義)，等同于執行命令`fgrep`： ~~~ [root@centos7 temp]# grep -F ^root /etc/passwd [root@centos7 temp]# ~~~ 命令無輸出選項`-E`可以使用擴展的正則表達式，如同執行`egrep`命令： ~~~ [root@centos7 temp]# egrep "^root|^learner" /etc/passwd root:x:0:0:root:/root:/bin/bash learner:x:1000:1000::/home/learner:/bin/bash ~~~ 使用擴展正則表達式意味著不需要轉義就能表示字符的特殊含義，包括`?`,`+`,`{`,`|`,`(`和`)`。選項`-P`表示使用perl的正則表達式進行匹配如： ~~~ [root@centos7 ~]# echo "helloworld123456"| grep -oP "\d+" 123456 [root@centos7 ~]# ~~~ perl正則中"\d"表示數字，`+`表示匹配一到多次(同vim)。選項`-a`將二進制文件當成文本文件處理： ~~~ [root@centos7 ~]# grep -a online /usr/bin/ls %s online help: <%s> [root@centos7 ~]# ~~~ 選項`--exclude=GLOB`和`--include=GLOB`分別表示排除和包含匹配GLOB的文件，GLOB表示通配符(find及xargs用法見[基礎命令介紹三](https://segmentfault.com/a/1190000007354176))： ~~~ [root@centos7 temp]# find . -type f | xargs grep --exclude=*.txt --include=test* bash ./test.sh:#!/bin/bash [root@centos7 temp]# ~~~ `grep`強大的過濾能力來自于各種選項以及正則表達式的配合，在今后的文章中還有更多的例子。