SRS性能(CPU)、內存優化工具用法 · 分布式服務器開發

## SRS性能(CPU)、內存優化工具用法 ## RTC RTC是UDP的協議，先設置`網卡隊列緩沖區`，下面命令是UDP分析常用的： ~~~bash # 查看UDP緩沖區長度，默認只有200KB左右。 sysctl net.core.rmem_max && sysctl net.core.rmem_default && sysctl net.core.wmem_max && sysctl net.core.wmem_default # 修改緩沖區長度為16MB sysctl net.core.rmem_max=16777216 sysctl net.core.rmem_default=16777216 sysctl net.core.wmem_max=16777216 sysctl net.core.wmem_default=16777216 ~~~ 也可以修改系統文件`/etc/sysctl.conf`，重啟也會生效： ~~~bash # vi /etc/sysctl.conf # For RTC net.core.rmem_max=16777216 net.core.rmem_default=16777216 net.core.wmem_max=16777216 net.core.wmem_default=16777216 ~~~ 查看接收和發送的丟包信息： ~~~bash # 查看丟包 netstat -suna # 查看30秒的丟包差 netstat -suna && sleep 30 && netstat -suna ~~~ > 實例說明： > `224911319 packets received`，這是接收到的總包數。 > `65731106 receive buffer errors`，接收的丟包，來不及處理就丟了。 > `123534411 packets sent`，這是發送的總包數。 > `0 send buffer errors`，這是發送的丟包。 > 備注：SRS的日志會打出UDP接收丟包和發送丟包，例如`loss=(r:49,s:0)`，意思是每秒有49個包來不及收，發送沒有丟包。查看接收和發送的長度： ~~~bash netstat -lpun ~~~ > 實例說明； > `Recv-Q 427008`，程序的接收隊列中的包數。Established: The count of bytes not copied by the user program connected to this socket. > `Send-Q 0`，程序的發送隊列中的包數目。Established: The count of bytes not acknowledged by the remote host. 下面是netstat的一些參數： > `--udp|-u` 篩選UDP協議。 > `--numeric|-n` 顯示數字IP或端口，而不是別名，比如http的數字是80. > `--statistics|-s` 顯示網卡的統計信息。 > `--all|-a` 顯示所有偵聽和非偵聽的。 > `--listening|-l` 只顯示偵聽的socket。 > `--program|-p` 顯示程序名稱，誰在用這個FD。 ## PERF PERF是Linux性能分析工具，參考\[PERF\](perf record -e block:block\_rq\_issue -ag)。可以實時看到當前的SRS熱點函數： ~~~dart perf top -p `ps aux|grep srs|grep conf|awk '{print $2}'` ~~~ 或者記錄一定時間的數據： ~~~bash perf record -p `ps aux|grep srs|grep conf|awk '{print $2}'` # 需要按CTRL+C取消record，然后執行下面的 perf report ~~~ 記錄堆棧，顯示調用圖： ~~~dart perf record -a --call-graph fp -p `ps aux|grep srs|grep conf|awk '{print $2}'` perf report --call-graph --stdio ~~~ > Note: 也可以打印到文件`perf report --call-graph --stdio >t.txt`。 > Remark: 由于ST的堆棧是不正常的，perf開啟`-g`后記錄的堆棧都是錯亂的，所以perf只能看SRS的熱點，不能看堆棧信息；如果需要看堆棧，請使用`GPERF: GCP`，參考下面的章節。 ## GPROF GPROF是個GNU的CPU性能分析工具。參考[SRS GPROF](https://links.jianshu.com/go?to=https%3A%2F%2Fgithub.com%2Fossrs%2Fsrs%2Fwiki%2Fv1_CN_GPROF)，以及[GNU GPROF](https://links.jianshu.com/go?to=http%3A%2F%2Fwww.cs.utah.edu%2Fdept%2Fold%2Ftexinfo%2Fas%2Fgprof.html)。 Usage: ~~~csharp # Build SRS with GPROF ./configure --with-gprof && make # Start SRS with GPROF ./objs/srs -c conf/console.conf # Or CTRL+C to stop GPROF killall -2 srs # To analysis result. gprof -b ./objs/srs gmon.out ~~~ ## GPERF GPERF是google tcmalloc提供的cpu和內存工具，參考[GPERF](https://links.jianshu.com/go?to=https%3A%2F%2Fgithub.com%2Fossrs%2Fsrs%2Fwiki%2Fv1_CN_GPERF)。 ### GPERF: GCP GCP是CPU性能分析工具，就是一般講的性能瓶頸，看哪個函數調用占用過多的CPU。參考[GCP](https://links.jianshu.com/go?to=http%3A%2F%2Fgoogle-perftools.googlecode.com%2Fsvn%2Ftrunk%2Fdoc%2Fcpuprofile.html)。 Usage: ~~~python # Build SRS with GCP ./configure --with-gperf --with-gcp && make # Start SRS with GCP ./objs/srs -c conf/console.conf # Or CTRL+C to stop GCP killall -2 srs # To analysis cpu profile ./objs/pprof --text objs/srs gperf.srs.gcp* ~~~ 圖形化展示，在CentOS上安裝dot： ~~~bash yum install -y graphviz ~~~ 然后生成svg圖片，可以用Chrome打開： ~~~bash ./objs/pprof --svg ./objs/srs gperf.srs.gcp >t.svg ~~~ ### GPERF: GMD GMD是GPERF提供的內存Defense工具，檢測內存越界和野指針。一般在越界寫入時，可能不會立刻導致破壞，而是在切換到其他線程使用被破壞的對象時才會發現破壞了，所以這種內存問題很難排查；GMD能在越界和野指針使用時直接core dump，定位在那個出問題的地方。參考[GMD](https://links.jianshu.com/go?to=http%3A%2F%2Fblog.csdn.net%2Fwin_lin%2Farticle%2Fdetails%2F50461709)。 Usage: ~~~tsx # Build SRS with GMD. ./configure --with-gperf --with-gmd && make # Start SRS with GMD. env TCMALLOC_PAGE_FENCE=1 ./objs/srs -c conf/console.conf ~~~ ### GPERF: GMC GMC是內存泄漏檢測工具，參考[GMC](https://links.jianshu.com/go?to=http%3A%2F%2Fgoogle-perftools.googlecode.com%2Fsvn%2Ftrunk%2Fdoc%2Fheap_checker.html)。 Usage: ~~~python # Build SRS with GMC ./configure --with-gperf --with-gmc && make # Start SRS with GMC env PPROF_PATH=./objs/pprof HEAPCHECK=normal ./objs/srs -c conf/console.conf 2>gmc.log # Or CTRL+C to stop gmc killall -2 srs # To analysis memory leak cat gmc.log ~~~ ### GPERF: GMP GMP是內存性能分析工具，譬如檢測是否有頻繁的申請和釋放堆內存導致的性能問題。參考[GMP](https://links.jianshu.com/go?to=http%3A%2F%2Fgoogle-perftools.googlecode.com%2Fsvn%2Ftrunk%2Fdoc%2Fheapprofile.html)。 Usage: ~~~python # Build SRS with GMP ./configure --with-gperf --with-gmp && make # Start SRS with GMP ./objs/srs -c conf/console.conf # Or CTRL+C to stop gmp killall -2 srs # To analysis memory profile ./objs/pprof --text objs/srs gperf.srs.gmp* ~~~ ## VALGRIND VALGRIND是大名鼎鼎的C分析工具，SRS3之后支持了。SRS3之前，因為使用了ST，需要給ST打PATCH才能用。 ~~~jsx valgrind --leak-check=full ./objs/srs -c conf/console.conf ~~~ > Remark: SRS3之前的版本，可以手動給ST打PATCH支持VALGRIND，參考[state-threads](https://links.jianshu.com/go?to=https%3A%2F%2Fgithub.com%2Fossrs%2Fstate-threads%23usage)，詳細的信息可以參考[ST#2](https://links.jianshu.com/go?to=https%3A%2F%2Fgithub.com%2Fossrs%2Fstate-threads%2Fissues%2F2)。 ## Syscall 系統調用的性能排查，參考[centos6的性能分析工具集合](https://links.jianshu.com/go?to=https%3A%2F%2Fblog.csdn.net%2Fwin_lin%2Farticle%2Fdetails%2F9377209) ## OSX 在OSX/Darwin/Mac系統，可以用Instruments，在xcode中選擇Open Develop Tools，就可以看到Instruments，也可以直接找這個程序，參考[Profiling c++ on mac os x](https://links.jianshu.com/go?to=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F11445619%2Fprofiling-c-on-mac-os-x) ~~~undefined instruments -l 30000 -t Time\ Profiler -p 72030 ~~~ > Remark: 也可以在Active Monitor中選擇進程，然后選擇Sample采樣。還有DTrace可以用，參考[動態追蹤技術（中） - Dtrace、SystemTap、火焰圖](https://links.jianshu.com/go?to=https%3A%2F%2Fwww.cnblogs.com%2Fwelhzh%2Fp%2F9221155.html)或者[淺談動態跟蹤技術之DTrace](https://www.jianshu.com/p/6acd36976fba)。 ## 多核和軟中斷多核時，一般網卡軟中斷在CPU0上，可以把SRS調度到其他CPU： ~~~bash taskset -p 0xfe `cat objs/srs.pid` ~~~ 或者，指定SRS運行在CPU1上： ~~~bash taskset -pc 1 `cat objs/srs.pid` ~~~ 調整后，可以運行`top`，然后按數字`1`，可以看到每個CPU的負載： ~~~bash top # 進入界面后按數字1 #%Cpu0 : 1.8 us, 1.1 sy, 0.0 ni, 90.8 id, 0.0 wa, 0.0 hi, 6.2 si, 0.0 st #%Cpu1 : 67.6 us, 17.6 sy, 0.0 ni, 14.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st ~~~ 或者使用`mpstat -P ALL`： ~~~bash mpstat -P ALL #01:23:14 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle #01:23:14 PM all 33.33 0.00 8.61 0.04 0.00 3.00 0.00 0.00 0.00 55.02 #01:23:14 PM 0 2.46 0.00 1.32 0.06 0.00 6.27 0.00 0.00 0.00 89.88 #01:23:14 PM 1 61.65 0.00 15.29 0.02 0.00 0.00 0.00 0.00 0.00 23.03 ~~~ > 可以使用命令`cat /proc/softirqs`，查看所有CPU的具體軟中斷類型，參考[Introduction to deferred interrupts (Softirq, Tasklets and Workqueues)](https://links.jianshu.com/go?to=https%3A%2F%2F0xax.gitbooks.io%2Flinux-insides%2Fcontent%2FInterrupts%2Flinux-interrupts-9.html)。 > 如果將SRS強制綁定在CPU0上，則會導致較高的`softirq`，這可能是進程和系統的軟中斷都在CPU0上，可以看到si也比分開的要高很多。如果是多CPU，比如4CPU，則網卡中斷可能會綁定到多個CPU，可以通過下面的命令，查看網卡中斷的綁定情況： ~~~bash # grep virtio /proc/interrupts | grep -e in -e out 29: 64580032 0 0 0 PCI-MSI-edge virtio0-input.0 30: 1 49 0 0 PCI-MSI-edge virtio0-output.0 31: 48663403 0 11845792 0 PCI-MSI-edge virtio0-input.1 32: 1 0 0 52 PCI-MSI-edge virtio0-output.1 # cat /proc/irq/29/smp_affinity 1 # 意思是virtio0的接收，綁定到CPU0 # cat /proc/irq/30/smp_affinity 2 # 意思是virtio0的發送，綁定到CPU1 # cat /proc/irq/31/smp_affinity 4 # 意思是virtio1的接收，綁定到CPU2 # cat /proc/irq/32/smp_affinity 8 # 意思是virtio1的發送，綁定到CPU3 ~~~ 我們可以強制將網卡軟中斷綁定到CPU0，參考[Linux: scaling softirq among many CPU cores](https://links.jianshu.com/go?to=http%3A%2F%2Fnatsys-lab.blogspot.com%2F2012%2F09%2Flinux-scaling-softirq-among-many-cpu.html)和[SMP IRQ affinity](https://links.jianshu.com/go?to=https%3A%2F%2Fwww.kernel.org%2Fdoc%2FDocumentation%2FIRQ-affinity.txt)： ~~~bash for irq in $(grep virtio /proc/interrupts | grep -e in -e out | cut -d: -f1); do echo 1 > /proc/irq/$irq/smp_affinity done ~~~ > Note：如果要綁定到`CPU 0-1`，執行`echo 3 > /proc/irq/$irq/smp_affinity` 然后將SRS所有線程，綁定到CPU0之外的CPU： ~~~bash taskset -a -p 0xfe $(cat objs/srs.pid) ~~~ ## 進程優先級可以設置SRS為更高的優先級，可以獲取更多的CPU時間： ~~~bash renice -n -15 -p `cat objs/srs.pid` ~~~ > 說明：nice的值從`-20`到`19`，默認是`0`，一般ECS的優先的進程是`-10`，所以這里設置為`-15`。可以從ps中，看到進程的nice，也就是`NI`字段： ~~~bash top -n1 -p `cat objs/srs.pid` # PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND # 1505 root 5 -15 519920 421556 4376 S 66.7 5.3 4:41.12 srs ~~~