## 簡介
本文以工業界測試模型TPC-C為測試模型,介紹PostgreSQL數據庫從系統層面的優化到數據庫層面的優化方法。
TPmC從?256195.32?提升到?606466.31?是如何做到的。
## 測試環境介紹
* 16核開HT共32線程
* 256G 1600MHz 內存
* 萬兆網卡
* 3 塊 6.4TB AliFlash PCI-E SSD
* 邏輯卷條帶
* XFS
* 數據塊對齊
## XFS文件系統優化
主要分3塊:
1. 邏輯卷優化部分
2. XFS mkfs 優化部分
3. XFS mount 優化部分
以上幾個部分都可以通過man手冊查看,了解原理和應用場景后著手優化。
~~~
man lvcreate
man xfs
man mkfs.xfs
man mount
~~~
### 邏輯卷優化部分
對于不在lvm列表的設備,可能要先修改lvm.conf,添加設備號才行。否則不能創建PV。
~~~
# cat /proc/devices
252 shannon
[root@localhost ~]# vi /etc/lvm/lvm.conf
# types = [ "fd", 16 ]
types = [ "shannon", 252 ]
~~~
1. 創建PV前,將塊設備對齊(對齊的目的是避免雙寫,因為SSD有最小寫入單元,如果沒有對齊,可能出現SSD寫多個塊),前面1MB最好不要分配,從2048 sector開始分配。(使用pvcreate的–dataalignment參數也可以達到同樣的目的。)
~~~
fdisk -c -u /dev/dfa
start 2048
end + (2048*n) - 1
~~~
或者使用parted創建分區。
LVM的[layout](https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Logical_Volume_Manager_Administration/LVM_components.html#pv_illustration),創建PV時,也需要對齊DATA的數據。
從4MB處開始分配DATA EXTENSION:
`# pvcreate --dataalignment 4M /dev/sdc`
1st PE 即數據開始位置。
~~~
[root@digoal ~]# pvs -o+pe_start
PV VG Fmt Attr PSize PFree 1st PE
/dev/sda2 centos lvm2 a-- 19.51g 40.00m 1.00m
/dev/sdc lvm2 --- 20.00g 20.00g 4.00m
~~~
列出所有可以查看的flag
`pvs -o+`
2. 創建lv主要指定2個參數,條帶數量,和pv數量一致即可,如果PV本身是一個RAID設備,根據RAID的塊設備個數來定條帶數。
例如RAID5 5塊盤,去除1個校驗數據,取4作為條帶數。RAID10 10塊盤,取5作為條帶數。RAID0 10塊盤,取10作為條帶數。
~~~
-i, --stripes Stripes
Gives the number of stripes. This is equal to the number of physical volumes to scatter the logical volume.
~~~
條帶大小,和數據庫塊大小一致,例如postgresql默認為 8KB。
~~~
-I, --stripesize StripeSize
Gives the number of kilobytes for the granularity of the stripes.
StripeSize must be 2^n (n = 2 to 9) for metadata in LVM1 format. For metadata in LVM2 format, the stripe size may be a larger power of 2 but must not exceed the physical extent size.
~~~
創建快照時,指定的參數chunksize, 最好和數據庫的塊大小一致, 例如postgresql默認為 8KB。
~~~
-c, --chunksize ChunkSize
Power of 2 chunk size for the snapshot logical volume between 4k and 512k.
~~~
例如:
預留2GB給xfs的LOG DEV
~~~
#lvcreate -i 3 -I 8 -n lv02 -L 2G vgdata01
Logical volume "lv02" created
#lvcreate -i 3 -I 8 -n lv01 -l 100%FREE vgdata01
Logical volume "lv01" created
#lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
lv01 vgdata01 -wi-a- 17.29t
lv02 vgdata01 -wi-a- 2g
~~~
### XFS mkfs 優化部分
#### 首先要搞清楚XFS的layout
xfs包含3個section,data、log、realtime files。默認情況下 log存在data里面,沒有realtime。所有的section都是由最小單位block組成,初始化xfs是-b指定block size。
1. data
包含 metadata(inode, 目錄, 間接塊), user file data, non-realtime files
data被拆分成多個allocation group,mkfs.xfs時可以指定group的個數,以及單個group的SIZE。
group越多,可以并行進行的文件和塊的allocation就越多。你可以認為單個組的操作是串行的,多個組是并行的。但是組越多,消耗的CPU會越多,需要權衡。對于并發寫很高的場景,可以多一些組(例如一臺主機跑了很多小的數據庫,每個數據庫都很繁忙的場景下)
2. log
存儲metadata的log,修改metadata前,必須先記錄log,然后才能修改data section中的metadata。也用于crash后的恢復。
3. realtime
被劃分為很多個小的extents, 要將文件寫入到 realtime section中,必須使用xfsctl改一下文件描述符的bit位,并且一定要在數據寫入前完成。在realtime中的文件大小是realtime extents的倍數關系。
#### mkfs.xfs優化
1. data section
allocation group count數量和AGSIZE相乘等于塊設備大小。
AG count數量多少和用戶需求的并行度相關。
同時AG SIZE的取值范圍是16M到1TB,PostgreSQL 建議1GB左右。
-b size=8192 與數據庫塊大小一致 (但不是所有的xfs版本都支持大于4K的block size,所以如果你發現mount失敗并且告知只支持4K以下的BLOCK,那么請重新格式化)
-d agcount=9000,sunit=16,swidth=48
假設有9000個并發寫操作,使用9000個allocation groups
(單位512 bytes) 與lvm或RAID塊設備的條帶大小對齊
與lvm或RAID塊設備條帶跨度大小對齊,以上對應3*8 例如 -i 3 -I 8。
2. log section
最好放在SSD上,速度越快越好。最好不要使用cgroup限制LOG塊設備的iops操作。
3. realtime section
不需要的話,不需要創建。
agsize絕對不能是條帶寬度的倍數。(假設條帶數為3,條帶大小為8K,則寬度為24K)
如果根據指定agcount算出的agsize是swidth的倍數,會彈出警告:
例如下面的例子,
agsize=156234 blks 是 swidth=6 blks 的倍數 26039。
給出的建議是減掉一個stripe unit即8K,即156234 blks - sunit 2 blks = 156232 blks。
156232 blks換算成字節數= 156232*4096 = 639926272 bytes 或 156232*4 = 624928K
~~~
#mkfs.xfs -f -b size=4096 -l logdev=/dev/mapper/vgdata01-lv01,size=2136997888,sunit=16 -d agcount=30000,sunit=16,swidth=48 /dev/mapper/vgdata01-lv02
Warning: AG size is a multiple of stripe width. This can cause performance
problems by aligning all AGs on the same disk. To avoid this, run mkfs with
an AG size that is one stripe unit smaller, for example 156232.
meta-data=/dev/mapper/vgdata01-lv02 isize=256 agcount=30000, agsize=156234 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=4686971904, imaxpct=5
= sunit=2 swidth=6 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =/dev/mapper/vgdata01-lv01 bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=2 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
~~~
對于上面這個mkfs.xfs操作,改成以下
~~~
#mkfs.xfs -f -b size=4096 -l logdev=/dev/mapper/vgdata01-lv01,size=2136997888,sunit=16 -d agsize=639926272,sunit=16,swidth=48 /dev/mapper/vgdata01-lv02
~~~
或
~~~
#mkfs.xfs -f -b size=4096 -l logdev=/dev/mapper/vgdata01-lv01,size=2136997888,sunit=16 -d agsize=624928k,sunit=16,swidth=48 /dev/mapper/vgdata01-lv02
~~~
輸出如下
~~~
meta-data=/dev/mapper/vgdata01-lv02 isize=256 agcount=30001, agsize=156232 blks (約600MB)
= sectsz=4096 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=4686971904, imaxpct=5
= sunit=2 swidth=6 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =/dev/mapper/vgdata01-lv01 bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=2 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
~~~
### XFS mount 優化部分
nobarrier
largeio 針對數據倉庫,流媒體這種大量連續讀的應用
nolargeio 針對OLTP
logbsize=262144 指定 log buffer
logdev= 指定log section對應的塊設備,用最快的SSD。
noatime,nodiratime
swalloc 條帶對齊
allocsize=16M delayed allocation writeout的buffer io大小
inode64 Indicates that XFS is allowed to create inodes at any location in the filesystem
#### mount.xfs 例子
~~~
#mount -t xfs -o allocsize=16M,inode64,nobarrier,nolargeio,logbsize=262144,noatime,nodiratime,swalloc,logdev=/dev/mapper/vgdata01-lv02 /dev/mapper/vgdata01-lv01 /data01
~~~
### xfsctl 優化部分
略
### 排錯
~~~
#mount -o noatime,swalloc /dev/mapper/vgdata01-lv01 /data01
mount: Function not implemented
~~~
原因是用了不支持的塊大小
~~~
[ 5736.642924] XFS (dm-0): File system with blocksize 8192 bytes. Only pagesize (4096) or less will currently work.
[ 5736.695146] XFS (dm-0): SB validate failed with error -38.
~~~
問題排除
~~~
# mkfs.xfs -f -b size=4096 -l logdev=/dev/mapper/vgdata01-lv02,size=2136997888,sunit=16 -d agcount=9000,sunit=16,swidth=48 /dev/mapper/vgdata01-lv01
meta-data=/dev/mapper/vgdata01-lv01 isize=256 agcount=9000, agsize=515626 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=4640621568, imaxpct=5
= sunit=2 swidth=6 blks
naming =version 2 bsize=4096 ascii-ci=0
log =/dev/mapper/vgdata01-lv02 bsize=4096 blocks=521728, version=2
= sectsz=512 sunit=2 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
~~~
mount時指定logdev
~~~
#mount -t xfs -o allocsize=16M,inode64,nobarrier,nolargeio,logbsize=262144,noatime,nodiratime,swalloc,logdev=/dev/mapper/vgdata01-lv02 /dev/mapper/vgdata01-lv01 /data01
~~~
## 安裝benchmarksql
[http://sourceforge.net/projects/benchmarksql/](http://sourceforge.net/projects/benchmarksql/)
下載安裝 JDK7
~~~
http://www.oracle.com/technetwork/cn/java/javase/downloads/jdk7-downloads-1880260.html
wget http://download.oracle.com/otn-pub/java/jdk/7u79-b15/jdk-7u79-linux-x64.rpm
rpm -ivh jdk-7u79-linux-x64.rpm
~~~
檢查包安裝位置(使用rpm安裝時也可以直接指定位置)
~~~
rpm -ql jdk
...
/usr/java/jdk1.7.0_79/bin/java
...
~~~
配置JAVA環境變量
~~~
$ export JAVA_HOME=/usr/java/jdk1.7.0_79
$ export PATH=$JAVA_HOME/bin:$PATH
$ export CLASSPATH=.:$CLASSPATH
~~~
下載最新java版本對應的postgresql jdbc jar
~~~
wget https://jdbc.postgresql.org/download/postgresql-9.4.1207.jre7.jar
mv postgresql-9.4.1207.jre7.jar benchmarksql-4.1.0/lib/
~~~
配置benchmarksql,使用新的postgresql java驅動
~~~
$ vi runBenchmark.sh
java -cp .:../lib/postgresql-9.4.1207.jre7.jar:../lib/log4j-1.2.17.jar:../lib/apache-log4j-extras-1.1.jar:../dist/BenchmarkSQL-4.1.jar -Dprop=$1 jTPCC
$ vi runLoader.sh
java -cp .:../lib/postgresql-9.4.1207.jre7.jar:../dist/BenchmarkSQL-4.1.jar -Dprop=$1 LoadData $2 $3 $4 $5
$ vi runSQL.sh
myCP="../lib/postgresql-9.4.1207.jre7.jar"
myCP="$myCP:../dist/BenchmarkSQL-4.1.jar"
myOPTS="-Dprop=$1"
myOPTS="$myOPTS -DcommandFile=$2"
java -cp .:$myCP $myOPTS ExecJDBC
~~~
修改log4j,減少日志打印量。priority改成info,只輸出最終結果,不輸出產生訂單的日志。
~~~
$ vi log4j.xml
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration xmlns:log4j='http://jakarta.apache.org/log4j/'>
<appender name="console" class="org.apache.log4j.ConsoleAppender">
<param name="Threshold" value="info"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d %5p - %m%n"/>
</layout>
</appender>
<appender name="R" class="org.apache.log4j.rolling.RollingFileAppender">
<param name="Append" value="True" />
<rollingPolicy class="org.apache.log4j.rolling.TimeBasedRollingPolicy">
<param name="FileNamePattern" value="log/archive/benchmarksql.%d{yyyyMMddHHmm}.log"/>
<param name="ActiveFileName" value="log/benchmarksql.log"/>
</rollingPolicy>
<triggeringPolicy class="org.apache.log4j.rolling.SizeBasedTriggeringPolicy">
<param name="MaxFileSize" value="1"/>
</triggeringPolicy>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%5p\t[%d{yyyy-MM-dd HH:mm:ss.SSS}]\t%t \t%m%n"/>
</layout>
<filter class="org.apache.log4j.filter.StringMatchFilter">
<param name="StringToMatch" value ="\n" />
<param name="AcceptOnMatch" value="false" />
</filter>
</appender>
<appender name="E" class="org.apache.log4j.rolling.RollingFileAppender">
<param name="Append" value="True" />
<param name="Threshold" value="warn"/>
<rollingPolicy class="org.apache.log4j.rolling.TimeBasedRollingPolicy">
<param name="FileNamePattern" value="log/BenchmarkSQLError.%d.log"/>
<param name="ActiveFileName" value="log/BenchmarkSQLError.log"/>
</rollingPolicy>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%5p\t[%d{yyyy-MM-dd HH:mm:ss.SSS}]\t%t \t%m%n"/>
</layout>
</appender>
<root>
<priority value="info"/>
<appender-ref ref="R"/>
<appender-ref ref="E"/>
</root>
</log4j:configuration>
~~~
## 系統配置優化
~~~
內核配置
/etc/grub.conf
numa=off
elevator=deadline
編譯器版本
gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC)
/etc/sysctl.conf
vm.swappiness = 0
kernel.shmmax=135497418752
net.core.rmem_max = 4194304
net.core.wmem_max = 4194304
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.ipv4.ip_local_port_range = 9000 65535
kernel.sem = 50100 64128000 50100 1280
vm.dirty_background_bytes = 102400000
vm.dirty_ratio = 80
vm.nr_hugepages = 102352
/etc/security/limits.conf
* soft nofile 655360
* hard nofile 655360
* soft nproc 655360
* hard nproc 655360
* soft stack unlimited
* hard stack unlimited
* soft memlock 250000000
* hard memlock 250000000
塊設備預讀
blockdev --setra 16384 /dev/dfa
blockdev --setra 16384 /dev/dfb
blockdev --setra 16384 /dev/dfc
blockdev --setra 16384 /dev/dm-0
~~~
## 安裝PostgreSQL
PostgreSQL編譯項
~~~
./configure --prefix=/u02/digoal/soft_bak/pgsql9.5 --with-blocksize=8 --with-pgport=1921 --with-perl --with-python --with-tcl --with-openssl --with-pam --with-ldap --with-libxml --with-libxslt --enable-thread-safety
gmake world -j32
gmake install-world -j32
~~~
配置postgres環境變量
~~~
$ vi env_pg.sh
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=1921
export PGDATA=/data01/pgdata/pg_root
export LANG=en_US.utf8
export PGHOME=/u02/digoal/soft_bak/pgsql9.5
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=$PGDATA
export PGDATABASE=postgres
export PGUSER=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi
~~~
配置postgresql.conf
~~~
$ vi $PGDATA/postgresql.conf
port = 1921 # (change requires restart)
max_connections = 300 # (change requires restart)
unix_socket_directories = '.' # comma-separated list of directories
shared_buffers = 32GB # min 128kB
huge_pages = try # on, off, or try
maintenance_work_mem = 2GB # min 1MB
dynamic_shared_memory_type = posix # the default is the first option
bgwriter_delay = 10ms # 10-10000ms between rounds
wal_level = minimal # minimal, archive, hot_standby, or logical
synchronous_commit = off # synchronization level;
full_page_writes = off # recover from partial page writes, 有備份和歸檔就可以關閉它, crash后從備份恢復, 放partial write
wal_buffers = 16MB # min 32kB, -1 sets based on shared_buffers
wal_writer_delay = 10ms # 1-10000 milliseconds
max_wal_size = 32GB
effective_cache_size = 240GB
log_destination = 'csvlog' # Valid values are combinations of
logging_collector = on # Enable capturing of stderr and csvlog
log_truncate_on_rotation = on # If on, an existing log file with the
~~~
### 編輯benchmarksql連接配置和壓測配置
1000 個倉庫,約5億數據量。
~~~
$ vi props.pg
driver=org.postgresql.Driver
conn=jdbc:postgresql://localhost:1921/postgres
user=postgres
password=123
warehouses=1000
terminals=96
//To run specified transactions per terminal- runMins must equal zero
runTxnsPerTerminal=0
//To run for specified minutes- runTxnsPerTerminal must equal zero
runMins=1
//Number of total transactions per minute
limitTxnsPerMin=0
//The following five values must add up to 100
//The default percentages of 45, 43, 4, 4 & 4 match the TPC-C spec
newOrderWeight=40
paymentWeight=36
orderStatusWeight=8
deliveryWeight=8
stockLevelWeight=8
~~~
### 生成測試數據
配置postgres用戶默認搜索路徑
~~~
$ psql
psql (9.5.0)
Type "help" for help.
postgres=# alter role postgres set search_path='benchmarksql','public';
~~~
創建用于存放生成CSV的目錄
~~~
$ mkdir /u02/digoal/soft_bak/benchcsv
~~~
修改benchmarksql sqlTableCopies,指定目錄
~~~
$ vi sqlTableCopies
copy benchmarksql.warehouse
(w_id, w_ytd, w_tax, w_name, w_street_1, w_street_2, w_city, w_state, w_zip)
from '/u02/digoal/soft_bak/benchcsv/warehouse.csv' WITH CSV;
copy benchmarksql.item
(i_id, i_name, i_price, i_data, i_im_id)
from '/u02/digoal/soft_bak/benchcsv/item.csv' WITH CSV;
copy benchmarksql.stock
(s_i_id, s_w_id, s_quantity, s_ytd, s_order_cnt, s_remote_cnt, s_data,
s_dist_01, s_dist_02, s_dist_03, s_dist_04, s_dist_05,
s_dist_06, s_dist_07, s_dist_08, s_dist_09, s_dist_10)
from '/u02/digoal/soft_bak/benchcsv/stock.csv' WITH CSV;
copy benchmarksql.district
(d_id, d_w_id, d_ytd, d_tax, d_next_o_id, d_name, d_street_1,
d_street_2, d_city, d_state, d_zip)
from '/u02/digoal/soft_bak/benchcsv/district.csv' WITH CSV;
copy benchmarksql.customer
(c_id, c_d_id, c_w_id, c_discount, c_credit, c_last, c_first, c_credit_lim,
c_balance, c_ytd_payment, c_payment_cnt, c_delivery_cnt, c_street_1,
c_street_2, c_city, c_state, c_zip, c_phone, c_since, c_middle, c_data)
from '/u02/digoal/soft_bak/benchcsv/customer.csv' WITH CSV;
copy benchmarksql.history
(hist_id, h_c_id, h_c_d_id, h_c_w_id, h_d_id, h_w_id, h_date, h_amount, h_data)
from '/u02/digoal/soft_bak/benchcsv/cust-hist.csv' WITH CSV;
copy benchmarksql.oorder
(o_id, o_w_id, o_d_id, o_c_id, o_carrier_id, o_ol_cnt, o_all_local, o_entry_d)
from '/u02/digoal/soft_bak/benchcsv/order.csv' WITH CSV;
copy benchmarksql.order_line
(ol_w_id, ol_d_id, ol_o_id, ol_number, ol_i_id, ol_delivery_d,
ol_amount, ol_supply_w_id, ol_quantity, ol_dist_info)
from '/u02/digoal/soft_bak/benchcsv/order-line.csv' WITH CSV;
copy benchmarksql.new_order
(no_w_id, no_d_id, no_o_id)
from '/u02/digoal/soft_bak/benchcsv/new-order.csv' WITH CSV;
~~~
建立表結構
~~~
$ cd benchmarksql-4.1.0/run
$ ./runSQL.sh props.pg sqlTableCreates
~~~
生成CSV
~~~
$ ./runLoader.sh props.pg numWarehouses 1000 fileLocation /u02/digoal/soft_bak/benchcsv/
~~~
1000個倉庫的數據量69 GB:
~~~
total 69G
-rw-r--r-- 1 digoal users 2.0G Jan 9 15:53 cust-hist.csv
-rw-r--r-- 1 digoal users 16G Jan 9 15:53 customer.csv
-rw-r--r-- 1 digoal users 898K Jan 9 15:12 district.csv
-rw-r--r-- 1 digoal users 7.0M Jan 9 14:22 item.csv
-rw-r--r-- 1 digoal users 95M Jan 9 16:14 new-order.csv
-rw-r--r-- 1 digoal users 1.3G Jan 9 16:14 order.csv
-rw-r--r-- 1 digoal users 22G Jan 9 16:14 order-line.csv
-rw-r--r-- 1 digoal users 28G Jan 9 15:12 stock.csv
-rw-r--r-- 1 digoal users 84K Jan 9 14:22 warehouse.csv
~~~
導入數據庫
~~~
$ ./runSQL.sh props.pg sqlTableCopies
~~~
創建約束和索引
~~~
$ ./runSQL.sh props.pg sqlIndexCreates
~~~
備份
~~~
$ pg_dump -f /u02/digoal/soft_bak/benchmarksql.dmp -F c -n benchmarksql postgres
~~~
## 階段1 TPC-C 壓測
~~~
nohup ./runBenchmark.sh props.pg >/dev/null 2>./errrun.log &
~~~
測試結果
~~~
INFO [2016-01-09 22:03:39.963] Thread-7 Term-00, Measured tpmC (NewOrders) = 102494.46
INFO [2016-01-09 22:03:39.963] Thread-7 Term-00, Measured tpmTOTAL = 256195.32
INFO [2016-01-09 22:03:39.964] Thread-7 Term-00, Session Start = 2016-01-09 21:53:39
INFO [2016-01-09 22:03:39.964] Thread-7 Term-00, Session End = 2016-01-09 22:03:39
INFO [2016-01-09 22:03:39.964] Thread-7 Term-00, Transaction Count = 2563088
~~~
主機信息,截取壓測第9分鐘的top, iostat數據。
~~~
TOP
top - 22:02:09 up 3 days, 12:55, 3 users, load average: 19.23, 15.97, 8.37
Tasks: 619 total, 10 running, 609 sleeping, 0 stopped, 0 zombie
Cpu(s): 35.0%us, 9.4%sy, 0.0%ni, 52.6%id, 0.1%wa, 0.0%hi, 2.9%si, 0.0%st
Mem: 264643396k total, 241719372k used, 22924024k free, 36672k buffers
Swap: 18825200k total, 0k used, 18825200k free, 196557376k cached
iostat -x
avg-cpu: %user %nice %system %iowait %steal %idle
35.07 0.00 12.30 0.12 0.00 52.51
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
dfa 0.00 0.00 57.40 743.40 918.40 11849.00 15.94 0.02 0.03 0.03 2.08
dfb 0.00 0.00 57.20 740.40 915.20 11829.00 15.98 0.02 0.03 0.03 2.04
dfc 0.00 0.00 58.40 730.80 934.40 11675.80 15.98 0.03 0.03 0.03 2.52
dm-0 0.00 0.00 173.00 2213.20 2768.00 35331.40 15.97 0.08 0.03 0.03 7.02
~~~
PostgreSQL可以使用oprofile或perf top跟蹤統計,可以參考[這里](http://blog.163.com/digoal@126/blog/static/163877040201549115140794/),找到需要優化的代碼就靠它了。
~~~
CPU: Intel Ivy Bridge microarchitecture, speed 2600 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
vma samples % app name symbol name
007a7780 751274 5.1565 /soft/digoal/soft_bak/pgsql9.5/bin/postgres hash_search_with_hash_value
004a92f0 574315 3.9419 /soft/digoal/soft_bak/pgsql9.5/bin/postgres _bt_compare
006a4bd0 514473 3.5312 /soft/digoal/soft_bak/pgsql9.5/bin/postgres LWLockAcquire
0078a090 510962 3.5071 /soft/digoal/soft_bak/pgsql9.5/bin/postgres SearchCatCache
007bc3a0 484601 3.3262 /soft/digoal/soft_bak/pgsql9.5/bin/postgres AllocSetAlloc
006969c0 442341 3.0361 /soft/digoal/soft_bak/pgsql9.5/bin/postgres GetSnapshotData
00498930 352134 2.4170 /soft/digoal/soft_bak/pgsql9.5/bin/postgres heap_hot_search_buffer
005b8f70 279718 1.9199 /soft/digoal/soft_bak/pgsql9.5/bin/postgres ExecInitExpr
006895d0 249377 1.7117 /soft/digoal/soft_bak/pgsql9.5/bin/postgres PinBuffer
006a4220 168770 1.1584 /soft/digoal/soft_bak/pgsql9.5/bin/postgres LWLockRelease
007ac620 161861 1.1110 /soft/digoal/soft_bak/pgsql9.5/bin/postgres pg_encoding_mbcliplen
007a2180 161090 1.1057 /soft/digoal/soft_bak/pgsql9.5/bin/postgres FunctionCall2Coll
004aaa80 153079 1.0507 /soft/digoal/soft_bak/pgsql9.5/bin/postgres _bt_checkkeys
007a3950 147078 1.0095 /soft/digoal/soft_bak/pgsql9.5/bin/postgres fmgr_info_cxt_security
0049bce0 136680 0.9381 /soft/digoal/soft_bak/pgsql9.5/bin/postgres heap_page_prune_opt
0048c8f0 130807 0.8978 /soft/digoal/soft_bak/pgsql9.5/bin/postgres hash_any
006b2e50 130564 0.8962 /soft/digoal/soft_bak/pgsql9.5/bin/postgres PostgresMain
0046c790 121776 0.8358 /soft/digoal/soft_bak/pgsql9.5/bin/postgres slot_deform_tuple
......
~~~
### 階段1 PostgreSQL 9.5.0 TPmC : 256195.32
### 階段1 性能瓶頸分析
從系統統計信息可以分析得到, 系統還有大量空閑CPU,IO資源,所以極限性能應該不止于此。
## 階段2 TPC-C 優化
benchmarksql放到另一臺主機,主機間萬兆網同一交換機下互聯。
為了突破測試程序的極限,開4個schema,每個schema負責1000個倉庫,數據量總共20億左右,入庫后的數據總量約400GB。每個測試程序對付一個schema。終端數保持一致,每個測試程序開24個終端,一共96個終端。
### 讓benchmarksql支持多個schema
benchmarksql 默認編譯好的,還有配置都是用的benchmarksql 這個schema,如果我們想對一個數據庫用多個schema來壓性能,就需要開多個benchmarksql終端來壓。這里就涉及到benchmarksql需要支持多個schema,每個benchmarksql連一個schema。
目錄結構:
~~~
drwxr-xr-x 2 digoal users 4096 Jan 10 13:24 build
-rwxr-xr-x 1 digoal users 1112 Jan 10 13:24 build.xml
drwxr-xr-x 2 digoal users 4096 Jan 10 13:24 dist
-rw-r--r-- 1 digoal users 128 Jan 10 13:24 env_java.sh
-rwxr-xr-x 1 digoal users 1927 Jan 10 13:24 HOW-TO-RUN.txt
drwxr-xr-x 2 digoal users 4096 Jan 10 13:24 lib
-rwxr-xr-x 1 digoal users 2825 Jan 10 13:24 README.txt
drwxr-xr-x 3 digoal users 4096 Jan 10 13:24 run
drwxr-xr-x 6 digoal users 4096 Jan 10 13:24 src
~~~
需要修改的地方:
~~~
src/LoadData/LoadData.java
src/client/jTPCCTerminal.java
run/props.ora
run/props.pg
run/sqlIndexCreates
run/sqlIndexDrops
run/sqlTableCopies
run/sqlTableCreates
run/sqlTableDrops
run/sqlTableTruncates
~~~
把所有的benchmarksql替換成新的schema name,例如 test01
~~~
sed -i "s/benchmarksql/test01/g" src/LoadData/LoadData.java
sed -i "s/benchmarksql/test01/g" src/client/jTPCCTerminal.java
sed -i "s/benchmarksql/test01/g" run/props.ora
sed -i "s/benchmarksql/test01/g" run/props.pg
sed -i "s/benchmarksql/test01/g" run/sqlIndexCreates
sed -i "s/BENCHMARKSQL/TEST01/g" run/sqlIndexCreates
sed -i "s/benchmarksql/test01/g" run/sqlIndexDrops
sed -i "s/benchmarksql/test01/g" run/sqlTableCopies
sed -i "s/benchmarksql/test01/g" run/sqlTableCreates
sed -i "s/benchmarksql/test01/g" run/sqlTableDrops
sed -i "s/benchmarksql/test01/g" run/sqlTableTruncates
~~~
然后使用ant重新打包工程,如果沒有安裝ant,可以用yum install -y ant安裝它。
使用ant重新打包benchmarksql.jar
~~~
$ant -buildfile ./build.xml
Buildfile: ./build.xml
Trying to override old definition of task javac
init:
compile:
[javac] Compiling 16 source files to /soft/digoal/soft_bak/benchmarksql-4.1.0_oracle01/build
[javac] Note: /soft/digoal/soft_bak/benchmarksql-4.1.0_oracle01/src/client/jTPCCTerminal.java uses unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
dist:
[jar] Building jar: /soft/digoal/soft_bak/benchmarksql-4.1.0_oracle01/dist/BenchmarkSQL-4.1.jar
BUILD SUCCESSFUL
Total time: 2 seconds
~~~
現在benchmarksql使用的是test01這個schema。使用同樣的方法,生成支持test02,test03,test04 schema的benchmarksql版本。
創建4個數據庫,分別為test01、test02、test03、test04,將階段1 pg_dump導出的數據導入到這4個數據庫, 并將schema重命名為對應的test01、test02、test03、test04。
測試數據量
~~~
postgres=# \l+
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description
-----------+----------+----------+---------+-------+-----------------------+---------+------------+--------------------------------------------
test01 | test01 | UTF8 | C | C | | 100 GB | pg_default |
test02 | test02 | UTF8 | C | C | | 100 GB | pg_default |
test03 | test03 | UTF8 | C | C | | 100 GB | pg_default |
test04 | test04 | UTF8 | C | C | | 100 GB | pg_default |
~~~
benchmarksql軟件目錄
~~~
$ ll
drwxr-xr-x 7 digoal users 4.0K Jan 10 14:41 benchmarksql-4.1.0_pg01
drwxr-xr-x 7 digoal users 4.0K Jan 10 14:41 benchmarksql-4.1.0_pg02
drwxr-xr-x 7 digoal users 4.0K Jan 10 14:41 benchmarksql-4.1.0_pg03
drwxr-xr-x 7 digoal users 4.0K Jan 10 14:41 benchmarksql-4.1.0_pg04
~~~
配置每個benchmarksql的props.pg,修改對應的連接。
例如:
~~~
$cat run/props.pg
driver=org.postgresql.Driver
conn=jdbc:postgresql://xxx.xxx.xxx.xxx:1921/test01?preparedStatementCacheSizeMiB=10
user=test01
password=123
warehouses=1000
terminals=20
//To run specified transactions per terminal- runMins must equal zero
runTxnsPerTerminal=0
//To run for specified minutes- runTxnsPerTerminal must equal zero
runMins=10
//Number of total transactions per minute
limitTxnsPerMin=0
//The following five values must add up to 100
//The default percentages of 45, 43, 4, 4 & 4 match the TPC-C spec
newOrderWeight=40
paymentWeight=36
orderStatusWeight=8
deliveryWeight=8
stockLevelWeight=8
~~~
配置數據庫pg_hba.conf,允許測試機連接。
~~~
vi $PGDATA/pg_hba.conf
host all all 0.0.0.0/0 md5
pg_ctl reload
~~~
### 階段2 TPC-C 壓測
~~~
cd benchmarksql-4.1.0_pg01/run
nohup ./runBenchmark.sh props.pg >/dev/null 2>./errrun.log &
cd ../../benchmarksql-4.1.0_pg02/run
nohup ./runBenchmark.sh props.pg >/dev/null 2>./errrun.log &
cd ../../benchmarksql-4.1.0_pg03/run
nohup ./runBenchmark.sh props.pg >/dev/null 2>./errrun.log &
cd ../../benchmarksql-4.1.0_pg04/run
nohup ./runBenchmark.sh props.pg >/dev/null 2>./errrun.log &
cd ../..
~~~
### 階段2 PostgreSQL 9.5.0 TPmC : 453058.64
~~~
$ cat benchmarksql-4.1.0_pg01/run/log/benchmarksql.log
INFO [2016-01-10 17:54:04.925] Thread-22 Term-00, Measured tpmC (NewOrders) = 45416.28
INFO [2016-01-10 17:54:04.925] Thread-22 Term-00, Measured tpmTOTAL = 113487.61
INFO [2016-01-10 17:54:04.925] Thread-22 Term-00, Session Start = 2016-01-10 17:44:04
INFO [2016-01-10 17:54:04.925] Thread-22 Term-00, Session End = 2016-01-10 17:54:04
INFO [2016-01-10 17:54:04.925] Thread-22 Term-00, Transaction Count = 1134913
$ cat benchmarksql-4.1.0_pg02/run/log/benchmarksql.log
INFO [2016-01-10 17:54:04.943] Thread-12 Term-00, Measured tpmC (NewOrders) = 45292.48
INFO [2016-01-10 17:54:04.943] Thread-12 Term-00, Measured tpmTOTAL = 113269.54
INFO [2016-01-10 17:54:04.943] Thread-12 Term-00, Session Start = 2016-01-10 17:44:04
INFO [2016-01-10 17:54:04.944] Thread-12 Term-00, Session End = 2016-01-10 17:54:04
INFO [2016-01-10 17:54:04.944] Thread-12 Term-00, Transaction Count = 1132770
$ cat benchmarksql-4.1.0_pg03/run/log/benchmarksql.log
INFO [2016-01-10 17:54:04.955] Thread-12 Term-00, Measured tpmC (NewOrders) = 45336.15
INFO [2016-01-10 17:54:04.955] Thread-12 Term-00, Measured tpmTOTAL = 113247.19
INFO [2016-01-10 17:54:04.956] Thread-12 Term-00, Session Start = 2016-01-10 17:44:04
INFO [2016-01-10 17:54:04.956] Thread-12 Term-00, Session End = 2016-01-10 17:54:04
INFO [2016-01-10 17:54:04.956] Thread-12 Term-00, Transaction Count = 1132537
$ cat benchmarksql-4.1.0_pg04/run/log/benchmarksql.log
INFO [2016-01-10 17:54:04.986] Thread-23 Term-00, Measured tpmC (NewOrders) = 45231.67
INFO [2016-01-10 17:54:04.987] Thread-23 Term-00, Measured tpmTOTAL = 113054.3
INFO [2016-01-10 17:54:04.987] Thread-23 Term-00, Session Start = 2016-01-10 17:44:04
INFO [2016-01-10 17:54:04.987] Thread-23 Term-00, Session End = 2016-01-10 17:54:04
INFO [2016-01-10 17:54:04.987] Thread-23 Term-00, Transaction Count = 1130640
TPM :
113487.61 + 113269.54 + 113247.19 + 113054.3 = 453058.64
~~~
第9分鐘操作系統統計信息
~~~
TOP
top - 17:38:27 up 4 days, 8:32, 4 users, load average: 78.54, 68.64, 37.22
Tasks: 658 total, 34 running, 624 sleeping, 0 stopped, 0 zombie
Cpu(s): 70.2%us, 15.7%sy, 0.0%ni, 5.5%id, 1.5%wa, 0.0%hi, 7.1%si, 0.0%st
Mem: 264643396k total, 229866068k used, 34777328k free, 59652k buffers
Swap: 18825200k total, 0k used, 18825200k free, 183529592k cached
iostat -x
avg-cpu: %user %nice %system %iowait %steal %idle
71.39 0.00 22.47 1.26 0.00 4.88
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
dfa 0.00 0.00 3659.33 7008.67 58538.67 112050.67 15.99 5.85 0.55 0.06 68.17
dfb 0.00 0.00 3714.67 6888.67 59418.67 110173.33 15.99 5.98 0.56 0.06 67.87
dfc 0.00 0.00 3709.00 6974.33 59328.00 111504.00 15.99 5.63 0.52 0.07 71.60
dm-0 0.00 0.00 11083.00 20870.33 177285.33 333706.67 15.99 17.60 0.55 0.03 92.10
~~~
測試過程oprofile報告
~~~
#/home/digoal/oprof/bin/opreport -l -f -w -x -t 0.5
Using /soft/digoal/soft_bak/oprof_test/oprofile_data/samples/ for samples directory.
WARNING! Some of the events were throttled. Throttling occurs when
the initial sample rate is too high, causing an excessive number of
interrupts. Decrease the sampling frequency. Check the directory
/soft/digoal/soft_bak/oprof_test/oprofile_data/samples/current/stats/throttled
for the throttled event names.
CPU: Intel Ivy Bridge microarchitecture, speed 2600 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
vma samples % app name symbol name
007a7780 2632700 5.2511 /soft/digoal/soft_bak/pgsql9.5/bin/postgres hash_search_with_hash_value
004a92f0 1895924 3.7816 /soft/digoal/soft_bak/pgsql9.5/bin/postgres _bt_compare
006969c0 1844371 3.6787 /soft/digoal/soft_bak/pgsql9.5/bin/postgres GetSnapshotData
0078a090 1775031 3.5404 /soft/digoal/soft_bak/pgsql9.5/bin/postgres SearchCatCache
006a4bd0 1725350 3.4413 /soft/digoal/soft_bak/pgsql9.5/bin/postgres LWLockAcquire
007bc3a0 1565190 3.1219 /soft/digoal/soft_bak/pgsql9.5/bin/postgres AllocSetAlloc
00498930 1406694 2.8058 /soft/digoal/soft_bak/pgsql9.5/bin/postgres heap_hot_search_buffer
005b8f70 965646 1.9261 /soft/digoal/soft_bak/pgsql9.5/bin/postgres ExecInitExpr
006895d0 767078 1.5300 /soft/digoal/soft_bak/pgsql9.5/bin/postgres PinBuffer
004aaa80 617741 1.2321 /soft/digoal/soft_bak/pgsql9.5/bin/postgres _bt_checkkeys
007a2180 588043 1.1729 /soft/digoal/soft_bak/pgsql9.5/bin/postgres FunctionCall2Coll
006a4220 575864 1.1486 /soft/digoal/soft_bak/pgsql9.5/bin/postgres LWLockRelease
007ac620 485162 0.9677 /soft/digoal/soft_bak/pgsql9.5/bin/postgres pg_encoding_mbcliplen
007a3950 471102 0.9396 /soft/digoal/soft_bak/pgsql9.5/bin/postgres fmgr_info_cxt_security
0046c790 441548 0.8807 /soft/digoal/soft_bak/pgsql9.5/bin/postgres slot_deform_tuple
0048c8f0 425867 0.8494 /soft/digoal/soft_bak/pgsql9.5/bin/postgres hash_any
006b2e50 404548 0.8069 /soft/digoal/soft_bak/pgsql9.5/bin/postgres PostgresMain
007bd0f0 396510 0.7909 /soft/digoal/soft_bak/pgsql9.5/bin/postgres palloc
0049bce0 394201 0.7863 /soft/digoal/soft_bak/pgsql9.5/bin/postgres heap_page_prune_opt
007bce00 353243 0.7046 /soft/digoal/soft_bak/pgsql9.5/bin/postgres pfree
0049b300 335896 0.6700 /soft/digoal/soft_bak/pgsql9.5/bin/postgres heap_page_prune
0046c580 313145 0.6246 /soft/digoal/soft_bak/pgsql9.5/bin/postgres heap_getsysattr
006b14a0 311776 0.6219 /soft/digoal/soft_bak/pgsql9.5/bin/postgres exec_bind_message
007cb070 292106 0.5826 /soft/digoal/soft_bak/pgsql9.5/bin/postgres HeapTupleSatisfiesMVCC
007bd210 275282 0.5491 /soft/digoal/soft_bak/pgsql9.5/bin/postgres MemoryContextAllocZeroAligned
005b8530 273199 0.5449 /soft/digoal/soft_bak/pgsql9.5/bin/postgres ExecProject
00494ba0 266495 0.5315 /soft/digoal/soft_bak/pgsql9.5/bin/postgres heap_update
007bca10 265556 0.5297 /soft/digoal/soft_bak/pgsql9.5/bin/postgres AllocSetFree
~~~
### 階段2 性能瓶頸分析
1. 單次IO請求響應較高,在0.06毫秒;
2. 系統調用占用的CPU百分比較高;
3. 數據庫獲取快照占用CPU較高,需要PostgreSQL代碼層優化,本文暫時不涉及。
## 階段3 TPC-C 優化
1. 開啟PostgreSQL 預讀, 預讀數(n-1), 僅僅對hash 聚合有效, 有興趣的同學可以看看PostgreSQL代碼。n是條帶數, 所以本例case?`effective_io_concurrency`?= 2
這個使用xfs的largeio參數效果是類似的,還有塊設備的預讀功能。
開啟預讀可能存在IO浪費的情況,例如全BUFFER命中的情況下。預讀對于OLAP非常有效。
2. 開啟大頁支持, 開到168G;
~~~
/etc/sysctl.conf
vm.nr_hugepages = 102352
sysctl -p
/etc/security/limits.conf
* soft memlock 250000000
* hard memlock 250000000
# 配置 memlock 大于 nr_hugepages 大于 shared_buffers
~~~
3. 使用數據塊分組提交,?`commit_delay`?= 10,?`commit_siblings`?= 16
平滑檢查點到0.8個周期,減少fsync dirty page IO影響。
~~~
http://blog.163.com/digoal@126/blog/static/1638770402016011115141697/
shared_buffers = 164GB # min 128kB
huge_pages = on # on, off, or try
maintenance_work_mem = 2GB # min 1MB
wal_buffers = 16MB # min 32kB, -1 sets based on shared_buffers
wal_writer_delay = 10ms # 1-10000 milliseconds
commit_delay = 10 # range 0-100000, in microseconds
commit_siblings = 16 # range 1-1000
checkpoint_timeout = 35min # range 30s-1h
max_wal_size = 320GB
checkpoint_completion_target = 0.8 # checkpoint target duration, 0.0 - 1.0
effective_cache_size = 240GB
log_destination = 'csvlog' # Valid values are combinations of
logging_collector = on # Enable capturing of stderr and csvlog
log_truncate_on_rotation = on # If on, an existing log file with the
~~~
### 階段3 TPC-C 壓測
~~~
$tail -n 5 benchmarksql-4.1.0_pg01/run/log/benchmarksql.log
INFO [2016-01-11 13:33:55.917] Thread-14 Term-00, Measured tpmC (NewOrders) = 48151.07
INFO [2016-01-11 13:33:55.917] Thread-14 Term-00, Measured tpmTOTAL = 120215.48
INFO [2016-01-11 13:33:55.917] Thread-14 Term-00, Session Start = 2016-01-11 13:23:55
INFO [2016-01-11 13:33:55.917] Thread-14 Term-00, Session End = 2016-01-11 13:33:55
INFO [2016-01-11 13:33:55.917] Thread-14 Term-00, Transaction Count = 1202222
$tail -n 5 benchmarksql-4.1.0_pg02/run/log/benchmarksql.log
INFO [2016-01-11 13:33:55.971] Thread-16 Term-00, Measured tpmC (NewOrders) = 48505.54
INFO [2016-01-11 13:33:55.971] Thread-16 Term-00, Measured tpmTOTAL = 121182.26
INFO [2016-01-11 13:33:55.971] Thread-16 Term-00, Session Start = 2016-01-11 13:23:55
INFO [2016-01-11 13:33:55.972] Thread-16 Term-00, Session End = 2016-01-11 13:33:55
INFO [2016-01-11 13:33:55.972] Thread-16 Term-00, Transaction Count = 1211858
$tail -n 5 benchmarksql-4.1.0_pg03/run/log/benchmarksql.log
INFO [2016-01-11 13:33:55.985] Thread-4 Term-00, Measured tpmC (NewOrders) = 48119.61
INFO [2016-01-11 13:33:55.985] Thread-4 Term-00, Measured tpmTOTAL = 120523.98
INFO [2016-01-11 13:33:55.985] Thread-4 Term-00, Session Start = 2016-01-11 13:23:55
INFO [2016-01-11 13:33:55.985] Thread-4 Term-00, Session End = 2016-01-11 13:33:55
INFO [2016-01-11 13:33:55.985] Thread-4 Term-00, Transaction Count = 1205271
$tail -n 5 benchmarksql-4.1.0_pg04/run/log/benchmarksql.log
INFO [2016-01-11 13:33:55.958] Thread-21 Term-00, Measured tpmC (NewOrders) = 48087.55
INFO [2016-01-11 13:33:55.958] Thread-21 Term-00, Measured tpmTOTAL = 120461.29
INFO [2016-01-11 13:33:55.958] Thread-21 Term-00, Session Start = 2016-01-11 13:23:55
INFO [2016-01-11 13:33:55.958] Thread-21 Term-00, Session End = 2016-01-11 13:33:55
INFO [2016-01-11 13:33:55.958] Thread-21 Term-00, Transaction Count = 1204638
TPM:
120215.48 + 121182.26 + 120523.98 + 120461.29 = 482383.01
~~~
### 階段3 PostgreSQL 9.5.0 TPmC : 482383.01
### 階段3 性能瓶頸分析
1. 操作系統后臺刷臟頁的數據量太大,容易帶來抖動;
2. 優化并發數,減少事務快照CPU開銷;
3. 優化work_mem,減少文件排序;
4. 優化分組提交閾值。
## 階段4 TPC-C 優化
* 優化分組提交的時延,最小結束點并發事務數量,work_mem等
* 操作系統內核參數優化
* 優化老化臟頁刷新喚醒間隔
* `vm.dirty_writeback_centisecs`=10
* 優化老化臟頁閾值, 60秒以前的臟頁將被刷到磁盤
* `vm.dirty_expire_centisecs`=6000
* 優化用戶進程刷臟頁閾值, 臟頁超過80%才需要用戶進程刷臟頁, 盡量避免用戶進程刷臟頁.
* `vm.dirty_ratio`=80
* 優化內核進程刷臟頁閾值, 盡量讓系統后臺進程刷臟頁.
* `vm.dirty_background_bytes`=102400000
* 優化終端數,每個benchmarksql 20個終端,一共80個終端。
PostgreSQL參數
~~~
listen_addresses = '0.0.0.0' # what IP address(es) to listen on;
port = 1921 # (change requires restart)
max_connections = 300 # (change requires restart)
unix_socket_directories = '.' # comma-separated list of directories
shared_buffers = 164GB # min 128kB
huge_pages = on # on, off, or try
work_mem = 256MB # min 64kB
maintenance_work_mem = 2GB # min 1MB
autovacuum_work_mem = 2GB # min 1MB, or -1 to use maintenance_work_mem
dynamic_shared_memory_type = mmap # the default is the first option
vacuum_cost_delay = 10 # 0-100 milliseconds
vacuum_cost_limit = 10000 # 1-10000 credits
bgwriter_delay = 10ms # 10-10000ms between rounds
bgwriter_lru_maxpages = 1000# 0-1000 max buffers written/round
bgwriter_lru_multiplier = 10.0 # 0-10.0 multipler on buffers scanned/round
effective_io_concurrency = 2 # 1-1000; 0 disables prefetching
wal_level = minimal # minimal, archive, hot_standby, or logical
synchronous_commit = off # synchronization level;
full_page_writes = off # recover from partial page writes
wal_buffers = 1GB # min 32kB, -1 sets based on shared_buffers
wal_writer_delay = 10ms # 1-10000 milliseconds
commit_delay = 10 # range 0-100000, in microseconds
commit_siblings = 6 # range 1-1000
checkpoint_timeout = 55min # range 30s-1h
max_wal_size = 320GB
checkpoint_completion_target = 0.99 # checkpoint target duration, 0.0 - 1.0
random_page_cost = 1.0 # same scale as above
effective_cache_size = 240GB
log_destination = 'csvlog' # Valid values are combinations of
logging_collector = on # Enable capturing of stderr and csvlog
log_truncate_on_rotation = on # If on, an existing log file with the
log_timezone = 'PRC'
update_process_title = off
track_activities = off
autovacuum = on# Enable autovacuum subprocess? 'on'
~~~
重啟數據庫
~~~
pg_ctl restart
~~~
將數據加載到shared buffer
~~~
psql
\c test01 test01
explain analyze select * from customer; explain analyze select * from stock;
\c test02 test02
explain analyze select * from customer; explain analyze select * from stock;
\c test03 test03
explain analyze select * from customer; explain analyze select * from stock;
\c test04 test04
explain analyze select * from customer; explain analyze select * from stock;
~~~
### 階段4 TPC-C 壓測
~~~
$ tail -n 5 benchmarksql-4.1.0_pg01/run/log/benchmarksql.log
INFO [2016-01-12 11:55:09.461] Thread-12 Term-00, Measured tpmC (NewOrders) = 57995.55
INFO [2016-01-12 11:55:09.461] Thread-12 Term-00, Measured tpmTOTAL = 144975.59
INFO [2016-01-12 11:55:09.461] Thread-12 Term-00, Session Start = 2016-01-12 11:45:09
INFO [2016-01-12 11:55:09.461] Thread-12 Term-00, Session End = 2016-01-12 11:55:09
INFO [2016-01-12 11:55:09.462] Thread-12 Term-00, Transaction Count = 1449796
$ tail -n 5 benchmarksql-4.1.0_pg02/run/log/benchmarksql.log
INFO [2016-01-12 11:55:09.499] Thread-0 Term-00, Measured tpmC (NewOrders) = 58013.75
INFO [2016-01-12 11:55:09.499] Thread-0 Term-00, Measured tpmTOTAL = 145006.74
INFO [2016-01-12 11:55:09.499] Thread-0 Term-00, Session Start = 2016-01-12 11:45:09
INFO [2016-01-12 11:55:09.500] Thread-0 Term-00, Session End = 2016-01-12 11:55:09
INFO [2016-01-12 11:55:09.500] Thread-0 Term-00, Transaction Count = 1450110
$ tail -n 5 benchmarksql-4.1.0_pg03/run/log/benchmarksql.log
INFO [2016-01-12 11:55:09.541] Thread-14 Term-00, Measured tpmC (NewOrders) = 57322.05
INFO [2016-01-12 11:55:09.541] Thread-14 Term-00, Measured tpmTOTAL = 143227.03
INFO [2016-01-12 11:55:09.542] Thread-14 Term-00, Session Start = 2016-01-12 11:45:09
INFO [2016-01-12 11:55:09.542] Thread-14 Term-00, Session End = 2016-01-12 11:55:09
INFO [2016-01-12 11:55:09.542] Thread-14 Term-00, Transaction Count = 1432298
$ tail -n 5 benchmarksql-4.1.0_pg04/run/log/benchmarksql.log
INFO [2016-01-12 11:55:09.574] Thread-7 Term-00, Measured tpmC (NewOrders) = 57863.92
INFO [2016-01-12 11:55:09.574] Thread-7 Term-00, Measured tpmTOTAL = 144596.45
INFO [2016-01-12 11:55:09.575] Thread-7 Term-00, Session Start = 2016-01-12 11:45:09
INFO [2016-01-12 11:55:09.575] Thread-7 Term-00, Session End = 2016-01-12 11:55:09
INFO [2016-01-12 11:55:09.575] Thread-7 Term-00, Transaction Count = 1445978
TPM:
144975.59 + 145006.74 + 143227.03 + 144596.45 = 577805.81
~~~
### 階段4 PostgreSQL 9.5.0 TPmC : 577805.81
### 階段4 性能瓶頸分析
無明顯瓶頸,從編譯器,代碼方面入手優化。
## 階段5 TPC-C 優化
使用最新版本的gcc, 同時使用O3優化代碼。
* [gcc編譯器版本更新](http://blog.163.com/digoal@126/blog/static/163877040201601313814429/)
* [INTEL編譯器](https://software.intel.com/en-us/intel-compilers)
* [CLANG編譯器](http://blog.163.com/digoal@126/blog/static/163877040201601382640309/)
使用gcc 4.9.3版本,使用O3優化CFLAGS,重新編譯
~~~
$ export LD_LIBRARY_PATH=/u02/digoal/gcc4.9.3/lib:/u02/digoal/cloog/lib:/u02/digoal/gmp/lib:/u02/digoal/isl/lib:/u02/digoal/mpc/lib:/u02/digoal/mpfr/lib:$LD_LIBRARY_PATH
$ export PATH=/u02/digoal/gcc4.9.3/bin:$PATH
$ CFLAGS="-O3 -march=native -flto" CC=/u02/digoal/gcc4.9.3/bin/gcc ./configure --prefix=/u02/digoal/soft_bak/pgsql9.5 --with-blocksize=8 --with-pgport=1921 --with-perl --with-python --with-tcl --with-openssl --with-pam --with-ldap --with-libxml --with-libxslt --enable-thread-safety --with-wal-segsize=64
$ make world -j 32
$ make install-world -j 32
~~~
### 階段5 TPC-C 壓測
~~~
digoal tail -n 5 benchmarksql-4.1.0_pg01/run/log/benchmarksql.log
INFO [2016-01-13 02:00:49.699] Thread-15 Term-00, Measured tpmC (NewOrders) = 59092.33
INFO [2016-01-13 02:00:49.699] Thread-15 Term-00, Measured tpmTOTAL = 147832.44
INFO [2016-01-13 02:00:49.699] Thread-15 Term-00, Session Start = 2016-01-13 01:50:49
INFO [2016-01-13 02:00:49.699] Thread-15 Term-00, Session End = 2016-01-13 02:00:49
INFO [2016-01-13 02:00:49.699] Thread-15 Term-00, Transaction Count = 1478385
digoal tail -n 5 benchmarksql-4.1.0_pg02/run/log/benchmarksql.log
INFO [2016-01-13 02:00:49.704] Thread-0 Term-00, Measured tpmC (NewOrders) = 60051.49
INFO [2016-01-13 02:00:49.704] Thread-0 Term-00, Measured tpmTOTAL = 150231.54
INFO [2016-01-13 02:00:49.704] Thread-0 Term-00, Session Start = 2016-01-13 01:50:49
INFO [2016-01-13 02:00:49.704] Thread-0 Term-00, Session End = 2016-01-13 02:00:49
INFO [2016-01-13 02:00:49.704] Thread-0 Term-00, Transaction Count = 1502367
digoal tail -n 5 benchmarksql-4.1.0_pg03/run/log/benchmarksql.log
INFO [2016-01-13 02:00:49.693] Thread-16 Term-00, Measured tpmC (NewOrders) = 60273.99
INFO [2016-01-13 02:00:49.694] Thread-16 Term-00, Measured tpmTOTAL = 150601.93
INFO [2016-01-13 02:00:49.694] Thread-16 Term-00, Session Start = 2016-01-13 01:50:49
INFO [2016-01-13 02:00:49.694] Thread-16 Term-00, Session End = 2016-01-13 02:00:49
INFO [2016-01-13 02:00:49.694] Thread-16 Term-00, Transaction Count = 1506066
digoal tail -n 5 benchmarksql-4.1.0_pg04/run/log/benchmarksql.log
INFO [2016-01-13 02:00:49.715] Thread-18 Term-00, Measured tpmC (NewOrders) = 60180.69
INFO [2016-01-13 02:00:49.715] Thread-18 Term-00, Measured tpmTOTAL = 150591.78
INFO [2016-01-13 02:00:49.716] Thread-18 Term-00, Session Start = 2016-01-13 01:50:49
INFO [2016-01-13 02:00:49.716] Thread-18 Term-00, Session End = 2016-01-13 02:00:49
INFO [2016-01-13 02:00:49.716] Thread-18 Term-00, Transaction Count = 1505962
TPM
599257.69
~~~
### 階段5 PostgreSQL 9.5.0 TPmC : 599257.69
### 階段5 性能瓶頸分析
使用CLANG編譯器。
## 階段6 TPC-C 優化
clang的安裝參考[這里](http://blog.163.com/digoal@126/blog/static/163877040201601421045406/)。
使用clang編譯
~~~
CC=/u02/digoal/llvm/bin/clang CFLAGS="-O2 -fstrict-enums" ./configure --prefix=/u02/digoal/soft_bak/pgsql9.5 --with-pgport=1921 --with-perl --with-python --with-tcl --with-openssl --with-pam --with-ldap --with-libxml --with-libxslt --enable-thread-safety
make world -j 32
make install-world -j 32
~~~
### 階段6 TPC-C 壓測
~~~
$ tail -n 5 benchmarksql-4.1.0_pg01/run/log/benchmarksql.log
INFO [2016-01-16 07:21:58.070] Thread-12 Term-00, Measured tpmC (NewOrders) = 60519.19
INFO [2016-01-16 07:21:58.070] Thread-12 Term-00, Measured tpmTOTAL = 151235.02
INFO [2016-01-16 07:21:58.070] Thread-12 Term-00, Session Start = 2016-01-16 07:11:58
INFO [2016-01-16 07:21:58.071] Thread-12 Term-00, Session End = 2016-01-16 07:21:58
INFO [2016-01-16 07:21:58.071] Thread-12 Term-00, Transaction Count = 1512377
$ tail -n 5 benchmarksql-4.1.0_pg02/run/log/benchmarksql.log
INFO [2016-01-16 07:21:58.180] Thread-15 Term-00, Measured tpmC (NewOrders) = 60924.87
INFO [2016-01-16 07:21:58.180] Thread-15 Term-00, Measured tpmTOTAL = 152126.73
INFO [2016-01-16 07:21:58.180] Thread-15 Term-00, Session Start = 2016-01-16 07:11:58
INFO [2016-01-16 07:21:58.180] Thread-15 Term-00, Session End = 2016-01-16 07:21:58
INFO [2016-01-16 07:21:58.180] Thread-15 Term-00, Transaction Count = 1521312
$ tail -n 5 benchmarksql-4.1.0_pg03/run/log/benchmarksql.log
INFO [2016-01-16 07:21:58.198] Thread-0 Term-00, Measured tpmC (NewOrders) = 60481.19
INFO [2016-01-16 07:21:58.198] Thread-0 Term-00, Measured tpmTOTAL = 151294.63
INFO [2016-01-16 07:21:58.199] Thread-0 Term-00, Session Start = 2016-01-16 07:11:58
INFO [2016-01-16 07:21:58.199] Thread-0 Term-00, Session End = 2016-01-16 07:21:58
INFO [2016-01-16 07:21:58.199] Thread-0 Term-00, Transaction Count = 1512968
$ tail -n 5 benchmarksql-4.1.0_pg04/run/log/benchmarksql.log
INFO [2016-01-16 07:21:58.200] Thread-5 Term-00, Measured tpmC (NewOrders) = 60715.57
INFO [2016-01-16 07:21:58.200] Thread-5 Term-00, Measured tpmTOTAL = 151809.93
INFO [2016-01-16 07:21:58.200] Thread-5 Term-00, Session Start = 2016-01-16 07:11:58
INFO [2016-01-16 07:21:58.200] Thread-5 Term-00, Session End = 2016-01-16 07:21:58
INFO [2016-01-16 07:21:58.200] Thread-5 Term-00, Transaction Count = 1518149
TPM:
606466.31
~~~
### 階段6 PostgreSQL 9.5.0 TPmC : 606466.31
當前perf top
~~~
samples pcnt function DSO
_______ _____ _________________________________ __________________________________________
15900.00 3.2% hash_search_with_hash_value /u02/digoal/soft_bak/pgsql9.5/bin/postgres
13970.00 2.8% _bt_compare /u02/digoal/soft_bak/pgsql9.5/bin/postgres
13215.00 2.6% AllocSetAlloc /u02/digoal/soft_bak/pgsql9.5/bin/postgres
10678.00 2.1% LWLockAcquire /u02/digoal/soft_bak/pgsql9.5/bin/postgres
10298.00 2.1% memcpy /lib64/libc-2.12.so
9016.00 1.8% SearchCatCache /u02/digoal/soft_bak/pgsql9.5/bin/postgres
8577.00 1.7% heap_hot_search_buffer/u02/digoal/soft_bak/pgsql9.5/bin/postgres
8059.00 1.6% GetSnapshotData /u02/digoal/soft_bak/pgsql9.5/bin/postgres
6975.00 1.4% ExecInitExpr /u02/digoal/soft_bak/pgsql9.5/bin/postgres
6517.00 1.3% fmgr_info_cxt_security/u02/digoal/soft_bak/pgsql9.5/bin/postgres
5232.00 1.0% PostgresMain /u02/digoal/soft_bak/pgsql9.5/bin/postgres
4328.00 0.9% LWLockRelease /u02/digoal/soft_bak/pgsql9.5/bin/postgres
4044.00 0.8% PinBuffer /u02/digoal/soft_bak/pgsql9.5/bin/postgres
4037.00 0.8% _int_malloc /lib64/libc-2.12.so
4026.00 0.8% StrategyGetBuffer /u02/digoal/soft_bak/pgsql9.5/bin/postgres
3777.00 0.8% slot_deform_tuple /u02/digoal/soft_bak/pgsql9.5/bin/postgres
3755.00 0.7% FunctionCall2Coll /u02/digoal/soft_bak/pgsql9.5/bin/postgres
3741.00 0.7% __GI_vfprintf /lib64/libc-2.12.so
3403.00 0.7% __strncpy_ssse3 /lib64/libc-2.12.so
3305.00 0.7% aliflash_reconfig_task[aliflash]
3090.00 0.6% _bt_checkkeys /u02/digoal/soft_bak/pgsql9.5/bin/postgres
3012.00 0.6% __memset_sse2 /lib64/libc-2.12.so
2881.00 0.6% palloc /u02/digoal/soft_bak/pgsql9.5/bin/postgres
2698.00 0.5% __strlen_sse42 /lib64/libc-2.12.so
2585.00 0.5% _int_free /lib64/libc-2.12.so
2505.00 0.5% heap_page_prune /u02/digoal/soft_bak/pgsql9.5/bin/postgres
2495.00 0.5% hash_any /u02/digoal/soft_bak/pgsql9.5/bin/postgres
2442.00 0.5% heap_page_prune_opt /u02/digoal/soft_bak/pgsql9.5/bin/postgres
2437.00 0.5% __schedule[kernel.kallsyms]
2210.00 0.4% MemoryContextAllocZeroAligned /u02/digoal/soft_bak/pgsql9.5/bin/postgres
2111.00 0.4% pfree /u02/digoal/soft_bak/pgsql9.5/bin/postgres
2048.00 0.4% heap_update /u02/digoal/soft_bak/pgsql9.5/bin/postgres
2012.00 0.4% update_blocked_averages [kernel.kallsyms]
1937.00 0.4% __switch_to [kernel.kallsyms]
1925.00 0.4% heap_getsysattr /u02/digoal/soft_bak/pgsql9.5/bin/postgres
1916.00 0.4% TupleDescInitEntry /u02/digoal/soft_bak/pgsql9.5/bin/postgres
1905.00 0.4% irq_entries_start [kernel.kallsyms]
1863.00 0.4% AllocSetFree /u02/digoal/soft_bak/pgsql9.5/bin/postgres
1821.00 0.4% _wordcopy_bwd_aligned /lib64/libc-2.12.so
1761.00 0.4% _raw_spin_lock [kernel.kallsyms]
1758.00 0.4% check_stack_depth /u02/digoal/soft_bak/pgsql9.5/bin/postgres
1749.00 0.3% _bt_binsrch /u02/digoal/soft_bak/pgsql9.5/bin/postgres
1748.00 0.3% ReadBuffer_common /u02/digoal/soft_bak/pgsql9.5/bin/postgres
1747.00 0.3% expression_tree_walker/u02/digoal/soft_bak/pgsql9.5/bin/postgres
1651.00 0.3% __GI___libc_malloc /lib64/libc-2.12.so
1608.00 0.3% __memcmp_sse4_1 /lib64/libc-2.12.so
1586.00 0.3% LockAcquireExtended /u02/digoal/soft_bak/pgsql9.5/bin/postgres
------------------------------------------------------------------------------------------------------------
~~~
### 階段6 性能瓶頸分析
需要從PG代碼層進行優化.
#### 其他本文未嘗試的優化手段
有興趣的朋友可以試試:
1. 使用interl的icc編譯一下,看看性能還能不能提升;
2. 關閉表的自動analyze, 關閉日志表的autovacuum和auto analyze;
3. PostgreSQL jdbc有一些參數可以優化,本文還未處理。例如防止類型轉換(benchmarksql的代碼已處理了這塊),QUERY plan CACHE size。
參考[這里](http://www.postgresql.org/docs/9.2/interactive/libpq-connect.html)。
4. PostgreSQL 代碼層也有優化的空間,例如使用分區表, 以及分區表的代碼,快照的優化(應對高并發), LLVM。
請關注阿里云AliCloudDB for PgSQL,我們會持續從代碼層面優化社區版本的PostgreSQL。
## 總結
內核參數優化總結,以及每項配置的原理。
~~~
kernel.shmmax = 135497418752
kernel.shmmni = 4096
fs.file-max = 7672460
fs.aio-max-nr = 1048576
vm.zone_reclaim_mode=0 # 禁用 numa, 或者在vmlinux中禁止.
vm.swappiness = 0 # 關閉交換分區
kernel.shmmax=135497418752 # 最大共享內存段大小
net.core.rmem_max = 4194304 # The maximum receive socket buffer size in bytes
net.core.wmem_max = 4194304 # The maximum send socket buffer size in bytes.
net.core.rmem_default = 262144 # The default setting of the socket receive buffer in bytes.
net.core.wmem_default = 262144 # The default setting (in bytes) of the socket send buffer.
net.ipv4.ip_local_port_range = 9000 65535 # 本地自動分配的TCP UDP端口號范圍
kernel.sem = 50100 64128000 50100 1280 # 信號量
vm.dirty_background_bytes = 102400000 # 系統臟頁到達這個值,系統后臺刷臟頁調度進程 pdflush(或其他) 自動將(dirty_expire_centisecs/100)秒前的臟頁刷到磁盤
vm.dirty_expire_centisecs = 6000 # 比這個值老的臟頁,將被刷到磁盤。6000表示60秒。
vm.dirty_writeback_centisecs = 50 # pdflush(或其他)后臺刷臟頁進程的喚醒間隔, 50表示0.5秒。
vm.dirty_ratio = 80 # 如果系統進程刷臟頁太慢,使得系統臟頁超過內存 80 % 時,則用戶進程如果有寫磁盤的操作(如fsync, fdatasync等調用),則需要主動把系統臟頁刷出。
vm.nr_hugepages = 102352 # 大頁數量,乘以/proc/meminfo Hugepagesize就是內存數量。
vm.overcommit_memory = 0 # 在分配內存時,允許少量over malloc
vm.overcommit_ratio = 90 # 當overcommit_memory = 2 時,用于參與計算允許指派的內存大小。
~~~
內存分配策略解釋,參考[這里](http://blog.163.com/digoal@126/blog/static/163877040201563044143325/)。
~~~
當vm.overcommit_memory=0時,允許用戶輕微的overcommit。
當vm.overcommit_memory=1時,任何情況下都允許申請內存overcommit, 比較危險,常用于一些科學計算應用。
當vm.overcommit_memory=2時,Committed_AS不能大于CommitLimit。
申請內存的限制 計算方法
The CommitLimit is calculated with the following formula:
CommitLimit = ([total RAM pages] - [total huge TLB pages]) *
overcommit_ratio / 100 + [total swap pages]
For example, on a system with 1G of physical RAM and 7G
of swap with a `vm.overcommit_ratio` of 30 it would
yield a CommitLimit of 7.3G.
[root@digoal postgresql-9.4.4]# free
total used free shared buffers cached
Mem: 1914436 713976 1200460 72588 32384 529364
-/+ buffers/cache: 152228 1762208
Swap: 1048572 542080 506492
[root@digoal ~]# cat /proc/meminfo |grep Commit
CommitLimit: 2005788 kB
Committed_AS: 132384 kB
這個例子的2G就是以上公式計算得來。
overcommit限制的初衷是malloc后,內存并不是立即使用掉,所以如果多個進程同時申請一批內存的話,不允許OVERCOMMIT可能導致某些進程申請內存失敗,但實際上內存是還有的。
所以Linux內核給出了幾種選擇,
2是比較靠譜或者溫柔的做法。
1的話風險有點大,雖然可以申請內存,但是實際上可能已經沒有足夠的內存給程序使用,最終可能會導致OOM。
0是最常見的,允許少量的overcommit,但是對于需要超很多內存的情況,不允許。
還可以參考代碼 :
security/commoncap.c::cap_vm_enough_memory()
所以當數據庫無法啟動時,要么你降低一下數據庫申請內存的大小(例如降低shared_buffer或者max conn),要么就是修改一下overcommit的風格。
~~~
vi /etc/security/limits.conf
~~~
# add by digoal.zhou
* soft nofile 131072
* hard nofile 131072
* soft nproc 131072
* hard nproc 131072
* soft core unlimited
* hard core unlimited
* soft memlock 500000000
* hard memlock 500000000
~~~
內核啟動參數優化總結
關閉numa
使用deadline調度IO
~~~
kernel /vmlinuz-3.18.24 numa=off elevator=deadline intel_idle.max_cstate=0 scsi_mod.scan=sync
~~~
塊設備優化總結,預讀
~~~
blockdev --setra 16384 /dev/dfa
blockdev --setra 16384 /dev/dfb
blockdev --setra 16384 /dev/dfc
blockdev --setra 16384 /dev/dm-0
~~~
數據庫參數優化總結
~~~
max_connections = 300 # (change requires restart)
unix_socket_directories = '.' # comma-separated list of directories
shared_buffers = 194GB # 盡量用數據庫管理內存,減少雙重緩存,提高使用效率
huge_pages = on # on, off, or try ,使用大頁
work_mem = 256MB # min 64kB , 減少外部文件排序的可能,提高效率
maintenance_work_mem = 2GB # min 1MB , 加速建立索引
autovacuum_work_mem = 2GB # min 1MB, or -1 to use maintenance_work_mem , 加速垃圾回收
dynamic_shared_memory_type = mmap # the default is the first option
vacuum_cost_delay = 0 # 0-100 milliseconds , 垃圾回收不妥協,極限壓力下,減少膨脹可能性
bgwriter_delay = 10ms # 10-10000ms between rounds , 刷shared buffer臟頁的進程調度間隔,盡量高頻調度,減少用戶進程申請不到內存而需要主動刷臟頁的可能(導致RT升高)。
bgwriter_lru_maxpages = 1000 # 0-1000 max buffers written/round , 一次最多刷多少臟頁
bgwriter_lru_multiplier = 10.0 # 0-10.0 multipler on buffers scanned/round 一次掃描多少個塊,上次刷出臟頁數量的倍數
effective_io_concurrency = 2 # 1-1000; 0 disables prefetching , 執行節點為bitmap heap scan時,預讀的塊數。從而
wal_level = minimal # minimal, archive, hot_standby, or logical , 如果現實環境,建議開啟歸檔。
synchronous_commit = off # synchronization level; , 異步提交
wal_sync_method = open_sync # the default is the first option , 因為沒有standby,所以寫xlog選擇一個支持O_DIRECT的fsync方法。
full_page_writes = off # recover from partial page writes , 生產中,如果有增量備份和歸檔,可以關閉,提高性能。
wal_buffers = 1GB # min 32kB, -1 sets based on shared_buffers ,wal buffer大小,如果大量寫wal buffer等待,則可以加大。
wal_writer_delay = 10ms # 1-10000 milliseconds wal buffer調度間隔,和bg writer delay類似。
commit_delay = 20 # range 0-100000, in microseconds ,分組提交的等待時間
commit_siblings = 9 # range 1-1000 , 有多少個事務同時進入提交階段時,就觸發分組提交。
checkpoint_timeout = 55min # range 30s-1h 時間控制的檢查點間隔。
max_wal_size = 320GB # 2個檢查點之間最多允許產生多少個XLOG文件
checkpoint_completion_target = 0.99 # checkpoint target duration, 0.0 - 1.0 ,平滑調度間隔,假設上一個檢查點到現在這個檢查點之間產生了100個XLOG,則這次檢查點需要在產生100*checkpoint_completion_target個XLOG文件的過程中完成。PG會根據這些值來調度平滑檢查點。
random_page_cost = 1.0 # same scale as above , 離散掃描的成本因子,本例使用的SSD IO能力足夠好
effective_cache_size = 240GB # 可用的OS CACHE
log_destination = 'csvlog' # Valid values are combinations of
logging_collector = on # Enable capturing of stderr and csvlog
log_truncate_on_rotation = on # If on, an existing log file with the
update_process_title = off
track_activities = off
autovacuum = on # Enable autovacuum subprocess? 'on'
autovacuum_max_workers = 4 # max number of autovacuum subprocesses ,允許同時有多少個垃圾回收工作進程。
autovacuum_naptime = 6s # time between autovacuum runs , 自動垃圾回收探測進程的喚醒間隔
autovacuum_vacuum_cost_delay = 0 # default vacuum cost delay for , 垃圾回收不妥協
~~~
其他優化總結:
1. 盡量減少費的IO請求,所以本文從塊設備,到邏輯卷,到文件系統的塊大小都盡量和數據庫塊大小靠齊;
2. 通過對齊,減少IO覆蓋寫;
3. 通過大頁減少內存管理開銷;
4. 通過多個客戶端將數據庫硬件資源充分利用起來;
5. 減少客戶端輸出日志的開銷,降低客戶端性能干擾;
6. 使用新的編譯器,優化編譯后的可執行程序質量。
- 數據庫內核月報目錄
- 數據庫內核月報 - 2016/09
- MySQL · 社區貢獻 · AliSQL那些事兒
- PetaData · 架構體系 · PetaData第二代低成本存儲體系
- MySQL · 社區動態 · MariaDB 10.2 前瞻
- MySQL · 特性分析 · 執行計劃緩存設計與實現
- PgSQL · 最佳實踐 · pg_rman源碼淺析與使用
- MySQL · 捉蟲狀態 · bug分析兩例
- PgSQL · 源碼分析 · PG優化器淺析
- MongoDB · 特性分析· Sharding原理與應用
- PgSQL · 源碼分析 · PG中的無鎖算法和原子操作應用一則
- SQLServer · 最佳實踐 · TEMPDB的設計
- 數據庫內核月報 - 2016/08
- MySQL · 特性分析 ·MySQL 5.7新特性系列四
- PgSQL · PostgreSQL 邏輯流復制技術的秘密
- MySQL · 特性分析 · MyRocks簡介
- GPDB · 特性分析· Greenplum 備份架構
- SQLServer · 最佳實踐 · RDS for SQLServer 2012權限限制提升與改善
- TokuDB · 引擎特性 · REPLACE 語句優化
- MySQL · 專家投稿 · InnoDB物理行中null值的存儲的推斷與驗證
- PgSQL · 實戰經驗 · 旋轉門壓縮算法在PostgreSQL中的實現
- MySQL · 源碼分析 · Query Cache并發處理
- PgSQL · 源碼分析· pg_dump分析
- 數據庫內核月報 - 2016/07
- MySQL · 特性分析 ·MySQL 5.7新特性系列三
- MySQL · 特性分析 · 5.7 代價模型淺析
- PgSQL · 實戰經驗 · 分組TOP性能提升44倍
- MySQL · 源碼分析 · 網絡通信模塊淺析
- MongoDB · 特性分析 · 索引原理
- SQLServer · 特性分析 · XML與JSON應用比較
- MySQL · 最佳實戰 · 審計日志實用案例分析
- MySQL · 性能優化 · 條件下推到物化表
- MySQL · 源碼分析 · Query Cache內部剖析
- MySQL · 捉蟲動態 · 備庫1206錯誤問題說明
- 數據庫內核月報 - 2016/06
- MySQL · 特性分析 · innodb 鎖分裂繼承與遷移
- MySQL · 特性分析 ·MySQL 5.7新特性系列二
- PgSQL · 實戰經驗 · 如何預測Freeze IO風暴
- GPDB · 特性分析· Filespace和Tablespace
- MariaDB · 新特性 · 窗口函數
- MySQL · TokuDB · checkpoint過程
- MySQL · 特性分析 · 內部臨時表
- MySQL · 最佳實踐 · 空間優化
- SQLServer · 最佳實踐 · 數據庫實現大容量插入的幾種方式
- 數據庫內核月報 - 2016/05
- MySQL · 引擎特性 · 基于InnoDB的物理復制實現
- MySQL · 特性分析 · MySQL 5.7新特性系列一
- PostgreSQL · 特性分析 · 邏輯結構和權限體系
- MySQL · 特性分析 · innodb buffer pool相關特性
- PG&GP · 特性分析 · 外部數據導入接口實現分析
- SQLServer · 最佳實踐 · 透明數據加密在SQLServer的應用
- MySQL · TokuDB · 日志子系統和崩潰恢復過程
- MongoDB · 特性分析 · Sharded cluster架構原理
- PostgreSQL · 特性分析 · 統計信息計算方法
- MySQL · 捉蟲動態 · left-join多表導致crash
- 數據庫內核月報 - 2016/04
- MySQL · 參數故事 · innodb_additional_mem_pool_size
- GPDB · 特性分析 · Segment事務一致性與異常處理
- GPDB · 特性分析 · Segment 修復指南
- MySQL · 捉蟲動態 · 并行復制外鍵約束問題二
- PgSQL · 性能優化 · 如何瀟灑的處理每天上百TB的數據增量
- Memcached · 最佳實踐 · 熱點 Key 問題解決方案
- MongoDB · 最佳實踐 · 短連接Auth性能優化
- MySQL · 最佳實踐 · RDS 只讀實例延遲分析
- MySQL · TokuDB · TokuDB索引結構--Fractal Tree
- MySQL · TokuDB · Savepoint漫談
- 數據庫內核月報 - 2016/03
- MySQL · TokuDB · 事務子系統和 MVCC 實現
- MongoDB · 特性分析 · MMAPv1 存儲引擎原理
- PgSQL · 源碼分析 · 優化器邏輯推理
- SQLServer · BUG分析 · Agent 鏈接泄露分析
- Redis · 特性分析 · AOF Rewrite 分析
- MySQL · BUG分析 · Rename table 死鎖分析
- MySQL · 物理備份 · Percona XtraBackup 備份原理
- GPDB · 特性分析· GreenPlum FTS 機制
- MySQL · 答疑解惑 · 備庫Seconds_Behind_Master計算
- MySQL · 答疑解惑 · MySQL 鎖問題最佳實踐
- 數據庫內核月報 - 2016/02
- MySQL · 引擎特性 · InnoDB 文件系統之文件物理結構
- MySQL · 引擎特性 · InnoDB 文件系統之IO系統和內存管理
- MySQL · 特性分析 · InnoDB transaction history
- PgSQL · 會議見聞 · PgConf.Russia 2016 大會總結
- PgSQL · 答疑解惑 · PostgreSQL 9.6 并行查詢實現分析
- MySQL · TokuDB · TokuDB之黑科技工具
- PgSQL · 性能優化 · PostgreSQL TPC-C極限優化玩法
- MariaDB · 版本特性 · MariaDB 的 GTID 介紹
- MySQL · 特性分析 · 線程池
- MySQL · 答疑解惑 · mysqldump tips 兩則
- 數據庫內核月報 - 2016/01
- MySQL · 引擎特性 · InnoDB 事務鎖系統簡介
- GPDB · 特性分析· GreenPlum Primary/Mirror 同步機制
- MySQL · 專家投稿 · MySQL5.7 的 JSON 實現
- MySQL · 特性分析 · 優化器 MRR & BKA
- MySQL · 答疑解惑 · 物理備份死鎖分析
- MySQL · TokuDB · Cachetable 的工作線程和線程池
- MySQL · 特性分析 · drop table的優化
- MySQL · 答疑解惑 · GTID不一致分析
- PgSQL · 特性分析 · Plan Hint
- MariaDB · 社區動態 · MariaDB on Power8 (下)
- 數據庫內核月報 - 2015/12
- MySQL · 引擎特性 · InnoDB 事務子系統介紹
- PgSQL · 特性介紹 · 全文搜索介紹
- MongoDB · 捉蟲動態 · Kill Hang問題排查記錄
- MySQL · 參數優化 ·RDS MySQL參數調優最佳實踐
- PgSQL · 特性分析 · 備庫激活過程分析
- MySQL · TokuDB · 讓Hot Backup更完美
- PgSQL · 答疑解惑 · 表膨脹
- MySQL · 特性分析 · Index Condition Pushdown (ICP)
- MariaDB · 社區動態 · MariaDB on Power8
- MySQL · 特性分析 · 企業版特性一覽
- 數據庫內核月報 - 2015/11
- MySQL · 社區見聞 · OOW 2015 總結 MySQL 篇
- MySQL · 特性分析 · Statement Digest
- PgSQL · 答疑解惑 · PostgreSQL 用戶組權限管理
- MySQL · 特性分析 · MDL 實現分析
- PgSQL · 特性分析 · full page write 機制
- MySQL · 捉蟲動態 · MySQL 外鍵異常分析
- MySQL · 答疑解惑 · MySQL 優化器 range 的代價計算
- MySQL · 捉蟲動態 · ORDER/GROUP BY 導致 mysqld crash
- MySQL · TokuDB · TokuDB 中的行鎖
- MySQL · 捉蟲動態 · order by limit 造成優化器選擇索引錯誤
- 數據庫內核月報 - 2015/10
- MySQL · 引擎特性 · InnoDB 全文索引簡介
- MySQL · 特性分析 · 跟蹤Metadata lock
- MySQL · 答疑解惑 · 索引過濾性太差引起CPU飆高分析
- PgSQL · 特性分析 · PG主備流復制機制
- MySQL · 捉蟲動態 · start slave crash 診斷分析
- MySQL · 捉蟲動態 · 刪除索引導致表無法打開
- PgSQL · 特性分析 · PostgreSQL Aurora方案與DEMO
- TokuDB · 捉蟲動態 · CREATE DATABASE 導致crash問題
- PgSQL · 特性分析 · pg_receivexlog工具解析
- MySQL · 特性分析 · MySQL權限存儲與管理
- 數據庫內核月報 - 2015/09
- MySQL · 引擎特性 · InnoDB Adaptive hash index介紹
- PgSQL · 特性分析 · clog異步提交一致性、原子操作與fsync
- MySQL · 捉蟲動態 · BUG 幾例
- PgSQL · 答疑解惑 · 詭異的函數返回值
- MySQL · 捉蟲動態 · 建表過程中crash造成重建表失敗
- PgSQL · 特性分析 · 談談checkpoint的調度
- MySQL · 特性分析 · 5.6 并行復制恢復實現
- MySQL · 備庫優化 · relay fetch 備庫優化
- MySQL · 特性分析 · 5.6并行復制事件分發機制
- MySQL · TokuDB · 文件目錄談
- 數據庫內核月報 - 2015/08
- MySQL · 社區動態 · InnoDB Page Compression
- PgSQL · 答疑解惑 · RDS中的PostgreSQL備庫延遲原因分析
- MySQL · 社區動態 · MySQL5.6.26 Release Note解讀
- PgSQL · 捉蟲動態 · 執行大SQL語句提示無效的內存申請大小
- MySQL · 社區動態 · MariaDB InnoDB表空間碎片整理
- PgSQL · 答疑解惑 · 歸檔進程cp命令的core文件追查
- MySQL · 答疑解惑 · open file limits
- MySQL · TokuDB · 瘋狂的 filenum++
- MySQL · 功能分析 · 5.6 并行復制實現分析
- MySQL · 功能分析 · MySQL表定義緩存
- 數據庫內核月報 - 2015/07
- MySQL · 引擎特性 · Innodb change buffer介紹
- MySQL · TokuDB · TokuDB Checkpoint機制
- PgSQL · 特性分析 · 時間線解析
- PgSQL · 功能分析 · PostGIS 在 O2O應用中的優勢
- MySQL · 引擎特性 · InnoDB index lock前世今生
- MySQL · 社區動態 · MySQL內存分配支持NUMA
- MySQL · 答疑解惑 · 外鍵刪除bug分析
- MySQL · 引擎特性 · MySQL logical read-ahead
- MySQL · 功能介紹 · binlog拉取速度的控制
- MySQL · 答疑解惑 · 浮點型的顯示問題
- 數據庫內核月報 - 2015/06
- MySQL · 引擎特性 · InnoDB 崩潰恢復過程
- MySQL · 捉蟲動態 · 唯一鍵約束失效
- MySQL · 捉蟲動態 · ALTER IGNORE TABLE導致主備不一致
- MySQL · 答疑解惑 · MySQL Sort 分頁
- MySQL · 答疑解惑 · binlog event 中的 error code
- PgSQL · 功能分析 · Listen/Notify 功能
- MySQL · 捉蟲動態 · 任性的 normal shutdown
- PgSQL · 追根究底 · WAL日志空間的意外增長
- MySQL · 社區動態 · MariaDB Role 體系
- MySQL · TokuDB · TokuDB數據文件大小計算
- 數據庫內核月報 - 2015/05
- MySQL · 引擎特性 · InnoDB redo log漫游
- MySQL · 專家投稿 · MySQL數據庫SYS CPU高的可能性分析
- MySQL · 捉蟲動態 · 5.6 與 5.5 InnoDB 不兼容導致 crash
- MySQL · 答疑解惑 · InnoDB 預讀 VS Oracle 多塊讀
- PgSQL · 社區動態 · 9.5 新功能BRIN索引
- MySQL · 捉蟲動態 · MySQL DDL BUG
- MySQL · 答疑解惑 · set names 都做了什么
- MySQL · 捉蟲動態 · 臨時表操作導致主備不一致
- TokuDB · 引擎特性 · zstd壓縮算法
- MySQL · 答疑解惑 · binlog 位點刷新策略
- 數據庫內核月報 - 2015/04
- MySQL · 引擎特性 · InnoDB undo log 漫游
- TokuDB · 產品新聞 · RDS TokuDB小手冊
- PgSQL · 社區動態 · 說一說PgSQL 9.4.1中的那些安全補丁
- MySQL · 捉蟲動態 · 連接斷開導致XA事務丟失
- MySQL · 捉蟲動態 · GTID下slave_net_timeout值太小問題
- MySQL · 捉蟲動態 · Relay log 中 GTID group 完整性檢測
- MySQL · 答疑釋惑 · UPDATE交換列單表和多表的區別
- MySQL · 捉蟲動態 · 刪被引用索引導致crash
- MySQL · 答疑釋惑 · GTID下auto_position=0時數據不一致
- 數據庫內核月報 - 2015/03
- MySQL · 答疑釋惑· 并發Replace into導致的死鎖分析
- MySQL · 性能優化· 5.7.6 InnoDB page flush 優化
- MySQL · 捉蟲動態· pid file丟失問題分析
- MySQL · 答疑釋惑· using filesort VS using temporary
- MySQL · 優化限制· MySQL index_condition_pushdown
- MySQL · 捉蟲動態·DROP DATABASE外鍵約束的GTID BUG
- MySQL · 答疑釋惑· lower_case_table_names 使用問題
- PgSQL · 特性分析· Logical Decoding探索
- PgSQL · 特性分析· jsonb類型解析
- TokuDB ·引擎機制· TokuDB線程池
- 數據庫內核月報 - 2015/02
- MySQL · 性能優化· InnoDB buffer pool flush策略漫談
- MySQL · 社區動態· 5.6.23 InnoDB相關Bugfix
- PgSQL · 特性分析· Replication Slot
- PgSQL · 特性分析· pg_prewarm
- MySQL · 答疑釋惑· InnoDB丟失自增值
- MySQL · 答疑釋惑· 5.5 和 5.6 時間類型兼容問題
- MySQL · 捉蟲動態· 變量修改導致binlog錯誤
- MariaDB · 特性分析· 表/表空間加密
- MariaDB · 特性分析· Per-query variables
- TokuDB · 特性分析· 日志詳解
- 數據庫內核月報 - 2015/01
- MySQL · 性能優化· Group Commit優化
- MySQL · 新增特性· DDL fast fail
- MySQL · 性能優化· 啟用GTID場景的性能問題及優化
- MySQL · 捉蟲動態· InnoDB自增列重復值問題
- MySQL · 優化改進· 復制性能改進過程
- MySQL · 談古論今· key分區算法演變分析
- MySQL · 捉蟲動態· mysql client crash一例
- MySQL · 捉蟲動態· 設置 gtid_purged 破壞AUTO_POSITION復制協議
- MySQL · 捉蟲動態· replicate filter 和 GTID 一起使用的問題
- TokuDB·特性分析· Optimize Table
- 數據庫內核月報 - 2014/12
- MySQL· 性能優化·5.7 Innodb事務系統
- MySQL· 踩過的坑·5.6 GTID 和存儲引擎那會事
- MySQL· 性能優化·thread pool 原理分析
- MySQL· 性能優化·并行復制外建約束問題
- MySQL· 答疑釋惑·binlog event有序性
- MySQL· 答疑釋惑·server_id為0的Rotate
- MySQL· 性能優化·Bulk Load for CREATE INDEX
- MySQL· 捉蟲動態·Opened tables block read only
- MySQL· 優化改進· GTID啟動優化
- TokuDB· Binary Log Group Commit with TokuDB
- 數據庫內核月報 - 2014/11
- MySQL· 捉蟲動態·OPTIMIZE 不存在的表
- MySQL· 捉蟲動態·SIGHUP 導致 binlog 寫錯
- MySQL· 5.7改進·Recovery改進
- MySQL· 5.7特性·高可用支持
- MySQL· 5.7優化·Metadata Lock子系統的優化
- MySQL· 5.7特性·在線Truncate undo log 表空間
- MySQL· 性能優化·hash_scan 算法的實現解析
- TokuDB· 版本優化· 7.5.0
- TokuDB· 引擎特性· FAST UPDATES
- MariaDB· 性能優化·filesort with small LIMIT optimization
- 數據庫內核月報 - 2014/10
- MySQL· 5.7重構·Optimizer Cost Model
- MySQL· 系統限制·text字段數
- MySQL· 捉蟲動態·binlog重放失敗
- MySQL· 捉蟲動態·從庫OOM
- MySQL· 捉蟲動態·崩潰恢復失敗
- MySQL· 功能改進·InnoDB Warmup特性
- MySQL· 文件結構·告別frm文件
- MariaDB· 新鮮特性·ANALYZE statement 語法
- TokuDB· 主備復制·Read Free Replication
- TokuDB· 引擎特性·壓縮
- 數據庫內核月報 - 2014/09
- MySQL· 捉蟲動態·GTID 和 DELAYED
- MySQL· 限制改進·GTID和升級
- MySQL· 捉蟲動態·GTID 和 binlog_checksum
- MySQL· 引擎差異·create_time in status
- MySQL· 參數故事·thread_concurrency
- MySQL· 捉蟲動態·auto_increment
- MariaDB· 性能優化·Extended Keys
- MariaDB·主備復制·CREATE OR REPLACE
- TokuDB· 參數故事·數據安全和性能
- TokuDB· HA方案·TokuDB熱備
- 數據庫內核月報 - 2014/08
- MySQL· 參數故事·timed_mutexes
- MySQL· 參數故事·innodb_flush_log_at_trx_commit
- MySQL· 捉蟲動態·Count(Distinct) ERROR
- MySQL· 捉蟲動態·mysqldump BUFFER OVERFLOW
- MySQL· 捉蟲動態·long semaphore waits
- MariaDB·分支特性·支持大于16K的InnoDB Page Size
- MariaDB·分支特性·FusionIO特性支持
- TokuDB· 性能優化·Bulk Fetch
- TokuDB· 數據結構·Fractal-Trees與LSM-Trees對比
- TokuDB·社區八卦·TokuDB團隊