（六）--- ziplist壓縮列表 · Redis源碼分析

? ? ? ? ?ziplist和之前我解析過的adlist列表名字看上去的很像，但是作用卻完全不同。之前的adlist主要針對的是普通的數據鏈表操作。而今天的ziplist指的是壓縮鏈表，為什么叫壓縮鏈表呢，因為鏈表中我們一般常用pre，next來指明當前的結點的前一個指針或當前的結點的下一個指針，這其實是在一定程度上占據了比較多的內存空間，ziplist采用了長度的表示方法，整個ziplist其實是超級長的字符串，通過里面各個結點的長度，上一個結點的長度等信息，通過快速定位實現相關操作，而且編寫者，在長度上也做了動態分配字節的方法，表示長度，避免了一定的內存耗費，比如一個結點的字符串長度每個都很短，而你使用好幾個字節表示字符串的長度，顯然造成大量浪費，所以在長度表示方面，ziplist 就做到了壓縮，也體現了壓縮的性能。ziplist 用在什么地方呢，ziplist 就是用在我們平常最常用的一個命令rpush,lpush等這些往鏈表添加數據的方法，這些數據就是存在ziplist 中的。之后我們會看到相應的實現方法。 ? ?在學習ziplist的開始，一定要理解他的結構，關于這一點，必須花一定時間想想，要不然不太容易明白人家的設計。下面是我的理解，幫助大家理解： ~~~ /* The ziplist is a specially encoded dually linked list that is designed * to be very memory efficient. It stores both strings and integer values, * where integers are encoded as actual integers instead of a series of * characters. It allows push and pop operations on either side of the list * in O(1) time. However, because every operation requires a reallocation of * the memory used by the ziplist, the actual complexity is related to the * amount of memory used by the ziplist. * * ziplist是一個編碼后的列表，特殊的設計使得內存操作非常有效率，此列表可以同時存放 * 字符串和整數類型，列表可以在頭尾各邊支持推加和彈出操作在O(1)常量時間,但是，因為每次 * 操作設計到內存的重新分配釋放，所以加大了操作的復雜性 * ---------------------------------------------------------------------------- * * ziplist的結構組成： * ZIPLIST OVERALL LAYOUT: * The general layout of the ziplist is as follows: * <zlbytes><zltail><zllen><entry><entry><zlend> * * <zlbytes> is an unsigned integer to hold the number of bytes that the * ziplist occupies. This value needs to be stored to be able to resize the * entire structure without the need to traverse it first. * <zipbytes>代表著ziplist占有的字節數，這方便當重新調整大小的時候不需要重新從頭遍歷 * * <zltail> is the offset to the last entry in the list. This allows a pop * operation on the far side of the list without the need for full traversal. * <zltail>記錄了最后一個entry的位置在列表中，可以方便快速在列表末尾彈出操作 * * <zllen> is the number of entries.When this value is larger than 2**16-2, * we need to traverse the entire list to know how many items it holds. * <zllen>記錄的是ziplist里面entry數據結點的總數 * * <zlend> is a single byte special value, equal to 255, which indicates the * end of the list. * <zlend>代表的是結束標識別，用單字節表示，值是255，就是11111111 * * ZIPLIST ENTRIES: * Every entry in the ziplist is prefixed by a header that contains two pieces * of information. First, the length of the previous entry is stored to be * able to traverse the list from back to front. Second, the encoding with an * optional string length of the entry itself is stored. * 每個entry數據結點主要包含2部分信息，第一個，上一個結點的長度，主要就可以可以從任意結點從后往前遍歷整個列表 * 第二個，編碼字符串的方式的類型保存 * * The length of the previous entry is encoded in the following way: * If this length is smaller than 254 bytes, it will only consume a single * byte that takes the length as value. When the length is greater than or * equal to 254, it will consume 5 bytes. The first byte is set to 254 to * indicate a larger value is following. The remaining 4 bytes take the * length of the previous entry as value. * 之前的數據結點的字符串長度的長度少于254個字節，他將消耗單個字節，一個字節8位，最大可表示長度為2的8次方 * 當字符串的長度大于254個字節，則用5個字節表示，第一個字節被設置成254,其余的4個字節占據的長度為之前的數據結點的長度 * * The other header field of the entry itself depends on the contents of the * entry. When the entry is a string, the first 2 bits of this header will hold * the type of encoding used to store the length of the string, followed by the * actual length of the string. When the entry is an integer the first 2 bits * are both set to 1. The following 2 bits are used to specify what kind of * integer will be stored after this header. An overview of the different * types and encodings is as follows: * 頭部信息中的另一個值記錄著編碼的方式，當編碼的是字符串，頭部的前2位為00,01,10共3種 * 如果編碼的是整型數字的時候，則頭部的前2位為11,代表的是整數編碼，后面2位代表什么類型整型值將會在頭部后面被編碼 * 00-int16_t, 01-int32_t, 10-int64_t, 11-24 bit signed，還有比較特殊的2個，11111110-8 bit signed, * 1111 0000 - 1111 1101,代表的是整型值0-12,頭尾都已經存在，都不能使用，與傳統的通過固定的指針表示長度，這么做的好處實現 * 可以更合理的分配內存 * * String字符串編碼的3種形式 * |00pppppp| - 1 byte * String value with length less than or equal to 63 bytes (6 bits). * |01pppppp|qqqqqqqq| - 2 bytes * String value with length less than or equal to 16383 bytes (14 bits). * |10______|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes * String value with length greater than or equal to 16384 bytes. * |11000000| - 1 byte * Integer encoded as int16_t (2 bytes). * |11010000| - 1 byte * Integer encoded as int32_t (4 bytes). * |11100000| - 1 byte * Integer encoded as int64_t (8 bytes). * |11110000| - 1 byte * Integer encoded as 24 bit signed (3 bytes). * |11111110| - 1 byte * Integer encoded as 8 bit signed (1 byte). * |1111xxxx| - (with xxxx between 0000 and 1101) immediate 4 bit integer. * Unsigned integer from 0 to 12. The encoded value is actually from * 1 to 13 because 0000 and 1111 can not be used, so 1 should be * subtracted from the encoded 4 bit value to obtain the right value. * |11111111| - End of ziplist. * * All the integers are represented in little endian byte order. * * ---------------------------------------------------------------------------- ~~~ 希望大家能仔細反復閱讀，理解作者的設計思路，下面給出的他的實際結構體的定義： ~~~ /* 實際存放數據的數據結點 */ typedef struct zlentry { //prevrawlen為上一個數據結點的長度，prevrawlensize為記錄該長度數值所需要的字節數 unsigned int prevrawlensize, prevrawlen; //len為當前數據結點的長度，lensize表示表示當前長度表示所需的字節數 unsigned int lensize, len; //數據結點的頭部信息長度的字節數 unsigned int headersize; //編碼的方式 unsigned char encoding; //數據結點的數據(已包含頭部等信息)，以字符串形式保存 unsigned char *p; } zlentry; /* <zlentry>的結構圖線表示 <pre_node_len>(上一結點的長度信息)<node_encode>(本結點的編碼方式和編碼數據的長度信息)<node>(本結點的編碼數據) */ ~~~ 我們看一下里面比較核心的操作，插入操作，里面涉及指針的各種來回移動，這些都是內存地址的調整： ~~~ /* Insert item at "p". */ /* 插入操作的實現 */ static unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) { size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen; unsigned int prevlensize, prevlen = 0; size_t offset; int nextdiff = 0; unsigned char encoding = 0; long long value = 123456789; /* initialized to avoid warning. Using a value that is easy to see if for some reason we use it uninitialized. */ zlentry tail; /* Find out prevlen for the entry that is inserted. */ //尋找插入的位置 if (p[0] != ZIP_END) { //定位到指定位置 ZIP_DECODE_PREVLEN(p, prevlensize, prevlen); } else { //如果插入的位置是尾結點，直接定位到尾結點，看第一個字節的就可以判斷 unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl); if (ptail[0] != ZIP_END) { prevlen = zipRawEntryLength(ptail); } } /* See if the entry can be encoded */ if (zipTryEncoding(s,slen,&value,&encoding)) { /* 'encoding' is set to the appropriate integer encoding */ reqlen = zipIntSize(encoding); } else { /* 'encoding' is untouched, however zipEncodeLength will use the * string length to figure out how to encode it. */ reqlen = slen; } /* We need space for both the length of the previous entry and * the length of the payload. */ reqlen += zipPrevEncodeLength(NULL,prevlen); reqlen += zipEncodeLength(NULL,encoding,slen); /* When the insert position is not equal to the tail, we need to * make sure that the next entry can hold this entry's length in * its prevlen field. */ nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0; /* Store offset because a realloc may change the address of zl. */ //調整大小，為新結點的插入預留空間 offset = p-zl; zl = ziplistResize(zl,curlen+reqlen+nextdiff); p = zl+offset; /* Apply memory move when necessary and update tail offset. */ if (p[0] != ZIP_END) { /* Subtract one because of the ZIP_END bytes */ //如果插入的位置不是尾結點，則挪動位置 memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff); /* Encode this entry's raw length in the next entry. */ zipPrevEncodeLength(p+reqlen,reqlen); /* Update offset for tail */ ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen); /* When the tail contains more than one entry, we need to take * "nextdiff" in account as well. Otherwise, a change in the * size of prevlen doesn't have an effect on the *tail* offset. */ tail = zipEntry(p+reqlen); if (p[reqlen+tail.headersize+tail.len] != ZIP_END) { ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff); } } else { //如果是尾結點，直接設置新尾結點 /* This element will be the new tail. */ ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl); } /* When nextdiff != 0, the raw length of the next entry has changed, so * we need to cascade the update throughout the ziplist */ if (nextdiff != 0) { offset = p-zl; zl = __ziplistCascadeUpdate(zl,p+reqlen); p = zl+offset; } /* Write the entry */ //寫入新的數據結點信息 p += zipPrevEncodeLength(p,prevlen); p += zipEncodeLength(p,encoding,slen); if (ZIP_IS_STR(encoding)) { memcpy(p,s,slen); } else { zipSaveInteger(p,value,encoding); } //更新列表的長度加1 ZIPLIST_INCR_LENGTH(zl,1); return zl; } ~~~ 下面是刪除操作： ~~~ /* Delete "num" entries, starting at "p". Returns pointer to the ziplist. */ /* 刪除方法涉及p指針的滑動，后面的地址內容都需要滑動 */ static unsigned char *__ziplistDelete(unsigned char *zl, unsigned char *p, unsigned int num) { unsigned int i, totlen, deleted = 0; size_t offset; int nextdiff = 0; zlentry first, tail; first = zipEntry(p); for (i = 0; p[0] != ZIP_END && i < num; i++) { p += zipRawEntryLength(p); deleted++; } totlen = p-first.p; if (totlen > 0) { if (p[0] != ZIP_END) { /* Storing `prevrawlen` in this entry may increase or decrease the * number of bytes required compare to the current `prevrawlen`. * There always is room to store this, because it was previously * stored by an entry that is now being deleted. */ nextdiff = zipPrevLenByteDiff(p,first.prevrawlen); p -= nextdiff; zipPrevEncodeLength(p,first.prevrawlen); /* Update offset for tail */ ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))-totlen); /* When the tail contains more than one entry, we need to take * "nextdiff" in account as well. Otherwise, a change in the * size of prevlen doesn't have an effect on the *tail* offset. */ tail = zipEntry(p); if (p[tail.headersize+tail.len] != ZIP_END) { ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff); } /* Move tail to the front of the ziplist */ memmove(first.p,p, intrev32ifbe(ZIPLIST_BYTES(zl))-(p-zl)-1); } else { /* The entire tail was deleted. No need to move memory. */ ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe((first.p-zl)-first.prevrawlen); } /* Resize and update length */ //調整列表大小 offset = first.p-zl; zl = ziplistResize(zl, intrev32ifbe(ZIPLIST_BYTES(zl))-totlen+nextdiff); ZIPLIST_INCR_LENGTH(zl,-deleted); p = zl+offset; /* When nextdiff != 0, the raw length of the next entry has changed, so * we need to cascade the update throughout the ziplist */ if (nextdiff != 0) zl = __ziplistCascadeUpdate(zl,p); } return zl; } ~~~ 該方法的意思是從index索引對應的結點開始算起，刪除num個結點，這是刪除的最原始的方法，其他方法都是對此方法的包裝。下面我們看看我們在redis命令行中輸入的lpush或rpush調用的是什么方法呢？調用的形式： ~~~ zl = ziplistPush(zl, (unsigned char*)"foo", 3, ZIPLIST_TAIL); zl = ziplistPush(zl, (unsigned char*)"quux", 4, ZIPLIST_TAIL); zl = ziplistPush(zl, (unsigned char*)"hello", 5, ZIPLIST_HEAD); ~~~ ~~~ /* 在列表2邊插入數據的方法 */ unsigned char *ziplistPush(unsigned char *zl, unsigned char *s, unsigned int slen, int where) { unsigned char *p; //這里開始直接定位 p = (where == ZIPLIST_HEAD) ? ZIPLIST_ENTRY_HEAD(zl) : ZIPLIST_ENTRY_END(zl); //組后調用插入數據的insert方法 return __ziplistInsert(zl,p,s,slen); } ~~~ 到最后還是調用了insert方法。在寫之前看了一些別人分析的ziplist分析，感覺有些說的的都很粗略，還是自己仔細過一遍心里會清楚很多，建議大家多多閱讀源碼。每個人側重點都是不一樣的。最后給出頭文件和比較關鍵的宏定義： ~~~ /* zip列表的末尾值 */ #define ZIP_END 255 /* zip列表的最大長度 */ #define ZIP_BIGLEN 254 /* Different encoding/length possibilities */ /* 不同的編碼 */ #define ZIP_STR_MASK 0xc0 #define ZIP_INT_MASK 0x30 #define ZIP_STR_06B (0 << 6) #define ZIP_STR_14B (1 << 6) #define ZIP_STR_32B (2 << 6) #define ZIP_INT_16B (0xc0 | 0<<4) #define ZIP_INT_32B (0xc0 | 1<<4) #define ZIP_INT_64B (0xc0 | 2<<4) #define ZIP_INT_24B (0xc0 | 3<<4) #define ZIP_INT_8B 0xfe /* 4 bit integer immediate encoding */ #define ZIP_INT_IMM_MASK 0x0f //后續的好多運算都需要與掩碼進行位運算 #define ZIP_INT_IMM_MIN 0xf1 /* 11110001 */ #define ZIP_INT_IMM_MAX 0xfd /* 11111101 */ //最大值不能為11111111，這跟最末尾的結點重復了 #define ZIP_INT_IMM_VAL(v) (v & ZIP_INT_IMM_MASK) #define INT24_MAX 0x7fffff #define INT24_MIN (-INT24_MAX - 1) /* Macro to determine type */ #define ZIP_IS_STR(enc) (((enc) & ZIP_STR_MASK) < ZIP_STR_MASK) /* Utility macros */ /* 下面是一些用來到時能夠直接定位的數值偏移量 */ #define ZIPLIST_BYTES(zl) (*((uint32_t*)(zl))) #define ZIPLIST_TAIL_OFFSET(zl) (*((uint32_t*)((zl)+sizeof(uint32_t)))) #define ZIPLIST_LENGTH(zl) (*((uint16_t*)((zl)+sizeof(uint32_t)*2))) #define ZIPLIST_HEADER_SIZE (sizeof(uint32_t)*2+sizeof(uint16_t)) #define ZIPLIST_ENTRY_HEAD(zl) ((zl)+ZIPLIST_HEADER_SIZE) #define ZIPLIST_ENTRY_TAIL(zl) ((zl)+intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))) #define ZIPLIST_ENTRY_END(zl) ((zl)+intrev32ifbe(ZIPLIST_BYTES(zl))-1) ~~~ .h文件： ~~~ /* * Copyright (c) 2009-2012, Pieter Noordhuis <pcnoordhuis at gmail dot com> * Copyright (c) 2009-2012, Salvatore Sanfilippo <antirez at gmail dot com> * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * * Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of Redis nor the names of its contributors may be used * to endorse or promote products derived from this software without * specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. */ /* 標記列表頭節點和尾結點的標識 */ #define ZIPLIST_HEAD 0 #define ZIPLIST_TAIL 1 unsigned char *ziplistNew(void); //創建新列表 unsigned char *ziplistPush(unsigned char *zl, unsigned char *s, unsigned int slen, int where); //像列表中推入數據 unsigned char *ziplistIndex(unsigned char *zl, int index); //索引定位到列表的某個位置 unsigned char *ziplistNext(unsigned char *zl, unsigned char *p); //獲取當前列表位置的下一個值 unsigned char *ziplistPrev(unsigned char *zl, unsigned char *p); //獲取當期列表位置的前一個值 unsigned int ziplistGet(unsigned char *p, unsigned char **sval, unsigned int *slen, long long *lval); //獲取列表的信息 unsigned char *ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen); //向列表中插入數據 unsigned char *ziplistDelete(unsigned char *zl, unsigned char **p); //列表中刪除某個結點 unsigned char *ziplistDeleteRange(unsigned char *zl, unsigned int index, unsigned int num); //從index索引對應的結點開始算起，刪除num個結點 unsigned int ziplistCompare(unsigned char *p, unsigned char *s, unsigned int slen); //列表間的比較方法 unsigned char *ziplistFind(unsigned char *p, unsigned char *vstr, unsigned int vlen, unsigned int skip); //在列表中尋找某個結點 unsigned int ziplistLen(unsigned char *zl); //返回列表的長度 size_t ziplistBlobLen(unsigned char *zl); //返回列表的二進制長度，返回的是字節數 ~~~