sqlite虛擬機 · 深入理解sqlite

##SQLite入門與分析(七)---淺談SQLite的虛擬機寫在前面：虛擬機技術在現在是一個非常熱的技術，它的歷史也很悠久。最早的虛擬機可追溯到IBM的VM/370，到上個世紀90年代，在計算機程序設計語言領域又出現一件革命性的事情——Java語言的出現，它與c++最大的不同在于它必須在Java虛擬機上運行。Java虛擬機掀起了虛擬機技術的熱潮，隨后，Microsoft也不甘落后，雄心勃勃的推出了.Net平臺。由于在這里主要討論SQLite的虛擬機，不打算對這些做過多評論，但是作為對比，我會先對Java虛擬機作一個概述。好了，下面進入正題。 ### 1、概述所謂虛擬機是指對真實計算機資源環境的一個抽象，它為解釋性語言程序提供了一套完整的計算機接口。虛擬機的思想對現在的編譯有很大影響，其思路是先編譯成虛擬機指令，然后針對不同計算機實現該虛擬機。虛擬機定義了一組抽象的邏輯組件，這些組件包括寄存器組、數據棧和指令集等等。虛擬機指令的解釋執行包括3步： 1．獲取指令參數； 2. 執行該指令對應的功能； 3. 分派下一條指令。其中第一步和第三步構成了虛擬機的執行開銷。很多語言都采用了虛擬機作為運行環境。作為下一代計算平臺的競爭者，Sun的Java和微軟的.NET平臺都采用了虛擬機技術。Java的支撐環境是Java虛擬機（Java Virtual Machine，JVM），.NET的支撐環境是通用語言運行庫（Common Language Runtime，CLR）。JVM是典型的虛擬機架構。 Java平臺結構如圖所示。從圖中可以看出，JVM處于核心位置，它的下方是移植接口。移植接口由依賴平臺的和不依賴平臺的兩部分組成，其中依賴于平臺的部分稱為適配器。JVM通過移植接口在具體的操作系統上實現。如果在Java操作系統（Java Operation System, JOS）上實現，則不需要依賴于平臺的適配器，因為這部分工作已由JOS完成。因此對于JVM來說，操作系統和更低的硬件層是透明的。在JVM的上方，是Java類和Java應用程序接口（Java API）。在Java API上可以編寫Java應用程序和Java小程序（applet）。所以對于Java應用程序和applet這一層次來說，操作系統和硬件就更是透明的了。我們編寫的Java程序，可以在任何Java平臺上運行而無需修改。 ![document/2015-09-15/55f7c9d26a949](https://box.kancloud.cn/document_2015-09-15_55f7c9d26a949.png) JVM定義了獨立于平臺的類文件格式和字節碼形式的指令集。在任何Java程序的字節碼表示形式中，變量和方法的引用都是使用符號，而不是使用具體的數字。由于內存的布局要在運行時才確定，所以類的變量和方法的改變不會影響現存的字節碼。例如，一個Java程序引用了其他系統中的某個類，該系統中那個類的更新不會使這個Java程序崩潰。這也提高了Java的平臺獨立性。虛擬機一般都采用了基于棧的架構，這種架構易于實現。虛擬機方法顯著提高了程序語言的可移植性和安全性，但同時也導致了執行效率的下降。 ### 2、Java虛擬機 ####2.1、概述 Java虛擬機的主要任務是裝載Class文件并執行其中的字節碼。Java虛擬機包含一個類裝載器(class loader)，它從程序和API中裝載class文件，Java API中只有程序執行時需要的那些類才會被裝載，字節碼由執行引擎來執行。不同的Java虛擬機，執行引擎的實現可能不同。在軟件實現的虛擬機中，一般有幾下幾中實現方式：（1）解釋執行：實現簡單，但速度較慢，這是Java最初階段的實現方式。（2）即時編譯(just-in-time)：執行較快，但消耗內存。在這種情況下，第一次執行的字節碼會編譯成本地機器代碼，然后被緩存，以后可以重用。（3）自適應優化器：虛擬機開始的時候解釋字節碼，但是會監視程序的運行，并記錄下使用最頻繁的代碼，然后把這些代碼編譯成本地代碼，而其它的代碼仍保持為字節碼。該方法既提高的運行速度，又減少了內存開銷。同樣，虛擬機也可由硬件來實現，它用本地方法執行Java字節碼。 ![document/2015-09-15/55f7c9e80956e](https://box.kancloud.cn/document_2015-09-15_55f7c9e80956e.png) ####2.2、Java虛擬機 Java虛擬機的結構分為：類裝載子系統，運行時數據區，執行引擎，本地方法接口。其中運行時數據區又分為：方法區，堆，Java棧，PC寄存器，本地方法棧。 ![document/2015-09-15/55f7c9f8a8989](https://box.kancloud.cn/document_2015-09-15_55f7c9f8a8989.png) 關于Java虛擬機就介紹到此,由于Java虛擬機內容龐大，在這里不可能一一介紹，如果想更多了解Java虛擬機，參見《深入Java虛擬機》。 ###3、SQLite虛擬機在SQLite的后端（backend）的上一層，通常叫做虛擬數據庫引擎(virtual database engine)，或者叫做虛擬機(virtual machine)。從作用上來說，它是SQLite的核心。用戶程序發出的SQL語句請求，由前端(frontend)編譯器（以后會繼續介紹）處理，生成字節代碼程序（bytecode programs），然后由VM解釋執行。VM執行時，又會調用B-tree模塊的相關的接口，并輸出執行的結果（本節將以一個具體的查詢過程來描述這一過程）。 ####3.1、虛擬機的內部結構先來看一個簡單的例子： ~~~ int main(int argc, char **argv) { int rc, i, id, cid; char *name; char *sql; char *zErr; sqlite3 *db; sqlite3_stmt *stmt; sql="select id,name,cid from episodes"; //打開數據庫 sqlite3_open("test.db", &db); //編譯sql語句 sqlite3_prepare(db, sql, strlen(sql), &stmt, NULL); //調用VM，執行VDBE程序 rc = sqlite3_step(stmt); while(rc == SQLITE_ROW) { id = sqlite3_column_int(stmt, 0); name = (char *)sqlite3_column_text(stmt, 1); cid = sqlite3_column_int(stmt, 2); if(name != NULL){ fprintf(stderr, "Row: id=%i, cid=%i, name='%s'\n", id,cid,name); } else { /* Field is NULL */ fprintf(stderr, "Row: id=%i, cid=%i, name=NULL\n", id,cid); } rc = sqlite3_step(stmt); } //釋放資源 sqlite3_finalize(stmt); //關閉數據庫 sqlite3_close(db); return 0; } ~~~ 這段程序很簡單，它的功能就是遍歷整個表，并把查詢結果輸出。在SQLite 中，用戶發出的SQL語句，都會由編譯器生成一個虛擬機實例。在上面的例子中，變量sql代表的SQL語句經過sqlite3_prepare()處理后，便生成一個虛擬機實例——stmt。虛擬機實例從外部看到的結構是sqlite3_stmt所代表的數據結構，而在內部，是一個vdbe數據結構代表的實例。關于這點可以看看它們的定義： ~~~ //sqlite3.h typedef struct sqlite3_stmt sqlite3_stmt; vdbe的定義： //虛擬機數據結構 vdbeInt.h struct Vdbe { sqlite3 *db; /* The whole database */ Vdbe *pPrev,*pNext; /* Linked list of VDBEs with the same Vdbe.db */ FILE *trace; /* Write an execution trace here, if not NULL */ int nOp; /* Number of instructions in the program(指令的條數) */ int nOpAlloc; /* Number of slots allocated for aOp[]*/ Op *aOp; /* Space to hold the virtual machine's program(指令)*/ int nLabel; /* Number of labels used */ int nLabelAlloc; /* Number of slots allocated in aLabel[] */ int *aLabel; /* Space to hold the labels */ Mem *aStack; /* The operand stack, except string values(棧空間) */ Mem *pTos; /* Top entry in the operand stack(棧頂指針) */ Mem **apArg; /* Arguments to currently executing user function */ Mem *aColName; /* Column names to return */ int nCursor; /* Number of slots in apCsr[] */ Cursor **apCsr; /* One element of this array for each open cursor(游標數組) */ int nVar; /* Number of entries in aVar[] */ Mem *aVar; /* Values for the OP_Variable opcode*/ char **azVar; /* Name of variables */ int okVar; /* True if azVar[] has been initialized */ int magic; /* Magic number for sanity checking */ int nMem; /* Number of memory locations currently allocated */ Mem *aMem; /* The memory locations(保存臨時變量的Mem)*/ int nCallback; /* Number of callbacks invoked so far(回調的次數) */ int cacheCtr; /* Cursor row cache generation counter */ Fifo sFifo; /* A list of ROWIDs */ int contextStackTop; /* Index of top element in the context stack */ int contextStackDepth; /* The size of the "context" stack */ Context *contextStack; /* Stack used by opcodes ContextPush & ContextPop*/ int pc; /* The program counter(初始程序計數器) */ int rc; /* Value to return(返回結果) */ unsigned uniqueCnt; /* Used by OP_MakeRecord when P2!=0 */ int errorAction; /* Recovery action to do in case of an error */ int inTempTrans; /* True if temp database is transactioned */ int returnStack[100]; /* Return address stack for OP_Gosub & OP_Return */ int returnDepth; /* Next unused element in returnStack[] */ int nResColumn; /* Number of columns in one row of the result set */ char **azResColumn; /* Values for one row of result */ int popStack; /* Pop the stack this much on entry to VdbeExec()(出棧的項數) */ char *zErrMsg; /* Error message written here */ u8 resOnStack; /* True if there are result values on the stack(有結果在棧上則為真)*/ u8 explain; /* True if EXPLAIN present on SQL command */ u8 changeCntOn; /* True to update the change-counter */ u8 aborted; /* True if ROLLBACK in another VM causes an abort */ u8 expired; /* True if the VM needs to be recompiled */ u8 minWriteFileFormat; /* Minimum file format for writable database files */ int nChange; /* Number of db changes made since last reset */ i64 startTime; /* Time when query started - used for profiling */ #ifdef SQLITE_SSE int fetchId; /* Statement number used by sqlite3_fetch_statement */ int lru; /* Counter used for LRU cache replacement */ #endif }; ~~~ 由vdbe的定義，可以總結出SQLite虛擬機的內部結構： ![document/2015-09-15/55f7ca32a6568](https://box.kancloud.cn/document_2015-09-15_55f7ca32a6568.png) #### 3.2、指令 ~~~ int nOp; /* Number of instructions in the program(指令的條數) */ Op *aOp; /* Space to hold the virtual machine's program(指令)*/ ~~~ aOp數組保存有SQL經過編譯后生成的所有指令，對于上面的例子為： ~~~ 0、Goto(0x5b-91) |0|0c 1、Integer(0x2d-45) |0|0 2、OpenRead(0x0c-12)|0|2 3、SetNumColumns(0x64-100)|0|03 4、Rewind(0x77-119) |0|0a 5、Rowid(0x23-35) |0|0 6、Column(0x02-2) |0|1 7、Column(0x02-2) |0|2 8、Callback(0x36-54)|3|0 9、Next(0x68) |0|5 10、Close 11、Halt 12、Transaction(0x66-102)|0|0 13、VerifyCookie(0x61-97)|0|1 14、Goto(0x5b-91) |0|1| ~~~ sqlite3_step()引起VDBE解釋引擎執行這段代碼，下面來分析該段指令的執行過程： Goto：這是一條跳轉指令，它的作用僅僅是跳到第12條指令； Transaction：開始一個事務（讀事務）； Goto：跳到第1條指令； Integer：把操作數P1入棧，這里的0表示OpenRead指令打開的數據庫的編號； OpenRead：打開表的游標,數據庫的編號從棧頂中取得，P1為游標的編號，P2為root page。如果P2<=0,則從棧中取得root page no； SetNumColumns：對P1確定的游標的列數設置為P2（在這里為3），在OP_Column指令執行前,該指令應該被調用來設置表的列數； Rewind：移動當前游標（P1）移到表或索引的第一條記錄； Rowid：把當前游標（P1）指向的記錄的關鍵字壓入棧； Column：解析當前游標指定的記錄的數據，p1為當前游標索引號，p2為列號，并將結果壓入棧中； Callback：該指令執行后，PC將指向下一條指令。該指令的執行會結束sqlite3_step()的運行，并向其返回 SQLITE_ROW ——如果存在記錄的話；并將VDBE的PC指針指向下一條指令——即Next指令，所以當重新調用sqlite3_step()執行VDBE程序時，會執行Next指令（具體的分析見后面的指令實例分析）； Next：將游標移到下一條記錄，并將PC指向第5條指令； Close：關閉數據庫。 ### 3.3、棧 ~~~ Mem *aStack; /* The operand stack, except string values(棧空間) */ Mem *pTos; /* Top entry in the operand stack(棧頂指針) */ ~~~ aStack是VDBE執行時使用的棧，它主要用來保指令執行進需要的參數，以及指令執行時產生的中間結果(參見后面的指令實例分析)。在計算機硬件領域，基于寄存器的架構已經壓倒基于棧的架構成為當今的主流，但是在解釋性的虛擬機領域，基于棧架構的實現占了上風。 1. 從編譯的角度來看，許多編程語言可以很容易地被編譯成棧架構機器語言。如果采用寄存器架構，編譯器為了獲得好的性能必須進行優化，如全局寄存器分配（這需要對數據流進行分析）。這種復雜的優化工作使虛擬機的便捷性大打折扣。 2. 如果采用寄存器架構，虛擬機必須經常保存和恢復寄存器中的內容。與硬件計算機相比，這些操作在虛擬機中的開銷要大得多。因為每一條虛擬機指令都需要進行很費時的指令分派操作。雖然其它的指令也要分派，但是它們的語義內容更豐富。 3. 采用寄存器架構時，指令對應的操作數位于不同寄存器中，對操作數的尋址也是一個問題。而在基于棧的虛擬機中，操作數位于棧頂或緊跟在虛擬機指令之后。由于基于棧的架構的簡便性，一些查詢語言的實現也采用了此種架構。 SQLite的虛擬機就是基于棧架構的實現。每一個vdbe都有一個棧頂指針，它保存著vdbe的初始棧頂值。而在解釋引擎中也有一個pTos，它們是有區別的：（1）vdbe的pTos：在一趟vdbe執行的過程中不會變化，直到相應的指令修改它為止，在上面的例子中，Callback指令會修改其值（見指令分析）。（2）而解釋引擎中的pTos是隨著指令的執行而動態變化的,在上面的例子中,Integer,Column指令的執行都會引起解釋引擎pTos的改變。 ### 3.4、指令計數器(PC) 每一個vdbe都有一個程序計數器，用來保存初始的計數器值。和pTos一樣，解釋引擎也有一個pc，它用來指向VM下一條要執行的指令。 ###3.5、解釋引擎經過編譯器生成的vdbe最終都是由解釋引擎解釋執行的，SQLite的解釋引擎實現的原理非常簡單，本質上就是一個包含大量case語句的for循環，但是由于SQLite的指令較多（在version 3.3.6中是139條），所以代碼比較龐大。 SQLite的解釋引擎是在一個方法中實現的： ~~~ int sqlite3VdbeExec( Vdbe *p /* The VDBE */ ) ~~~ 具體代碼如下（為了閱讀，去掉了一些不影響閱讀的代碼，具體見SQLite的源碼）： ~~~ /*執行VDBE程序.當從數據庫中取出一行數據時,該函數會調用回調函數(如果有的話), **或者返回SQLITE_ROW. */ int sqlite3VdbeExec( Vdbe *p /* The VDBE */ ){ //指令計數器 int pc; /* The program counter */ //當前指令 Op *pOp; /* Current operation */ int rc = SQLITE_OK; /* Value to return */ //數據庫 sqlite3 *db = p->db; /* The database */ u8 encoding = ENC(db); /* The database encoding */ //棧頂 Mem *pTos; /* Top entry in the operand stack */ if( p->magic!=VDBE_MAGIC_RUN ) return SQLITE_MISUSE; //當前棧頂指針 pTos = p->pTos; if( p->rc==SQLITE_NOMEM ){ /* This happens if a malloc() inside a call to sqlite3_column_text() or ** sqlite3_column_text16() failed. */ goto no_mem; } p->rc = SQLITE_OK; //如果需要進行出棧操作，則進行出棧操作 if( p->popStack ){ popStack(&pTos, p->popStack); p->popStack = 0; } //表明棧中沒有結果 p->resOnStack = 0; db->busyHandler.nBusy = 0; //執行指令 for(pc=p->pc; rc==SQLITE_OK; pc++){ //取出操作碼 pOp = &p->aOp[pc]; switch( pOp->opcode ){ //跳到操作數P2指向的指令 case OP_Goto: { /* no-push */ CHECK_FOR_INTERRUPT; //設置pc pc = pOp->p2 - 1; break; } //P1入棧 case OP_Integer: { //當前棧頂指針上移 pTos++; //設為整型 pTos->flags = MEM_Int; //取操作數P1,并賦值 pTos->i = pOp->p1; break; } //其它指令的實現 }//end switch }//end for } ~~~ ###3.6、指令實例分析由于篇幅限制，僅給出幾條的指令的實現，其它具體實現見源碼。 #### 1、Callback指令 //該指令執行后,PC將指向下一條指令. //棧中棧頂的P1個值為查詢的結果.該指令會導致sqlite3_step()函數將以SQLITE_ROW為返回碼 //而結束運行.此時用戶程序就可以通過sqlite3_column_XXX讀取位于棧中的數據了. //當sqlite3_step()再一次運行時,棧頂的P1個值會在執行Next指令前自動出棧. ~~~ case OP_Callback: { /* no-push */ Mem *pMem; Mem *pFirstColumn; assert( p->nResColumn==pOp->p1 ); /* Data in the pager might be moved or changed out from under us ** in between the return from this sqlite3_step() call and the ** next call to sqlite3_step(). So deephermeralize everything on ** the stack. Note that ephemeral data is never stored in memory ** cells so we do not have to worry about them. */ pFirstColumn = &pTos[0-pOp->p1]; for(pMem = p->aStack; pMem<pFirstColumn; pMem++){ Deephemeralize(pMem); } /* Invalidate all ephemeral cursor row caches */ p->cacheCtr = (p->cacheCtr + 2)|1; /* Make sure the results of the current row are \000 terminated ** and have an assigned type. The results are deephemeralized as ** as side effect. */ for(; pMem<=pTos; pMem++ ){ sqlite3VdbeMemNulTerminate(pMem); //設置結果集中的數據類型 storeTypeInfo(pMem, encoding); } /* Set up the statement structure so that it will pop the current ** results from the stack when the statement returns. */ p->resOnStack = 1; //棧上有結果 p->nCallback++; //回調次數加1 //出棧的數據個數,在下次執行VDBE時,會先進行出棧操作 p->popStack = pOp->p1; //程序計數器加1 p->pc = pc + 1; //設置vdbe的棧頂指針,此時,棧中保存有結果 p->pTos = pTos; /*注意:這里不是break,而是return; 向sqlite3_step()返回SQLITE_ROW. **當用戶程序重新調用sqlite3_step()時,重新執行VDBE. */ return SQLITE_ROW; } ~~~ ### 2、Rewind指令 ~~~ /*移動當前游標到表或索引的第一條記錄. **如果表為空且p2>0,則跳到p2處;如果p2為0且表不空，則執行下一條指令. */ case OP_Rewind: { /* no-push */ int i = pOp->p1; Cursor *pC; BtCursor *pCrsr; int res; assert( i>=0 && i<p->nCursor ); //取得當前游標 pC = p->apCsr[i]; assert( pC!=0 ); if( (pCrsr = pC->pCursor)!=0 ){ //調用B-tree模塊,移動游標到第一條記錄 rc = sqlite3BtreeFirst(pCrsr, &res); pC->atFirst = res==0; pC->deferredMoveto = 0; pC->cacheStatus = CACHE_STALE; }else{ res = 1; } pC->nullRow = res; if( res && pOp->p2>0 ){ pc = pOp->p2 - 1; } break; } ~~~ ### 3、Column指令 ~~~ /*解析當前游標指定的記錄的數據 **p1為當前游標索引號,p2為列號 */ case OP_Column: { u32 payloadSize; /* Number of bytes in the record */ int p1 = pOp->p1; /* P1 value of the opcode */ //列號 int p2 = pOp->p2; /* column number to retrieve */ //VDBE游標 Cursor *pC = 0; /* The VDBE cursor */ char *zRec; /* Pointer to complete record-data */ //btree游標 BtCursor *pCrsr; /* The BTree cursor */ u32 *aType; /* aType[i] holds the numeric type of the i-th column */ u32 *aOffset; /* aOffset[i] is offset to start of data for i-th column */ //列數 u32 nField; /* number of fields in the record */ int len; /* The length of the serialized data for the column */ int i; /* Loop counter */ char *zData; /* Part of the record being decoded */ Mem sMem; /* For storing the record being decoded */ sMem.flags = 0; assert( p1<p->nCursor ); //棧頂指針上移 pTos++; pTos->flags = MEM_Null; /* This block sets the variable payloadSize to be the total number of ** bytes in the record. ** ** zRec is set to be the complete text of the record if it is available. ** The complete record text is always available for pseudo-tables ** If the record is stored in a cursor, the complete record text ** might be available in the pC->aRow cache. Or it might not be. ** If the data is unavailable, zRec is set to NULL. ** ** We also compute the number of columns in the record. For cursors, ** the number of columns is stored in the Cursor.nField element. For ** records on the stack, the next entry down on the stack is an integer ** which is the number of records. */ //設置游標 pC = p->apCsr[p1]; assert( pC!=0 ); if( pC->pCursor!=0 ){ /* The record is stored in a B-Tree */ //移到當前游標 rc = sqlite3VdbeCursorMoveto(pC); if( rc ) goto abort_due_to_error; zRec = 0; pCrsr = pC->pCursor; if( pC->nullRow ){ payloadSize = 0; }else if( pC->cacheStatus==p->cacheCtr ){ payloadSize = pC->payloadSize; zRec = (char*)pC->aRow; }else if( pC->isIndex ){ i64 payloadSize64; sqlite3BtreeKeySize(pCrsr, &payloadSize64); payloadSize = payloadSize64; }else{ //解析數據,payloadSize保存cell的數據字節數 sqlite3BtreeDataSize(pCrsr, &payloadSize); } nField = pC->nField; }else if( pC->pseudoTable ){ /* The record is the sole entry of a pseudo-table */ payloadSize = pC->nData; zRec = pC->pData; pC->cacheStatus = CACHE_STALE; assert( payloadSize==0 || zRec!=0 ); nField = pC->nField; pCrsr = 0; }else{ zRec = 0; payloadSize = 0; pCrsr = 0; nField = 0; } /* If payloadSize is 0, then just push a NULL onto the stack. */ if( payloadSize==0 ){ assert( pTos->flags==MEM_Null ); break; } assert( p2<nField ); /* Read and parse the table header. Store the results of the parse ** into the record header cache fields of the cursor. */ if( pC && pC->cacheStatus==p->cacheCtr ){ aType = pC->aType; aOffset = pC->aOffset; }else{ u8 *zIdx; /* Index into header */ u8 *zEndHdr; /* Pointer to first byte after the header(指向header之后的第一個字節)*/ u32 offset; /* Offset into the data */ int szHdrSz; /* Size of the header size field at start of record */ int avail; /* Number of bytes of available data */ //數據類型數組 aType = pC->aType; if( aType==0 ){ //每個數據類型分配8字節---sizeof(aType)==4 pC->aType = aType = sqliteMallocRaw( 2*nField*sizeof(aType) ); } if( aType==0 ){ goto no_mem; } //每列數據的偏移 pC->aOffset = aOffset = &aType[nField]; pC->payloadSize = payloadSize; pC->cacheStatus = p->cacheCtr; /* Figure out how many bytes are in the header */ if( zRec ){ zData = zRec; }else{ if( pC->isIndex ){ zData = (char*)sqlite3BtreeKeyFetch(pCrsr, &avail); }else{ //獲取數據 zData = (char*)sqlite3BtreeDataFetch(pCrsr, &avail); } /* If KeyFetch()/DataFetch() managed to get the entire payload, ** save the payload in the pC->aRow cache. That will save us from ** having to make additional calls to fetch the content portion of ** the record. */ if( avail>=payloadSize ){ zRec = zData; pC->aRow = (u8*)zData; }else{ pC->aRow = 0; } } assert( zRec!=0 || avail>=payloadSize || avail>=9 ); //獲得header size szHdrSz = GetVarint((u8*)zData, offset); /* The KeyFetch() or DataFetch() above are fast and will get the entire ** record header in most cases. But they will fail to get the complete ** record header if the record header does not fit on a single page ** in the B-Tree. When that happens, use sqlite3VdbeMemFromBtree() to ** acquire the complete header text. */ if( !zRec && avail<offset ){ rc = sqlite3VdbeMemFromBtree(pCrsr, 0, offset, pC->isIndex, &sMem); if( rc!=SQLITE_OK ){ goto op_column_out; } zData = sMem.z; } /* 一個記錄的例子: ** 08 | 08 |04 00 13 01 | 63 61 74 01 ** 08: nSize,payload總的大小——后面8個字節 ** 08: 關鍵字大小,對于整型則為關鍵字本身 ** 04: header size，包括本身共4個字節——04 00 13 01 ** 00: 第一列的數據類型——空類型 ** 13: 第二列的數據類型——字符串,長為(19-13)/2=3——“cat” ** 01: 第三列的數據類型——整型，占一個字節——1 ** 對于這里的zData保存的數據為:04 00 13 01 63 61 74 01 */ //header之后的數據,對于上例為:63 61 74 01 zEndHdr = (u8 *)&zData[offset]; //header數據的索引號,對于上例為:00 13 01 zIdx = (u8 *)&zData[szHdrSz]; /* Scan the header and use it to fill in the aType[] and aOffset[] ** arrays. aType[i] will contain the type integer for the i-th ** column and aOffset[i] will contain the offset from the beginning ** of the record to the start of the data for the i-th column */ /*掃描header，然后設置aType[]和aOffset[]數組; aType[i]為第i列的數據類型, **aOffset[i]為第i列數據相對于記錄的開始的偏移. */ for(i=0; i<nField; i++){ if( zIdx<zEndHdr ){ //計算每一列數據的偏移 aOffset[i] = offset; //計算每一列的數據類型 zIdx += GetVarint(zIdx, aType[i]); //offset指向下一列 offset += sqlite3VdbeSerialTypeLen(aType[i]); }else{ /* If i is less that nField, then there are less fields in this ** record than SetNumColumns indicated there are columns in the ** table. Set the offset for any extra columns not present in ** the record to 0. This tells code below to push a NULL onto the ** stack instead of deserializing a value from the record. */ aOffset[i] = 0; } } Release(&sMem); sMem.flags = MEM_Null; /* If we have read more header data than was contained in the header, ** or if the end of the last field appears to be past the end of the ** record, then we must be dealing with a corrupt database. */ if( zIdx>zEndHdr || offset>payloadSize ){ rc = SQLITE_CORRUPT_BKPT; goto op_column_out; } } /* Get the column information. If aOffset[p2] is non-zero, then ** deserialize the value from the record. If aOffset[p2] is zero, ** then there are not enough fields in the record to satisfy the ** request. In this case, set the value NULL or to P3 if P3 is ** a pointer to a Mem object. */ //獲取P2指定的列的數據 if( aOffset[p2] ){ assert( rc==SQLITE_OK ); if( zRec ){ //取得該列的數據 zData = &zRec[aOffset[p2]]; }else{ len = sqlite3VdbeSerialTypeLen(aType[p2]); rc = sqlite3VdbeMemFromBtree(pCrsr, aOffset[p2], len, pC->isIndex,&sMem); if( rc!=SQLITE_OK ){ goto op_column_out; } zData = sMem.z; } //解析zData，并將結果保存在pTos中 sqlite3VdbeSerialGet((u8*)zData, aType[p2], pTos); pTos->enc = encoding; }else{ if( pOp->p3type==P3_MEM ){ sqlite3VdbeMemShallowCopy(pTos, (Mem *)(pOp->p3), MEM_Static); }else{ pTos->flags = MEM_Null; } } /* If we dynamically allocated space to hold the data (in the ** sqlite3VdbeMemFromBtree() call above) then transfer control of that ** dynamically allocated space over to the pTos structure. ** This prevents a memory copy. */ if( (sMem.flags & MEM_Dyn)!=0 ){ assert( pTos->flags & MEM_Ephem ); assert( pTos->flags & (MEM_Str|MEM_Blob) ); assert( pTos->z==sMem.z ); assert( sMem.flags & MEM_Term ); pTos->flags &= ~MEM_Ephem; pTos->flags |= MEM_Dyn|MEM_Term; } /* pTos->z might be pointing to sMem.zShort[]. Fix that so that we ** can abandon sMem */ rc = sqlite3VdbeMemMakeWriteable(pTos); op_column_out: break; } ~~~ ### 4、Next指令 ~~~ /*移動游標，使其指向表的下一個記錄 */ case OP_Prev: /* no-push */ case OP_Next: { /* no-push */ Cursor *pC; BtCursor *pCrsr; CHECK_FOR_INTERRUPT; assert( pOp->p1>=0 && pOp->p1<p->nCursor ); pC = p->apCsr[pOp->p1]; assert( pC!=0 ); if( (pCrsr = pC->pCursor)!=0 ){ int res; if( pC->nullRow ){ res = 1; }else{ assert( pC->deferredMoveto==0 ); //調用B-tree模塊,移動游標指向下一條記錄 rc = pOp->opcode==OP_Next ? sqlite3BtreeNext(pCrsr, &res) : sqlite3BtreePrevious(pCrsr, &res); pC->nullRow = res; pC->cacheStatus = CACHE_STALE; } if( res==0 ){ pc = pOp->p2 - 1; sqlite3_search_count++; } }else{ pC->nullRow = 1; } pC->rowidIsValid = 0; break; } ~~~