（6）多遍翻譯的宏翻譯系統 · 編譯器架構的王者LLVM

LLVM平臺，短短幾年間，改變了眾多編程語言的走向，也催生了一大批具有特色的編程語言的出現，不愧為編譯器架構的王者，也榮獲2012年ACM軟件系統獎 —— 題記版權聲明：本文為西風逍遙游原創文章，轉載請注明出處西風世界 [http://blog.csdn.net/xfxyy_sxfancy](http://blog.csdn.net/xfxyy_sxfancy) # 多遍翻譯的宏翻譯系統上次我們討論了構建語法樹的基本模型，我們能夠利用Lex+Bison+Node,幾個組件將我們的目標語法翻譯成AST語法樹了，在第四章，我們也給出了RedApple這款實現型小編譯器的語法結構，那么我們的準備工作基于基本完成。我們在搞定了AST語法樹的構建后，需要有一種機制，能夠遍歷整棵語法樹，然后將其翻譯為LLVM的一個模塊，然后再輸出成.bc字節碼。這種機制我稱其為多趟宏翻譯系統，因為它要多次掃描整棵語法樹，每次掃描需要的部分，然后構建整個模塊。我希望能實現類似Java的語法特性，無需考慮定義順序，只要定義了，那么就能夠找到該符號。這樣我們就需要合理的掃描順序。 ### 掃描順序的確定首先，我們必須先掃描出所有的類型，因為類型的聲明很重要，沒有類型聲明，就無法構建函數。其次，我們要掃描出所有的函數，為其構建函數的聲明。最后，我們掃描出所有的函數定義，構建每個函數的函數體。這樣我們是三次掃描，無需擔心效率問題，因為前兩次掃描都是在根節點下一層，掃描的元素非常少，所以處理起來很快。 ### 待掃描的AST語法樹這是我們之前生成好的AST語法樹，結構還算清晰吧。我們能用的遍歷手段也就是上次我們實現的next指針，然后不斷的去判斷當前節點的數據，然后對應的代碼生成出來。為了能夠區分每條語句的含義，我在每個列表最前，都添加了翻譯宏的名稱，這個設計是仿照lisp做的，宏相當于是編譯器中的函數，處理元數據，然后將其翻譯成對應的內容。例如這段代碼： ~~~ void hello(int k, int g) { int y = k + g; printf("%d\n", y); if (k + g < 5) printf("right\n"); } void go(int k) { int a = 0; while (a < k) { printf("go-%d\n", a); a = a + 1; } } void print(int k) { for (int i = 0; i < 10; i = i+1) { printf("hello-%d\n",i); } } void main() { printf("hello world\n"); hello(1,2); print(9); } ~~~ 其AST語法樹如下： ~~~ Node Node String function String void String hello Node Node String set String int String k Node String set String int String g Node Node String set String int String y Node String opt2 String + ID k ID g Node String call String printf String %d ID y Node String if Node String opt2 String < Node String opt2 String + ID k ID g Int 5 Node String call String printf String right Node String function String void String go Node Node String set String int String k Node Node String set String int String a Int 0 Node String while Node String opt2 String < ID a ID k Node Node String call String printf String go-%d ID a Node String opt2 String = ID a Node String opt2 String + ID a Int 1 Node String function String void String print Node Node String set String int String k Node Node String for Node String set String int String i Int 0 Node String opt2 String < ID i Int 10 Node String opt2 String = ID i Node String opt2 String + ID i Int 1 Node Node String call String printf String hello-%d ID i Node String function String void String main Node Node Node String call String printf String hello world Node String call String hello Int 1 Int 2 Node String call String print Int 9 ~~~ ### 掃描中的上下文由于翻譯過程中，我們還需要LLVMContext變量，符號表，宏定義表等必要信息，我們還需要自己實現一個上下文類，來存儲必要的信息，上下文類需要在第一遍掃描前就初始化好。例如我們在翻譯中，遇到了一個變量，那么該變量是臨時的還是全局的呢？是什么類型，都需要我們在符號表中存儲表達，另外當前翻譯的語句是屬于哪條宏，該怎么翻譯？我們必須有一個類來保存這些信息。于是我們先不談實現，將接口寫出來 ~~~ class CodeGenContext; typedef Value* (*CodeGenFunction)(CodeGenContext*, Node*); typedef struct _funcReg { const char* name; CodeGenFunction func; } FuncReg; class CodeGenContext { public: CodeGenContext(Node* node); ~CodeGenContext(); // 必要的初始化方法 void PreInit(); void PreTypeInit(); void Init(); void MakeBegin() { MacroMake(root); } // 這個函數是用來一條條翻譯Node宏的 Value* MacroMake(Node* node); // 遞歸翻譯該節點下的所有宏 void MacroMakeAll(Node* node); CodeGenFunction getMacro(string& str); // C++注冊宏 // void AddMacros(const FuncReg* macro_funcs); // 為只添加不替換保留 void AddOrReplaceMacros(const FuncReg* macro_funcs); // 代碼塊棧的相關操作 BasicBlock* getNowBlock(); BasicBlock* createBlock(); BasicBlock* createBlock(Function* f); // 獲取當前模塊中已注冊的函數 Function* getFunction(Node* node); Function* getFunction(std::string& name); void nowFunction(Function* _nowFunc); void setModule(Module* pM) { M = pM; } Module* getModule() { return M; } void setContext(LLVMContext* pC) { Context = pC; } LLVMContext* getContext() { return Context; } // 類型的定義和查找 void DefType(string name, Type* t); Type* FindType(string& name); Type* FindType(Node*); void SaveMacros(); void RecoverMacros(); bool isSave() { return _save; } void setIsSave(bool save) { _save = save; } id* FindST(Node* node) const; id* FindST(string& str) const { return st->find(str); } IDTable* st; private: // 語法樹根節點 Node* root; // 當前的LLVM Module Module* M; LLVMContext* Context; Function* nowFunc; BasicBlock* nowBlock; // 這是用來查找是否有該宏定義的 map<string, CodeGenFunction> macro_map; // 這個棧是用來臨時保存上面的查詢表的 stack<map<string, CodeGenFunction> > macro_save_stack; void setNormalType(); // 用來記錄當前是讀取還是存入狀態 bool _save; }; ~~~ ### 宏的注冊宏是內部的非常重要的函數，本身是一個C函數指針，宏有唯一的名字，通過map表，去查找該宏對應的函數，然后調用其對當前的語法節點進行解析。宏函數的定義： ~~~ typedef Value* (*CodeGenFunction)(CodeGenContext*, Node*); ~~~ 注冊我是仿照lua的方式設計的，將函數指針組織成數組，然后初始化進入結構體： ~~~ extern const FuncReg macro_funcs[] = { {"function", function_macro}, {"struct", struct_macro}, {"set", set_macro}, {"call", call_macro}, {"opt2", opt2_macro}, {"for", for_macro}, {"while", while_macro}, {"if", if_macro}, {"return", return_macro}, {"new", new_macro}, {NULL, NULL} }; ~~~ 這樣寫是為了方便我們一次就導入一批函數進入我們的系統。函數指針我還是習慣使用C指針，一般避免使用C++的成員指針，那樣太復雜，而且不容易和其他模塊鏈接，因為C++是沒有標準ABI的，但C語言有。 ### 實現掃描的引導掃描其實很簡單了，如果當前節點是個字符串，而且在宏定義中能夠找到，那么我們就調用這條宏來處理，否則如果是列表的化，就對每一條分別遞歸處理。宏的查找我直接使用了stl模版庫中的map和string，非常的方便。 ~~~ Value* CodeGenContext::MacroMake(Node* node) { if (node == NULL) return NULL; if (node->isStringNode()) { StringNode* str_node = (StringNode*)node; CodeGenFunction func = getMacro(str_node->getStr()); if (func != NULL) { return func(this, node->getNext()); } return NULL; } if (node->getChild() != NULL && node->getChild()->isStringNode()) return MacroMake(node->getChild()); Value* ans; for (Node* p = node->getChild(); p != NULL; p = p->getNext()) ans = MacroMake(p); return ans; } CodeGenFunction CodeGenContext::getMacro(string& str) { auto func = macro_map.find(str); if (func != macro_map.end()) return func->second; else return NULL; } ~~~ 就這樣，我們可以引導宏翻譯了，那多遍翻譯是如何實現的呢？其實很簡單，使用宏注冊函數將當前的宏替換就好了，重新執行翻譯引導，不就是多遍翻譯了？