4.14 其他方式定義詞法規則 · 【譯】Python Lex Yacc手冊

上面的例子，詞法分析器都是在單個的Python模塊中指定的。如果你想將標記的規則放到不同的模塊，使用module關鍵字參數。例如，你可能有一個專有的模塊，包含了標記的規則： ~~~ # module: tokrules.py # This module just contains the lexing rules # List of token names. This is always required tokens = ( 'NUMBER', 'PLUS', 'MINUS', 'TIMES', 'DIVIDE', 'LPAREN', 'RPAREN', ) # Regular expression rules for simple tokens t_PLUS = r'\+' t_MINUS = r'-' t_TIMES = r'\*' t_DIVIDE = r'/' t_LPAREN = r'\(' t_RPAREN = r'\)' # A regular expression rule with some action code def t_NUMBER(t): r'\d+' t.value = int(t.value) return t # Define a rule so we can track line numbers def t_newline(t): r'\n+' t.lexer.lineno += len(t.value) # A string containing ignored characters (spaces and tabs) t_ignore = ' \t' # Error handling rule def t_error(t): print "Illegal character '%s'" % t.value[0] t.lexer.skip(1) ~~~ 現在，如果你想要從不同的模塊中構建分析器，應該這樣（在交互模式下）： ~~~ >>> import tokrules >>> lexer = lex.lex(module=tokrules) >>> lexer.input("3 + 4") >>> lexer.token() LexToken(NUMBER,3,1,1,0) >>> lexer.token() LexToken(PLUS,'+',1,2) >>> lexer.token() LexToken(NUMBER,4,1,4) >>> lexer.token() None ~~~ `module`選項也可以指定類型的實例，例如： ~~~ import ply.lex as lex class MyLexer: # List of token names. This is always required tokens = ( 'NUMBER', 'PLUS', 'MINUS', 'TIMES', 'DIVIDE', 'LPAREN', 'RPAREN', ) # Regular expression rules for simple tokens t_PLUS = r'\+' t_MINUS = r'-' t_TIMES = r'\*' t_DIVIDE = r'/' t_LPAREN = r'\(' t_RPAREN = r'\)' # A regular expression rule with some action code # Note addition of self parameter since we're in a class def t_NUMBER(self,t): r'\d+' t.value = int(t.value) return t # Define a rule so we can track line numbers def t_newline(self,t): r'\n+' t.lexer.lineno += len(t.value) # A string containing ignored characters (spaces and tabs) t_ignore = ' \t' # Error handling rule def t_error(self,t): print "Illegal character '%s'" % t.value[0] t.lexer.skip(1) # Build the lexer def build(self,**kwargs): self.lexer = lex.lex(module=self, **kwargs) # Test it output def test(self,data): self.lexer.input(data) while True: tok = lexer.token() if not tok: break print tok # Build the lexer and try it out m = MyLexer() m.build() # Build the lexer m.test("3 + 4") # Test it ~~~ 當從類中定義lexer，你需要創建類的實例，而不是類本身。這是因為，lexer的方法只有被綁定（bound-methods）對象后才能使PLY正常工作。當給lex()方法使用module選項時，PLY使用`dir()`方法，從對象中獲取符號信息，因為不能直接訪問對象的`__dict__`屬性。（譯者注：可能是因為兼容性原因，__dict__這個方法可能不存在）最后，如果你希望保持較好的封裝性，但不希望什么東西都寫在類里面，lexers可以在閉包中定義，例如： ~~~ import ply.lex as lex # List of token names. This is always required tokens = ( 'NUMBER', 'PLUS', 'MINUS', 'TIMES', 'DIVIDE', 'LPAREN', 'RPAREN', ) def MyLexer(): # Regular expression rules for simple tokens t_PLUS = r'\+' t_MINUS = r'-' t_TIMES = r'\*' t_DIVIDE = r'/' t_LPAREN = r'\(' t_RPAREN = r'\)' # A regular expression rule with some action code def t_NUMBER(t): r'\d+' t.value = int(t.value) return t # Define a rule so we can track line numbers def t_newline(t): r'\n+' t.lexer.lineno += len(t.value) # A string containing ignored characters (spaces and tabs) t_ignore = ' \t' # Error handling rule def t_error(t): print "Illegal character '%s'" % t.value[0] t.lexer.skip(1) # Build the lexer from my environment and return it return lex.lex() ~~~