分析器 · JAVA

[TOC] # 簡介默認使用的是標準分析器StandardAnalyzer 查詢分析器效果使用Analyzer對象的tokenStream方法返回一個TokenStream對象.詞對象中包含了最終分詞結果. 實現步驟 1. 創建一個Analyzer對象,StandardAnalyzer對象 2. 使用分析器對象的tokenStream方法獲得一個TokenStream對象 3. 向TokenStream對象中設置一個引用,相當于一個指針 4. 調用while循環遍歷TokenStream對象 5. 關閉TokenStream對象 # 標準分析器StandardAnalyzer ~~~ @Test public void testTokenStream() throws IOException { //1. 創建一個Analyzer對象,StandardAnalyzer對象 StandardAnalyzer analyzer = new StandardAnalyzer(); //2. 使用分析器對象的tokenStream方法獲得一個TokenStream對象, 域名稱,分析的文本內容 TokenStream tokenStream = analyzer.tokenStream("", "數據庫like查詢和全文檢索的區別"); //3. 向TokenStream對象中設置一個引用,相當于一個指針 CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class); //4. 調用TokenStream對象的rest方法.如果不調用拋異常 tokenStream.reset(); //5. 調用while循環遍歷TokenStream對象 while (tokenStream.incrementToken()) { System.out.println(charTermAttribute.toString()); } //6. 關閉TokenStream對象 tokenStream.close(); /** * 輸出 * 數 * 據 * 庫 * like * 查 * 詢 * 和 * 全 * 文 * 檢 * 索 * 的 * 區 * 別 */ } ~~~ # ik中文分詞 IKAnalyze的使用方法 1. 把IKAnalyzer的jar包添加到工程中 2. 把配置文件和擴展詞典添加到工程的classpath下注意：hotword.dic和ext\_stopword.dic文件的格式為UTF-8，注意是無BOM?的UTF-8編碼。也就是說禁止使用windows記事本編輯擴展詞典文件擴展詞典: 添加一些新詞停用詞詞典: 無意義的詞或者敏感詞匯使用方法：第一步：把jar包添加到工程中第二步：把配置文件和擴展詞典和停用詞詞典添加到classpath下注意：hotword.dic和ext\_stopword.dic文件的格式為UTF-8，注意是無BOM?的UTF-8編碼。 xml可以配置 ~~~ <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> <comment>IK Analyzer 擴展配置</comment>  <entry key="ext_dict">hotword.dic;</entry>  <entry key="ext_stopwords">stopword.dic;</entry> </properties> ~~~ ~~~ @Test public void testTokenStream() throws IOException { //1. 創建一個分詞對象 IKAnalyzer analyzer = new IKAnalyzer(); //2. 使用分析器對象的tokenStream方法獲得一個TokenStream對象, 域名稱,分析的文本內容 TokenStream tokenStream = analyzer.tokenStream("", "1.2.1全文檢索"); //3. 向TokenStream對象中設置一個引用,相當于一個指針 CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class); //4. 調用TokenStream對象的rest方法.如果不調用拋異常 tokenStream.reset(); //5. 調用while循環遍歷TokenStream對象 while (tokenStream.incrementToken()) { System.out.println(charTermAttribute.toString()); } //6. 關閉TokenStream對象 tokenStream.close(); } ~~~ 擴展字段 ~~~ 文 ~~~ 停用詞典 ~~~ 1.2 ~~~ 輸出 ~~~ 加載擴展詞典：hotword.dic 加載擴展停止詞典：stopword.dic 1.2.1 全文文檢索 ~~~ # 創建索引使用ik分詞 IndexWriterConfig那指定ik ~~~ @Test public void createIndex() throws IOException { //1. 創建一個Director對象,指定索引庫保存的位置 //把索引保存到磁盤 Directory directory = FSDirectory.open(new File("/Users/jdxia/Desktop/study/studylucene/lucene-first/index").toPath()); //2. 基于Directory對象創建建一個IndexWriter對象, IndexWriter對象默認使用的不是ik IndexWriterConfig config = new IndexWriterConfig(new IKAnalyzer()); IndexWriter indexWriter = new IndexWriter(directory, config); //3. 讀取磁盤上的文件,對應每個文件創建一個文檔對象 File dir = new File("/Users/jdxia/Desktop/study/studylucene/lucene-first/search"); File[] files = dir.listFiles(); for (File f : files) { //取文件名 String fileName = f.getName(); //文件的路徑 String filePath = f.getPath(); String fileContent = FileUtils.readFileToString(f, "utf-8"); //文件的大小 long fileSize = FileUtils.sizeOf(f); //創建Field //參數1: 域的名稱, 參數2: 域的內容, 參數3: 是否存儲 TextField fieldName = new TextField("name", fileName, Field.Store.YES); TextField fieldPath = new TextField("path", filePath, Field.Store.YES); TextField fieldContent = new TextField("content", fileContent, Field.Store.YES); TextField fieldSize = new TextField("size", fileSize + "", Field.Store.YES); //創建文檔對象 Document document = new Document(); //向文檔對象中添加域 document.add(fieldName); document.add(fieldPath); document.add(fieldContent); document.add(fieldSize); //把文檔對象寫入索引庫 indexWriter.addDocument(document); } //關閉indexWriter對象 indexWriter.close(); } ~~~