創建索引 · Lucene案例開發

轉載請注明出處：[http://blog.csdn.net/xiaojimanman/article/details/42872711](http://blog.csdn.net/xiaojimanman/article/details/42872711) 從這篇博客開始，不論是API介紹還是后面的案例開發，都是基于 lucene4.3.1 這個版本，Lucene4.3.1 下載請[點擊這里](http://archive.apache.org/dist/lucene/java/4.3.1/)， Lucene其他版本下載請[點擊這里](http://archive.apache.org/dist/lucene/java/)，Lucene4.3.1官方API文檔請[點擊這里](http://lucene.apache.org/core/4_3_1/core/)。 **創建索引demo** 在開始介紹之前，先看一個簡單的索引創建demo程序： ~~~ /** *@Description: 索引創建demo */ package com.lulei.lucene.study; import java.io.File; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field.Store; import org.apache.lucene.document.TextField; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.IndexWriterConfig.OpenMode; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class IndexCreate { public static void main(String[] args) { //指定索引分詞技術，這里使用的是標準分詞 Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_43); //indexwriter 配置信息 IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_43, analyzer); //索引的打開方式，沒有索引文件就新建，有就打開 indexWriterConfig.setOpenMode(OpenMode.CREATE_OR_APPEND); Directory directory = null; IndexWriter indexWrite = null; try { //指定索引硬盤存儲路徑 directory = FSDirectory.open(new File("D://study/index/testindex")); //如果索引處于鎖定狀態，則解鎖 if (IndexWriter.isLocked(directory)){ IndexWriter.unlock(directory); } //指定所以操作對象indexWrite indexWrite = new IndexWriter(directory, indexWriterConfig); } catch (Exception e) { e.printStackTrace(); } //創建文檔一 Document doc1 = new Document(); //對name域賦值“測試標題”，存儲域值信息 doc1.add(new TextField("name", "測試標題", Store.YES)); //對content域賦值“測試標題”，存儲域值信息 doc1.add(new TextField("content", "測試內容", Store.YES)); try { //將文檔寫入到索引中 indexWrite.addDocument(doc1); } catch (Exception e) { e.printStackTrace(); } //創建文檔二 Document doc2 = new Document(); doc2.add(new TextField("name", "基于lucene的案例開發：索引數學模型", Store.YES)); doc2.add(new TextField("content", "lucene將一篇文檔分成若干個域，每個域又分成若干個詞元，通過詞元在文檔中的重要程度，將文檔轉化為N維的空間向量，通過計算兩個向量之間的夾角余弦值來計算兩個文檔的相似程度", Store.YES)); try { //將文檔寫入到索引中 indexWrite.addDocument(doc2); } catch (Exception e) { e.printStackTrace(); } //將indexWrite操作提交，如果不提交，之前的操作將不會保存到硬盤 try { //這一步很消耗系統資源，所以commit操作需要有一定的策略 indexWrite.commit(); //關閉資源 indexWrite.close(); directory.close(); } catch (Exception e) { e.printStackTrace(); } } } ~~~ 在上述的程序中，已做了詳細的注釋，對每一條語句的作用就不再介紹，下面就看一下執行這個main函數之后創建的索引文件，如下圖： ![](https://box.kancloud.cn/2016-02-22_56ca7bed79c25.jpg) 通過索引查看工具 luke 可以簡單的看下索引中的內容，如下圖： ![](https://box.kancloud.cn/2016-02-22_56ca7bed90234.jpg) ![](https://box.kancloud.cn/2016-02-22_56ca7beda979e.jpg) 從上面兩張圖，我們可以看出索引中一共有兩個文檔，content域有50個詞，name域有18個詞，索引中存儲了文檔的詳細信息。 **創建索引核心類** 在上述創建索引過程中，用到了幾個核心類：**IndexWriter**、**Directory**、**Analyzer**、**Document**、**Field**。 **IndexWriter** IndexWriter(寫索引)是索引過程中的核心組件，這個類負責創建新的索引或打開已有的索引以及向索引中添加、刪除、更新被索引的文檔信息；IndexWriter需要開辟一定空間來存儲索引，該功能可以由Directory完成。 **Directory** Directory類描述了Lucene索引的存放位置。它是一個抽象類，它的子類負責指定索引的存儲路徑，在前面的例子中，我們用的是FSDirectory.open方法來獲取真實文件在文件系統中的存儲路徑，然后將他們依次傳遞給IndexWriter類構造方法。 **Analyzer** 文檔信息在被索引之前需要經過Analyzer（分析器）處理，上述例子中使用的是標準分詞，在以后的博客中會單獨介紹各種分詞器以及使用場景。 **Document** Document對象的結構比較簡單，為一個包含多個Field對象的容器，上述事例中的文檔就包含兩個域 name、 content。 **Filed** 索引中的每一個文檔都包含一個或多個域不同命名的域，每個域都有一個域名和對應的域值以及一組選項來精確控制Lucene索引操作各個域值。在搜索時，所有域的文本就好像連接在一起，作為一個文本域來處理。上述幾個核心類在Lucene的操作中非常重要而且常用，如需要詳細了解，還請參照[官方API文檔](http://lucene.apache.org/core/4_3_1/core/)。