搜索 · Elasticsearch權威指南（中文版）

## 檢索文檔現在Elasticsearch中已經存儲了一些數據，我們可以根據業務需求開始工作了。第一個需求是能夠檢索單個員工的信息。這對于Elasticsearch來說非常簡單。我們只要執行HTTP GET請求并指出文檔的“地址”——索引、類型和ID既可。根據這三部分信息，我們就可以返回原始JSON文檔： ```Jacscript GET /megacorp/employee/1 ``` 響應的內容中包含一些文檔的元信息，John Smith的原始JSON文檔包含在`_source`字段中。 ```Javascript { "_index" : "megacorp", "_type" : "employee", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ] } } ``` >我們通過HTTP方法`GET`來檢索文檔，同樣的，我們可以使用`DELETE`方法刪除文檔，使用`HEAD`方法檢查某文檔是否存在。如果想更新已存在的文檔，我們只需再`PUT`一次。 ## 簡單搜索 `GET`請求非常簡單——你能輕松獲取你想要的文檔。讓我們來進一步嘗試一些東西，比如簡單的搜索！我們嘗試一個最簡單的搜索全部員工的請求： ```Javascript GET /megacorp/employee/_search ``` 你可以看到我們依然使用`megacorp`索引和`employee`類型，但是我們在結尾使用關鍵字`_search`來取代原來的文檔ID。響應內容的`hits`數組中包含了我們所有的三個文檔。默認情況下搜索會返回前10個結果。 ```Javascript { "took": 6, "timed_out": false, "_shards": { ... }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "megacorp", "_type": "employee", "_id": "3", "_score": 1, "_source": { "first_name": "Douglas", "last_name": "Fir", "age": 35, "about": "I like to build cabinets", "interests": [ "forestry" ] } }, { "_index": "megacorp", "_type": "employee", "_id": "1", "_score": 1, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { "_index": "megacorp", "_type": "employee", "_id": "2", "_score": 1, "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } } ``` >**注意**： >響應內容不僅會告訴我們哪些文檔被匹配到，而且這些文檔內容完整的被包含在其中—我們在給用戶展示搜索結果時需要用到的所有信息都有了。接下來，讓我們搜索姓氏中包含**“Smith”**的員工。要做到這一點，我們將在命令行中使用輕量級的搜索方法。這種方法常被稱作**查詢字符串(query string)**搜索，因為我們像傳遞URL參數一樣去傳遞查詢語句： ```Javascript GET /megacorp/employee/_search?q=last_name:Smith ``` 我們在請求中依舊使用`_search`關鍵字，然后將查詢語句傳遞給參數`q=`。這樣就可以得到所有姓氏為Smith的結果： ```Javascript { ... "hits": { "total": 2, "max_score": 0.30685282, "hits": [ { ... "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { ... "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } } ``` ## 使用DSL語句查詢查詢字符串搜索便于通過命令行完成**特定(ad hoc)**的搜索，但是它也有局限性（參閱簡單搜索章節）。Elasticsearch提供豐富且靈活的查詢語言叫做**DSL查詢(Query DSL)**,它允許你構建更加復雜、強大的查詢。 **DSL(Domain Specific Language特定領域語言)**以JSON請求體的形式出現。我們可以這樣表示之前關于“Smith”的查詢: ```Javascript GET /megacorp/employee/_search { "query" : { "match" : { "last_name" : "Smith" } } } ``` 這會返回與之前查詢相同的結果。你可以看到有些東西改變了，我們不再使用**查詢字符串(query string)**做為參數，而是使用請求體代替。這個請求體使用JSON表示，其中使用了`match`語句（查詢類型之一，具體我們以后會學到）。 ## 更復雜的搜索我們讓搜索稍微再變的復雜一些。我們依舊想要找到姓氏為“Smith”的員工，但是我們只想得到年齡大于30歲的員工。我們的語句將添加**過濾器(filter)**,它使得我們高效率的執行一個結構化搜索： ```Javascript GET /megacorp/employee/_search { "query" : { "filtered" : { "filter" : { "range" : { "age" : { "gt" : 30 } <1> } }, "query" : { "match" : { "last_name" : "smith" <2> } } } } } ``` * <1> 這部分查詢屬于**區間過濾器(range filter)**,它用于查找所有年齡大于30歲的數據——`gt`為"greater than"的縮寫。 * <2> 這部分查詢與之前的`match`**語句(query)**一致。現在不要擔心語法太多，我們將會在以后詳細的討論。你只要知道我們添加了一個**過濾器(filter)**用于執行區間搜索，然后重復利用了之前的`match`語句。現在我們的搜索結果只顯示了一個32歲且名字是“Jane Smith”的員工： ```Javascript { ... "hits": { "total": 1, "max_score": 0.30685282, "hits": [ { ... "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } } ``` ## 全文搜索到目前為止搜索都很簡單：搜索特定的名字，通過年齡篩選。讓我們嘗試一種更高級的搜索，全文搜索——一種傳統數據庫很難實現的功能。我們將會搜索所有喜歡**“rock climbing”**的員工： ```Javascript GET /megacorp/employee/_search { "query" : { "match" : { "about" : "rock climbing" } } } ``` 你可以看到我們使用了之前的`match`查詢，從`about`字段中搜索**"rock climbing"**，我們得到了兩個匹配文檔： ```Javascript { ... "hits": { "total": 2, "max_score": 0.16273327, "hits": [ { ... "_score": 0.16273327, <1> "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { ... "_score": 0.016878016, <2> "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } } ``` - <1><2> 結果相關性評分。默認情況下，Elasticsearch根據結果相關性評分來對結果集進行排序，所謂的「結果相關性評分」就是文檔與查詢條件的匹配程度。很顯然，排名第一的`John Smith`的`about`字段明確的寫到**“rock climbing”**。但是為什么`Jane Smith`也會出現在結果里呢？原因是**“rock”**在她的`abuot`字段中被提及了。因為只有**“rock”**被提及而**“climbing”**沒有，所以她的`_score`要低于John。這個例子很好的解釋了Elasticsearch如何在各種文本字段中進行全文搜索，并且返回相關性最大的結果集。**相關性(relevance)**的概念在Elasticsearch中非常重要，而這個概念在傳統關系型數據庫中是不可想象的，因為傳統數據庫對記錄的查詢只有匹配或者不匹配。 ## 短語搜索目前我們可以在字段中搜索單獨的一個詞，這挺好的，但是有時候你想要確切的匹配若干個單詞或者**短語(phrases)**。例如我們想要查詢同時包含"rock"和"climbing"（并且是相鄰的）的員工記錄。要做到這個，我們只要將`match`查詢變更為`match_phrase`查詢即可: ```Javascript GET /megacorp/employee/_search { "query" : { "match_phrase" : { "about" : "rock climbing" } } } ``` 毫無疑問，該查詢返回John Smith的文檔： ```Javascript { ... "hits": { "total": 1, "max_score": 0.23013961, "hits": [ { ... "_score": 0.23013961, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } } ] } } ``` ## 高亮我們的搜索很多應用喜歡從每個搜索結果中**高亮(highlight)**匹配到的關鍵字，這樣用戶可以知道為什么這些文檔和查詢相匹配。在Elasticsearch中高亮片段是非常容易的。讓我們在之前的語句上增加`highlight`參數： ```Javascript GET /megacorp/employee/_search { "query" : { "match_phrase" : { "about" : "rock climbing" } }, "highlight": { "fields" : { "about" : {} } } } ``` 當我們運行這個語句時，會命中與之前相同的結果，但是在返回結果中會有一個新的部分叫做`highlight`，這里包含了來自`about`字段中的文本，并且用``來標識匹配到的單詞。 ```Javascript { ... "hits": { "total": 1, "max_score": 0.23013961, "hits": [ { ... "_score": 0.23013961, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] }, "highlight": { "about": [ "I love to go rock climbing" <1> ] } } ] } } ``` - <1> 原有文本中高亮的片段你可以在高亮章節閱讀更多關于搜索高亮的部分。