第七章：查詢基礎解析（一） · elasticsearch6.x學習筆記

## **elasticsearch 查詢** es中的查詢請求有兩種方式，一種是簡易版的查詢，另外一種是使用JSON完整的請求體，叫做結構化查詢（DSL）。由于DSL查詢更為直觀也更為簡易，所以大都使用這種方式。DSL查詢是POST過去一個json，由于post的請求是json格式的，所以存在很多靈活性，也有很多形式。這里有一個地方注意的是官方文檔里面給的例子的json結構只是一部分，并不是可以直接黏貼復制進去使用的。一般要在外面加個query為key的機構。 ## **路由查詢** 官方文檔地址：[https://www.elastic.co/guide/en/elasticsearch/reference/current/search-uri-request.html]() ![](https://box.kancloud.cn/8d8ea3e39e5b61c5dadf8967b5ec7037_620x203.png) 通過url query參數來實現搜索，常用參數如下： * q：指定查詢的語句； * df：df指定要查詢的字段； * sort：排序； * timeout：指定過期時間； * form,size：用于分頁例如： ``` #查詢user字段含有alfred的文檔，結果按照age升序排列，返回5~14個文檔，如果超過1s沒有結束，則已超時結束 GET /my_index/_search?q=alfred&df=user&sort=age:asc&from=4&size=10&timeout=1s ``` ## **Request body search** 通過body參數來實現搜索。 ### **(1) match查詢** match查詢也叫模糊查詢。matcha查詢會先對搜索詞進行分詞，分詞完畢后再逐個對分詞結果進行匹配。match還有兩個相似的功能，一個是match_phrase，一個叫multi_match。例子： ``` #創建索引以及準備數據 PUT my_index { "mappings": { "_doc": { "dynamic":"strict", "properties": { "title" : { "type":"text" }, "name":{ "type" : "keyword" } } } } } PUT my_index/_doc/1 { "name" : "張三", "title" : "我的寶馬有222馬力" } PUT my_index/_doc/2 { "name" : "李四", "title" : "我的奧迪有220馬力" } PUT my_index/_doc/3 { "name" : "王五", "title" : "我的瑪莎拉蒂有250馬力" } #match查詢 POST my_index/_doc/_search { "query": { "match": { "title": "寶馬瑪力" } }, "highlight":{ "pre_tags":"<tag1>", "post_tags" : "</tag1>", "fields":{"title":{}} } } #返回結果 { "took": 17, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 3, "max_score": 0.970927, "hits": [ { "_index": "my_index", "_type": "_doc", "_id": "1", "_score": 0.970927, "_source": { "name": "張三", "title": "我的寶馬有222馬力" }, "highlight": { "title": [ "我的<tag1>寶</tag1><tag1>馬</tag1>有222<tag1>馬</tag1><tag1>力</tag1>" ] } }, { "_index": "my_index", "_type": "_doc", "_id": "3", "_score": 0.8630463, "_source": { "name": "王五", "title": "我的瑪莎拉蒂有250馬力" }, "highlight": { "title": [ "我的<tag1>瑪</tag1>莎拉蒂有250<tag1>馬</tag1><tag1>力</tag1>" ] } }, { "_index": "my_index", "_type": "_doc", "_id": "2", "_score": 0.5753642, "_source": { "name": "李四", "title": "我的奧迪有220馬力" }, "highlight": { "title": [ "我的奧迪有220<tag1>馬</tag1><tag1>力</tag1>" ] } } ] } } ``` 說明：match查詢會將查詢詞“寶馬瑪力”分解成一個一個詞語，“寶”，“馬”，“瑪”，“力”再去匹配，返回查詢結果 ### **(2) match_phrase查詢（短語匹配）** 和match查詢類似，match_phrase查詢首先解析查詢字符串來產生一個詞條列表。然后會搜索所有的詞條，但只保留包含了所有搜索詞條的文檔，并且詞條的位置要鄰接。簡單理解就是必須含有搜索詞的所有詞根，沒做限制則還要毗鄰。 ``` #增加多一條數據 PUT my_index/_doc/5 { "name" : "陳六", "title" : "我的寶瑪有250馬力" } #查詢 POST my_index/_doc/_search { "query": { "match_phrase": { "title": "寶馬瑪力" } }, "highlight":{ "pre_tags":"<h1>", "post_tags" : "</h1>", "fields":{"title":{}} } } #返回結果 { "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } } ``` 說明：因為沒有文檔含有搜索詞的所有詞條且毗鄰。完全匹配可能比較嚴，我們會希望有個可調節因子，少匹配一個也滿足，那就需要使用到slop。例如： ``` #添加多兩條數據 PUT my_index/_doc/6 { "name" : "陳六", "title" : "我的寶馬的瑪力有250馬力" } PUT my_index/_doc/7 { "name" : "陳六", "title" : "我的寶馬的李瑪力有250馬力" } #查詢 POST my_index/_doc/_search { "query": { "match_phrase": { "title": { "query":"寶馬瑪力", "slop" : 1 } } }, "highlight":{ "pre_tags":"<h1>", "post_tags" : "</h1>", "fields":{"title":{}} } } #返回結果 { "took": 8, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 1.0359334, "hits": [ { "_index": "my_index", "_type": "_doc", "_id": "6", "_score": 1.0359334, "_source": { "name": "陳六", "title": "我的寶馬的瑪力有250馬力" }, "highlight": { "title": [ "我的<h1>寶</h1><h1>馬</h1>的<h1>瑪</h1><h1>力</h1>有250馬力" ] } } ] } } ``` 說明："寶馬的瑪力"我的寶馬的瑪力有250馬力"含有所以查詢詞條，且位置差一個 ### **(2) multi_match查詢** 如果我們希望兩個字段進行匹配，其中一個字段有這個文檔就滿足的話，使用multi_match ``` #增加多一條數據 PUT my_index/_doc/9 { "name" : "瑪力", "title" : "我有一輛紅旗" } #查詢 POST my_index/_doc/_search { "query": { "multi_match": { "query":"瑪力", "fields":["title","name"] } }, "highlight":{ "pre_tags":"<h1>", "post_tags" : "</h1>", "fields":{"title":{}} } } #結果 { "took": 6, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 0.2876821, "hits": [ { "_index": "my_index", "_type": "_doc", "_id": "9", "_score": 0.2876821, "_source": { "name": "瑪力", "title": "我有一輛紅旗" } }, { "_index": "my_index", "_type": "_doc", "_id": "1", "_score": 0.2876821, "_source": { "name": "張三", "title": "我的寶馬有222馬力" }, "highlight": { "title": [ "我的寶馬有222馬<h1>力</h1>" ] } } ] } } ``` 但是multi_match就涉及到匹配評分的問題 * 我們希望完全匹配的文檔占的評分比較高，則需要使用best_fields * 我們希望越多字段匹配的文檔評分越高，就要使用most_fields * 我們會希望這個詞條的分詞詞匯是分配到不同字段中的，那么就使用cross_fields ``` POST my_index/_doc/_search { "query": { "multi_match": { "query":"瑪力", "fields":["title","name"], "type" : "best_fields" } }, "highlight":{ "pre_tags":"<h1>", "post_tags" : "</h1>", "fields":{"title":{}} } } ``` ## **term查詢** term是代表完全匹配，即不進行分詞器分析，文檔中必須包含整個搜索的詞匯使用term要確定的是這個字段是否“被分析”(analyzed)，默認的字符串是被分析的。 ``` DELETE my_index PUT my_index { "mappings": { "_doc": { "dynamic":"strict", "properties": { "title" : { "type":"text" }, "name":{ "type" : "keyword" } } } } } PUT my_index/_doc/1 { "name" : "張三", "title" : "我的寶馬有222馬力" } #查詢 POST my_index/_doc/_search { "query": { "term": { "title":"寶馬" } } } #返回結果 { "took": 19, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } } ``` **因為"title"字段的類型為"text"是被分析的，即拆詞保存。沒有直接保存"寶馬"。所以不能被搜索出來** ``` POST my_index/_doc/_search { "query": { "term": { "name":"張三" } } } #返回結果 { "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 0.2876821, "hits": [ { "_index": "my_index", "_type": "_doc", "_id": "1", "_score": 0.2876821, "_source": { "name": "張三", "title": "我的寶馬有222馬力" } } ] } } ``` **而"name"字段的類型為：keyword，不拆詞，直接保存，所以能被檢索出來** 說明：當希望字段類型"text"的中文也能被"term"檢索出來，則使用"ik_max_word" ``` DELETE my_index PUT my_index { "mappings": { "_doc": { "dynamic":"strict", "properties": { "title" : { "type":"text", "analyzer":"ik_max_word" }, "name":{ "type" : "keyword" } } } } } PUT my_index/_doc/1 { "name" : "張三", "title" : "我的寶馬有222馬力" } #搜索 POST my_index/_doc/_search { "query": { "term": { "title":"寶馬" } } } #返回結果 { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 0.2876821, "hits": [ { "_index": "my_index", "_type": "_doc", "_id": "1", "_score": 0.2876821, "_source": { "name": "張三", "title": "我的寶馬有222馬力" } } ] } } ``` ## **bool聯合查詢: must,should,must_not** 如果我們想要請求"title"中帶"寶馬"，但是"name"中不帶"寶馬"這樣類似的需求，就需要用到bool聯合查詢。聯合查詢就會使用到must,should,must_not三種關鍵詞。這三個可以這么理解 * must: 文檔必須完全匹配條件 * should: should下面會帶一個以上的條件，至少滿足一個條件，這個文檔就符合should * must_not: 文檔必須不匹配條件 ``` PUT my_index/_doc/2 { "name" : "寶馬", "title" : "我的寶馬x5有260馬力" } PUT my_index/_doc/3 { "name" : "寶馬", "title" : "我的奧迪有260馬力" } #搜索 POST my_index/_doc/_search { "query":{ "bool":{ "must":{ "term":{ "name":"寶馬" } }, "must_not":{ "term": { "title": "寶馬" } } } } } #返回結果 { "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 0.2876821, "hits": [ { "_index": "my_index", "_type": "_doc", "_id": "3", "_score": 0.2876821, "_source": { "name": "寶馬", "title": "我的奧迪有260馬力" } } ] } } ```