多字段查詢 · Elasticsearch 5.4 中文文檔

# 多字段查詢原文鏈接 : [https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html)（修改該鏈接為官網對應的鏈接）譯文鏈接 : [http://www.le.wiki/pages/viewpage.action?pageId=4883323](http://www.le.wiki/pages/viewpage.action?pageId=4883323)（修改該鏈接為 ApacheCN 對應的譯文鏈接）貢獻者 : @羊兩頭 ## 多字段查詢 multi_match查詢基于匹配查詢且允許多字段查詢構建的： ``` GET /_search { "query": { "multi_match" : { "query": "this is a test", （1） "fields": [ "subject", "message" ] （2） } } } ``` （1）查詢字符串（2）要查詢的字段字段盒每個字段的重點都可以用通配符來指定，比如： ``` GET /_search { "query": { "multi_match" : { "query": "Will Smith", "fields": [ "title", "*_name" ] （1） } } } ``` （1）查詢title、first_name 盒 last_name字段可以使用插入符號（^）表示法來增強單個字段 ``` GET /_search { "query": { "multi_match" : { "query" : "this is a test", "fields" : [ "subject^3", "message" ] （1） } } } ``` （1）主題字段的重要性是消息字段的三倍 ## 多字段查詢的類型內部執行multi_match查詢的方式取決于type參數，可以將其設置為： best_fields： ? ? (默認) 查找與任何字段匹配的文檔，使用最佳字段中的權重。詳情參見：[`best_fields`](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-best-fields "best_fields") most_fields： ? ?查找與任何字段匹配的文檔，并組合每個字段的權重。詳情參見：[`most_fields`](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-most-fields "most_fields"). cross_fields： ? 使用相同的分析儀處理字段，就像它們是一個大字段。在任何字段中查找每個字詞，詳情參見：[`cross_fields`](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-cross-fields "cross_fields"). phrase： ? ? ? ? ? 對每個字段運行match_phrase查詢，并合并每個字段的權重，詳情參見：[`phrase`?and?`phrase_prefix`](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-phrase "phrase and phrase_prefix"). phrase_prefix：對每個字段運行match_phrase_prefix查詢，并合并每個字段的權重，詳情參見：[`phrase`?and?`phrase_prefix`](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-phrase "phrase and phrase_prefix") ## best_fields best_fields類型是非常有用的，當您搜索在同一字段中要找多個字詞時。例如，單個字段中的“棕狐”比一個字段中的“棕色”和另一個字段中的“狐貍”更有意義。 ``` GET /_search { "query": { "multi_match" : { "query": "brown fox", "type": "best_fields", "fields": [ "subject", "message" ], "tie_breaker": 0.3 } } } ``` 等價于執行： ``` GET /_search { "query": { "dis_max": { "queries": [ { "match": { "subject": "brown fox" }}, { "match": { "message": "brown fox" }} ], "tie_breaker": 0.3 } } } ``` 通常，best_fields類型使用單個最佳匹配字段的權重，但如果指定了tie_breaker，則計算分數如下： ? ? ? ?* 從最佳匹配字段得權重 ? ? ? ?* ?用于所有其他匹配字段加上`tie_breaker * _score`? 此外，如match查詢中所述，接受analyzer，boost，operator，minimum_should_match，fuzziness，lenient，prefix_length，max_expansions，rewrite，zero_terms_query和cutoff_frequency。 ``` 運算符和minimum_should_match ``` ``` best_fields和most_fields類型是以字段為中心的 - 它們為每個字段生成一個匹配查詢。這意味著運算符和minimum_should_match參數分別應用于每個字段，這可能不是您想要的 ``` ``` 以此查詢為例： ``` ``` GET /_search { "query": { "multi_match" : { "query": "Will Smith", "type": "best_fields", "fields": [ "first_name", "last_name" ], "operator": "and" （1） } } } ``` （1）所有查詢條件必須存在這個查詢可以理解為： (+first_name:will +first_name:smith) | (+last_name:will +last_name:smith) 換句話說，所有術語必須存在于單個字段中以供文檔匹配。有關更好的解決方案，請參閱cross_fields ## most_fields 當以不同方式查詢包含相同文本的多個字段時，most_fields類型最有用。例如，主字段可以包含同義詞，詞干和沒有變音符號的術語。第二字段可以包含原始術語，并且第三字段可以包含帶狀皰疹。通過組合所有三個字段的權重，我們可以將盡可能多的文檔與主字段匹配，但使用第二和第三字段將最相似的結果推送到列表的頂部查詢如下： ``` GET /_search { "query": { "multi_match" : { "query": "quick brown fox", "type": "most_fields", "fields": [ "title", "title.original", "title.shingles" ] } } } ``` 等價于執行： ``` GET /_search { "query": { "bool": { "should": [ { "match": { "title": "quick brown fox" }}, { "match": { "title.original": "quick brown fox" }}, { "match": { "title.shingles": "quick brown fox" }} ] } } } ``` 每個匹配子句的權重分加在一起，然后除以匹配子句的數量。此外，如match查詢中所述，接受analyzer，boost，operator，minimum_should_match，fuzziness，lenient，prefix_length，max_expansions，rewrite，zero_terms_query和cutoff_frequency，但請參閱operator和minimum_should_match。 ## `phrase和``phrase_prefix` phrase和phrase_prefix類型的行為與best_fields類似，但是它們使用match_phrase或match_phrase_prefix查詢，而不是匹配查詢。如下查詢： ``` GET /_search { "query": { "multi_match" : { "query": "quick brown f", "type": "phrase_prefix", "fields": [ "subject", "message" ] } } } ``` 等價于執行： ``` GET /_search { "query": { "dis_max": { "queries": [ { "match_phrase_prefix": { "subject": "quick brown f" }}, { "match_phrase_prefix": { "message": "quick brown f" }} ] } } } ``` 此外，如match查詢中所述，接受analyzer，boost，operator，minimum_should_match，fuzziness，lenient，prefix_length，max_expansions，rewrite，zero_terms_query和cutoff_frequency，但請參閱operator和minimum_should_match。 ``` phrase、phrase_prefix和fuzziness ``` ``` fuzziness參數不能與phrase或phrase_prefix一起使用 ``` ## `cross_fields` cross_fields類型對于多個字段應匹配的結構化文檔特別有用。例如，當查詢“Will Smith”的first_name和last_name字段時，最佳匹配可能在一個字段中具有“Will”，而在另一個字段中具有“Smith” 這聽起來像是most_fields的工作，但這種方法有兩個問題。第一個問題是，對每個字段應用operator和minimum_should_match，而不是per-term（參見上面的解釋）。第二個問題是關于相關性：first_name和last_name字段中不同的術語頻率可能會產生意外的結果。例如，假設我們有兩個人：“Will Smith”和“Smith Jones”。 “Smith”作為姓氏是非常普遍的（因此具有低重要性），但是“Smith”作為名字是非常罕見的（因此是非常重要的）。如果我們搜索“Will Smith”，“Smith Jones”文檔可能會出現在更匹配的“Will Smith”上面，因為first_name：smith的得分勝過了first_name：will加上last_name：smith的組合分數處理這些類型的查詢的一種方法是簡單地將first_name和last_name字段索引到單個full_name字段中。當然，這只能在索引時完成。 cross_field類型試圖通過采用以術語為中心的方法在查詢時解決這些問題。它首先將查詢字符串分析為單個術語，然后在任何字段中查找每個術語，就好像它們是一個大字段。例如如下查詢： ``` GET /_search { "query": { "multi_match" : { "query": "Will Smith", "type": "cross_fields", "fields": [ "first_name", "last_name" ], "operator": "and" } } } ``` 執行等價與： +(first_name:will last_name:will) +(first_name:smith last_name:smith) 換句話說，所有術語必須存在于至少一個字段中以供文檔匹配。（與best_fields和most_fields的邏輯進行比較。）這解決了兩個問題之一。不同項頻率的問題通過混合所有字段頻率來解決，以便平衡差異。在實踐中，first_name：smith將被視為具有與last_name：smith相同的頻率，加一。這將使得first_name和last_name上的匹配具有可比的分數，對last_name具有很小的優勢，因為它是包含smith的最可能的字段。注意，cross_fields通常只對所有的boost字段都為1的短字符串字段有用。否則boosts，term freqs和length標準化以這樣一種方式促成分數，使得術語統計的混合不再有意義了。如果您通過Validate API運行上述查詢，則返回以下解釋： +blended("will", fields: [first_name, last_name]) +blended("smith", fields: [first_name, last_name]) 此外，如match查詢中所述，接受analyzer，boost，operator，minimum_should_match，fuzziness，lenient，prefix_length，max_expansions，rewrite，zero_terms_query和cutoff_frequency，但請參閱operator和minimum_should_match。 ## `cross_field`?和 analysis cross_field類型只能在具有相同分析器的字段上以term-centric模式工作。具有相同分析器的字段在上面的示例中被分組在一起。如果有多個組，它們將與bool查詢結合使用。例如，如果我們有具有相同分析器的第一和最后一個字段，加上first.edge和last.edge，它們都使用edge_ngram分析器，查詢如下： ``` GET /_search { "query": { "multi_match" : { "query": "Jon", "type": "cross_fields", "fields": [ "first", "first.edge", "last", "last.edge" ] } } } ``` 等價與執行： blended("jon", fields: [first, last]) | ( ? ? blended("j", fields: [first.edge, last.edge]) ? ? blended("jo", fields: [first.edge, last.edge]) ? ? blended("jon", fields: [first.edge, last.edge]) ) 換句話說，第一個和最后一個將被分組在一起并被視為單個字段，first.edge和last.edge將被分組在一起并被視為單個字段。擁有多個組是很好的，但是當與operator或minimum_should_match相結合時，它可能會遇到與most_fields或best_fields相同的問題。您可以輕松地將此查詢重新編寫為兩個單獨的cross_fields查詢以及bool查詢，并將minimum_should_match參數應用于其中一個： ``` GET /_search { "query": { "bool": { "should": [ { "multi_match" : { "query": "Will Smith", "type": "cross_fields", "fields": [ "first", "last" ], "minimum_should_match": "50%" （1） } }, { "multi_match" : { "query": "Will Smith", "type": "cross_fields", "fields": [ "*.edge" ] } } ] } } } ``` （1）在第一個或最后一個字段中必須存在一個will或smith 您可以通過在查詢中指定分析器參數將所有字段強制設置到同一組中： ``` GET /_search { "query": { "multi_match" : { "query": "Jon", "type": "cross_fields", "analyzer": "standard", （1） "fields": [ "first", "last", "*.edge" ] } } } ``` （1）對所有字段使用標準分析儀等價與執行： blended("will", fields: [first, first.edge, last.edge, last]) blended("smith", fields: [first, first.edge, last.edge, last]) ## `tie_breaker` 默認情況下，每個詞匯混合查詢將使用組中任何字段返回的最佳分數，然后將這些分數加在一起以給出最終分數。 tie_breaker參數可以更改每個期間混合查詢的默認行為。它接受： 0.0 ? ? ? ? ? ? ? ? 取出單個最佳分數（例如）first_name：will和last_name：will（default） 1.0 ? ? ? ? ? ? ? ? 將（例如）first_name：will和last_name：will的分數加在一起 0.0 <n <1.0 ? ?取單個最佳分數加上tie_breaker乘以來自其他匹配字段的每個分數。 ## `cross_field`?和 fuzziness fuzziness字段不能和cross_fields類型一起使用