<!-- 秀川 -->
### 組合查詢
在《組合過濾》中我們討論了怎樣用布爾過濾器組合多個用`and`, `or`, and `not`邏輯組成的過濾子句,在查詢中, 布爾查詢充當著相似的作用,但是有一個重要的區別。
過濾器會做一個判斷: 是否應該將文檔添加到結果集? 然而查詢會做更精細的判斷. 他們不僅決定一個文檔是否要添加到結果集,而且還要計算文檔的相關性(_relevant_).
像過濾器一樣, 布爾查詢接受多個用`must`, `must_not`, and `should`的查詢子句. 例:
```Javascript
GET /my_index/my_type/_search
{
"query": {
"bool": {
"must": { "match": { "title": "quick" }},
"must_not": { "match": { "title": "lazy" }},
"should": [
{ "match": { "title": "brown" }},
{ "match": { "title": "dog" }}
]
}
}
}
```
在前面的查詢中,凡是滿足`title`字段中包含`quick`,但是不包含`lazy`的文檔都會在查詢結果中。到目前為止,布爾查詢的作用非常類似于布爾過濾的作用。
當`should`過濾器中有兩個子句時不同的地方就體現出來了,下面例子就可以體現:一個文檔不需要同時包含`brown`和`dog`,但如果同時有這兩個詞,這個文檔的相關性就更高:
```Javascript
{
"hits": [
{
"_id": "3",
"_score": 0.70134366, <1>
"_source": {
"title": "The quick brown fox jumps over the quick dog"
}
},
{
"_id": "1",
"_score": 0.3312608,
"_source": {
"title": "The quick brown fox"
}
}
]
}
```
<1> 文檔3的得分更高,是因為它同時包含了`brown` 和 `dog`。
####得分計算
布爾查詢通過把所有符合`must` 和 `should`的子句得分加起來,然后除以`must` 和 `should`子句的總數為每個文檔計算相關性得分。
`must_not`子句并不影響得分;他們存在的意義是排除已經被包含的文檔。
#### 精度控制
所有的 `must` 子句必須匹配, 并且所有的 `must_not` 子句必須不匹配, 但是多少 `should` 子句應該匹配呢? 默認的,不需要匹配任何 `should` 子句,一種情況例外:如果沒有`must`子句,就必須至少匹配一個`should`子句。
像我們控制`match`查詢的精度一樣,我們也可以通過`minimum_should_match`參數控制多少`should`子句需要被匹配,這個參數可以是正整數,也可以是百分比。
```Javascript
GET /my_index/my_type/_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "brown" }},
{ "match": { "title": "fox" }},
{ "match": { "title": "dog" }}
],
"minimum_should_match": 2 <1>
}
}
}
```
<1> 這也可以用百分比表示
結果集僅包含`title`字段中有`"brown"
和 "fox"`, `"brown" 和 "dog"`, 或 `"fox" 和 "dog"`的文檔。如果一個文檔包含上述三個條件,那么它的相關性就會比其他僅包含三者中的兩個條件的文檔要高。
<!--
[[bool-query]]
=== Combining Queries
In <<combining-filters>> we discussed how to((("full text search", "combining queries"))), use the `bool` filter to combine
multiple filter clauses with `and`, `or`, and `not` logic. In query land, the
`bool` query does a similar job but with one important difference.
Filters make a binary decision: should this document be included in the
results list or not? Queries, however, are more subtle. They decide not only
whether to include a document, but also how _relevant_ that document is.
Like the filter equivalent, the `bool` query accepts((("bool query"))) multiple query clauses
under the `must`, `must_not`, and `should` parameters. For instance:
[source,js]
--------------------------------------------------
GET /my_index/my_type/_search
{
"query": {
"bool": {
"must": { "match": { "title": "quick" }},
"must_not": { "match": { "title": "lazy" }},
"should": [
{ "match": { "title": "brown" }},
{ "match": { "title": "dog" }}
]
}
}
}
--------------------------------------------------
// SENSE: 100_Full_Text_Search/15_Bool_query.json
The results from the preceding query include any document whose `title` field
contains the term `quick`, except for those that also contain `lazy`. So
far, this is pretty similar to how the `bool` filter works.
The difference comes in with the two `should` clauses, which say that: a document
is _not required_ to contain ((("should clause", "in bool queries")))either `brown` or `dog`, but if it does, then
it should be considered _more relevant_:
[source,js]
--------------------------------------------------
{
"hits": [
{
"_id": "3",
"_score": 0.70134366, <1>
"_source": {
"title": "The quick brown fox jumps over the quick dog"
}
},
{
"_id": "1",
"_score": 0.3312608,
"_source": {
"title": "The quick brown fox"
}
}
]
}
--------------------------------------------------
<1> Document 3 scores higher because it contains both `brown` and `dog`.
==== Score Calculation
The `bool` query calculates((("relevance scores", "calculation in bool queries")))((("bool query", "score calculation"))) the relevance `_score` for each document by adding
together the `_score` from all of the matching `must` and `should` clauses,
and then dividing by the total number of `must` and `should` clauses.
The `must_not` clauses do not affect ((("must_not clause", "in bool queries")))the score; their only purpose is to
exclude documents that might otherwise have been included.
==== Controlling Precision
All the `must` clauses must match, and all the `must_not` clauses must not
match, but how many `should` clauses((("bool query", "controlling precision")))((("full text search", "combining queries", "controlling precision")))((("precision", "controlling for bool query"))) should match? By default, none of the `should` clauses are required to match, with one
exception: if there are no `must` clauses, then at least one `should` clause
must match.
Just as we can control the <<match-precision,precision of the `match` query>>,
we can control how many `should` clauses need to match by using the
`minimum_should_match` parameter,((("minimum_should_match parameter", "in bool queries"))) either as an absolute number or as a
percentage:
[source,js]
--------------------------------------------------
GET /my_index/my_type/_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "brown" }},
{ "match": { "title": "fox" }},
{ "match": { "title": "dog" }}
],
"minimum_should_match": 2 <1>
}
}
}
--------------------------------------------------
// SENSE: 100_Full_Text_Search/15_Bool_query.json
<1> This could also be expressed as a percentage.
The results would include only documents whose `title` field contains `"brown"
AND "fox"`, `"brown" AND "dog"`, or `"fox" AND "dog"`. If a document contains
all three, it would be considered more relevant than those that contain
just two of the three.
-->
- Introduction
- 入門
- 是什么
- 安裝
- API
- 文檔
- 索引
- 搜索
- 聚合
- 小結
- 分布式
- 結語
- 分布式集群
- 空集群
- 集群健康
- 添加索引
- 故障轉移
- 橫向擴展
- 更多擴展
- 應對故障
- 數據
- 文檔
- 索引
- 獲取
- 存在
- 更新
- 創建
- 刪除
- 版本控制
- 局部更新
- Mget
- 批量
- 結語
- 分布式增刪改查
- 路由
- 分片交互
- 新建、索引和刪除
- 檢索
- 局部更新
- 批量請求
- 批量格式
- 搜索
- 空搜索
- 多索引和多類型
- 分頁
- 查詢字符串
- 映射和分析
- 數據類型差異
- 確切值對決全文
- 倒排索引
- 分析
- 映射
- 復合類型
- 結構化查詢
- 請求體查詢
- 結構化查詢
- 查詢與過濾
- 重要的查詢子句
- 過濾查詢
- 驗證查詢
- 結語
- 排序
- 排序
- 字符串排序
- 相關性
- 字段數據
- 分布式搜索
- 查詢階段
- 取回階段
- 搜索選項
- 掃描和滾屏
- 索引管理
- 創建刪除
- 設置
- 配置分析器
- 自定義分析器
- 映射
- 根對象
- 元數據中的source字段
- 元數據中的all字段
- 元數據中的ID字段
- 動態映射
- 自定義動態映射
- 默認映射
- 重建索引
- 別名
- 深入分片
- 使文本可以被搜索
- 動態索引
- 近實時搜索
- 持久化變更
- 合并段
- 結構化搜索
- 查詢準確值
- 組合過濾
- 查詢多個準確值
- 包含,而不是相等
- 范圍
- 處理 Null 值
- 緩存
- 過濾順序
- 全文搜索
- 匹配查詢
- 多詞查詢
- 組合查詢
- 布爾匹配
- 增加子句
- 控制分析
- 關聯失效
- 多字段搜索
- 多重查詢字符串
- 單一查詢字符串
- 最佳字段
- 最佳字段查詢調優
- 多重匹配查詢
- 最多字段查詢
- 跨字段對象查詢
- 以字段為中心查詢
- 全字段查詢
- 跨字段查詢
- 精確查詢
- 模糊匹配
- Phrase matching
- Slop
- Multi value fields
- Scoring
- Relevance
- Performance
- Shingles
- Partial_Matching
- Postcodes
- Prefix query
- Wildcard Regexp
- Match phrase prefix
- Index time
- Ngram intro
- Search as you type
- Compound words
- Relevance
- Scoring theory
- Practical scoring
- Query time boosting
- Query scoring
- Not quite not
- Ignoring TFIDF
- Function score query
- Popularity
- Boosting filtered subsets
- Random scoring
- Decay functions
- Pluggable similarities
- Conclusion
- Language intro
- Intro
- Using
- Configuring
- Language pitfalls
- One language per doc
- One language per field
- Mixed language fields
- Conclusion
- Identifying words
- Intro
- Standard analyzer
- Standard tokenizer
- ICU plugin
- ICU tokenizer
- Tidying text
- Token normalization
- Intro
- Lowercasing
- Removing diacritics
- Unicode world
- Case folding
- Character folding
- Sorting and collations
- Stemming
- Intro
- Algorithmic stemmers
- Dictionary stemmers
- Hunspell stemmer
- Choosing a stemmer
- Controlling stemming
- Stemming in situ
- Stopwords
- Intro
- Using stopwords
- Stopwords and performance
- Divide and conquer
- Phrase queries
- Common grams
- Relevance
- Synonyms
- Intro
- Using synonyms
- Synonym formats
- Expand contract
- Analysis chain
- Multi word synonyms
- Symbol synonyms
- Fuzzy matching
- Intro
- Fuzziness
- Fuzzy query
- Fuzzy match query
- Scoring fuzziness
- Phonetic matching
- Aggregations
- overview
- circuit breaker fd settings
- filtering
- facets
- docvalues
- eager
- breadth vs depth
- Conclusion
- concepts buckets
- basic example
- add metric
- nested bucket
- extra metrics
- bucket metric list
- histogram
- date histogram
- scope
- filtering
- sorting ordering
- approx intro
- cardinality
- percentiles
- sigterms intro
- sigterms
- fielddata
- analyzed vs not
- 地理坐標點
- 地理坐標點
- 通過地理坐標點過濾
- 地理坐標盒模型過濾器
- 地理距離過濾器
- 緩存地理位置過濾器
- 減少內存占用
- 按距離排序
- Geohashe
- Geohashe
- Geohashe映射
- Geohash單元過濾器
- 地理位置聚合
- 地理位置聚合
- 按距離聚合
- Geohash單元聚合器
- 范圍(邊界)聚合器
- 地理形狀
- 地理形狀
- 映射地理形狀
- 索引地理形狀
- 查詢地理形狀
- 在查詢中使用已索引的形狀
- 地理形狀的過濾與緩存
- 關系
- 關系
- 應用級別的Join操作
- 扁平化你的數據
- Top hits
- Concurrency
- Concurrency solutions
- 嵌套
- 嵌套對象
- 嵌套映射
- 嵌套查詢
- 嵌套排序
- 嵌套集合
- Parent Child
- Parent child
- Indexing parent child
- Has child
- Has parent
- Children agg
- Grandparents
- Practical considerations
- Scaling
- Shard
- Overallocation
- Kagillion shards
- Capacity planning
- Replica shards
- Multiple indices
- Index per timeframe
- Index templates
- Retiring data
- Index per user
- Shared index
- Faking it
- One big user
- Scale is not infinite
- Cluster Admin
- Marvel
- Health
- Node stats
- Other stats
- Deployment
- hardware
- other
- config
- dont touch
- heap
- file descriptors
- conclusion
- cluster settings
- Post Deployment
- dynamic settings
- logging
- indexing perf
- rolling restart
- backup
- restore
- conclusion