##查詢階段
在初始化_查詢階段_(_query phase_),查詢被向索引中的每個分片副本(原本或副本)廣播。每個分片在本地執行搜索并且建立了匹配document的_優先隊列_(_priority queue_)。
> ####優先隊列
> 一個_優先隊列_(_priority queue_ is)只是一個存有_前n個_(_top-n_)匹配document的有序列表。這個優先隊列的大小由分頁參數from和size決定。例如,下面這個例子中的搜索請求要求優先隊列要能夠容納100個document
``` JavaScript
GET /_search
{
"from": 90,
"size": 10
}
```
這個查詢的過程被描述在圖分布式搜索查詢階段中。

圖1 分布式搜索查詢階段
查詢階段包含以下三步:
1.客戶端發送一個`search(搜索)`請求給`Node 3`,`Node 3`創建了一個長度為`from+size`的空優先級隊列。
2.`Node 3` 轉發這個搜索請求到索引中每個分片的原本或副本。每個分片在本地執行這個查詢并且結果將結果到一個大小為`from+size`的有序本地優先隊列里去。
3.每個分片返回document的ID和它優先隊列里的所有document的排序值給協調節點`Node 3`。`Node 3`把這些值合并到自己的優先隊列里產生全局排序結果。
當一個搜索請求被發送到一個節點Node,這個節點就變成了協調節點。這個節點的工作是向所有相關的分片廣播搜索請求并且把它們的響應整合成一個全局的有序結果集。這個結果集會被返回給客戶端。
第一步是向索引里的每個節點的分片副本廣播請求。就像document的`GET`請求一樣,搜索請求可以被每個分片的原本或任意副本處理。這就是更多的副本(當結合更多的硬件時)如何提高搜索的吞吐量的方法。對于后續請求,協調節點會輪詢所有的分片副本以分攤負載。
每一個分片在本地執行查詢和建立一個長度為`from+size`的有序優先隊列——這個長度意味著它自己的結果數量就足夠滿足全局的請求要求。分片返回一個輕量級的結果列表給協調節點。只包含documentID值和排序需要用到的值,例如`_score`。
協調節點將這些分片級的結果合并到自己的有序優先隊列里。這個就代表了最終的全局有序結果集。到這里,查詢階段結束。
整個過程類似于歸并排序算法,先分組排序再歸并到一起,對于這種分布式場景非常適用。
> ###注意
> 一個索引可以由一個或多個原始分片組成,所以一個對于單個索引的搜索請求也需要能夠把來自多個分片的結果組合起來。一個對于
_多(multiple)_或_全部(all)_索引的搜索的工作機制和這完全一致——僅僅是多了一些分片而已。
<!--
=== Query Phase
During the initial _query phase_, the((("distributed search execution", "query phase")))((("query phase of distributed search"))) query is broadcast to a shard copy (a
primary or replica shard) of every shard in the index. Each shard executes
the search locally and ((("priority queue")))builds a _priority queue_ of matching documents.
.Priority Queue
****
A _priority queue_ is just a sorted list that holds the _top-n_ matching
documents. The size of the priority queue depends on the pagination
parameters `from` and `size`. For example, the following search request
would require a priority queue big enough to hold 100 documents:
[source,js]
--------------------------------------------------
GET /_search
{
"from": 90,
"size": 10
}
--------------------------------------------------
****
The query phase process is depicted in <<img-distrib-search>>.
[[img-distrib-search]]
.Query phase of distributed search
image::images/elas_0901.png["Query phase of distributed search"]
The query phase consists of the following three steps:
1. The client sends a `search` request to `Node 3`, which creates an empty
priority queue of size `from + size`.
2. `Node 3` forwards the search request to a primary or replica copy of every
shard in the index. Each shard executes the query locally and adds the
results into a local sorted priority queue of size `from + size`.
3. Each shard returns the doc IDs and sort values of all the docs in its
priority queue to the coordinating node, `Node 3`, which merges these
values into its own priority queue to produce a globally sorted list of
results.
When a search request is sent to a node, that node becomes the coordinating
node.((("nodes", "coordinating node for search requests"))) It is the job of this node to broadcast the search request to all
involved shards, and to gather their responses into a globally sorted result
set that it can return to the client.
The first step is to broadcast the request to a shard copy of every node in
the index. Just like <<distrib-read,document `GET` requests>>, search requests
can be handled by a primary shard or by any of its replicas.((("shards", "handling search requests"))) This is how more
replicas (when combined with more hardware) can increase search throughput.
A coordinating node will round-robin through all shard copies on subsequent
requests in order to spread the load.
Each shard executes the query locally and builds a sorted priority queue of
length `from + size`—in other words, enough results to satisfy the global
search request all by itself. It returns a lightweight list of results to the
coordinating node, which contains just the doc IDs and any values required for
sorting, such as the `_score`.
The coordinating node merges these shard-level results into its own sorted
priority queue, which represents the globally sorted result set. Here the query
phase ends.
[NOTE]
====
An index can consist of one or more primary shards,((("indices", "multi-index search"))) so a search request
against a single index needs to be able to combine the results from multiple
shards. A search against _multiple_ or _all_ indices works in exactly the same
way--there are just more shards involved.
====
-->
- Introduction
- 入門
- 是什么
- 安裝
- API
- 文檔
- 索引
- 搜索
- 聚合
- 小結
- 分布式
- 結語
- 分布式集群
- 空集群
- 集群健康
- 添加索引
- 故障轉移
- 橫向擴展
- 更多擴展
- 應對故障
- 數據
- 文檔
- 索引
- 獲取
- 存在
- 更新
- 創建
- 刪除
- 版本控制
- 局部更新
- Mget
- 批量
- 結語
- 分布式增刪改查
- 路由
- 分片交互
- 新建、索引和刪除
- 檢索
- 局部更新
- 批量請求
- 批量格式
- 搜索
- 空搜索
- 多索引和多類型
- 分頁
- 查詢字符串
- 映射和分析
- 數據類型差異
- 確切值對決全文
- 倒排索引
- 分析
- 映射
- 復合類型
- 結構化查詢
- 請求體查詢
- 結構化查詢
- 查詢與過濾
- 重要的查詢子句
- 過濾查詢
- 驗證查詢
- 結語
- 排序
- 排序
- 字符串排序
- 相關性
- 字段數據
- 分布式搜索
- 查詢階段
- 取回階段
- 搜索選項
- 掃描和滾屏
- 索引管理
- 創建刪除
- 設置
- 配置分析器
- 自定義分析器
- 映射
- 根對象
- 元數據中的source字段
- 元數據中的all字段
- 元數據中的ID字段
- 動態映射
- 自定義動態映射
- 默認映射
- 重建索引
- 別名
- 深入分片
- 使文本可以被搜索
- 動態索引
- 近實時搜索
- 持久化變更
- 合并段
- 結構化搜索
- 查詢準確值
- 組合過濾
- 查詢多個準確值
- 包含,而不是相等
- 范圍
- 處理 Null 值
- 緩存
- 過濾順序
- 全文搜索
- 匹配查詢
- 多詞查詢
- 組合查詢
- 布爾匹配
- 增加子句
- 控制分析
- 關聯失效
- 多字段搜索
- 多重查詢字符串
- 單一查詢字符串
- 最佳字段
- 最佳字段查詢調優
- 多重匹配查詢
- 最多字段查詢
- 跨字段對象查詢
- 以字段為中心查詢
- 全字段查詢
- 跨字段查詢
- 精確查詢
- 模糊匹配
- Phrase matching
- Slop
- Multi value fields
- Scoring
- Relevance
- Performance
- Shingles
- Partial_Matching
- Postcodes
- Prefix query
- Wildcard Regexp
- Match phrase prefix
- Index time
- Ngram intro
- Search as you type
- Compound words
- Relevance
- Scoring theory
- Practical scoring
- Query time boosting
- Query scoring
- Not quite not
- Ignoring TFIDF
- Function score query
- Popularity
- Boosting filtered subsets
- Random scoring
- Decay functions
- Pluggable similarities
- Conclusion
- Language intro
- Intro
- Using
- Configuring
- Language pitfalls
- One language per doc
- One language per field
- Mixed language fields
- Conclusion
- Identifying words
- Intro
- Standard analyzer
- Standard tokenizer
- ICU plugin
- ICU tokenizer
- Tidying text
- Token normalization
- Intro
- Lowercasing
- Removing diacritics
- Unicode world
- Case folding
- Character folding
- Sorting and collations
- Stemming
- Intro
- Algorithmic stemmers
- Dictionary stemmers
- Hunspell stemmer
- Choosing a stemmer
- Controlling stemming
- Stemming in situ
- Stopwords
- Intro
- Using stopwords
- Stopwords and performance
- Divide and conquer
- Phrase queries
- Common grams
- Relevance
- Synonyms
- Intro
- Using synonyms
- Synonym formats
- Expand contract
- Analysis chain
- Multi word synonyms
- Symbol synonyms
- Fuzzy matching
- Intro
- Fuzziness
- Fuzzy query
- Fuzzy match query
- Scoring fuzziness
- Phonetic matching
- Aggregations
- overview
- circuit breaker fd settings
- filtering
- facets
- docvalues
- eager
- breadth vs depth
- Conclusion
- concepts buckets
- basic example
- add metric
- nested bucket
- extra metrics
- bucket metric list
- histogram
- date histogram
- scope
- filtering
- sorting ordering
- approx intro
- cardinality
- percentiles
- sigterms intro
- sigterms
- fielddata
- analyzed vs not
- 地理坐標點
- 地理坐標點
- 通過地理坐標點過濾
- 地理坐標盒模型過濾器
- 地理距離過濾器
- 緩存地理位置過濾器
- 減少內存占用
- 按距離排序
- Geohashe
- Geohashe
- Geohashe映射
- Geohash單元過濾器
- 地理位置聚合
- 地理位置聚合
- 按距離聚合
- Geohash單元聚合器
- 范圍(邊界)聚合器
- 地理形狀
- 地理形狀
- 映射地理形狀
- 索引地理形狀
- 查詢地理形狀
- 在查詢中使用已索引的形狀
- 地理形狀的過濾與緩存
- 關系
- 關系
- 應用級別的Join操作
- 扁平化你的數據
- Top hits
- Concurrency
- Concurrency solutions
- 嵌套
- 嵌套對象
- 嵌套映射
- 嵌套查詢
- 嵌套排序
- 嵌套集合
- Parent Child
- Parent child
- Indexing parent child
- Has child
- Has parent
- Children agg
- Grandparents
- Practical considerations
- Scaling
- Shard
- Overallocation
- Kagillion shards
- Capacity planning
- Replica shards
- Multiple indices
- Index per timeframe
- Index templates
- Retiring data
- Index per user
- Shared index
- Faking it
- One big user
- Scale is not infinite
- Cluster Admin
- Marvel
- Health
- Node stats
- Other stats
- Deployment
- hardware
- other
- config
- dont touch
- heap
- file descriptors
- conclusion
- cluster settings
- Post Deployment
- dynamic settings
- logging
- indexing perf
- rolling restart
- backup
- restore
- conclusion