##取回階段
查詢階段辨別出那些滿足搜索請求的document,但我們仍然需要取回那些document本身。這就是取回階段的工作,如圖分布式搜索的取回階段所示。

圖2 分布式搜索取回階段
分發階段由以下步驟構成:
1.協調節點辨別出哪個document需要取回,并且向相關分片發出`GET`請求。
2.每個分片加載document并且根據需要_豐富(enrich)_它們,然后再將document返回協調節點。
3.一旦所有的document都被取回,協調節點會將結果返回給客戶端。
協調節點先決定哪些document是_實際(actually)_需要取回的。例如,我們指定查詢```{ "from": 90, "size": 10 }```,那么前90條將會被丟棄,只有之后的10條會需要取回。這些document可能來自與原始查詢請求相關的某個、某些或者全部分片。
協調節點為每個持有相關document的分片建立多點get請求然后發送請求到處理查詢階段的分片副本。
分片加載document主體——`_source` field。如果需要,還會根據元數據豐富結果和高亮搜索片斷。一旦協調節點收到所有結果,會將它們匯集到單一的回答響應里,這個響應將會返回給客戶端。
###深分頁
****
查詢然后取回過程雖然支持通過使用`from`和`size`參數進行分頁,但是_要在有限范圍內(within limited)_。還記得每個分片必須構造一個長度為`from+size`的優先隊列吧,所有這些都要傳回協調節點。這意味著協調節點要通過對`分片數量 * (from + size)`個document進行排序來找到正確的`size`個document。
根據document的數量,分片的數量以及所使用的硬件,對10,000到50,000條結果(1,000到5,000頁)深分頁是可行的。但是對于足夠大的`from`值,排序過程將會變得非常繁重,會使用巨大量的CPU,內存和帶寬。因此,強烈不建議使用深分頁。
在實際中,“深分頁者”也是很少的一部人。一般人會在翻了兩三頁后就停止翻頁,并會更改搜索標準。那些不正常情況通常是機器人或者網絡爬蟲的行為。它們會持續不斷地一頁接著一頁地獲取頁面直到服務器到崩潰的邊緣。
如果你確實需要從集群里獲取大量documents,你可以通過設置搜索類型`scan`禁用排序,來高效地做這件事。這一點將在后面的章節討論。
****
<!--
=== Fetch Phase
The query phase identifies which documents satisfy((("distributed search execution", "fetch phase")))((("fetch phase of distributed search"))) the search request, but we
still need to retrieve the documents themselves. This is the job of the fetch
phase, shown in <<img-distrib-fetch>>.
[[img-distrib-fetch]]
.Fetch phase of distributed search
image::images/elas_0902.png["Fetch Phase of distributed search"]
The distributed phase consists of the following steps:
1. The coordinating node identifies which documents need to be fetched and
issues a multi `GET` request to the relevant shards.
2. Each shard loads the documents and _enriches_ them, if required, and then
returns the documents to the coordinating node.
3. Once all documents have been fetched, the coordinating node returns the
results to the client.
The coordinating node first decides which documents _actually_ need to be
fetched. For instance, if our query specified `{ "from": 90, "size": 10 }`,
the first 90 results would be discarded and only the next 10 results would
need to be retrieved. These documents may come from one, some, or all of the
shards involved in the original search request.
The coordinating node builds a <<distrib-multi-doc,multi-get request>> for
each shard that holds a pertinent document and sends the request to the same
shard copy that handled the query phase.
The shard loads the document bodies--the `_source` field--and, if
requested, enriches the results with metadata and
<<highlighting-intro,search snippet highlighting>>.
Once the coordinating node receives all results, it assembles them into a
single response that it returns to the client.
.Deep Pagination
****
The query-then-fetch process supports pagination with the `from` and `size`
parameters, but _within limits_. ((("size parameter")))((("from parameter")))((("pagination", "supported by query-then-fetch process")))((("deep paging, problems with"))) Remember that each shard must build a priority
queue of length `from + size`, all of which need to be passed back to
the coordinating node. And the coordinating node needs to sort through
`number_of_shards * (from + size)` documents in order to find the correct
`size` documents.
Depending on the size of your documents, the number of shards, and the
hardware you are using, paging 10,000 to 50,000 results (1,000 to 5,000 pages)
deep should be perfectly doable. But with big-enough `from` values, the
sorting process can become very heavy indeed, using vast amounts of CPU,
memory, and bandwidth. For this reason, we strongly advise against deep paging.
In practice, ``deep pagers'' are seldom human anyway. A human will stop
paging after two or three pages and will change the search criteria. The
culprits are usually bots or web spiders that tirelessly keep fetching page
after page until your servers crumble at the knees.
If you _do_ need to fetch large numbers of docs from your cluster, you can
do so efficiently by disabling sorting with the `scan` search type,
which we discuss <<scan-scroll,later in this chapter>>.
****
-->
- Introduction
- 入門
- 是什么
- 安裝
- API
- 文檔
- 索引
- 搜索
- 聚合
- 小結
- 分布式
- 結語
- 分布式集群
- 空集群
- 集群健康
- 添加索引
- 故障轉移
- 橫向擴展
- 更多擴展
- 應對故障
- 數據
- 文檔
- 索引
- 獲取
- 存在
- 更新
- 創建
- 刪除
- 版本控制
- 局部更新
- Mget
- 批量
- 結語
- 分布式增刪改查
- 路由
- 分片交互
- 新建、索引和刪除
- 檢索
- 局部更新
- 批量請求
- 批量格式
- 搜索
- 空搜索
- 多索引和多類型
- 分頁
- 查詢字符串
- 映射和分析
- 數據類型差異
- 確切值對決全文
- 倒排索引
- 分析
- 映射
- 復合類型
- 結構化查詢
- 請求體查詢
- 結構化查詢
- 查詢與過濾
- 重要的查詢子句
- 過濾查詢
- 驗證查詢
- 結語
- 排序
- 排序
- 字符串排序
- 相關性
- 字段數據
- 分布式搜索
- 查詢階段
- 取回階段
- 搜索選項
- 掃描和滾屏
- 索引管理
- 創建刪除
- 設置
- 配置分析器
- 自定義分析器
- 映射
- 根對象
- 元數據中的source字段
- 元數據中的all字段
- 元數據中的ID字段
- 動態映射
- 自定義動態映射
- 默認映射
- 重建索引
- 別名
- 深入分片
- 使文本可以被搜索
- 動態索引
- 近實時搜索
- 持久化變更
- 合并段
- 結構化搜索
- 查詢準確值
- 組合過濾
- 查詢多個準確值
- 包含,而不是相等
- 范圍
- 處理 Null 值
- 緩存
- 過濾順序
- 全文搜索
- 匹配查詢
- 多詞查詢
- 組合查詢
- 布爾匹配
- 增加子句
- 控制分析
- 關聯失效
- 多字段搜索
- 多重查詢字符串
- 單一查詢字符串
- 最佳字段
- 最佳字段查詢調優
- 多重匹配查詢
- 最多字段查詢
- 跨字段對象查詢
- 以字段為中心查詢
- 全字段查詢
- 跨字段查詢
- 精確查詢
- 模糊匹配
- Phrase matching
- Slop
- Multi value fields
- Scoring
- Relevance
- Performance
- Shingles
- Partial_Matching
- Postcodes
- Prefix query
- Wildcard Regexp
- Match phrase prefix
- Index time
- Ngram intro
- Search as you type
- Compound words
- Relevance
- Scoring theory
- Practical scoring
- Query time boosting
- Query scoring
- Not quite not
- Ignoring TFIDF
- Function score query
- Popularity
- Boosting filtered subsets
- Random scoring
- Decay functions
- Pluggable similarities
- Conclusion
- Language intro
- Intro
- Using
- Configuring
- Language pitfalls
- One language per doc
- One language per field
- Mixed language fields
- Conclusion
- Identifying words
- Intro
- Standard analyzer
- Standard tokenizer
- ICU plugin
- ICU tokenizer
- Tidying text
- Token normalization
- Intro
- Lowercasing
- Removing diacritics
- Unicode world
- Case folding
- Character folding
- Sorting and collations
- Stemming
- Intro
- Algorithmic stemmers
- Dictionary stemmers
- Hunspell stemmer
- Choosing a stemmer
- Controlling stemming
- Stemming in situ
- Stopwords
- Intro
- Using stopwords
- Stopwords and performance
- Divide and conquer
- Phrase queries
- Common grams
- Relevance
- Synonyms
- Intro
- Using synonyms
- Synonym formats
- Expand contract
- Analysis chain
- Multi word synonyms
- Symbol synonyms
- Fuzzy matching
- Intro
- Fuzziness
- Fuzzy query
- Fuzzy match query
- Scoring fuzziness
- Phonetic matching
- Aggregations
- overview
- circuit breaker fd settings
- filtering
- facets
- docvalues
- eager
- breadth vs depth
- Conclusion
- concepts buckets
- basic example
- add metric
- nested bucket
- extra metrics
- bucket metric list
- histogram
- date histogram
- scope
- filtering
- sorting ordering
- approx intro
- cardinality
- percentiles
- sigterms intro
- sigterms
- fielddata
- analyzed vs not
- 地理坐標點
- 地理坐標點
- 通過地理坐標點過濾
- 地理坐標盒模型過濾器
- 地理距離過濾器
- 緩存地理位置過濾器
- 減少內存占用
- 按距離排序
- Geohashe
- Geohashe
- Geohashe映射
- Geohash單元過濾器
- 地理位置聚合
- 地理位置聚合
- 按距離聚合
- Geohash單元聚合器
- 范圍(邊界)聚合器
- 地理形狀
- 地理形狀
- 映射地理形狀
- 索引地理形狀
- 查詢地理形狀
- 在查詢中使用已索引的形狀
- 地理形狀的過濾與緩存
- 關系
- 關系
- 應用級別的Join操作
- 扁平化你的數據
- Top hits
- Concurrency
- Concurrency solutions
- 嵌套
- 嵌套對象
- 嵌套映射
- 嵌套查詢
- 嵌套排序
- 嵌套集合
- Parent Child
- Parent child
- Indexing parent child
- Has child
- Has parent
- Children agg
- Grandparents
- Practical considerations
- Scaling
- Shard
- Overallocation
- Kagillion shards
- Capacity planning
- Replica shards
- Multiple indices
- Index per timeframe
- Index templates
- Retiring data
- Index per user
- Shared index
- Faking it
- One big user
- Scale is not infinite
- Cluster Admin
- Marvel
- Health
- Node stats
- Other stats
- Deployment
- hardware
- other
- config
- dont touch
- heap
- file descriptors
- conclusion
- cluster settings
- Post Deployment
- dynamic settings
- logging
- indexing perf
- rolling restart
- backup
- restore
- conclusion