### 單一查詢字符串(Single Query String)
bool查詢是多字段查詢的中流砥柱。在很多場合下它都能很好地工作,特別是當你能夠將不同的查詢字符串映射到不同的字段時。
問題在于,現在的用戶期望能夠在一個地方輸入所有的搜索詞條,然后應用能夠知道如何為他們得到正確的結果。所以當我們把含有多個字段的搜索表單稱為高級搜索(Advanced Search)時,是有一些諷刺意味的。高級搜索雖然對用戶而言會顯得更"高級",但是實際上它的實現方式更簡單。
對于多詞,多字段查詢并沒有一種萬能(one-size-fits-all)的方法。要得到最佳的結果,你需要了解你的數據以及如何使用恰當的工具。
#### 了解你的數據
當用戶的唯一輸入就是一個查詢字符串時,你會經常碰到以下三種情況:
##### 1.最佳字段(Best fields)::
當搜索代表某些概念的單詞時,例如"brown fox",幾個單詞合在一起表達出來的意思比單獨的單詞更多。類似title和body的字段,盡管它們是相關聯的,但是也是互相競爭著的。文檔在相同的字段中應該有盡可能多的單詞(譯注:搜索的目標單詞),文檔的分數應該來自擁有最佳匹配的字段。
##### 2.多數字段(Most fields)::
一個用來調優相關度的常用技術是將相同的數據索引到多個字段中,每個字段擁有自己的分析鏈(Analysis Chain)。
主要字段會含有單詞的詞干部分,同義詞和消除了變音符號的單詞。它用來盡可能多地匹配文檔。
相同的文本可以被索引到其它的字段中來提供更加精確的匹配。一個字段或許會包含未被提取成詞干的單詞,另一個字段是包含了變音符號的單詞,第三個字段則使用shingle來提供關于[單詞鄰近度(Word Proximity)](http://blog.csdn.net/dm_vincent/article/details/41800351)的信息。
以上這些額外的字段扮演者signal的角色,用來增加每個匹配的文檔的相關度分值。越多的字段被匹配則意味著文檔的相關度越高。
##### 3.跨字段(Cross fields)::
對于一些實體,標識信息會在多個字段中出現,每個字段中只含有一部分信息:
* Person: `first_name` 和 `last_name`
* Book: `title`, `author`, 和 `description`
* Address: `street`, `city`, `country`, 和 `postcode`
此時,我們希望在任意字段中找到盡可能多的單詞。我們需要在多個字段中進行查詢,就好像這些字段是一個字段那樣。
以上這些都是多詞,多字段查詢,但是每種都需要使用不同的策略。我們會在本章剩下的部分解釋每種策略。
<!-- === Single Query String
The `bool` query is the mainstay of multiclause queries.((("multifield search", "single query string"))) It works well
for many cases, especially when you are able to map different query strings to
individual fields.
The problem is that, these days, users expect to be able to type all of their
search terms into a single field, and expect that the application will figure out how
to give them the right results. It is ironic that the multifield search form
is known as _Advanced Search_—it may appear advanced to the user, but it is
much simpler to implement.
There is no simple _one-size-fits-all_ approach to multiword, multifield
queries. To get the best results, you have to _know your data_ and know how
to use the appropriate tools.
[[know-your-data]]
==== Know Your Data
When your only user input is a single query string, you will encounter three scenarios frequently:
Best fields::
When searching for words that represent a concept, such as ``brown fox,'' the
words mean more together than they do individually. Fields like the `title`
and `body`, while related, can be considered to be in competition with each
other. Documents should have as many words as possible in _the same field_,
and the score should come from the _best-matching field_.
Most fields::
+
--
A common technique for fine-tuning relevance is to index the same data into
multiple fields, each with its own analysis chain.
The main field may contain words in their stemmed form, synonyms, and words
stripped of their _diacritics_, or accents. It is used to match as many
documents as possible.
The same text could then be indexed in other fields to provide more-precise
matching. One field may contain the unstemmed version, another the original
word with accents, and a third might use _shingles_ to provide information
about <<proximity-matching,word proximity>>.
These other fields act as _signals_ to increase the relevance score of each
matching document. The _more fields that match_, the better.
--
Cross fields::
+
--
For some entities, the identifying information is spread across multiple
fields, each of which contains just a part of the whole:
* Person: `first_name` and `last_name`
* Book: `title`, `author`, and `description`
* Address: `street`, `city`, `country`, and `postcode`
In this case, we want to find as many words as possible in _any_ of the listed
fields. We need to search across multiple fields as if they were one big
field.
--
All of these are multiword, multifield queries, but each requires a
different strategy. We will examine each strategy in turn in the rest of this
chapter.
-->
- Introduction
- 入門
- 是什么
- 安裝
- API
- 文檔
- 索引
- 搜索
- 聚合
- 小結
- 分布式
- 結語
- 分布式集群
- 空集群
- 集群健康
- 添加索引
- 故障轉移
- 橫向擴展
- 更多擴展
- 應對故障
- 數據
- 文檔
- 索引
- 獲取
- 存在
- 更新
- 創建
- 刪除
- 版本控制
- 局部更新
- Mget
- 批量
- 結語
- 分布式增刪改查
- 路由
- 分片交互
- 新建、索引和刪除
- 檢索
- 局部更新
- 批量請求
- 批量格式
- 搜索
- 空搜索
- 多索引和多類型
- 分頁
- 查詢字符串
- 映射和分析
- 數據類型差異
- 確切值對決全文
- 倒排索引
- 分析
- 映射
- 復合類型
- 結構化查詢
- 請求體查詢
- 結構化查詢
- 查詢與過濾
- 重要的查詢子句
- 過濾查詢
- 驗證查詢
- 結語
- 排序
- 排序
- 字符串排序
- 相關性
- 字段數據
- 分布式搜索
- 查詢階段
- 取回階段
- 搜索選項
- 掃描和滾屏
- 索引管理
- 創建刪除
- 設置
- 配置分析器
- 自定義分析器
- 映射
- 根對象
- 元數據中的source字段
- 元數據中的all字段
- 元數據中的ID字段
- 動態映射
- 自定義動態映射
- 默認映射
- 重建索引
- 別名
- 深入分片
- 使文本可以被搜索
- 動態索引
- 近實時搜索
- 持久化變更
- 合并段
- 結構化搜索
- 查詢準確值
- 組合過濾
- 查詢多個準確值
- 包含,而不是相等
- 范圍
- 處理 Null 值
- 緩存
- 過濾順序
- 全文搜索
- 匹配查詢
- 多詞查詢
- 組合查詢
- 布爾匹配
- 增加子句
- 控制分析
- 關聯失效
- 多字段搜索
- 多重查詢字符串
- 單一查詢字符串
- 最佳字段
- 最佳字段查詢調優
- 多重匹配查詢
- 最多字段查詢
- 跨字段對象查詢
- 以字段為中心查詢
- 全字段查詢
- 跨字段查詢
- 精確查詢
- 模糊匹配
- Phrase matching
- Slop
- Multi value fields
- Scoring
- Relevance
- Performance
- Shingles
- Partial_Matching
- Postcodes
- Prefix query
- Wildcard Regexp
- Match phrase prefix
- Index time
- Ngram intro
- Search as you type
- Compound words
- Relevance
- Scoring theory
- Practical scoring
- Query time boosting
- Query scoring
- Not quite not
- Ignoring TFIDF
- Function score query
- Popularity
- Boosting filtered subsets
- Random scoring
- Decay functions
- Pluggable similarities
- Conclusion
- Language intro
- Intro
- Using
- Configuring
- Language pitfalls
- One language per doc
- One language per field
- Mixed language fields
- Conclusion
- Identifying words
- Intro
- Standard analyzer
- Standard tokenizer
- ICU plugin
- ICU tokenizer
- Tidying text
- Token normalization
- Intro
- Lowercasing
- Removing diacritics
- Unicode world
- Case folding
- Character folding
- Sorting and collations
- Stemming
- Intro
- Algorithmic stemmers
- Dictionary stemmers
- Hunspell stemmer
- Choosing a stemmer
- Controlling stemming
- Stemming in situ
- Stopwords
- Intro
- Using stopwords
- Stopwords and performance
- Divide and conquer
- Phrase queries
- Common grams
- Relevance
- Synonyms
- Intro
- Using synonyms
- Synonym formats
- Expand contract
- Analysis chain
- Multi word synonyms
- Symbol synonyms
- Fuzzy matching
- Intro
- Fuzziness
- Fuzzy query
- Fuzzy match query
- Scoring fuzziness
- Phonetic matching
- Aggregations
- overview
- circuit breaker fd settings
- filtering
- facets
- docvalues
- eager
- breadth vs depth
- Conclusion
- concepts buckets
- basic example
- add metric
- nested bucket
- extra metrics
- bucket metric list
- histogram
- date histogram
- scope
- filtering
- sorting ordering
- approx intro
- cardinality
- percentiles
- sigterms intro
- sigterms
- fielddata
- analyzed vs not
- 地理坐標點
- 地理坐標點
- 通過地理坐標點過濾
- 地理坐標盒模型過濾器
- 地理距離過濾器
- 緩存地理位置過濾器
- 減少內存占用
- 按距離排序
- Geohashe
- Geohashe
- Geohashe映射
- Geohash單元過濾器
- 地理位置聚合
- 地理位置聚合
- 按距離聚合
- Geohash單元聚合器
- 范圍(邊界)聚合器
- 地理形狀
- 地理形狀
- 映射地理形狀
- 索引地理形狀
- 查詢地理形狀
- 在查詢中使用已索引的形狀
- 地理形狀的過濾與緩存
- 關系
- 關系
- 應用級別的Join操作
- 扁平化你的數據
- Top hits
- Concurrency
- Concurrency solutions
- 嵌套
- 嵌套對象
- 嵌套映射
- 嵌套查詢
- 嵌套排序
- 嵌套集合
- Parent Child
- Parent child
- Indexing parent child
- Has child
- Has parent
- Children agg
- Grandparents
- Practical considerations
- Scaling
- Shard
- Overallocation
- Kagillion shards
- Capacity planning
- Replica shards
- Multiple indices
- Index per timeframe
- Index templates
- Retiring data
- Index per user
- Shared index
- Faking it
- One big user
- Scale is not infinite
- Cluster Admin
- Marvel
- Health
- Node stats
- Other stats
- Deployment
- hardware
- other
- config
- dont touch
- heap
- file descriptors
- conclusion
- cluster settings
- Post Deployment
- dynamic settings
- logging
- indexing perf
- rolling restart
- backup
- restore
- conclusion