analyzer（分析器） · my-elasticsearch-cn

# analyzer（分析器） `[analyzed](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/mapping-index.html)（`被分析）的?**string**?**fields**（字符串字段）的值通過?[`analyzer`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis.html)（分析器）來傳遞，將字符串轉換為一串?**`tokens`**（標記）標記或者?**`terms`**（詞條）。例如，基于某種分析器，字符串 "**The quick Brown Foxes**" 被解析為 :?**`quick?`**`，`**`brown`，`fox?`**`。`這些是索引該字段的實際?**`terms`**（詞條），可以用來有效地搜索大塊文本內的單個單詞。這樣的分析過程不僅發生在索引的時候，而且在查詢時也需要 : 查詢字符串需要通過相同（或類似的）**`analyzer?`**分析器傳遞，以便嘗試查找那些存在于索引的相同格式的?**`terms`**（詞條）。 **Elasticsearch?**內置了許多?[`pre-defined analyzers`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-analyzers.html)（預定義的分析器），可以在不進一步配置的情況下使用。它還附帶許多?[`character filters`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-charfilters.html)（字符過濾器），[`tokenizers`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-tokenizers.html)（分詞器）和[`Token Filters`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-tokenfilters.html)（標記過濾器）。可以用來組合配置每個索引的自定義`analyzer`（分析器）。每一個查詢，每一個字段或索引都可以指定分析器，在索引的時候，**Elasticsearch?**將按以下順序查找?**`analyzer`**（分析器）:? * 定義在字段映射中的?**`analyzer`**（分析器）。 * 索引設置中?**`default`**（默認）的?**`analyzer`**（分析器）。 * [`standard`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-standard-analyzer.html)（標準的）**`analyzer`**（分析器）。在查詢時，還有幾層 : * 在?[`full-text query`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/full-text-queries.html)（全文查找）中定義的?**`analyzer`**（分析器）。 * 在字段映射中定義的?**`search_analyzer`**（搜索分析器）。 * 在字段映射中定義的?**`analyzer`**（分析器）。 * 在索引配置中?**`default_search`**（默認搜索的）**`analyzer`**（分析器）。 * 索引設置中?**`default`**（默認）的?**`analyzer`**（分析器）。 * [`standard`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-standard-analyzer.html)（標準的）**`analyzer`**（分析器）。為特定字段指定分析器的最簡單的方法是在字段映射中進行定義，如下所示 :? | `curl -XPUT?``'localhost:9200/my_index?pretty'`?`-H?``'Content-Type: application/json'`?`-d'` `{` `"mappings"``: {` `"my_type"``: {` `"properties"``: {` `"text"``: {?``# 1` `"type"``:?``"text"``,` `"fields"``: {` `"english"``: {?``# 2` `"type"``:?????``"text"``,` `"analyzer"``:?``"english"` `}` `}` `}` `}` `}` `}` `}` `'` `curl -XGET?``'localhost:9200/my_index/_analyze?pretty'`?`-H?``'Content-Type: application/json'`?`-d'?``# 3` `{` `"field"``:?``"text"``,` `"text"``:?``"The quick Brown Foxes."` `}` `'` `curl -XGET?``'localhost:9200/my_index/_analyze?pretty'`?`-H?``'Content-Type: application/json'`?`-d'?``# 4` `{` `"field"``:?``"text.english"``,` `"text"``:?``"The quick Brown Foxes."` `}` `'` | | 1 | `**text**?`字段使用默認的?**`standard`**（標準的）分析器。 | | 2 | **`text.english?`**多字段使用?**`english?`**分詞器，可以刪除?**`stop words`**（停用詞）并應用于?**`stemming?`**詞干。 | | 3 | 返回?**`tokens`**（標記）: [**`the`**，**`quick`**，**`brown`**，**`foxes`**]。 | | 4 | 返回?**`tokens`**（標記）: [**`quick`**，**`brown`**，**`fox`**]。 | ## search_quote_analyzer（搜索引用分析器） `該?**search_quote_analyzer?**`設置允許你為短語指定?**`analyzer`**（分析器），這在處理禁用短語的?**`stop words`**（停用詞）時特別有用。要使用三個?**`analyzer`**（分析器）設置來禁用短語的停用詞 :? 1. 一個?**`analyzer`**（分析器）設置成索引所有的?**`terms`**（詞條）包括?**`stop words`**（停用詞）。 2. 一個?**`search_analyzer?`**設置成將移除?**`stop words`**（停用詞）的非短語查詢。 3. 一個?`**search_quote_analyzer**?`設置不會移除?**`stop words`**（停用詞）的短語查詢。 | `curl -XPUT?``'localhost:9200/my_index?pretty'`?`-H?``'Content-Type: application/json'`?`-d'` `{` `"settings"``:{` `"analysis"``:{` `"analyzer"``:{` `"my_analyzer"``:{?``# 1` `"type"``:``"custom"``,` `"tokenizer"``:``"standard"``,` `"filter"``:[` `"lowercase"` `]` `},` `"my_stop_analyzer"``:{?``# 2` `"type"``:``"custom"``,` `"tokenizer"``:``"standard"``,` `"filter"``:[` `"lowercase"``,` `"english_stop"` `]` `}` `},` `"filter"``:{` `"english_stop"``:{` `"type"``:``"stop"``,` `"stopwords"``:``"_english_"` `}` `}` `}` `},` `"mappings"``:{` `"my_type"``:{` `"properties"``:{` `"title"``: {` `"type"``:``"text"``,` `"analyzer"``:``"my_analyzer"``,?``# 3` `"search_analyzer"``:``"my_stop_analyzer"``,?``# 4` `"search_quote_analyzer"``:``"my_analyzer"`?`# 5` `}` `}` `}` `}` `}` `'` | | `PUT my_index``/my_type/1` `{` `"title"``:``"The Quick Brown Fox"` `}` `PUT my_index``/my_type/2` `{` `"title"``:``"A Quick Brown Fox"` `}` `GET my_index``/my_type/_search` `{` `"query"``:{` `"query_string"``:{` `"query"``:``"\"the quick brown fox\""`?`# 1` `}` `}` `}` | | 1 | **`my_analyzer?`**分析器，用于標識所有?`terms`（詞條）包括?**`stop words`**（停用詞）。 | | 2 | 移除?`**stop**?**words**`（停用詞）的?**`my_stop_analyzer?`**分析器。 | | 3 | **`analyzer`**（分析器）設置指向將在索引時使用的?**`my_analyzer?`**分析器。 | | 4 | **`search_analyzer?`**設置指向?**`my_stop_analyzer`**，并移除非短語查詢的**`stop words`**（停用詞）。 | | 5 | **`search_quote_analyzer?`**設置指向?**`my_analyzer?`**分析器，并確保?**`stop words`**（停用詞）不會從短語查詢中移除。 | | 1 | 由于查詢時用括號括起來的,因此它被檢測為短語查詢。因此**`search_quote_analyzer?`**會啟動并確保停用詞不會從查詢中移除。**`my_analyzer?`**分析器將返回與其中一個文檔相匹配的?**`terms`**（詞條）[`**the**,``**quick**,``**brown**,`**`fox`**]。同時，將通過?**`my_stop_analyzer?`**分析器分析**`terms`**（詞條）查詢，該分析器將過濾掉?**`stop words`**（停用詞）。因此，搜索?**`The quick brown fox`**?或?**`A quick brown fox`**?將返回兩個文檔，因為這兩個文檔都包含以下?**`tokens`**（詞元）[`**quick**,``**brown**,`**`fox`**]。沒有**`search_quote_analyzer`**，將不可能對??**phrase**?**queries**（短語查詢）做到精確匹配，因為短語查詢時?**`stop words`**（停用詞）會被刪除，從而導致兩個文檔都會被匹配到。 |