自定義分析器 · my-elasticsearch-cn

# 自定義分析器當內置分析器不能滿足您的需求時，您可以創建一個custom分析器，它使用以下相應的組合： * 零個或多個[字符過濾器](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-custom-analyzer.html) * 一個[?分析器](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-tokenizers.html) * 零個或多個[token過濾器](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-custom-analyzer.html)。 ## 配置 custom（自定義）分析器接受以下的參數： ? | `tokenizer` | 內置或定制的標記器。（需要） | | `char_filter` | 內置或自定義[字符過濾器](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-charfilters.html)的可選陣列。 | | `filter` | 可選的內置或定制token過濾器陣列。 | | `position_increment_gap` | 在索引文本值數組時，Elasticsearch會在一個值的最后一個值和下一個值的第一個項之間插入假的“間隙”，以確保短語查詢與不同數組元素的兩個術語不匹配。默認為100.有關更多信息，請參閱[position_increment_gap](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/position-increment-gap.html)。 | ## 配置示例以下是一個結合以下內容的示例：字符過濾器 * [HTML Strip Character Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-htmlstrip-charfilter.html "HTML Strip Char Filter") 分詞器 * [Standard Tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-standard-tokenizer.html "Standard Tokenizer") Token 分析器 * [Lowercase Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lowercase-tokenfilter.html "Lowercase Token Filter") * [ASCII-Folding Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-asciifolding-tokenfilter.html "ASCII Folding Token Filter") | `PUT my_index` `{` `"settings"``: {` `"analysis"``: {` `"analyzer"``: {` `"my_custom_analyzer"``: {` `"type"``:??????``"custom"``,` `"tokenizer"``:?``"standard"``,` `"char_filter"``: [` `"html_strip"` `],` `"filter"``: [` `"lowercase"``,` `"asciifolding"` `]` `}` `}` `}` `}` `}` `POST my_index/_analyze` `{` `"analyzer"``:?``"my_custom_analyzer"``,` `"text"``:?``"Is this <b>déjà vu</b>?"` `}` | 上述句子將產生以下詞語： ? | `[ is,?``this``, deja, vu ]` | 前面的例子使用了默認配置的tokenizer，令牌過濾器和字符過濾器，但是可以創建每個配置的版本并在自定義分析器中使用它們。以下是一個比較復雜的例子：字符過濾器 * [Mapping Character Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-mapping-charfilter.html "Mapping Char Filter"),? 分詞器 * [Pattern Tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-pattern-tokenizer.html "Pattern Tokenizer"),?配置為分割標點符號 Token 分析器 * [Lowercase Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lowercase-tokenfilter.html "Lowercase Token Filter")(小寫分析器) * [Stop Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-stop-tokenfilter.html "Stop Token Filter")(停止分析器),?配置為使用預定義的英文停止詞列表 ## 示例 | `PUT my_index` `{` `"settings"``: {` `"analysis"``: {` `"analyzer"``: {` `"my_custom_analyzer"``: {` `"type"``:?``"custom"``,` `"char_filter"``: [` `"emoticons"` `],` `"tokenizer"``:?``"punctuation"``,` `"filter"``: [` `"lowercase"``,` `"english_stop"` `]` `}` `},` `"tokenizer"``: {` `"punctuation"``: {` `"type"``:?``"pattern"``,` `"pattern"``:?``"[ .,!?]"` `}` `},` `"char_filter"``: {` `"emoticons"``: {` `"type"``:?``"mapping"``,` `"mappings"``: [` `":) => _happy_"``,` `":( => _sad_"` `]` `}` `},` `"filter"``: {` `"english_stop"``: {` `"type"``:?``"stop"``,` `"stopwords"``:?``"_english_"` `}` `}` `}` `}` `}` `POST my_index/_analyze` `{` `"analyzer"``:?``"my_custom_analyzer"``,` `"text"``:?????``"I'm a :) person, and you?"` `}` | | [![](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/images/icons/callouts/1.png)](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-custom-analyzer.html#CO283-1)? | 表情符號字符過濾器，標點符號化器和english_stop令牌過濾器是在相同索引設置中定義的自定義實現。 | 以上示例產生以下詞語： | `[ i'm, _happy_, person, you ]` |