<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                ThinkChat2.0新版上線,更智能更精彩,支持會話、畫圖、視頻、閱讀、搜索等,送10W Token,即刻開啟你的AI之旅 廣告
                # 模式分析器 原文鏈接 : [https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-pattern-analyzer.html](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/getting-started.html)(修改該鏈接為官網對應的鏈接) 譯文鏈接 : [http://www.apache.wiki/display/Elasticsearch/analysis-pattern-analyzer.html](http://www.apache.wiki/display/Elasticsearch)(修改該鏈接為 **ApacheCN** 對應的譯文鏈接) 貢獻者 : @您的名字,[ApacheCN](/display/~apachecn),[Apache中文網](/display/~apachechina) pattern analyzer 使用正則表達式將文本拆分為詞語。 正則表達式應該不是**token**本身匹配?**token separators**?。 正則表達式默認為\ W +(或所有非字符字符)。 ## **Beware of Pathological 正則表達式** pattern analyzer 使用java正則表達式 一個嚴重的正則表達式可能會運行得非常慢,甚至會拋出一個StackOverflowError,并導致它正在運行的節點突然退出。 閱讀更多關于pathological正則表達式和如何避免它們。 ## **定義** 它包括: 分詞器 * [Pattern Tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-pattern-tokenizer.html "Pattern Tokenizer") 詞語過濾器 * [Lower Case Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lowercase-tokenfilter.html "Lowercase Token Filter") * [Stop Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-stop-tokenfilter.html "Stop Token Filter")?(默認禁用) ## **輸出實例** ``` POST _analyze { "analyzer": "pattern", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } ``` 上述的句子將產生以下的詞語: ``` [ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ] ``` ## **配置** pattern analyzer?接受以下參數: pattern ? ? ? ? ? ? ? ? ? ? ? ? ? ?Java正則表達式默認為\ W +。 flags? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Java正則表達式標志。 標志應分開管道,例如“CASE_INSENSITIVE | COMMENTS”。 lowercase ? ? ? ? ? ? ? ? ? ? 是否應該降低條件? 默認為true。 stopwords ? ? ? ? ? ? ? ? ? ?預定義的 stop 詞列表,如_english_或包含停止詞列表的數組。 默認為\ _none_。 stopwords_path ? ? ? ? ?包含停止詞的文件的路徑。 有關stop word配置的更多信息,請參閱Stop Token Filter。 ## **配置實例** 在這個例子中,我們配置了模式分析器來分割非字符字符或下劃線(\ W | _)的電子郵件地址,并將結果縮小: ``` PUT my_index { "settings": { "analysis": { "analyzer": { "my_email_analyzer": { "type": "pattern", "pattern": "\\W|_", "lowercase": true } } } } } POST my_index/_analyze { "analyzer": "my_email_analyzer", "text": "John_Smith@foo-bar.com" } ``` ? 1&gt;當將模式指定為JSON字符串時,模式中的反斜杠需要轉義。 上述的句子將產生以下的詞語: ``` [ john, smith, foo, bar, com ] ``` ## CamelCase 分詞器 以下更復雜的示例將 CamelCase 文本分成token: ``` PUT my_index { "settings": { "analysis": { "analyzer": { "camel": { "type": "pattern", "pattern": "([^\\p{L}\\d]+)|(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)|(?<=[\\p{L}&&[^\\p{Lu}]])(?=\\p{Lu})|(?<=\\p{Lu})(?=\\p{Lu}[\\p{L}&&[^\\p{Lu}]])" } } } } } GET my_index/_analyze { "analyzer": "camel", "text": "MooseX::FTPClass2_beta" } ``` 上述的句子將產生以下的詞語: ``` [ moose, x, ftp, class, 2, beta ] ``` 上面的正則表達式比較容易理解為: ``` ([^\p{L}\d]+) # swallow non letters and numbers, | (?<=\D)(?=\d) # or non-number followed by number, | (?<=\d)(?=\D) # or number followed by non-number, | (?<=[ \p{L} && [^\p{Lu}]]) # or lower case (?=\p{Lu}) # followed by upper case, | (?<=\p{Lu}) # or upper case (?=\p{Lu} # followed by upper case [\p{L}&&[^\p{Lu}]] # then lower case ) ```
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看