<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                合規國際互聯網加速 OSASE為企業客戶提供高速穩定SD-WAN國際加速解決方案。 廣告
                # Pattern Tokenizer **Pattern Tokenizer?**使用正則表達式分割文本。遇到單詞分隔符將文本分割為詞元, 或者將捕獲到匹配的文本作為詞元。 默認的匹配模式時 ??\W+ ,遇到非單詞的字符時分割文本。 謹防病態的正則表達式 **Pattern Tokenizer** 使用 [Java 正則表達式](http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html) 。 一個書寫不當的正則表達式會導致運行緩慢,甚至拋出 StackOverflowError 導致運行中的節點突然退出。 查看更多關于 [病態的正則表達式 和 如何避免。](http://www.regular-expressions.info/catastrophic.html) 原文鏈接 : [https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-pattern-tokenizer.html](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-pattern-tokenizer.html) 譯文鏈接 : [http://www.apache.wiki/display/Elasticsearch/Pattern+Tokenizer](http://www.apache.wiki/display/Elasticsearch/Pattern+Tokenizer) 貢獻者 : [陳益雷](/display/~chenyilei),[ApacheCN](/display/~apachecn),[Apache中文網](/display/~apachechina) ## **輸出示例** ``` POST _analyze { "tokenizer": "pattern", "text": "The foo_bar_size's default is 5." } ``` 上面的句子會生成如下的詞元: ``` [ The, foo_bar_size, s, default, is, 5 ] ``` ## **配置** **Pattern Tokenizer?**有以下參數: | `pattern` | [Java 正則表達式](http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html) 。默認是?`\W+ 。` | | `flags` | Java正則表達式?[flags](http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#field.summary). flag之間用管道分隔, 如 `"CASE_INSENSITIVE&#124;COMMENTS"。` | | `group` | 將哪個捕獲分組作為詞元。默認是 -1。 | ## **配置示例** 下面的例子中,我們配置 **Pattern Tokenizer?**遇到逗號時分隔文本。 ``` PUT my_index { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "my_tokenizer" } }, "tokenizer": { "my_tokenizer": { "type": "pattern", "pattern": "," } } } } } POST my_index/_analyze { "analyzer": "my_analyzer", "text": "comma,separated,values" } ``` 輸出為: ``` [ comma, separated, values ] ``` 在下一個例子中,我們配置 **Pattern Tokenizer?**遇到雙引號( 忽視轉義的引號 \" ) 時捕獲分組。正則表達式如下: ``` "((?:\\"|[^"]|\\")*)" ``` 解釋: * 起始的引號 " * 開始捕獲 * 一個 \" ?或者其他 非" 的字符 * 重復直到無法匹配更多的字符 * 結束的引號 在寫入到 JSON 中, **"** 和 **\** 需要轉義,因此表達式最終為: ``` \"((?:\\\\\"|[^\"]|\\\\\")+)\" ``` ``` PUT my_index { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "my_tokenizer" } }, "tokenizer": { "my_tokenizer": { "type": "pattern", "pattern": "\"((?:\\\\\"|[^\"]|\\\\\")+)\"", "group": 1 } } } } } POST my_index/_analyze { "analyzer": "my_analyzer", "text": "\"value\", \"value with embedded \\\" quote\"" } ``` 輸出為: ``` [ value, value with embedded \" quote ] ```
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看