<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                ThinkChat2.0新版上線,更智能更精彩,支持會話、畫圖、視頻、閱讀、搜索等,送10W Token,即刻開啟你的AI之旅 廣告
                ## 一、分詞 當一個文檔被存儲時,ElasticSearch會使用分詞器從文檔中提取出若干詞元(token)來支持索引的存儲和搜索;ElasticSearch內置了很多分詞器,但內置的分詞器對中文的處理不好; 舉例來說; 使用分詞命令分析; ``` curl --user elastic:'ray!@#333' -H "Content-Type: application/json" -X POST localhost:9200/_analyze?pretty -d '{"text":"ray elasticsearch"} ' ``` 等同 ``` curl --user elastic:'ray!@#333' -H "Content-Type: application/json" -X POST localhost:9200/_analyze?pretty -d '{"analyzer": "standard","text":"ray elasticsearch"} ' ``` ![](https://img.kancloud.cn/ff/64/ff64cdcb78a5eafa3dce9e05d79d7464_1419x424.png) 上面結果顯示 "ray elasticsearch"語句被分為兩個單詞,因為英文天生以空格分隔,自然就以空格來分詞,這沒有任何問題; 下面舉一個中文的例子; ``` curl --user elastic:'ray!@#333' -H "Content-Type: application/json" -X POST localhost:9200/_analyze?pretty -d '{"text":"全文檢索網"} ' ``` 等同 ``` curl --user elastic:'ray!@#333' -H "Content-Type: application/json" -X POST localhost:9200/_analyze?pretty -d '{"analyzer": "standard","text":"全文檢索網"} ' ``` ![](https://img.kancloud.cn/1a/2e/1a2eb5ec97623e6f46c65e3774a74bad_1416x835.png) 從結果可以看出,這種分詞把每個漢字都獨立分開來了,這對中文分詞就沒有意義了,所以ElasticSearch默認的分詞器對中文處理是有問題的; 上面默認的分詞器的名稱是standard;當我們換一個分詞器處理分詞時,只需將"analyzer"字段設置相應的分詞器名稱即可; ES通過安裝插件的方式來支持第三方分詞器; ## 二、中文分詞 常用的是中文分詞器是中科院ICTCLAS的smartcn和IKAnanlyzer分詞器,我們使用IKAnanlyzer分詞器; ### **安裝** 進入${elasticsearch}/plugins目錄下,創建ik子目錄; 下載: ``` wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.15.1/elasticsearch-analysis-ik-7.15.1.zip ``` 解壓: ``` unzip elasticsearch-analysis-ik-7.15.1.zip ``` 重啟elasticsearch進程,即可啟用IK分詞器了; ### **測試** ``` curl --user elastic:'ray!@#333' -H "Content-Type: application/json" -X POST localhost:9200/_analyze?pretty -d '{"analyzer": "ik_max_word","text":"全文檢索網"} ' ``` ![](https://img.kancloud.cn/e6/bc/e6bc56f64321d03e1e791905681cf389_1410x565.png) 可以看得出來,對比standard分詞器,IK分詞就比較合理了; >[danger] IK包含了兩個分詞器,ik_max_word和ik_smart;
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看