<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                ??碼云GVP開源項目 12k star Uniapp+ElementUI 功能強大 支持多語言、二開方便! 廣告
                課程大綱 ## 1. term vector介紹 > 作用: > 獲取document中的某個field內的各個term的統計信息 > term information: term frequency in the field, term positions, start and end offsets, term payloads > term statistics: 設置term_statistics=true; > ttf(total term frequency), 一個term在所有document中出現的頻率; > document frequency,有多少document包含這個term > field statistics: > document count:有多少document包含這個field; > sum of document frequency:一個field中所有term的df之和; > sum of total term frequency:一個field中的所有term的tf之和 > 用例: > 比如說,你想要看到某個term,某個詞條,大話西游,這個詞條,在多少個document中出現了。或者說某個field,film_desc,電影的說明信息,有多少個doc包含了這個說明信息。 ## 2. index-iime term vector實驗 > term vector,涉及了很多的term和field相關的統計信息,有兩種方式可以采集到這個統計信息 > (1)index-time,你在mapping里配置一下,然后建立索引的時候,就直接給你生成這些term和field的統計信息了 > (2)query-time,你之前沒有生成過任何的Term vector信息,然后在查看term vector的時候,直接就可以看到了,會on the fly,現場計算出各種統計信息,然后返回給你 ~~~ PUT /my_index { "mappings": { "my_type": { "properties": { "text": { "type": "text", "term_vector": "with_positions_offsets_payloads", "store" : true, "analyzer" : "fulltext_analyzer" }, "fullname": { "type": "text", "analyzer" : "fulltext_analyzer" } } } }, "settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 }, "analysis": { "analyzer": { "fulltext_analyzer": { "type": "custom", "tokenizer": "whitespace", "filter": [ "lowercase", "type_as_payload" ] } } } } } ~~~ ~~~ PUT /my_index/my_type/1 { "fullname" : "Leo Li", "text" : "hello test test test " } PUT /my_index/my_type/2 { "fullname" : "Leo Li", "text" : "other hello test ..." } ~~~ ### 2.1 term 統計 ~~~ GET /my_index/my_type/1/_termvectors { "fields" : ["text"], "offsets" : true, "payloads" : true, "positions" : true, "term_statistics" : true, "field_statistics" : true } ~~~ * 得到 ~~~ "term_vectors": { "text": { "field_statistics": { "sum_doc_freq": 6, "doc_count": 2, "sum_ttf": 8 }, "terms": { "hello": { "doc_freq": 2, # 在2個doc中出現了 "ttf": 2, # 在所有doc中出現了2次 "term_freq": 1, # 在當前doc中出現了多少次 "tokens": [ { "position": 0, "start_offset": 0, "end_offset": 5, "payload": "d29yZA==" } ] }, "test": { "doc_freq": 2, "ttf": 4, "term_freq": 3, "tokens": [ # test這個單詞,出現了三次,這三個的offset { "position": 1, "start_offset": 6, "end_offset": 10, "payload": "d29yZA==" }, { "position": 2, "start_offset": 11, "end_offset": 15, "payload": "d29yZA==" }, { "position": 3, "start_offset": 16, "end_offset": 20, "payload": "d29yZA==" } ] ~~~ ### 2.2 query-time term vector實驗 fullname這個field沒有在創建的時候產生詞條信息 ~~~ GET /my_index/my_type/1/_termvectors { "fields" : ["fullname"], "offsets" : true, "positions" : true, "term_statistics" : true, "field_statistics" : true } ~~~ 一般來說,如果條件允許,你就用query time的term vector就可以了,你要探查什么數據,現場去探查一下就好了 ### 2.3 查找指定詞語在所有doc某一field出現的次數 這里對text進行搜索,查找找test在所有doc出現的頻率 ~~~ GET /my_index/my_type/_termvectors { "doc" : { "fullname":"li xiao long", "text" : "test" }, "fields" : ["text"], "offsets" : true, "payloads" : true, "positions" : true, "term_statistics" : true, "field_statistics" : true } ~~~ 得到 ~~~ "terms": { "test": { "doc_freq": 2, "ttf": 5, # test在所有doc的text中出現的次數 "term_freq": 1, "tokens": [ { "position": 0, "start_offset": 0, "end_offset": 4 } ~~~ * 查找lihong長在full的頻率 ~~~ GET /my_index/my_type/_termvectors { "doc" : { "fullname":"li hong zhang", "text" : "test" }, "fields" : ["fullname"], "offsets" : true, "payloads" : true, "positions" : true, "term_statistics" : true, "field_statistics" : true } ~~~ 手動指定一個doc,實際上不是要指定doc,而是要指定你想要安插的詞條,hello test,那么就可以放在一個field中 將這些term分詞,然后對每個term,都去計算它在現有的所有doc中的一些統計信息 這個挺有用的,可以讓你手動指定要探查的term的數據情況,你就可以指定探查“大話西游”這個詞條的統計信息 ### 2.4 手動指定analyzer來生成term vector ~~~ GET /my_index/my_type/_termvectors { "doc" : { "fullname" : "Leo Li", "text" : "hello test test test" }, "fields" : ["text"], "offsets" : true, "payloads" : true, "positions" : true, "term_statistics" : true, "field_statistics" : true, "per_field_analyzer" : { "text": "standard" } } ~~~ ### 2.5 terms filter ~~~ GET /my_index/my_type/_termvectors { "doc" : { "fullname" : "Leo Li", "text" : "hello test test test" }, "fields" : ["text"], "offsets" : true, "payloads" : true, "positions" : true, "term_statistics" : true, "field_statistics" : true, "filter" : { "max_num_terms" : 3, # 控制term出現的最大次數 "min_term_freq" : 1, # 控制term出現的最小次數 "min_doc_freq" : 1 # 最少在多少個doc中出現 } } ~~~ 這個就是說,根據term統計信息,過濾出你想要看到的term vector統計結果 也挺有用的,比如你探查數據把,可以過濾掉一些出現頻率過低的term,就不考慮了 ### 2.6 multi term vector ~~~ GET _mtermvectors { "docs": [ { "_index": "my_index", "_type": "my_type", "_id": "2", "term_statistics": true }, { "_index": "my_index", "_type": "my_type", "_id": "1", "fields": [ "text" ] } ] } ~~~ ~~~ GET /my_index/_mtermvectors { "docs": [ { "_type": "test", "_id": "2", "fields": [ "text" ], "term_statistics": true }, { "_type": "test", "_id": "1" } ] } ~~~ ~~~ GET /my_index/my_type/_mtermvectors { "docs": [ { "_id": "2", "fields": [ "text" ], "term_statistics": true }, { "_id": "1" } ] } ~~~ ~~~ GET /_mtermvectors { "docs": [ { "_index": "my_index", "_type": "my_type", "doc" : { "fullname" : "Leo Li", "text" : "hello test test test" } }, { "_index": "my_index", "_type": "my_type", "doc" : { "fullname" : "Leo Li", "text" : "other hello test ..." } } ] } ~~~
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看