屬性方法 · JAVA

[TOC] # Mapping參數 ![](https://img.kancloud.cn/35/8e/358e31e4b4e0838b58a32f56e88104e2_931x531.png) ## analyzer * 分詞器，默認為standard analyzer，當該字段被索引和搜索時對字段進行分詞處理 ## boost * 字段權重，默認為1.0 ## dynamic * Mapping中的字段類型一旦設定后，禁止直接修改，原因是：Lucene實現的倒排索引生成后不允許修改 * 只能新建一個索引，然后reindex數據 * 默認允許新增字段 * 通過dynamic參數來控制字段的新增： * true（默認）允許自動新增字段 * false 不允許自動新增字段，但是文檔可以正常寫入，但無法對新增字段進行查詢等操作 * strict 文檔不能寫入，報錯 ~~~ PUT my_index { "mappings": { "_doc": { "dynamic": false, "properties": { "user": { "properties": { "name": { "type": "text" }, "social_networks": { "dynamic": true, "properties": {} } } } } } } } ~~~ 定義后my\_index這個索引下不能自動新增字段，但是在user.social\_networks下可以自動新增子字段 ## copy\_to * 將該字段復制到目標字段，實現類似\_all的作用 * 不會出現在\_source中，只用來搜索 ~~~ DELETE my_index PUT my_index { "mappings": { "doc": { "properties": { "first_name": { "type": "text", "copy_to": "full_name" }, "last_name": { "type": "text", "copy_to": "full_name" }, "full_name": { "type": "text" } } } } } PUT my_index/doc/1 { "first_name": "John", "last_name": "Smith" } GET my_index/_search { "query": { "match": { "full_name": { "query": "John Smith", "operator": "and" } } } } ~~~ ## index * 控制當前字段是否索引，默認為true，即記錄索引，false不記錄，即不可搜索 ## index\_options * index\_options參數控制將哪些信息添加到倒排索引，以用于搜索和突出顯示，可選的值有：docs，freqs，positions，offsets * docs：只索引 doc id * freqs：索引 doc id 和詞頻，平分時可能要用到詞頻 * positions：索引 doc id、詞頻、位置，做 proximity or phrase queries 時可能要用到位置信息 * offsets：索引doc id、詞頻、位置、開始偏移和結束偏移，高亮功能需要用到offsets ## fielddata * 是否預加載 fielddata，默認為false * Elasticsearch第一次查詢時完整加載這個字段所有 Segment 中的倒排索引到內存中 * 如果我們有一些 5 GB 的索引段，并希望加載 10 GB 的 fielddata 到內存中，這個過程可能會要數十秒 * 將 fielddate 設置為 true ,將載入 fielddata 的代價轉移到索引刷新的時候，而不是查詢時，從而大大提高了搜索體驗 * 參考：[預加載 fielddata](https://www.elastic.co/guide/cn/elasticsearch/guide/current/preload-fielddata.html) ## eager\_global\_ordinals * 是否預構建全局序號，默認false * 參考：[預構建全局序號（Eager global ordinals）](https://www.elastic.co/guide/cn/elasticsearch/guide/current/preload-fielddata.html#global-ordinals) ## doc\_values * 參考：[Doc Values and Fielddata](https://www.elastic.co/guide/cn/elasticsearch/guide/current/docvalues-and-fielddata.html) ## fields * 該參數的目的是為了實現 multi-fields * 一個字段，多種數據類型 * 譬如：一個字段 city 的數據類型為 text ，用于全文索引，可以通過 fields 為該字段定義 keyword 類型，用于排序和聚合 ~~~ # 設置 mapping PUT my_index { "mappings": { "_doc": { "properties": { "city": { "type": "text", "fields": { "raw": { "type": "keyword" } } } } } } } # 插入兩條數據 PUT my_index/_doc/1 { "city": "New York" } PUT my_index/_doc/2 { "city": "York" } # 查詢，city用于全文索引 match，city.raw用于排序和聚合 GET my_index/_search { "query": { "match": { "city": "york" } }, "sort": { "city.raw": "asc" }, "aggs": { "Cities": { "terms": { "field": "city.raw" } } } } ~~~ ## format * 由于JSON沒有date類型，Elasticsearch預先通過format參數定義時間格式，將匹配的字符串識別為date類型，轉換為時間戳（單位：毫秒） * format默認為：`strict_date_optional_time||epoch_millis` * Elasticsearch內建的時間格式: ![](https://img.kancloud.cn/4d/b9/4db9e4fdd6aed8f81227c904799fb12c_362x693.png) * 上述名稱加前綴`strict_`表示為嚴格格式 * 更多的查看文檔 ## properties * 用于\_doc，object和nested類型的字段定義**子字段** ~~~ PUT my_index { "mappings": { "_doc": { "properties": { "manager": { "properties": { "age": { "type": "integer" }, "name": { "type": "text" } } }, "employees": { "type": "nested", "properties": { "age": { "type": "integer" }, "name": { "type": "text" } } } } } } } PUT my_index/_doc/1 { "region": "US", "manager": { "name": "Alice White", "age": 30 }, "employees": [ { "name": "John Smith", "age": 34 }, { "name": "Peter Brown", "age": 26 } ] } ~~~ ## normalizer * 與 analyzer 類似，只不過 analyzer 用于 text 類型字段，分詞產生多個 token，而 normalizer 用于 keyword 類型，只產生一個 token（整個字段的值作為一個token，而不是分詞拆分為多個token） * 定義一個自定義 normalizer，使用大寫uppercase過濾器 ~~~ PUT test_index_4 { "settings": { "analysis": { "normalizer": { "my_normalizer": { "type": "custom", "char_filter": [], "filter": ["uppercase", "asciifolding"] } } } }, "mappings": { "_doc": { "properties": { "foo": { "type": "keyword", "normalizer": "my_normalizer" } } } } } # 插入數據 POST test_index_4/_doc/1 { "foo": "hello world" } POST test_index_4/_doc/2 { "foo": "Hello World" } POST test_index_4/_doc/3 { "foo": "hello elasticsearch" } # 搜索hello，結果為空，而不是3條！！ GET test_index_4/_search { "query": { "match": { "foo": "hello" } } } # 搜索 hello world，結果2條，1 和 2 GET test_index_4/_search { "query": { "match": { "foo": "hello world" } } } ~~~ ## 其他字段 * coerce * 強制類型轉換，把json中的值轉為ES中字段的數據類型，譬如：把字符串"5"轉為integer的5 * coerce默認為 true * 如果coerce設置為 false，當json的值與es字段類型不匹配將會 rejected * 通過 "settings": { "index.mapping.coerce": false } 設置索引的 coerce * enabled * 是否索引，默認為 true * 可以在\_doc和字段兩個粒度進行設置 * ignore\_above * 設置能被索引的字段的長度 * 超過這個長度，該字段將不被索引，所以無法搜索，但聚合的terms可以看到 * null\_value * 該字段定義遇到null值時的處理策略，默認為Null，即空值，此時ES會忽略該值 * 通過設定該值可以設定字段為 null 時的默認值 * ignore\_malformed * 當數據類型不匹配且 coerce 強制轉換時,默認情況會拋出異常,并拒絕整個文檔的插入 * 若設置該參數為 true，則忽略該異常，并強制賦值，但是不會被索引，其他字段則照常 * norms * norms 存儲各種標準化因子，為后續查詢計算文檔對該查詢的匹配分數提供依據 * norms 參數對**評分**很有用，但需要占用大量的磁盤空間 * 如果不需要計算字段的評分，可以取消該字段 norms 的功能 * position\_increment\_gap * 與 proximity queries（近似查詢）和 phrase queries（短語查詢）有關 * 默認值 100 * search\_analyzer * 搜索分詞器，查詢時使用 * 默認與 analyzer 一樣 * similarity * 設置相關度算法，ES5.x 和 ES6.x 默認的算法為 BM25 * 另外也可選擇 classic 和 boolean * store * store 的意思是：是否在 \_source 之外在獨立存儲一份，默認值為 false * es在存儲數據的時候把json對象存儲到"\_source"字段里，"\_source"把所有字段保存為一份文檔存儲（讀取需要1次IO），要取出某個字段則通過 source filtering 過濾 * 當字段比較多或者內容比較多，并且不需要取出所有字段的時候，可以把特定字段的store設置為true單獨存儲（讀取需要1次IO），同時在\_source設置exclude * 關于該字段的理解，參考： [es設置mapping store屬性](https://blog.csdn.net/helllochun/article/details/52136954) * term\_vector * 與倒排索引相關