第五章：mapping參數解析 · elasticsearch6.x學習筆記

## **mapping參數解析** 官方文檔地址：[https://www.elastic.co/guide/en/elasticsearch/reference/6.x/mapping-params.html]() **1. analyzer** 指定分詞器(分析器更合理)，對索引和查詢都有效。如下，指定ik分詞的配置（1）定義索引并定義mapping ``` PUT test { "mappings": { "it":{ "properties":{ "name" : { "type" : "text", "analyzer" : "ik_smart", "search_analyzer":"ik_max_word" } } } } } ``` （2）插入數據 ``` PUT test/it/1 { "name" : "美國留給伊拉克的是個爛攤子" } PUT test/it/2 { "name" : "中國駐洛杉磯領事館遭亞裔男子槍擊，嫌犯已自首" } PUT test/it/3 { "name" : "中韓漁船沖突調查：韓警平均扣留一艘國漁船" } PUT test/it/4 { "name" : "公安部：各地校車將享受最高路權" } ``` （3）查詢 ``` POST test/it/_search { "query": { "match": { "name": "中國" } } } ``` 查詢結果： ``` { "took": 8, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 0.65109104, "hits": [ { "_index": "test", "_type": "it", "_id": "2", "_score": 0.65109104, "_source": { "name": "中國駐洛杉磯領事館遭亞裔男子槍擊，嫌犯已自首" } } ] } } ``` **2. normalizer** normalizer用于解析前的標準化配置，比如把所有的字符轉化為小寫等。 (1) 創建索引 ``` PUT my_index/ { "settings": { "analysis": { "normalizer":{ "my_normalizer":{ "type":"custom", "char_filter" : [], "filter" : ["lowercase", "asciifolding"] } } } }, "mappings": { "_doc" : { "properties" : { "foo" : { "type": "keyword", "normalizer": "my_normalizer" } } } } } ``` (2) 插入數據 ``` PUT my_index/_doc/1 { "foo": "BàR" } PUT my_index/_doc/2 { "foo": "bar" } PUT my_index/_doc/3 { "foo": "baz" } ``` (3) 查詢數據 ``` GET my_index/_search { "query": { "term": { "foo": "BAR" } } } GET my_index/_search { "query": { "match": { "foo": "BAR" } } } ``` 返回結果： ``` { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": 0.2876821, "hits": [ { "_index": "my_index", "_type": "_doc", "_id": "2", "_score": 0.2876821, "_source": { "foo": "bar" } }, { "_index": "my_index", "_type": "_doc", "_id": "1", "_score": 0.2876821, "_source": { "foo": "BàR" } } ] } } ``` **3.boost** 通過指定一個boost值來控制每個查詢子句的相對權重，該值默認為1。一個大于1的boost會增加該查詢子句的相對權重。 (1) 創建索引并插入數據： ``` #創建索引 PUT my_index { "mappings": { "_doc": { "properties": { "title": { "type": "text", "boost": 2 }, "content": { "type": "text" } } } } } #插入數據 PUT my_index/_doc/1 { "title" : "hello world", "content" : "你好世界" } ``` (2) 查詢： ``` #查詢 POST my_index/_search { "query": { "match" : { "title": { "query": "quick brown fox" } } } } #返回結果： { "took": 13, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 1.1507283, "hits": [ { "_index": "my_index", "_type": "_doc", "_id": "1", "_score": 1.1507283, "_source": { "title": "hello world", "content": "你好世界" } } ] } } ``` boost參數被用來增加一個子句的相對權重(當boost大于1時)，或者減小相對權重(當boost介于0到1時)，但是增加或者減小不是線性的。換言之，boost設為2并不會讓最終的_score加倍。相反，新的_score會在適用了boost后被歸一化(Normalized)。每種查詢都有自己的歸一化算法(Normalization Algorithm)。但是能夠說一個高的boost值會產生一個高的_score。 **4.coerce** coerce屬性用于清除臟數據，coerce的默認值是true。整型數字5有可能會被寫成字符串“5”或者浮點數5.0.coerce屬性可以用來清除臟數據： * 字符串會被強制轉換為整數 * 浮點數被強制轉換為整數 ``` #創建索引 PUT my_index { "mappings": { "_doc": { "properties": { "title": { "type": "text" }, "content": { "type": "text" }, "age" : { "type" : "integer", "coerce" : false } } } } } #第一次插入數據 PUT my_index/_doc/1 { "title" : "hello world", "content" : "你好世界", "age" : 5 #注意此處區別 } #第一次返回結果 { "_index": "my_index", "_type": "_doc", "_id": "1", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 0, "_primary_term": 1 } #第二次插入數據： PUT my_index/_doc/1 { "title" : "hello world", "content" : "你好世界", "age" : "5" #注意此處區別 } #第二次返回結果 { "error": { "root_cause": [ { "type": "mapper_parsing_exception", "reason": "failed to parse [age]" } ], "type": "mapper_parsing_exception", "reason": "failed to parse [age]", "caused_by": { "type": "illegal_argument_exception", "reason": "Integer value passed as String" } }, "status": 400 } ``` **5.copy-to** copy_to屬性用于配置自定義的_all字段。換言之，就是多個字段可以合并成一個超級字段。比如，first_name和last_name可以合并為full_name字段。 ``` #創建索引 PUT my_index { "mappings": { "_doc": { "properties": { "first_name":{ "type" : "text", "copy_to" : "full_name" }, "second_name" : { "type" : "text" , "copy_to" : "full_name" }, "full_name" : { "type" : "text" } } } } } #插入數據 PUT my_index/_doc/1 { "first_name" : "hello", "second_name" : "world" } #查詢 POST my_index/_search { "query": { "match": { "full_name": { "query": "hello world", "operator": "and" } } } } #返回結果 { "took": 6, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 0.5753642, "hits": [ { "_index": "my_index", "_type": "_doc", "_id": "1", "_score": 0.5753642, "_source": { "first_name": "hello", "second_name": "world" } } ] } } ``` **6.doc_values** doc_values是為了加快排序、聚合操作，在建立倒排索引的時候，額外增加一個列式存儲映射，是一個空間換時間的做法。默認是開啟的，對于確定不需要聚合或者排序的字段可以關閉。 ``` PUT my_index { "mappings": { "_doc": { "properties": { "first_name":{ "type" : "text", "copy_to" : "full_name" }, "second_name" : { "type" : "text" , "copy_to" : "full_name", "doc_values" : false }, "full_name" : { "type" : "text" } } } } } ``` **7.dynamic** dynamic屬性用于檢測新發現的字段（即插入記錄是存在字段沒有被定義的情況），有三個取值： * true:新發現的字段添加到映射中。（默認） * flase:新檢測的字段被忽略。必須顯式添加新字段。 * strict:如果檢測到新字段，就會引發異常并拒絕文檔 ``` #創建索引 PUT my_index { "mappings": { "_doc": { "dynamic":"strict", "properties": { "first_name":{ "type" : "text", "copy_to" : "full_name" }, "second_name" : { "type" : "text" , "copy_to" : "full_name", "doc_values" : false }, "full_name" : { "type" : "text" } } } } } #添加文檔，添加不存在的字段 PUT my_index/_doc/1 { "first_name" : "hello", "second_name" : "world", "age" : 10 } #返回結果 { "error": { "root_cause": [ { "type": "strict_dynamic_mapping_exception", "reason": "mapping set to strict, dynamic introduction of [age] within [_doc] is not allowed" } ], "type": "strict_dynamic_mapping_exception", "reason": "mapping set to strict, dynamic introduction of [age] within [_doc] is not allowed" }, "status": 400 } ``` **8.enabled** ELasticseaech默認會索引所有的字段，enabled設為false的字段，es會跳過字段內容，該字段只能從_source中獲取，但是不可搜。而且字段可以是任意類型。 ``` #創建索引 PUT my_index { "mappings": { "_doc": { "dynamic":"strict", "properties": { "first_name":{ "type" : "text", "copy_to" : "full_name" }, "second_name" : { "type" : "text" , "copy_to" : "full_name", "doc_values" : false }, "full_name" : { "type" : "text" }, "age":{ "enabled": false } } } } } #插入數據 PUT my_index/_doc/1 { "first_name" : "hello", "second_name" : "world", "age" : 10 } #查詢 POST my_index/_search { "query": { "match": { "age": { "query": 10 } } } } #返回結果 { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } } ``` **9.format** 當type(字段類型)為date時指定日期的保存格式。除了使用系統內置的格式還可以使用自己熟悉的格式，例如：yyyy/mm/dd。(格式將在接下來的章節中詳細講解) **10.ignore_above** ignore_above用于指定字段索引和存儲的長度最大值，超過最大值的會被忽略(不能用于type類型為text的字段中) ``` #添加索引 PUT my_index { "mappings": { "_doc": { "dynamic":"strict", "properties": { "keyword" : { "type":"keyword", "ignore_above" : 5 } } } } } #添加第一條數據（不超過5個字符） PUT my_index/_doc/1 { "keyword" : "hello" } #添加第二條數據（超過5個字符） PUT my_index/_doc/2 { "keyword" : "hello world" } #查詢字段 POST my_index/_search { "query": { "match": { "keyword": { "query": "hello" } } } } #查詢結果，超過5個字符的將被忽略 { "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 0.2876821, "hits": [ { "_index": "my_index", "_type": "_doc", "_id": "1", "_score": 0.2876821, "_source": { "keyword": "hello" } } ] } } ``` mapping中指定了ignore_above字段的最大長度為5，第一個文檔的字段長小于等于5，因此索引成功，第二個超過5，因此不索引 **11.ignore_malformed** ignore_malformed可以忽略不規則數據。對于賬號userid字段，有人可能填寫的是整數類型，也有人填寫的是郵件格式。給一個字段索引不合適的數據類型發生異常，導致整個文檔索引失敗。如果ignore_malformed參數設為true，異常會被忽略，出異常的字段不會被索引，其它字段正常索引。 ``` #第一種情況當ignore_malformed為false時 PUT my_index { "mappings": { "_doc": { "dynamic":"strict", "properties": { "age" : { "type":"integer", "ignore_malformed" : false } } } } } #插入數據（整型） PUT my_index/_doc/2 { "age" : "10" } #返回結果插入成功 { "_index": "my_index", "_type": "_doc", "_id": "2", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 0, "_primary_term": 1 } #插入數據（非整形） PUT my_index/_doc/1 { "age" : "hello" } #返回結果 { "error": { "root_cause": [ { "type": "mapper_parsing_exception", "reason": "failed to parse [age]" } ], "type": "mapper_parsing_exception", "reason": "failed to parse [age]", "caused_by": { "type": "number_format_exception", "reason": "For input string: \"hello\"" } }, "status": 400 } #第二種情況，當ignore_malformed為true時 PUT my_index { "mappings": { "_doc": { "dynamic":"strict", "properties": { "age" : { "type":"integer", "ignore_malformed" : true } } } } } #插入整形數據和非整形數據 PUT my_index/_doc/1 { "age" : "hello" } PUT my_index/_doc/2 { "age" : "10" } #均插入成功 { "_index": "my_index", "_type": "_doc", "_id": "2", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 0, "_primary_term": 1 } ``` **12.index_options** 用于控制倒排索引記錄的內容，有如下四個配置選項 ![](https://box.kancloud.cn/8b06439432f4d7237142a535cea2cf41_855x207.png) ``` PUT my_index { "mappings": { "my_type": { "properties": { "text": { "type": "text", "index_options": "offsets" } } } } } ``` **13.index** index屬性用于指定字段是否索引，不索引也就不可搜索，取值可以為true或者false。 ``` PUT my_index { "mappings": { "_doc": { "dynamic":"strict", "properties": { "name" : { "type":"text", "index" : false }, "title" : { "type" : "text" } } } } } ``` **14.null_value** 當字段遇到null時得處理策略，默認為null,即為空，此時es會忽略該值。可以通過設定該值設定字段的默認值。（該屬性不能用于type類型為:text的字段下） ``` PUT my_index { "mappings": { "_doc": { "dynamic":"strict", "properties": { "name" : { "type":"text", "index" : false }, "title" : { "type" : "keyword", "null_value" : "null" } } } } } ``` **15.fields** fields可以讓同一文本有多種不同的索引方式，比如一個String類型的字段，可以使用text類型做全文檢索，使用keyword類型做聚合和排序。``` fields可以讓同一文本有多種不同的索引方式，比如一個String類型的字段，可以使用text類型做全文檢索，使用keyword類型做聚合和排序。``` fields可以讓同一文本有多種不同的索引方式，比如一個String類型的字段，可以使用text類型做全文檢索，使用keyword類型做聚合和排序。 ``` PUT my_index { "mappings": { "my_type": { "properties": { "city": { "type": "text", "fields": { "raw": { "type": "keyword" } } } } } } } PUT my_index/my_type/1 { "city": "New York" } PUT my_index/my_type/2 { "city": "York" } GET my_index/_search { "query": { "match": { "city": "york" } }, "sort": { "city.raw": "asc" }, "aggs": { "Cities": { "terms": { "field": "city.raw" } } } } { "took": 31, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": null, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": null, "_source": { "city": "New York" }, "sort": [ "New York" ] }, { "_index": "my_index", "_type": "my_type", "_id": "2", "_score": null, "_source": { "city": "York" }, "sort": [ "York" ] } ] }, "aggregations": { "Cities": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "New York", "doc_count": 1 }, { "key": "York", "doc_count": 1 } ] } } }