父子關系模型 · TUNA-daily

[TOC] ## 基礎 > * nested object的建模，有個不好的地方，就是采取的是類似冗余數據的方式，將多個數據都放在一起了，維護成本就比較高 > * parent child建模方式，采取的是類似于關系型數據庫的三范式類的建模，多個實體都分割開來，每個實體之間都通過一些關聯方式，進行了父子關系的關聯，各種數據不需要都放在一起，父doc和子doc分別在進行更新的時候，都不會影響對方 > * 一對多關系的建模，維護起來比較方便，類似關系型數據庫的建模方式，應用層join的方式，會導致性能比較差，因為做多次搜索。父子關系的數據模型，不會，性能很好。因為雖然數據實體之間分割開來，但是我們在搜索的時候，由es自動為我們處理底層的關聯關系，并且通過一些手段保證搜索性能。 > 1. 父子關系數據模型，相對于nested數據模型來說，優點是父doc和子doc互相之間不會影響 > 要點： > 1. 父子關系元數據映射，用于確保查詢時候的高性能，但是有一個限制，就是父子數據必須存在于一個shard中 > 2. 父子關系數據存在一個shard中，而且還有映射其關聯關系的元數據，那么搜索父子關系數據的時候，不用跨分片，一個分片本地自己就搞定了，性能當然高了 ## 1. 父子關系建模 > 1. mapping中建立兩個索引（父index和子index） > 2. 子index加入`_parent`屬性，指明自己的父index 案例背景：研發中心員工管理案例，一個IT公司有多個研發中心，每個研發中心有多個員工 ~~~ PUT /company { "mappings": { "rd_center": {}, "employee": { "_parent": { "type": "rd_center" # 指頂父index } } } } ~~~ 父子關系建模的核心，多個type之間有父子關系，用_parent指定父type * 插入父index數據 ~~~ POST /company/rd_center/_bulk { "index": { "_id": "1" }} { "name": "北京研發總部", "city": "北京", "country": "中國" } { "index": { "_id": "2" }} { "name": "上海研發中心", "city": "上海", "country": "中國" } { "index": { "_id": "3" }} { "name": "硅谷人工智能實驗室", "city": "硅谷", "country": "美國" } ~~~ > * shard路由的時候，id=1的rd_center doc，默認會根據id進行路由，到某一個shard ~~~ PUT /company/employee/1?parent=1 # 指定父doc的id，可以保證父子doc路由到用一個share上 { "name": "張三", "birthday": "1970-10-24", "hobby": "爬山" } ~~~ > * 維護父子關系的核心，parent=1，指定了這個數據的父doc的id > 此時，parent-child關系，就確保了說，父doc和子doc都是保存在一個shard上的。內部原理還是doc routing，employee和rd_center的數據，都會用parent id作為routing，這樣就會到一個shard > 就不會根據id=1的employee doc的id進行路由了，而是根據parent=1進行路由，會根據父doc的id進行路由，那么就可以通過底層的路由機制，保證父子數據存在于一個shard中 ~~~ POST /company/employee/_bulk { "index": { "_id": 2, "parent": "1" }} { "name": "李四", "birthday": "1982-05-16", "hobby": "游泳" } { "index": { "_id": 3, "parent": "2" }} { "name": "王二", "birthday": "1979-04-01", "hobby": "爬山" } { "index": { "_id": 4, "parent": "3" }} { "name": "趙五", "birthday": "1987-05-11", "hobby": "騎馬" } ~~~ ## 2. 搜索、聚合 1. 搜索有1980年以后出生的員工的研發中心 > 因為要得到研發中心，所以從研發中心index中search，并指明查詢類型是has_child(從父往下查，過濾子信息1980)，這里是從rd_center(父index)search ~~~ GET company/rd_center/_search { "query": { "has_child": { # 指明query類型，和nested object類似, "type": "employee", # 查詢分析的是子index，指明從父里邊查，還是子index查 "query": { "range": { "birthday": { "gte": 1980 } } } } } } ~~~ 得到 ~~~ "hits": [ { "_index": "company", "_type": "rd_center", "_id": "1", "_score": 1, "_source": { "name": "北京研發總部", "city": "北京", "country": "中國" } }, { "_index": "company", "_type": "rd_center", "_id": "3", "_score": 1, "_source": { "name": "硅谷人工智能實驗室", "city": "硅谷", "country": "美國" } } ] ~~~ 2. 搜索有名叫張三的員工的研發中心 ~~~ GET /company/rd_center/_search { "query": { "has_child": { "type": "employee", "query": { "match": { "name": "張三" } } } } } ~~~ 3. 搜索有至少2個以上員工的研發中心 ~~~ GET /company/rd_center/_search { "query": { "has_child": { "type": "employee", "min_children": 2, # 指明父index中至少有兩個doc的才符合 "query": { "match_all": {} } } } } ~~~ 4. 搜索在中國的研發中心的員工這里是對父index查詢過濾出子index * 錯誤的 ~~~ GET /company/employee/_search { "query": { "has_parent": { "parent_type": "rd_center", "query": { "term": { "country": { "value": "中國" } } } } } } ~~~ 因為倒排索引中的“中國”已經被分為“中”，“國”，term有事精準查詢，所以上邊的查詢是無效de 正確的使用es默認為我們創建的country.keyword查詢 ## 3. 祖孫三代父子關系，祖孫三層關系的數據建模，搜索 ~~~ PUT /company { "mappings": { "country": {}, "rd_center": { "_parent": { "type": "country" } }, "employee": { "_parent": { "type": "rd_center" } } } } ~~~ country -> rd_center -> employee，祖孫三層數據模型 ~~~ POST /company/country/_bulk { "index": { "_id": "1" }} { "name": "中國" } { "index": { "_id": "2" }} { "name": "美國" } POST /company/rd_center/_bulk { "index": { "_id": "1", "parent": "1" }} { "name": "北京研發總部" } { "index": { "_id": "2", "parent": "1" }} { "name": "上海研發中心" } { "index": { "_id": "3", "parent": "2" }} { "name": "硅谷人工智能實驗室" } ~~~ ~~~ PUT /company/employee/1?parent=1&routing=1 { "name": "張三", "dob": "1970-10-24", "hobby": "爬山" } ~~~ routing參數的講解，必須跟grandparent相同，否則有問題 country，用的是自己的id去路由; rd_center，parent，用的是country的id去路由; employee，如果也是僅僅指定一個parent，那么用的是rd_center的id去路由，這就導致祖孫三層數據不會在一個shard上孫子輩兒，要手動指定routing，指定為爺爺輩兒的數據的id 搜索有爬山愛好的員工所在的國家 ~~~ GET /company/country/_search { "query": { "has_child": { "type": "rd_center", "query": { "has_child": { "type": "employee", "query": { "match": { "hobby": "爬山" } } } } } } } ~~~ ~~~ { "took": 10, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "company", "_type": "country", "_id": "1", "_score": 1, "_source": { "name": "中國" } } ] } } ~~~