4.document核心元數據解析及ES的并發控制 · java核心知識整理

## 一、 document 核心元數據 ### 1\. \_index元數據 ~~~ （1）代表一個document存放在哪個index中。（2）類似的數據放在一個索引，非類似的數據放不同索引：product index（包含了所有的商品），sales index（包含了所有的商品銷售數據），inventory index（包含了所有庫存相關的數據）。如果你把比如product，sales，human resource（employee），全都放在一個大的index里面，比如說company index，不合適的。（3）index中包含了很多類似的document：類似是什么意思，其實指的就是說，這些document的fields很大一部分是相同的，你說你放了3個document，每個document的fields都完全不一樣，這就不是類似了，就不太適合放到一個index里面去了。（4）索引名稱必須是小寫的，不能用下劃線開頭，不能包含逗號：product，website，blog 復制代碼 ~~~ 2. \_type元數據 ~~~ （1）代表document屬于index中的哪個類別（type）（2）一個索引通常會劃分為多個type，邏輯上對index中有些許不同的幾類數據進行分類：因為一批相同的數據，可能有很多相同的fields，但是還是可能會有一些輕微的不同，可能會有少數fields是不一樣的，舉個例子，就比如說，商品，可能劃分為電子商品，生鮮商品，日化商品，等等。（3）type名稱可以是大寫或者小寫，但是同時不能用下劃線開頭，不能包含逗號復制代碼 ~~~ 3. \_id元數據 ~~~ （1）代表document的唯一標識，與index和type一起，可以唯一標識和定位一個document （2）我們可以手動指定document的id（put /index/type/id），也可以不指定，由es自動為我們創建一個id 復制代碼 ~~~ 4. doucument id 手動指定與自動生成 ~~~ 1.手動指定document id （1）根據應用情況來說，是否滿足手動指定document id的前提：一般來說，是從某些其他的系統中，導入一些數據到es時，會采取這種方式，就是使用系統中已有數據的唯一標識，作為es中document的id。舉個例子，比如說，我們現在在開發一個電商網站，做搜索功能，或者是OA系統，做員工檢索功能。這個時候，數據首先會在網站系統或者IT系統內部的數據庫中，會先有一份，此時就肯定會有一個數據庫的primary key（自增長，UUID，或者是業務編號）。如果將數據導入到es中，此時就比較適合采用數據在數據庫中已有的primary key。如果說，我們是在做一個系統，這個系統主要的數據存儲就是es一種，也就是說，數據產生出來以后，可能就沒有id，直接就放es一個存儲，那么這個時候，可能就不太適合說手動指定document id的形式了，因為你也不知道id應該是什么，此時可以采取下面介紹的讓es自動生成id的方式。（2） put /index/type/id PUT /test_index/test_type/2 { "test_content": "my test" } 復制代碼 ~~~ ~~~ 2.自動生成document id （1）post /index/type POST /test_index/test_type { "test_content": "my test" } { "_index": "test_index", "_type": "test_type", "_id": "AVp4RN0bhjxldOOnBxaE", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": true } （2）自動生成的id，長度為20個字符，URL安全，base64編碼，GUID算法，分布式系統并行生成時不可能會發生沖突復制代碼 ~~~ ### 2.\_source元數據 ~~~ put /test_index/test_type/1 { "test_field1": "test field1", "test_field2": "test field2" } get /test_index/test_type/1 { "_index": "test_index", "_type": "test_type", "_id": "1", "_version": 2, "found": true, "_source": { "test_field1": "test field1", "test_field2": "test field2" } } _source元數據：就是說，我們在創建一個document的時候，使用的那個放在request body中的json串，默認情況下，在get的時候，會原封不動的給我們返回回來。定制返回的結果，指定_source中，返回哪些field GET /test_index/test_type/1?_source=test_field1,test_field2 { "_index": "test_index", "_type": "test_type", "_id": "1", "_version": 2, "found": true, "_source": { "test_field2": "test field2" } } 復制代碼 ~~~ ### 3.document全量替換、強制創建 ~~~ 1、document的全量替換（1）語法與創建文檔是一樣的，如果document id不存在，那么就是創建；如果document id已經存在，那么就是全量替換操作，替換document的json串內容（2）document是不可變的，如果要修改document的內容，第一種方式就是全量替換，直接對document重新建立索引，替換里面所有的內容（3）es會將老的document標記為deleted，然后新增我們給定的一個document，當我們創建越來越多的document的時候，es會在適當的時機在后臺自動刪除標記為deleted的document 2、document的強制創建（1）創建文檔與全量替換的語法是一樣的，有時我們只是想新建文檔，不想替換文檔，如果強制進行創建呢？（2）PUT /index/type/id?op_type=create，PUT /index/type/id/_create 3、document的刪除（1）DELETE /index/type/id （2）不會理解物理刪除，只會將其標記為deleted，當數據越來越多的時候，在后臺自動刪除復制代碼 ~~~ ## 二、es 并發沖突問題 ~~~ es并發產生的問題：比如電商場景下，多個用戶同時下單購買同一商品，多線程并發修改庫存。并發控制解決方案 1.悲觀鎖優點：方便直接加鎖，對程序透明。不需要做額外操作缺點：并發能力低，同時只有一個線程操作數據。 2.樂觀鎖優點: 并發能力高，不需要加鎖，大量線程并發。缺點: 操作麻煩，每次更新需要對比版本號。復制代碼 ~~~ ##### 1\. es內部基于\_version進行版本控制 ![](data:image/svg+xml;utf8,) ##### 2\. \_version 來進行版本控制 ~~~ （1）先構造一條數據出來 PUT /test_index/test_type/7 { "test_field": "test test" } （2）模擬兩個客戶端，都獲取到了同一條數據 GET test_index/test_type/7 { "_index": "test_index", "_type": "test_type", "_id": "7", "_version": 1, "found": true, "_source": { "test_field": "test test" } } （3）其中一個客戶端，先更新了一下這個數據同時帶上數據的版本號，確保說，es中的數據的版本號，跟客戶端中的數據的版本號是相同的，才能修改 PUT /test_index/test_type/7?version=1 { "test_field": "test client 1" } { "_index": "test_index", "_type": "test_type", "_id": "7", "_version": 2, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": false } （4）另外一個客戶端，嘗試基于version=1的數據去進行修改，同樣帶上version版本號，進行樂觀鎖的并發控制 PUT /test_index/test_type/7?version=1 { "test_field": "test client 2" } { "error": { "root_cause": [ { "type": "version_conflict_engine_exception", "reason": "[test_type][7]: version conflict, current version [2] is different than the one provided [1]", "index_uuid": "6m0G7yx7R1KECWWGnfH1sw", "shard": "3", "index": "test_index" } ], "type": "version_conflict_engine_exception", "reason": "[test_type][7]: version conflict, current version [2] is different than the one provided [1]", "index_uuid": "6m0G7yx7R1KECWWGnfH1sw", "shard": "3", "index": "test_index" }, "status": 409 } （5）在樂觀鎖成功阻止并發問題之后，嘗試正確的完成更新 GET /test_index/test_type/7 { "_index": "test_index", "_type": "test_type", "_id": "7", "_version": 2, "found": true, "_source": { "test_field": "test client 1" } } 基于最新的數據和版本號，去進行修改，修改后，帶上最新的版本號，可能這個步驟會需要反復執行好幾次，才能成功，特別是在多線程并發更新同一條數據很頻繁的情況下 PUT /test_index/test_type/7?version=2 { "test_field": "test client 2" } { "_index": "test_index", "_type": "test_type", "_id": "7", "_version": 3, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": false } 復制代碼 ~~~ ##### 3\. 使用 external version 進行樂觀鎖并發控制 es提供了一個feature，就是說，你可以不用它提供的內部\_version版本號來進行并發控制，可以基于你自己維護的一個版本號來進行并發控制。舉個列子，加入你的數據在mysql里也有一份，然后你的應用系統本身就維護了一個版本號，無論是什么自己生成的，程序控制的。這個時候，你進行樂觀鎖并發控制的時候，可能并不是想要用es內部的\_version來進行控制，而是用你自己維護的那個version來進行控制。 ?version=1 ?version=1&version\_type=external version\_type=external，唯一的區別在于，\_version，只有當你提供的version與es中的\_version一模一樣的時候，才可以進行修改，只要不一樣，就報錯；當version\_type=external的時候，只有當你提供的version比es中的\_version大的時候，才能完成修改 es，\_version=1，?version=1，才能更新成功 es，\_version=1，?version>1&version\_type=external，才能成功，比如說?version=2&version\_type=external ~~~ （1）先構造一條數據 PUT /test_index/test_type/8 { "test_field": "test" } { "_index": "test_index", "_type": "test_type", "_id": "8", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": true } （2）模擬兩個客戶端同時查詢到這條數據 GET /test_index/test_type/8 { "_index": "test_index", "_type": "test_type", "_id": "8", "_version": 1, "found": true, "_source": { "test_field": "test" } } （3）第一個客戶端先進行修改，此時客戶端程序是在自己的數據庫中獲取到了這條數據的最新版本號，比如說是2 PUT /test_index/test_type/8?version=2&version_type=external { "test_field": "test client 1" } { "_index": "test_index", "_type": "test_type", "_id": "8", "_version": 2, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": false } （4）模擬第二個客戶端，同時拿到了自己數據庫中維護的那個版本號，也是2，同時基于version=2發起了修改 PUT /test_index/test_type/8?version=2&version_type=external { "test_field": "test client 2" } { "error": { "root_cause": [ { "type": "version_conflict_engine_exception", "reason": "[test_type][8]: version conflict, current version [2] is higher or equal to the one provided [2]", "index_uuid": "6m0G7yx7R1KECWWGnfH1sw", "shard": "1", "index": "test_index" } ], "type": "version_conflict_engine_exception", "reason": "[test_type][8]: version conflict, current version [2] is higher or equal to the one provided [2]", "index_uuid": "6m0G7yx7R1KECWWGnfH1sw", "shard": "1", "index": "test_index" }, "status": 409 } （5）在并發控制成功后，重新基于最新的版本號發起更新 GET /test_index/test_type/8 { "_index": "test_index", "_type": "test_type", "_id": "8", "_version": 2, "found": true, "_source": { "test_field": "test client 1" } } PUT /test_index/test_type/8?version=3&version_type=external { "test_field": "test client 2" } { "_index": "test_index", "_type": "test_type", "_id": "8", "_version": 3, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": false } 復制代碼 ~~~ ## 三、 partial update #### 1、什么是partial update？ ~~~ PUT /index/type/id，創建文檔&替換文檔，就是一樣的語法一般對應到應用程序中，每次的執行流程基本是這樣的：（1）應用程序先發起一個get請求，獲取到document，展示到前臺界面，供用戶查看和修改（2）用戶在前臺界面修改數據，發送到后臺（3）后臺代碼，會將用戶修改的數據在內存中進行執行，然后封裝好修改后的全量數據（4）然后發送PUT請求，到es中，進行全量替換（5）es將老的document標記為deleted，然后重新創建一個新的document partial update post /index/type/id/_update { "doc": { "要修改的少數幾個field即可，不需要全量的數據" } } PUT /test_index/test_type/10 { "test_field1": "test1", "test_field2": "test2" } POST /test_index/test_type/10/_update { "doc": { "test_field2": "updated test2" } } 看起來，好像就比較方便了，每次就傳遞少數幾個發生修改的field即可，不需要將全量的document數據發送過去復制代碼 ~~~ #### 2、partial update相較于全量替換優點 ~~~ 1.所有的查詢、修改和寫回操作，都發生在es中的一個shard內部，避免了所有點網絡數據傳輸的開銷，大大提升了性能 2. 減少了查詢和修改中點時間間隔，可以有效減少并發沖突的情況。復制代碼 ~~~ #### 3、partial update內置樂觀鎖并發控制 ~~~ partial update 會自動執行前面所說的樂觀鎖并發控制，并會不斷做重試。 retry 重試策略： 1.再次獲取document 數據和最新版本號 2.基與最新版本號再次去更新，如果成功那么就OK。 3.如果失敗了呢？重復執行1、2步驟，最多重復幾次？這個可以通過retry參數來控制，比如 retry_on_conflict=5 post /index/type/id/_update?retry_on_conflict=5&version=6 ~~~ 作者：Leo\_CX330 鏈接：https://juejin.cn/post/6926341303101980680 來源：掘金著作權歸作者所有。商業轉載請聯系作者獲得授權，非商業轉載請注明出處。