Bulk API · my-elasticsearch-cn

# Bulk API (批量API) Bulk API可以在單個API調用中執行多個創建索引/刪除的操作。這可以大大提高索引速度。 > ## 批量請求客戶端支持 > > 一些官方支持的客戶端提供幫助來協助從一個索引到另一個索引的批量請求和重新索引： > > Perl > > 參見[Search::Elasticsearch::Bulk](https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Bulk)與[Search::Elasticsearch::Scroll](https://metacpan.org/pod/Search::Elasticsearch::Client::5_0::Scroll) > > Python > > 參見[elasticsearch.helpers.\*](http://elasticsearch-py.readthedocs.io/en/master/helpers.html) REST API端點是`/_bulk`，它期望是以換行符分隔JSON（NDJSON）的結構： ``` action_and_meta_data\n optional_source\n action_and_meta_data\n optional_source\n .... action_and_meta_data\n optional_source\n ``` 注意：最后一行數據必須以換行符`\n`結尾。每個換行符字符之前都可以回車`\r`。當向該端點發送請求時，應將`Content-Type`頭設置為`application/x-ndjson`。可能的操作是`index`、`create`、`delete`和`update`。`index`和`create`期望下一行的源，并且具有與標準索引API的`op_type`參數相同的語義（即：如果具有相同索引和類型的文檔已經存在，則`create`將失敗，而有必要時索引將添加或替換文檔）。`delete`并不期望下列行的源，并且具有與標準`delete` API相同的語義。 `update`需要在下一行指定部分文檔，`upsert`和`script`及其選項。如果要提供文本文件輸入到`curl`，則必須使用`--data-binary`標志替代`-d`的文本。后者不需要保留換行符。例： ``` $ cat requests { "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } } { "field1" : "value1" } $ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@requests"; echo {"took":7, "errors": false, "items":[{"index":{"_index":"test","_type":"type1","_id":"1","_version":1,"result":"created","forced_refresh":false}}]} ``` 因為此格式使用文字`\n`作為分隔符，請確保JSON操作和源文檔不是格式化的打印。以下是批量命令正確序列的示例： ``` POST _bulk { "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } } { "field1" : "value1" } { "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } } { "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } } { "field1" : "value3" } { "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} } { "doc" : {"field2" : "value2"} } ``` 批量操作的結果如下： ``` { "took": 30, "errors": false, "items": [ { "index": { "_index": "test", "_type": "type1", "_id": "1", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": true, "status": 201 } }, { "delete": { "found": false, "_index": "test", "_type": "type1", "_id": "2", "_version": 1, "result": "not_found", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "status": 404 } }, { "create": { "_index": "test", "_type": "type1", "_id": "3", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": true, "status": 201 } }, { "update": { "_index": "test", "_type": "type1", "_id": "1", "_version": 2, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "status": 200 } } ] } ``` 端點是`/_bulk`、`/{index}/_bulk`和`{index}/{type}/_bulk`。當提供索引或索引/類型時，它們將作為批量操作的條目的默認值使用、不會作為明確聲明的條目使用。關于格式的注意。這里的想法是盡可能快地處理這個問題。由于某些操作將被重定向到其他節點上的其他分片，因此在接收節點側僅解析`action_meta_data`。使用此協議的客戶端庫應盡可能嘗試在客戶端執行類似操作，并盡可能減少緩沖。對批量操作的響應是一個大的JSON結構，其中包含了執行每個操作的各個結果。一個動作的失敗不會影響剩余的動作。單次`bulk`調用沒有一個“正確”的操作執行數量。您應該嘗試使用不同的設置來查找特定工作負載的最佳大小。如果使用HTTP API，請確保客戶端不發送HTTP塊，因為這會減慢事情。 ## 版本控制每個bulk條目可以使用`_version`與`version`字段包含版本值。它基于`_version`映射自動跟蹤索引與刪除操作的行為。它還支持`version_typ` / `_version_type`（請參閱[版本控制](Index_API.md#index-versioning)）。 ## 路由每個bulk條目可以使用`_routing`與`routing`字段包括路由值。它基于映射的`_routing`來自動跟蹤索引與刪除操作的行為。 ## 等待活動分片進行批量調用時，您可以設置`wait_for_active_shards`參數，以便在開始處理批量請求之前要求最小數量的分片副本處于活動狀態。有關詳細信息和使用示例，請參閱[此處](Index_API.md#index-wait-for-active-shards)。 ## 沖刷用來控制本次的修改能夠被搜索可見。參見：[refresh](refresh.html)。 ## 更新當使用`update`操作時，`_retry_on_conflict`可以用作動作本身的字段（而不是額外的數據行），可以指定在版本沖突的情況下應重試更新的次數。更新操作數據行支持以下選項：`doc`（部分文檔）、`upsert`、`doc_as_upsert`、`scirpt`、`params`（與腳本結合使用）、`lang`（與腳本結合使用）和`_source`。有關選項的詳細信息，請參閱[更新操作文檔](Update_API.md)。更新操作的示例： ``` POST _bulk { "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} } { "doc" : {"field" : "value"} } { "update" : { "_id" : "0", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} } { "script" : { "inline": "ctx._source.counter += params.param1", "lang" : "painless", "params" : {"param1" : 1}}, "upsert" : {"counter" : 1}} { "update" : {"_id" : "2", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} } { "doc" : {"field" : "value"}, "doc_as_upsert" : true } { "update" : {"_id" : "3", "_type" : "type1", "_index" : "index1", "_source" : true} } { "doc" : {"field" : "value"} } { "update" : {"_id" : "4", "_type" : "type1", "_index" : "index1"} } { "doc" : {"field" : "value"}, "_source": true} ``` ## 安全參見[基于URL的訪問控制](../API_Conventions/URL-based_access_control.md)