elasticsearch文檔操作 · 碼山有道：Java技術生態 · 看云

<ruby id="bdb3f"></ruby>

<p id="bdb3f"><cite id="bdb3f"></cite></p>

<p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>

<p id="bdb3f"><cite id="bdb3f"></cite></p>

<pre id="bdb3f"></pre>

<pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

<ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
<pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre>

<output id="bdb3f"></output><p id="bdb3f"></p>

<pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

<ruby id="bdb3f"></ruby>

[TOC] ## 一、索引文檔文檔通過 index API 被索引--使數據可以被存儲和搜索。，文檔通過其 _index、 _type 、 _id 唯一確定。可以自己提供一個 _id ，或者也使用 index API 生成一個。 ### 1.1、使用自己的 ID 如果文檔有自然的標識符（例如user_account字段或者其他值表示文檔），就可以提供自己的_id，使用這種形式的indexAPI： ```shell PUT /{index}/{type}/{id} { "field": "value", ... } ``` 例如索引叫做 “website” ，類型叫做 “blog” ， ID 是 “123” ，那么這個索引請求就像這樣： ```shell curl -H "Content-Type: application/json" -XPUT localhost:9200/website/blog/123 -d ' { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" }' ``` Elasticsearch 的響應： ```json { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "result": "created" } ``` 響應指出請求的索引已經被成功創建，這個索引中包含 _index 、 _type 和 _id 元數據，以及一個新元素： _version 。Elasticsearch中每個文檔都有版本號，每當文檔變化（包括刪除）都會使 _version 增加。 ### 1.2、自增 ID 如果數據沒有自然 ID，可以讓 Elasticsearch 自動生成。請求結構發生了變化： PUT 方法（“在這個 URL 中存儲文檔”，可以認為是更新）變成了 POST 方法（“在這個類型下存儲文檔”，符合 POST 新增的語義）。 URL 現在只包含 _index 和 _type 兩個字段： ```shell curl -H "Content-Type: application/json" -XPOST localhost:9200/website/blog -d ' { "title": "My second blog entry", "text": "Still trying this out...", "date": "2014/01/01" }' ``` 響應內容與剛才類似，只有 _id 字段變成了自動生成的值： ```json { "_index": "website", "_type": "blog", "_id": "zqIzcm0BgOKExS3TRFoX", "_version": 1, "result": "created" } ``` ### 1.3、創建一個新文檔當索引一個文檔，如何確定是完全創建了一個新的還是覆蓋了一個已經存在的呢？ _index 、 _type 、 _id 三者唯一確定一個文檔。所以要想保證文檔是新加入的，最簡單的方式是使用 POST 方法讓 Elasticsearch 自動生成唯一 _id ： ```shell POST /website/blog/ ``` 如果想使用自定義的 _id ，必須告訴 Elasticsearch 應該在 _index 、 _type 、 _id 三者都不同時才接受請求。為了做到這點有兩種方法： 1. 第一種方法使用 op_type 查詢參數： ```shell PUT /website/blog/123?op_type=create ``` 2. 第二種方法是在 URL 后加 /_create 做為端點： ```shell PUT /website/blog/123/_create ``` 如果成功創建了一個新文檔，Elasticsearch 將返回正常的元數據且響應狀態碼是 201 Created 。如果包含相同的 _index 、 _type 和 _id 的文檔已經存在，Elasticsearch 將返回 409 Conflict 響應狀態碼，并提示 `document already exists`。 ## 二、檢索文檔 ### 2.1、檢索文檔的全部想要從 Elasticsearch 中獲取文檔，使用同樣的 _index、 _type、 _id，但是 HTTP 方法改為 GET （在任意的查詢字符串中增加 pretty 參數，會讓 Elasticsearch 美化輸出 (pretty-print) JSON 響應以便更加容易閱讀）： ```shell curl -XGET localhost:9200/website/blog/123?pretty ``` 響應包含了元數據節點，增加了 _source 字段，它包含了在創建索引時發送給Elasticsearch 的原始文檔： ```json { "_index" : "website", "_type" : "blog", "_id" : "123", "_version" : 1, "found" : true, "_source" : { "title" : "My first blog entry", "text" : "Just trying this out...", "date" : "2014/01/01" } } ``` ### 2.2、檢索文檔的一部分通常，GET 請求將返回文檔的全部，存儲在 _source 參數中。但是可能感興趣的字段只是title。請求個別字段可以使用 _source 參數。多個字段可以使用逗號分隔： ```shell curl -i -XGET localhost:9200/website/blog/123?_source=title,text ``` -i 可以顯示請求頭，_source 字段現在只包含請求的字段，過濾了date字段： ```json HTTP/1.1 200 OK content-type: application/json; charset=UTF-8 content-length: 148 { "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "found": true, "_source": { "text": "Just trying this out...", "title": "My first blog entry" } } ``` 或者只想得到 _source 字段而不要其他的元數據，可以這樣請求： ```shell curl -XGET localhost:9200/website/blog/123/_source?pretty ``` 它僅僅返回: ```json { "title" : "My first blog entry", "text" : "Just trying this out...", "date" : "2014/01/01" } ``` ### 2.3、檢查文檔是否存在如果想做的只是檢查文檔是否存在，使用 HEAD 方法來代替 GET 。 HEAD 請求不會返回響應體，只有HTTP頭： ```shell curl --head http://localhost:9200/website/blog/123 ``` Elasticsearch 將會返回 200 OK 狀態如果文檔存在： ```json HTTP/1.1 200 OK content-type: application/json; charset=UTF-8 content-length: 188 ``` ### 2.4、檢索多個文檔檢索多個文檔依舊非常快。合并多個請求可以避免每個請求單獨的網絡開銷。如果需要從 Elasticsearch 中檢索多個文檔，相對于一個一個的檢索，更快的方式是在一個請求中使用 multi-get 或者 mget API。 mgetAPI 參數是一個 docs 數組，數組的每個節點定義一個文檔的 _index、 _type、 _id 元數據。如果只想檢索一個或幾個確定的字段，也可以定義一個 _source 參數： ```shell curl -H "Content-Type: application/json" -XPOST localhost:9200/_mget?pretty -d ' { "docs": [ { "_index": "website", "_type": "blog", "_id": 1 }, { "_index": "website", "_type": "blog", "_id": 2, "_source": "views" } ] }' ``` 響應體也包含一個 docs 數組，每個文檔還包含一個響應，它們按照請求定義的順序排列。 ```json { "docs" : [ { "_index" : "website", "_type" : "blog", "_id" : "1", "found" : false }, { "_index" : "website", "_type" : "blog", "_id" : "2", "_version" : 3, "found" : true, "_source" : { "views" : 3 } } ] } ``` 如果想檢索的文檔在同一個 `_index`中（甚至在同一個 `_type` 中），就可以在 URL 中定義一個默認的 `/_index` 或者 `/_index/_type` : ```shell curl -H "Content-Type: application/json" -XPOST localhost:9200/website/blog/_mget?pretty -d ' { "docs": [ { "_id": 1 }, { "_type": "pageviews", "_id": 1 } ] }' ``` 如果所有文檔具有相同 _index 和 _type，可以通過簡單的 ids 數組來代替完整的 docs 數組： ```shell curl -H "Content-Type: application/json" -XPOST localhost:9200/website/blog/_mget?pretty -d ' { "ids": ["2", "1"] }' ``` ## 三、更新文檔 ### 3.1、整體文檔更新文檔在 Elasticsearch 中是不可變的--不能修改他們。如果需要更新已存在的文檔，可以重建索引(reindex)，或者替換掉它。 ```shell curl -H "Content-Type: application/json" -XPUT localhost:9200/website/blog/123?pretty -d ' { "title": "My first blog entry", "text": "I am starting to get the hang of this...", "date": "2014/01/02" }' ``` 在響應中，可以看到 Elasticsearch 把 _version 增加了: ```json { "_index" : "website", "_type" : "blog", "_id" : "123", "_version" : 2, "result" : "updated" } ``` 在內部，Elasticsearch 已經標記舊文檔為刪除并添加了一個完整的新文檔。舊版本文檔不會立即消失，但也不能去訪問它。Elasticsearch 會在繼續索引更多數據時清理被刪除的文檔。 ### 3.2、指定版本更新文檔 Elasticsearch 是分布式的。當文檔被創建、更新或刪除，文檔的新版本會被復制到集群的其它節點。Elasticsearch 即是同步的又是異步的，意思是這些復制請求都是平行發送的，并無序(out of sequence)的到達目的地。這就需要一種方法確保老版本的文檔永遠不會覆蓋新的版本。執行 index 、 get 、 delete 請求時，每個文檔都有一個 _version 號碼，這個號碼在文檔被改變時加一。Elasticsearch 使用這個 _version 保證所有修改都被正確排序。當一個舊版本出現在新版本之后，它會被簡單的忽略。利用 _version 的這一優點確保數據不會因為修改沖突而丟失。可以指定文檔的 version 來做想要的更改。如果那個版本號不是現在的，請求就失敗了。創建一個新的博文： ```shell curl -H "Content-Type: application/json" -XPUT localhost:9200/website/blog/1/_create -d ' { "title": "My first blog entry", "text": "Just trying this out..." }' ``` 首先檢索文檔： ```shell curl -XGET localhost:9200/website/blog/1?pretty ``` 響應體包含相同的 _version 是 1： ```json { "_index" : "website", "_type" : "blog", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "title" : "My first blog entry", "text" : "Just trying this out..." } } ``` 現在，當通過重新索引文檔保存修改時，這樣指定了 version 參數，只希望文檔的_version 是 1 時更新才生效： ```shell curl -H "Content-Type: application/json" -XPUT localhost:9200/website/blog/1?version=1 -d ' { "title": "My first blog entry", "text": "Starting to get the hang of this..." }' ``` 請求成功，響應體_version 已經增加到 2 ： ```json { "_index": "website", "_type": "blog", "_id": "1", "_version": 2, "result": "updated" } ``` 如果重新運行相同的索引請求，依舊指定 version=1，Elasticsearch 將返回 409 Conflict 狀態的 HTTP 響應。 ### 3.3、文檔局部更新通過檢索，修改，然后重建整文檔的索引方法來更新文檔。使用 update API，可以使用一個請求來實現局部更新，用于添加新字段或者更新已有字段。文檔是不可變的--它們不能被更改，只能被替換。 update API 必須遵循相同的規則。表面看來，似乎是局部更新了文檔的位置，內部卻是像之前說的一樣簡單使用update API 處理相同的檢索-修改-重建索引流程，也減少了其他進程可能導致沖突的修改。最簡單的 update 請求表單接受一個局部文檔參數 doc ，它會合并到現有文檔中--對象合并在一起，存在的標量字段被覆蓋，新字段被添加。可以使用以下請求為博客添加一個 tags 字段和一個 views 字段： ```shell curl -H "Content-Type: application/json" -XPOST localhost:9200/website/blog/1/_update?pretty -d ' { "doc": { "tags": [ "testing" ], "views": 0 } }' ``` 如果請求成功，將看到類似 index 請求的響應結果： ```json { "_index" : "website", "_type" : "blog", "_id" : "1", "_version" : 3, "result" : "updated" ``` 檢索文檔文檔顯示被更新的 _source 字段： ```json { "_index" : "website", "_type" : "blog", "_id" : "1", "_version" : 3, "found" : true, "_source" : { "title" : "My first blog entry", "text" : "Starting to get the hang of this...", "views" : 0, "tags" : [ "testing" ] } } ``` ### 3.4、使用腳本局部更新當 API 不能滿足要求時，Elasticsearch 允許使用腳本實現自己的邏輯。默認的腳本語言是 Groovy，一個快速且功能豐富的腳本語言，語法類似于Javascript。腳本能夠使用 update API 改變 `_source` 字段的內容，它在腳本內部以 `ctx._source` 表示。可以使用腳本增加博客的 views 數量： ```shell curl -H "Content-Type: application/json" -XPOST localhost:9200/website/blog/1/_update?pretty -d ' { "script": "ctx._source.views+=1" }' ``` 還可以使用腳本增加一個新標簽到 tags 數組中： ```json curl -H "Content-Type: application/json" -XPOST localhost:9200/website/blog/1/_update?pretty -d ' { "script": { "source": "ctx._source.tags.add(params.new_tag)", "params": { "new_tag": "search" } } }' ``` ### 3.5、更新可能不存在的文檔要在 Elasticsearch 中存儲瀏覽量計數器。每當有用戶訪問頁面，增加這個頁面的瀏覽量。但如果這是個新頁面，并不確定這個計數器存在與否，當試圖更新一個不存在的文檔，更新將失敗。可以使用 upsert 參數定義文檔來使其不存在時被創建。 ```shell curl -H "Content-Type: application/json" -XPOST localhost:9200/website/blog/2/_update?pretty -d ' { "script": "ctx._source.views+=1", "upsert": { "views": 1 } }' ``` 第一次執行這個請求， upsert 值被索引為一個新文檔，初始化 views 字段為 1，接下來文檔已經存在，所以 script 被更新代替，增加 views 數量。對于多用戶的局部更新，兩個進程都要增加頁面瀏覽量，增加的順序可以不關心，如果沖突發生，可以重新嘗試更新既可。可以通過 retry_on_conflict 參數設置重試次數來自動完成，這樣 update 操作將會在發生錯誤前重試--這個值默認為0。 ```shell curl -H "Content-Type: application/json" -XPOST localhost:9200/website/blog/2/_update?retry_on_conflict=5 -d ' { "script": "ctx._source.views+=1", "upsert": { "views": 0 } }' ``` ### 3.6、更新時的批量操作就像 mget 允許一次性檢索多個文檔一樣， bulk API允許使用單一請求來實現多個文檔的 create、 index、 update 或 delete。這對索引類似于日志活動這樣的數據流非常有用，它們可以以成百上千的數據為一個批次按序進行索引。 bulk 請求體如下，它有一點不同尋常： ```json { action: { metadata }}\n { request body }\n { action: { metadata }}\n { request body }\n ... ``` 這種格式類似于用 "\n" 符號連接起來的一行一行的 JSON 文檔流(stream)。兩個重要的點需要注意：這種格式類似于用 "\n" 符號連接起來的一行一行的 JSON 文檔流(stream)。兩個重要的點需要注意： - 每行必須以 "\n" 符號結尾，包括最后一行。這些都是作為每行有效的分離而做的標記。 - 每一行的數據不能包含未被轉義的換行符，它們會干擾分析--這意味著 JSON 不能被美化打印。 #### 3.6.1、action/metadata 這一行定義了文檔行為(what action)發生在哪個文檔(which document)之上。行為(action)必須是以下幾種： - create：當文檔不存在時創建 - index：創建新文檔或替換已有文檔 - update：局部更新文檔 - delete：刪除一個文檔在索引、創建、更新或刪除時必須指定文檔的`_index、 _type、 _id`這些元數據(metadata)。例如刪除請求看起來像這樣： ```json { "delete": { "_index": "website", "_type": "blog", "_id": "123" }} ``` #### 3.6.2、請求體(request body) 由文檔的 `_source` 組成--文檔所包含的一些字段以及其值。 - 它被 index 和 create 操作所必須，這是有道理的：必須提供文檔用來索引 - 這些還被 update 操作所必需，而且請求體的組成應該與 update API（ doc, upsert, script 等等）一致 - 刪除操作不需要請求體(request body) ```json { "create": { "_index": "website", "_type": "blog", "_id": "123" }} { "title": "My first blog post" } ``` #### 3.6.3、bulk 請求為了將這些放在一起，bulk 請求表單是這樣的： ```json curl -H "Content-Type: application/json" -XPOST localhost:9200/_bulk?pretty -d ' { "delete": { "_index": "website", "_type": "blog", "_id": "123" }} { "create": { "_index": "website", "_type": "blog", "_id": "123" }} { "title": "My first blog post" } { "index": { "_index": "website", "_type": "blog" }} { "title": "My second blog post" } { "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 5}} { "doc" : {"title" : "My updated blog post"}} ' ``` Elasticsearch 響應包含一個 items 數組，它羅列了每一個請求的結果，結果的順序與我們請求的順序相同： ```json { "took" : 1069, "errors" : false, "items" : [ { "delete" : { "_index" : "website", "_type" : "blog", "_id" : "123", "_version" : 1, "result" : "not_found", "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1, "status" : 404 } }, { "create" : { "_index" : "website", "_type" : "blog", "_id" : "123", "_version" : 2, "result" : "created", "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "_seq_no" : 1, "_primary_term" : 1, "status" : 201 } }, { "index" : { "_index" : "website", "_type" : "blog", "_id" : "_--jcm0Bsjr6Q3VWz7kw", "_version" : 1, "result" : "created", "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1, "status" : 201 } }, { "update" : { "_index" : "website", "_type" : "blog", "_id" : "123", "_version" : 3, "result" : "updated", "_shards" : { "total" : 1, "successful" : 1, "failed" : 0 }, "_seq_no" : 2, "_primary_term" : 1, "status" : 200 } } ] } ``` 每個子請求都被獨立的執行，所以一個子請求的錯誤并不影響其它請求。如果任何一個請求失敗，頂層的 error 標記將被設置為 true，然后錯誤的細節將在相應的請求中被報告。這些說明 bulk 請求不是原子操作--它們不能實現事務。每個請求操作時分開的，所以每個請求的成功與否不干擾其它操作。為每個文檔指定相同的元數據是多余的。就像 mget API，bulk 請求也可以在 URL 中使用 `/_index` 或 `/_index/_type` : ```shell POST /website/_bulk { "index": { "_type": "log" }} { "event": "User logged in" } ``` 依舊可以覆蓋元數據行的 `_index` 和 `_type` ，在沒有覆蓋時它會使用 URL 中的值作為默認值： ```json POST /website/log/_bulk { "index": {}} { "event": "User logged in" } { "index": { "_type": "blog" }} { "title": "Overriding the default type" } ``` ## 四、刪除文檔刪除文檔的語法模式與之前基本一致，只不過要使用 DELETE 方法： ```shell curl -XDELETE localhost:9200/website/blog/123 ``` 如果文檔被找到，Elasticsearch 將返回 200 OK 狀態碼和以下響應體。注意 _version 數字已經增加了： ```json { "_index": "website", "_type": "blog", "_id": "123", "_version": 3, "result": "deleted" ``` 作者：w1992wishes 鏈接：https://www.jianshu.com/p/dfc29826f793 來源：簡書著作權歸作者所有。商業轉載請聯系作者獲得授權，非商業轉載請注明出處。

<ruby id="bdb3f"></ruby>

<p id="bdb3f"><cite id="bdb3f"></cite></p>

<p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>

<p id="bdb3f"><cite id="bdb3f"></cite></p>

<pre id="bdb3f"></pre>

<pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

<ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
<pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre>

<output id="bdb3f"></output><p id="bdb3f"></p>

<pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

<ruby id="bdb3f"></ruby>

哎呀哎呀视频在线观看