Histogram Aggregation · my-elasticsearch-cn

## Histogram Aggregation A multi-bucket values source based aggregation,可以應用于從文檔中提取的數值。它會動態地在值上構建固定大小（a.k.a.interval）桶。例如，如果文檔有一個包含價格的字段(數值)，我們可以配置這個聚合來動態地構建帶間隔5的bucket（比如價格可能代表$ 5），當聚合執行時，每個文檔的價格字段將被評估，并將四舍五入到最接近的bucket，例如，如果價格是32，而bucket（桶）的大小是5，那么四舍五入將產生30，因此，文檔將“掉落”到與關鍵30相關的bucket（桶）中，為了使這更正式，這里是使用的如下計算公式： | `bucket_key = Math.floor((value - offset) / interval) * interval + offset` | interval必須是正數，而offset（偏移量）必須是小數`[0, interval[`. 下面的代碼片段“bucket”基于價格的間隔為50 | `POST /sales/_search?size=0` `{` `"aggs" : {` `"prices" : {` `"histogram" : {` `"field" : "price",` `"interval" : 50` `}` `}` `}` `}` | 可能返回以下結果： | `{` `...` `"aggregations": {` `"prices" : {` `"buckets": [` `{` `"key": 0.0,` `"doc_count": 1` `},` `{` `"key": 50.0,` `"doc_count": 1` `},` `{` `"key": 100.0,` `"doc_count": 0` `},` `{` `"key": 150.0,` `"doc_count": 2` `},` `{` `"key": 200.0,` `"doc_count": 3` `}` `]` `}` `}` `}` | ### Minimum document count 上面的結果顯示，沒有任何文檔的價格在[100 - 150)范圍內。默認情況下，返回結果將用空桶填充直方圖中的空白。由于min_doc_count設置，可能會更改這個和請求桶的最小值，這是由min_doc_count設置: | `POST /sales/_search?size=0` `{` `"aggs" : {` `"prices" : {` `"histogram" : {` `"field" : "price",` `"interval" : 50,` `"min_doc_count" : 1` `}` `}` `}` `}` | 返回結果： | `{` `...` `"aggregations": {` `"prices" : {` `"buckets": [` `{` `"key": 0.0,` `"doc_count": 1` `},` `{` `"key": 50.0,` `"doc_count": 1` `},` `{` `"key": 150.0,` `"doc_count": 2` `},` `{` `"key": 200.0,` `"doc_count": 3` `}` `]` `}` `}` `}` | 默認情況下，histogram返回數據本身范圍內的所有bucket,也就是說，具有最小值(使用直方圖)的文檔將確定最小的bucket(帶有最小鍵的bucket)，具有最高值的文檔將確定最大的bucket(具有最高鍵的bucket)。通常，當請求空buckets時，這會造成混亂，特別是當數據被過濾時。為了說明原因，讓我們來看一下列子：假設你正在過濾您的請求，以獲取值在0到500之間的所有文檔，此外，您還希望使用直方圖來將數據切片，其中間隔為50，您還要指定“min_doc_count”：0，因為您希望獲得所有的桶，即使是空的。如果發生這種情況，所有產品(文件)的價格都高于100，你將獲得的第一個bucket將是一個100的key，這是令人困惑的，很多次，你還想把這些桶放在0到100之間。通過使用extended_bounds設置，現在，您可以“強制”直方圖聚合來開始在特定的min值上構建bucket，并且還可以繼續構建到最大值的bucket（即使沒有文檔了），當min_doc_count為0時，使用extended_bounds才有意義（如果min_doc_count大于0，則永遠不會返回空buckets）注意，(顧名思義)extended_bounds不是過濾buckets。意味著，如果extended_bounds.min高于從文檔中提取的值。這些文件仍將決定第一個bucket將是什么（對于extended_bounds.max和最后一個bucket也是一樣），對于filtering buckets，應使用適當的from/to設置將范圍過濾器聚合下的直方圖聚合嵌套。例子： | `POST /sales/_search?size=0` `{` `"query" : {` `"constant_score" : { "filter": { "range" : { "price" : { "to" : "500" } } } }` `},` `"aggs" : {` `"prices" : {` `"histogram" : {` `"field" : "price",` `"interval" : 50,` `"extended_bounds" : {` `"min" : 0,` `"max" : 500` `}` `}` `}` `}` `}` | ### Order 默認情況下，返回的bucket按它們的key升序排序，盡管順序行為可以通過order設置來控制。按鍵降序排列桶： | `POST /sales/_search?size=0` `{` `"aggs" : {` `"prices" : {` `"histogram" : {` `"field" : "price",` `"interval" : 50,` `"order" : { "_key" : "desc" }` `}` `}` `}` `}` | 按其doc_count - 升序排列： | `POST /sales/_search?size=0` `{` `"aggs" : {` `"prices" : {` `"histogram" : {` `"field" : "price",` `"interval" : 50,` `"order" : { "_count" : "asc" }` `}` `}` `}` `}` | If the histogram aggregation has a direct metrics sub-aggregation,?則后者可以確定桶的順序： | `POST /sales/_search?size=0` `{` `"aggs" : {` `"prices" : {` `"histogram" : {` `"field" : "price",` `"interval" : 50,` `"order" : { "price_stats.min" : "asc" } #1` `},` `"aggs" : {` `"price_stats" : { "stats" : {"field" : "price"} }` `}` `}` `}` `}` | #1 ?{“price_stats.min”：asc“}將根據其price_stats子聚合的最小值對桶進行排序。也可以根據層次結構中的“更深層次的”聚合來對buckets進行排序，只要聚合路徑是single-bucket類型，就可以支持這一點，在路徑中的最后一個聚合可能是單桶的，也可以是度量的。如果它是一個single-bucket類型，那么這個順序將由bucket中的文檔數來定義（例如doc_count），如果這是一個度量標準，則與上面的規則相同（如果路徑必須指出度量名稱以在multi-value度量聚合的情況下排序，并且在single-value度量聚合的情況下，該排序將應用于該值）路徑必須以下列形式定義： | `AGG_SEPARATOR?????? =? '>' ;` `METRIC_SEPARATOR??? =? '.' ;` `AGG_NAME??????????? =? <the name of the aggregation> ;` `METRIC????????????? =? <the name of the metric (in case of multi-value metrics aggregation)> ;` `PATH??????????????? =? <AGG_NAME> [ <AGG_SEPARATOR>, <AGG_NAME> ]* [ <METRIC_SEPARATOR>, <METRIC> ] ;` | | `POST /sales/_search?size=0` `{` `"aggs" : {` `"prices" : {` `"histogram" : {` `"field" : "price",` `"interval" : 50,` `"order" : { "promoted_products>rating_stats.avg" : "desc" }` `},` `"aggs" : {` `"promoted_products" : {` `"filter" : { "term" : { "promoted" : true }},` `"aggs" : {` `"rating_stats" : { "stats" : { "field" : "rating" }}` `}` `}` `}` `}` `}` `}` | 上述將根據促銷產品的平均評級對桶進行排序 ### Offset 默認情況下，bucket鍵以0開始，然后以interval間隔均勻分布，例如，如果間隔為10，則第一個桶（假設里面有數據）將為[0 - 9]，[10-19]，[20-29]，可以使用offset選項來改變bucket的邊界。這可以用一個例子來說明，如果有10個值從5到14的文檔，使用interval10將產生兩個bucket，每個bucket包含5個文檔，如果使用附加的offset為5，則只有一個包含所有10個文檔的單個bucket[5-14]。 ### Response Format 默認情況下，buckets作為有序數組返回，還可以將響應請求為哈希，而不是用bucket鍵。 | `POST /sales/_search?size=0` `{` `"aggs" : {` `"prices" : {` `"histogram" : {` `"field" : "price",` `"interval" : 50,` `"keyed" : true` `}` `}` `}` `}` | 響應結果： | `{` `...` `"aggregations": {` `"prices": {` `"buckets": {` `"0.0": {` `"key": 0.0,` `"doc_count": 1` `},` `"50.0": {` `"key": 50.0,` `"doc_count": 1` `},` `"100.0": {` `"key": 100.0,` `"doc_count": 0` `},` `"150.0": {` `"key": 150.0,` `"doc_count": 2` `},` `"200.0": {` `"key": 200.0,` `"doc_count": 3` `}` `}` `}` `}` `}` | ### Missing value missing的參數定義了如何處理缺少值的文檔，默認情況下，它們將被忽略，但也有可能將它們視為具有值 | `POST /sales/_search?size=0` `{` `"aggs" : {` `"quantity" : {` `"histogram" : {` `"field" : "quantity",` `"interval": 10,` `"missing": 0 ＃1` `}` `}` `}` `}` | ＃1 ? quantity字段沒有值的文檔將落入與文檔相同的bucket中＃1 ? 值為0