First steps · Prometheus 官方文檔中文翻譯

# **First steps with Prometheus** <br /> 歡迎使用 Prometheus！Prometheus 是一個監控平臺，它對監控目標，通過其HTTP endpoints 來收集、采樣指標（metrics）。該指南將會向您展示如何使用Prometheus 安裝、配置和監控我們的第一個資源。您將會下載、安裝和運行Prometheus。您還將下載和安裝 exporter，它能夠暴露主機和服務的時序數據。我們第一個 exporter 將是 Prometheus 本身，它能夠提供關于內存使用、垃圾回收等多種宿主機級別的指標。 ## **下載 Prometheus** [下載最新版本](https://prometheus.io/download)的 Prometheus，然后進行解壓縮： ``` tar xvfz prometheus-*.tar.gz cd prometheus-* ``` Prometheus server 是一個名字為 prometheus 的二進制文件（或 Microsoft Windows 上的 prometheus.exe）。我們可以運行該二進制文件，并通過使用 `--help` 標志來查看其有關其選項的幫助。 ~~~bash ./prometheus --help usage: prometheus [<flags>] The Prometheus monitoring server ~~~ 在開始 Prometheus 之前，我們先來配置它。 ## **配置** Prometheus 配置文件結構為 YAML。下載時隨附一個名為 prometheus.yml 的示例配置文件。我們刪除了示例文件中的大多數注釋以使其更加簡潔。 ~~~yaml global: scrape_interval: 15s evaluation_interval: 15s rule_files: # - "first.rules" # - "second.rules" scrape_configs: - job_name: prometheus static_configs: - targets: ['localhost:9090'] ~~~ 在示例配置文件中有三個有關配置的 block：`global`, `rule_files`, `scrape_configs`. `global` 塊控制著 Prometheus server 的全局配置。在此我們展示了兩項配置。 * `scrape_interval`：采樣間隔，控制著 Prometheus 抓取目標的頻率。您可以為單獨的采樣目標覆蓋該配置。在此示例中，全局配置是每15秒做一次采樣。 * `evaluation_interval`：評估間隔選項控制著 Prometheus 評估規則的頻率。Prometheus 運行評估規則來創建新的時間序列，并生成警報。 `rule_files` 塊指定了我們要 Prometheus server 加載的任意規則。此處我們還沒有規則。最后一個塊，`scrape_configs` 控制了 Prometheus 監控著的采樣目標資源。因為Prometheus 將自身有關的數據通過 HTTP endpoint 暴露，因此它可以抓取并監控自身的運行狀況。在默認配置中，有一個稱為 Prometheus 的 Job，它會抓取Prometheus server 暴露的時序數據。該 Job 包含了一個獨立的，靜態配置的目標：localhost:9090。Prometheus 希望能夠通過 `/metrics` 路由獲取指標。因此，該默認 Job 通過以下 URL 進行采樣：`http://localhost:9090/metrics`。請求獲得的時序數據 response 將會詳細表達 Prometheus server 的狀態和性能。有關配置選項的完整說明，可以參考[配置文檔](https://prometheus.io/docs/operating/configuration)。 ## **開始 Prometheus** 進入到 Prometheus bin 所在的目錄，使用我們新的配置文件來啟動 Prometheus： ~~~bash ./prometheus --config.file=prometheus.yml ~~~ Prometheus 將會啟動。您將同樣能夠瀏覽一個關于 Prometheus 自身的狀態網頁：`http://localhost:9090`。給他 30s 時間來收集通過它自身 HTTP metric endpoint 暴露的數據。您同樣能驗證 Prometheus 是否在提供有關其自身的metrics，通過 endpoint：`http://localhost:9090/metrics`。 ## **使用表達式瀏覽（expression browser）** 讓我們看一下 Prometheus 收集的有關它自身的數據。打開`http://localhost:9090/graph`，選擇 `Console` 中的 `Graph` tab 來使用 Prometheus 自建的表達式瀏覽。正如您可以從 `http://localhost:9090/metrics` 收集的那樣，Prometheus export 的有關它自身的一個指標叫稱為 `promhttp_metric_handler_requests_total`（Prometheus server 已處理的 /metrics 請求總數）。將其輸入到表達式 console 中： ~~~ promhttp_metric_handler_requests_total ~~~ 這將返回返回多個不同的時間序列（以及每個時間序列的最新值），所有時間序列的指標名稱均為 promhttp_metric_handler_requests_total，但標簽（label）不同。這些標簽指定了不同的請求狀態。如果我們只對 HTTP code 是 200 的請求感興趣，我們可以通過這樣的 query 來檢索數據： ~~~ promhttp_metric_handler_requests_total{code="200"} ~~~ 對返回的時間序列進行計數，可以： ~~~ count(promhttp_metric_handler_requests_total) ~~~ 通過查看[表達式語言文檔](https://prometheus.io/docs/querying/basics/)來獲取更多的信息。 ## **使用圖形接口** 要想圖形化表達式，打開`http://localhost:9090/graph`，選擇`Graph` tab。舉個例子，輸入如下表達式，圖形化Prometheus本身的status code為200的每秒HTTP請求率： ~~~ rate(promhttp_metric_handler_requests_total{code="200"}[1m]) ~~~ ## **監控其他目標** 僅從 Prometheus 收集指標并不能很好地說明 Prometheus 的功能。為了更好的了解 Prometheus 能做什么，我們建議您瀏覽其他 exporters 的文檔。使用 `node exporter` 來監控 Linux 或 macOS 的宿主機指標是一個不錯的起點。 ## **總結** 在該指南中，您安裝了 Prometheus，配置了一個 Prometheus 實例來監控資源，并了解了在 Prometheus 表達式瀏覽中使用時間序列數據的一些基礎知識。要繼續學習 Prometheus，請查看[概述](https://prometheus.io/docs/introduction/overview)以獲取有關接下來要探索的內容的一些想法。