Prometheus配置文件 · Kubernetes

[TOC] >[info] 說明：本章節只介紹常用的參數 # global全局配置常用參數列表 | 參數 | 參數說明 | | :-: | :-: | | **scrape_interval** | 默認情況下抓取目標的頻率，默認1m | | **scrape_timeout** | 抓取請求超時需要多長時間，默認10s | | **evaluation_interval** | 評估規則的頻率，默認1m | # rule_files配置文件沒有參數，只有添加配置文件。參考下面的示例; 支持正則匹配文件 ```yaml rule_files: - /etc/prometheus/*.rules ``` # scrape_config_files采集文件沒有參數，只有添加配置文件。參考下面的示例; 支持正則匹配文件 ```yaml scrape_config_files: - /etc/prometheus/*.target ``` >[info] 文件的內容與scrape_configs參數一致，只需要將scrape_configs內容拷過去就好了。具體配置請查看scrape_configs段落 # scrape_configs采集規則這個是監控的關鍵，是根據下面的配置來實現監控的。下面列舉常用的配置項，請看所有的配置，請查看[Prometheus官方文檔](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) | 參數 | 參數說明 | | :-: | :- | | **job_name** | 任務的名稱 | | **scrape_interval** | 抓取目標的頻率，沒有設置則使用全局配置 | | **scrape_timeout** | 抓取請求超時需要多長時間，沒有設置則使用全局配置 | | **metrics_path** | 從目標獲取指標的 HTTP 資源路徑，默認是 `/metrics` | | **scheme** | 配置用于請求的協議方案，默認是http | | **params** | 可選的 HTTP URL 參數 | | **relabel_configs** | 可以在目標被抓取之前動態地重寫目標的標簽【重要】 | | **basic_auth** | 在每個抓取請求上設置 `Authorization` 標頭配置的用戶名和密碼，password 和 password_file 是互斥的 | | **authorization** | 使用配置的憑據在每個抓取請求上設置 `Authorization` 標頭 | | **tls_config** | 配置抓取請求的 TLS 設置 | | **static_configs** | 標記的靜態配置的警報管理器列表 | | **file_sd_config** | 文件服務發現配置列表 | | **consul_sd_config** | Consul 服務發現配置列表 | | **docker_sd_config** | Docker 服務發現配置列表 | | **kubernetes_sd_config** | Kubernetes SD 配置允許從 Kubernetes 的 REST API 檢索抓取目標并始終與集群狀態保持同步 | scrape_configs采集規則有兩類： 1. 靜態配置(上面列舉的倒數第五個就是靜態配置)，每次配置后都需要重啟Prometheus服務 2. 服務發現(上面列舉的后四個都是，其他服務發現的請看官方文檔)。prometheus-server自動發現target。無需重啟Prometheus服務【推薦】 prometheus 如何工作感知采集地址，路徑以及http協議的呢？ 1. 采集地址：分為兩類情況 - 靜態配置以及基于文件服務發現，都是根據tagers確認IP地址以及端口 - 其他的服務發現是根據 `instance` 標簽作為采集metrics數據的地址。當 `instance` 標簽不存在時，就使用 `__address__` 標簽替代 `instance` 標簽 2. 采集http協議：scheme參數設置，默認是http協議。也可以 `__scheme__` 標簽設置為采集目標的http協議 3. 采集路徑：metrics_path參數配置，默認是/metrics。也可以 ` __metrics_path__` 標簽設置為采集目標的metrics路徑 ## relabel_configs 配置重新標記是一個強大的工具，可以在目標被抓取之前動態重寫目標的標簽集。每個抓取配置可以配置多個重新標記步驟。它們按照在配置文件中出現的順序應用于每個目標的標簽集。詳細參數 [請參考官方文檔](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config) | 參數 | 參數說明 | | :-: | :- | | source_labels | 源標簽從現有標簽中選擇值 | | separator | 放置在串聯源標簽值之間的分隔符，默認值; | | target_label | 在替換操作中將結果值寫入的標簽 | | regex | 與源標簽提取的值相匹配的正則表達式，默認值(.*) | | replacement | 如果正則表達式匹配，則執行正則表達式替換的替換值，默認值$1 | | action | 基于正則表達式匹配執行的操作，默認值replace | > action 常用值 > - replace：將正則表達式與連接的 source_labels 匹配。然后，將 target_label 設置為替換，替換中的匹配組引用 (${1}, ${2}, ...) 替換為它們的值。如果正則表達式不匹配，則不進行替換。 > - keep：刪除正則表達式與連接的 source_labels 不匹配的目標。 > - drop：刪除正則表達式與連接的 source_labels 匹配的目標。 ## 采集規則配置流程 1. shell命令使用curl測試能獲取到metrics數據 a. 確認獲取metrics數據的參數 b. 確認地址是靜態還是動態發現 2. 添加Prometheus采集metrics數據配置 ## 基于靜態配置 **使用shell獲取Prometheus監控指標** ```shell $ curl -s localhost:9090/metrics | head # HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 3.7762e-05 go_gc_duration_seconds{quantile="0.25"} 0.000101175 go_gc_duration_seconds{quantile="0.5"} 0.00016822 go_gc_duration_seconds{quantile="0.75"} 0.000428428 go_gc_duration_seconds{quantile="1"} 0.00079745 go_gc_duration_seconds_sum 0.002778413 go_gc_duration_seconds_count 11 # HELP go_goroutines Number of goroutines that currently exist. ``` **在 prometheus-target 的configmap添加一個配置文件** ```yaml prometheus.targets: | scrape_configs: - job_name: "prometheus" static_configs: - targets: - "localhost:9090" ``` **驗證Prometheus的targets的界面** ![targets01](https://img.kancloud.cn/c2/80/c280c59735f85956d9b68c44b57752d3_1689x209.png) ## 基于文件服務發現基于文件的服務發現提供了一種更通用的方式來配置靜態目標，并用作插入自定義服務發現機制的接口。 **使用shell獲取node-exporter監控指標** ```shell curl -s 192.168.31.103:9100/metrics | head # HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 3.6659e-05 go_gc_duration_seconds{quantile="0.25"} 8.684e-05 go_gc_duration_seconds{quantile="0.5"} 0.00018778 go_gc_duration_seconds{quantile="0.75"} 0.000327928 go_gc_duration_seconds{quantile="1"} 0.092123081 go_gc_duration_seconds_sum 0.200803256 go_gc_duration_seconds_count 50 # HELP go_goroutines Number of goroutines that currently exist. ``` **在 prometheus-target 的configmap添加兩個配置文件** ```yaml # 基于文件發現 # 后續有其他基于文件發現的話，都是在這個文件下添加job_name即可 file_discovery.targets: | scrape_configs: - job_name: "node-exporter" file_sd_configs: - files: - /etc/prometheus/target/node-exporter.yml # 刷新間隔，重新讀取文件 refresh_interval: 1m # 關于node-exporter增刪節點都是操作這個文件 # 無需roload/重啟Prometheus服務，即可生效 node-exporter.yml: | - targets: - "192.168.31.103:9100" - "192.168.31.79:9100" - "192.168.31.95:9100" - "192.168.31.78:9100" - "192.168.31.253:9100" ``` **驗證Prometheus的targets的界面** ![target02](https://img.kancloud.cn/4d/62/4d62635bcc01b4ac9024d4abfb6d9954_1727x343.png) ## 基于kubernetes服務發現 ### kubernetes node 節點角色為每個集群節點發現一個目標，其地址默認為 **`Kubelet`** 的 HTTP 端口。目標地址默認為NodeInternalIP、NodeExternalIP、NodeLegacyHostIP、NodeHostName的地址類型順序中Kubernetes節點對象的第一個現有地址，按照順序往下匹配。匹配成功則賦值 `__address__`的值。詳細信息 [請參考官方文檔](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#node) ![](https://img.kancloud.cn/bf/7e/bf7e11f5e20615753250846a042e8577_1155x166.png) **使用shell獲取 kubelet 監控指標** ```shell # 獲取sa的ca.crt證書 kubectl -n kube-system get secret `kubectl -n kube-system get sa prometheus -ojsonpath="{.secrets[0].name}"` -ojsonpath='{.data.ca\.crt}' | base64 -d > /tmp/ca.crt # 獲取token TOKEN=$(kubectl -n kube-system get secret `kubectl -n kube-system get sa prometheus -ojsonpath="{.secrets[0].name}"` -ojsonpath='{.data.token}' | base64 -d) # 訪問kubelet的metrics數據 curl -k --cacert /tmp/ca.crt -H "Authorization: Bearer ${TOKEN}" https://192.168.32.127:10250/metrics | head # HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend. # TYPE apiserver_audit_event_total counter apiserver_audit_event_total 0 # HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend. # TYPE apiserver_audit_requests_rejected_total counter apiserver_audit_requests_rejected_total 0 # HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request. # TYPE apiserver_client_certificate_expiration_seconds histogram apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0 apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0 ``` > 說明curl的參數 > 1. `-k`：關閉curl對證書的驗證 > 2. `--cacert`: 提供訪問kubelet的ca證書 > 3. `-H "Authorization: Bearer ${TOKEN}"`: 提供訪問kueblet的token值 **在 prometheus-target 的configmap添加一個配置文件** ```yaml kubernetes.targets: | scrape_configs: - job_name: "kubelet" scheme: https tls_config: # 對應 curl --cacert 參數 ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt # 對應 curl -k 參數 insecure_skip_verify: true # 對應 curl -H "Authorization: Bearer ${TOKEN}" 參數 bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node ``` ### kubernetes endpoints 端點角色從列出的服務端點中發現目標。對于每個端點地址，每個端口都會發現一個目標。如果端點由 pod 支持，則該 pod 的所有其他容器端口（未綁定到端點端口）也會被發現作為目標。詳細參數 [請參考官方文檔](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#endpoints) **使用shell獲取 kube-apiserver 監控指標** ```shell # 獲取sa的ca.crt證書 kubectl -n kube-system get secret `kubectl -n kube-system get sa prometheus -ojsonpath="{.secrets[0].name}"` -ojsonpath='{.data.ca\.crt}' | base64 -d > /tmp/ca.crt # 獲取token TOKEN=$(kubectl -n kube-system get secret `kubectl -n kube-system get sa prometheus -ojsonpath="{.secrets[0].name}"` -ojsonpath='{.data.token}' | base64 -d) curl -sk --cacert /tmp/ca.crt -H "Authorization: Bearer ${TOKEN}" https://192.168.32.127:6443/metrics | head # HELP aggregator_openapi_v2_regeneration_count [ALPHA] Counter of OpenAPI v2 spec regeneration count broken down by causing APIService name and reason. # TYPE aggregator_openapi_v2_regeneration_count counter aggregator_openapi_v2_regeneration_count{apiservice="*",reason="startup"} 0 aggregator_openapi_v2_regeneration_count{apiservice="k8s_internal_local_delegation_chain_0000000002",reason="update"} 0 # HELP aggregator_openapi_v2_regeneration_duration [ALPHA] Gauge of OpenAPI v2 spec regeneration duration in seconds. # TYPE aggregator_openapi_v2_regeneration_duration gauge aggregator_openapi_v2_regeneration_duration{reason="startup"} 0.016283936 aggregator_openapi_v2_regeneration_duration{reason="update"} 0.021537866 # HELP aggregator_unavailable_apiservice [ALPHA] Gauge of APIServices which are marked as unavailable broken down by APIService name. # TYPE aggregator_unavailable_apiservice gauge ``` **Prometheus配置文件下的scrape_configs配置項或 scrape_config_files配置文件** ```yaml - job_name: "kube-apiserver" scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: true bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: endpoints # __meta_kubernetes_namespace=default # __meta_kubernetes_endpoints_name=kubernetes # __meta_kubernetes_endpoint_port_name=https # 所有的endpoint符合上述的規則則保留 relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https ``` **驗證Prometheus的targets的界面** ![](https://img.kancloud.cn/76/e1/76e1910ae914928dd35da110e2e075eb_1653x225.png) ### kubernetes pod pod 角色會發現所有 pod 并將其容器公開為目標。對于容器的每個聲明端口，都會生成一個目標。如果容器沒有指定端口，則會為每個容器創建一個無端口目標，以便通過重新標記手動添加端口。詳細參數 [請參考官方文檔](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#pod) **使用shell獲取 cilium 監控指標** ```shell $ kubectl -n kube-system get pod -owide -l app.kubernetes.io/name=cilium-operator NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cilium-operator-584b8c6b7-znlwd 1/1 Running 3 (29h ago) 11d 192.168.32.128 192.168.32.128 <none> <none> $ curl -s 192.168.32.128:9963/metrics | head # HELP cilium_operator_ces_queueing_delay_seconds CiliumEndpointSlice queueing delay in seconds # TYPE cilium_operator_ces_queueing_delay_seconds histogram cilium_operator_ces_queueing_delay_seconds_bucket{le="0.005"} 0 cilium_operator_ces_queueing_delay_seconds_bucket{le="0.01"} 0 cilium_operator_ces_queueing_delay_seconds_bucket{le="0.025"} 0 cilium_operator_ces_queueing_delay_seconds_bucket{le="0.05"} 0 cilium_operator_ces_queueing_delay_seconds_bucket{le="0.1"} 0 cilium_operator_ces_queueing_delay_seconds_bucket{le="0.25"} 0 cilium_operator_ces_queueing_delay_seconds_bucket{le="0.5"} 0 cilium_operator_ces_queueing_delay_seconds_bucket{le="1"} 0 ``` **Prometheus配置文件下的scrape_configs配置項或 scrape_config_files配置文件** ```yaml - job_name: "Service/cilium" kubernetes_sd_configs: - role: pod relabel_configs: # annotation 有 prometheus_io_scrape=true 參數和容器名稱為cilium開頭以及容器端口名稱為 Prometheus 的 pod - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape, __meta_kubernetes_pod_container_name, __meta_kubernetes_pod_container_port_name] action: keep regex: true;cilium-.+;prometheus # 拼接采集metrics地址 - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port] action: replace regex: (.+):(\d+);(\d+) replacement: ${1}:${3} target_label: __address__ # 保留以下的標簽集。第一個完整配置，其余都是省略寫法 - source_labels: [__meta_kubernetes_namespace] action: replace regex: (.+) replacement: $1 target_label: namespace - source_labels: [__meta_kubernetes_pod_container_name] target_label: pod_container_name - source_labels: [__meta_kubernetes_pod_controller_kind] target_label: pod_controller_kind - source_labels: [__meta_kubernetes_pod_name] target_label: pod_name - source_labels: [__meta_kubernetes_pod_node_name] target_label: pod_node_name ``` **驗證Prometheus的targets的界面** ![](https://img.kancloud.cn/6f/39/6f390608fc0174cd073427c80f580e04_1899x845.png)