[TOC]
> 注意:
> - Prometheus使用普通用戶啟動,注意創建文件的用戶
> - 多個節點Prometheus文件中的target配置保持一致
## 靜態監控
```yaml
- job_name: "Prometheus"
static_configs:
- "localhost:9090"
```
## 基于文件服務發現
1. 創建target目標
```yaml
- job_name: "node-exporter"
file_sd_configs:
- files:
- "targets/node-exporter.yml"
# 刷新間隔以重新讀取文件
refresh_interval: 1m
```
2. 創建監控文件
```shell
mkdir /data/prometheus/targets
cat <<-EOF | sudo tee /data/prometheus/targets/node-exporter.yml > /dev/null
- targets:
- 192.168.31.103:9100
- 192.168.31.79:9100
- 192.168.31.95:9100
- 192.168.31.78:9100
- 192.168.31.253:9100
EOF
chown -R ops. /data/prometheus
```
3. 熱加載配置文件
```shell
sudo systemctl reload prometheus
```
4. 將文件同步給其他節點
```shell
# 主配置文件 及 文件發現目錄
cd /data/prometheus && scp -r prometheus.yml targets ops@k8s-master02:/data/prometheus
# 修改其他節點特有的labal
ssh ops@k8s-master02 "sed -ri 's@(replica).*@\1: B@g' /data/prometheus/prometheus.yml"
# 檢測配置文件
ssh ops@k8s-master02 "promtool check config /data/prometheus/prometheus.yml"
# 熱加載配置文件
ssh ops@k8s-master02 "sudo systemctl reload prometheus"
```
## 基于kubernetes服務發現
> 由于 thanos 是二進制部署的,需要在 kubernetes 集群上創建 sa 的相關監控權限
1. 創建Prometheus監控kubernetes集群的權限(k8s master節點執行)
```yaml
cat <<-EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: kube-system
EOF
```
2. 獲取監控kubernetes的token(k8s master節點執行)
```shell
kubectl -n kube-system get secret `kubectl -n kube-system get sa prometheus -o jsonpath={.secrets[0].name}` -ojsonpath={.data.token} | base64 --decode > /data/prometheus/token
```
3. 示例(thanos節點)
```yaml
- job_name: "Service/kube-apiserver"
scheme: https
tls_config:
insecure_skip_verify: true
# 上面獲取的token
bearer_token_file: /data/prometheus/token
kubernetes_sd_configs:
- role: endpoints
# 訪問集群的入口
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
```
4. 熱加載配置文件
```shell
sudo systemctl reload prometheus
```
5. 將文件同步給其他節點
```shell
# 主配置文件 及 文件發現目錄
cd /data/prometheus && scp -r prometheus.yml targets ops@k8s-master02:/data/prometheus
# 修改其他節點特有的labal
ssh ops@k8s-master02 "sed -ri 's@(replica): .*@\1: B@g' /data/prometheus/prometheus.yml"
# 檢測配置文件
ssh ops@k8s-master02 "promtool check config /data/prometheus/prometheus.yml"
# 熱加載配置文件
ssh ops@k8s-master02 "sudo systemctl reload prometheus"
```
## 監控kubernetes(完整版)
> 下面有證書,token,文件發現目錄等等,需要自行手工創建或者拷貝,這里只是主配文件示例
```yaml
scrape_configs:
# 基于文件服務發現
- job_name: "node-exporter"
file_sd_configs:
- files:
- "targets/node-exporter.yml"
# 刷新間隔以重新讀取文件
refresh_interval: 1m
relabel_configs:
metric_relabel_configs:
- source_labels: [__address__]
action: replace
regex: (.*):10250
target_label: instance
replacement: $1
# 基于kubernetes服務發現
- job_name: "Service/kube-apiserver"
scheme: https
tls_config:
insecure_skip_verify: true
# 請參考上面方式創建token
bearer_token_file: /data/prometheus/token
kubernetes_sd_configs:
- role: endpoints
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: "Service/kube-controller-manager"
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
kubernetes_sd_configs:
- role: node
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
- source_labels: [__meta_kubernetes_node_labelpresent_node_role_kubernetes_io_master]
action: keep
regex: true
- source_labels: [__address__]
action: replace
regex: (.*):10250
target_label: __address__
replacement: $1:10257
- job_name: "Service/kube-scheduler"
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
kubernetes_sd_configs:
- role: node
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
- source_labels: [__meta_kubernetes_node_labelpresent_node_role_kubernetes_io_master]
action: keep
regex: true
- source_labels: [__address__]
action: replace
regex: (.*):10250
target_label: __address__
replacement: $1:10259
- job_name: "Service/kubelet"
scheme: https
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
kubernetes_sd_configs:
- role: node
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
- job_name: "Service/kube-proxy"
kubernetes_sd_configs:
- role: node
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
- source_labels: [__address__]
action: replace
regex: (.*):10250
target_label: __address__
replacement: $1:10249
- job_name: "Service/etcd"
scheme: https
tls_config:
ca_file: targets/certs/ca.pem
cert_file: targets/certs/etcd.pem
key_file: targets/certs/etcd-key.pem
insecure_skip_verify: true
file_sd_configs:
- files:
- targets/etcd.yml
- job_name: "Service/calico"
kubernetes_sd_configs:
- role: node
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
- source_labels: [__address__]
action: replace
regex: (.*):10250
target_label: __address__
replacement: $1:9091
- job_name: "Service/coredns"
kubernetes_sd_configs:
- role: endpoints
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: kube-system;kube-dns;metrics
- job_name: "Service/ingress-nginx"
kubernetes_sd_configs:
- role: endpoints
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: ingress-nginx;ingress-nginx-metrics;metrics
- job_name: "kube-state-metrics"
kubernetes_sd_configs:
- role: endpoints
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: kube-system;kube-state-metrics;http-metrics
- job_name: "service-http-probe"
scrape_interval: 1m
metrics_path: /probe
# 使用blackbox exporter配置文件的http_2xx的探針
params:
module: [ http_2xx ]
kubernetes_sd_configs:
- role: service
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
# 保留service注釋有prometheus.io/scrape: true和prometheus.io/http-probe: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_service_annotation_prometheus_io_http_probe]
action: keep
regex: true;true
# 將原標簽名__meta_kubernetes_service_name改成service_name
- source_labels: [__meta_kubernetes_service_name]
action: replace
regex: (.*)
target_label: service_name
# 將原標簽名__meta_kubernetes_namespace改成namespace
- source_labels: [__meta_kubernetes_namespace]
action: replace
regex: (.*)
target_label: namespace
# 將instance改成 `clusterIP:port` 地址
- source_labels: [__meta_kubernetes_service_cluster_ip, __meta_kubernetes_service_annotation_prometheus_io_http_probe_port, __meta_kubernetes_service_annotation_pretheus_io_http_probe_path]
action: replace
regex: (.*);(.*);(.*)
target_label: __param_target
replacement: $1:$2$3
- source_labels: [__param_target]
target_label: instance
# 將__address__的值改成 `blackbox-exporter:9115`
- target_label: __address__
replacement: blackbox-exporter:9115
- job_name: "service-tcp-probe"
scrape_interval: 1m
metrics_path: /probe
# 使用blackbox exporter配置文件的tcp_connect的探針
params:
module: [ tcp_connect ]
kubernetes_sd_configs:
- role: service
api_server: https://192.168.31.100:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /data/prometheus/token
relabel_configs:
# 保留prometheus.io/scrape: "true"和prometheus.io/tcp-probe: "true"的service
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_service_annotation_prometheus_io_tcp_probe]
action: keep
regex: true;true
# 將原標簽名__meta_kubernetes_service_name改成service_name
- source_labels: [__meta_kubernetes_service_name]
action: replace
regex: (.*)
target_label: service_name
# 將原標簽名__meta_kubernetes_service_name改成service_name
- source_labels: [__meta_kubernetes_namespace]
action: replace
regex: (.*)
target_label: namespace
# 將instance改成 `clusterIP:port` 地址
- source_labels: [__meta_kubernetes_service_cluster_ip, __meta_kubernetes_service_annotation_prometheus_io_http_probe_port]
action: replace
regex: (.*);(.*)
target_label: __param_target
replacement: $1:$2
- source_labels: [__param_target]
target_label: instance
# 將__address__的值改成 `blackbox-exporter:9115`
- target_label: __address__
replacement: blackbox-exporter:9115
```
- 前言
- 架構
- 部署
- kubeadm部署
- kubeadm擴容節點
- 二進制安裝基礎組件
- 添加master節點
- 添加工作節點
- 選裝插件安裝
- Kubernetes使用
- k8s與dockerfile啟動參數
- hostPort與hostNetwork異同
- 應用上下線最佳實踐
- 進入容器命名空間
- 主機與pod之間拷貝
- events排序問題
- k8s會話保持
- 容器root特權
- CNI插件
- calico
- calicoctl安裝
- calico網絡通信
- calico更改pod地址范圍
- 新增節點網卡名不一致
- 修改calico模式
- calico數據存儲遷移
- 啟用 kubectl 來管理 Calico
- calico卸載
- cilium
- cilium架構
- cilium/hubble安裝
- cilium網絡路由
- IP地址管理(IPAM)
- Cilium替換KubeProxy
- NodePort運行DSR模式
- IP地址偽裝
- ingress使用
- nginx-ingress
- ingress安裝
- ingress高可用
- helm方式安裝
- 基本使用
- Rewrite配置
- tls安全路由
- ingress發布管理
- 代理k8s集群外的web應用
- ingress自定義日志
- ingress記錄真實IP地址
- 自定義參數
- traefik-ingress
- traefik名詞概念
- traefik安裝
- traefik初次使用
- traefik路由(IngressRoute)
- traefik中間件(middlewares)
- traefik記錄真實IP地址
- cert-manager
- 安裝教程
- 頒布者CA
- 創建證書
- 外部存儲
- 對接NFS
- 對接ceph-rbd
- 對接cephfs
- 監控平臺
- Prometheus
- Prometheus安裝
- grafana安裝
- Prometheus配置文件
- node_exporter安裝
- kube-state-metrics安裝
- Prometheus黑盒監控
- Prometheus告警
- grafana儀表盤設置
- 常用監控配置文件
- thanos
- Prometheus
- Sidecar組件
- Store Gateway組件
- Querier組件
- Compactor組件
- Prometheus監控項
- grafana
- Querier對接grafana
- alertmanager
- Prometheus對接alertmanager
- 日志中心
- filebeat安裝
- kafka安裝
- logstash安裝
- elasticsearch安裝
- elasticsearch索引生命周期管理
- kibana安裝
- event事件收集
- 資源預留
- 節點資源預留
- imagefs與nodefs驗證
- 資源預留 vs 驅逐 vs OOM
- scheduler調度原理
- Helm
- Helm安裝
- Helm基本使用
- 安全
- apiserver審計日志
- RBAC鑒權
- namespace資源限制
- 加密Secret數據
- 服務網格
- 備份恢復
- Velero安裝
- 備份與恢復
- 常用維護操作
- container runtime
- 拉取私有倉庫鏡像配置
- 拉取公網鏡像加速配置
- runtime網絡代理
- overlay2目錄占用過大
- 更改Docker的數據目錄
- Harbor
- 重置Harbor密碼
- 問題處理
- 關閉或開啟Harbor的認證
- 固定harbor的IP地址范圍
- ETCD
- ETCD擴縮容
- ETCD常用命令
- ETCD數據空間壓縮清理
- ingress
- ingress-nginx header配置
- kubernetes
- 驗證yaml合法性
- 切換KubeProxy模式
- 容器解析域名
- 刪除節點
- 修改鏡像倉庫
- 修改node名稱
- 升級k8s集群
- 切換容器運行時
- apiserver接口
- 其他
- 升級內核
- k8s組件性能分析
- ETCD
- calico
- calico健康檢查失敗
- Harbor
- harbor同步失敗
- Kubernetes
- 資源Terminating狀態
- 啟動容器報錯