[TOC]
# 監控kubernetes基礎組件示例
## master節點
由于 kube-controller-manager,kube-scheduler 以及 etcd 的 metrics 都是暴露在 127.0.0.1 地址上。所以這里為了方便就使用 haproxy 代理網絡使的網絡可達。
**haproxy打通metrics網絡**
```shell
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: metrics-proxy
namespace: kube-system
data:
# haproxy配置文件
haproxy.cfg: |
global
log stdout local2 info
defaults
mode tcp
log global
option tcplog
maxconn 100
timeout connect 5s
timeout client 30s
timeout server 30s
# 暴露 metrics 數據
frontend metrics
bind *:8405
mode http
http-request use-service prometheus-exporter if { path /metrics }
no log
listen etcd
bind *:12381
tcp-request connection reject if !{ src -f /usr/local/etc/haproxy/whitelist.lst }
server server1 127.0.0.1:2381 check
listen kube-controller-manager
bind *:20257
tcp-request connection reject if !{ src -f /usr/local/etc/haproxy/whitelist.lst }
server server1 127.0.0.1:10257 check
listen kube-scheduler
bind *:20259
tcp-request connection reject if !{ src -f /usr/local/etc/haproxy/whitelist.lst }
server server1 127.0.0.1:10259 check
# 網絡放通地址,Prometheus運行節點以及podIP地址【為了方便,放通整個集群以及podCICD地址段】
whitelist.lst: |
192.168.32.127
192.168.32.128
192.168.32.129
# podCICD網段
10.0.0.0/8
EOF
cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: metrics-proxy
namespace: kube-system
spec:
selector:
matchLabels:
app: metrics-proxy
template:
metadata:
labels:
app: metrics-proxy
spec:
containers:
- name: metrics-proxy
image: haproxy:2.8-alpine
imagePullPolicy: IfNotPresent
volumeMounts:
- name: conf
mountPath: /usr/local/etc/haproxy
hostNetwork: true
nodeSelector:
node-role.kubernetes.io/master: ""
volumes:
- name: conf
configMap:
name: metrics-proxy
EOF
```
**kube-apiserver**
```yaml
- job_name: "k8s/kube-apiserver"
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
```
**kube-controller-manager**
```yaml
- job_name: "k8s/kube-controller-manager"
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__meta_kubernetes_node_labelpresent_node_role_kubernetes_io_master]
action: keep
- source_labels: [__address__]
action: replace
regex: (.*):10250
target_label: __address__
replacement: $1:20257
```
> 注意:master節點必須要有 node-role.kubernetes.io/master 標簽
**kube-scheduler**
```yaml
- job_name: "k8s/kube-scheduler"
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__meta_kubernetes_node_labelpresent_node_role_kubernetes_io_master]
action: keep
- source_labels: [__address__]
action: replace
regex: (.*):10250
target_label: __address__
replacement: $1:20259
```
> 注意:master節點必須要有 node-role.kubernetes.io/master 標簽
**etcd**
```yaml
- job_name: "k8s/etcd"
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__meta_kubernetes_node_labelpresent_node_role_kubernetes_io_master]
action: keep
- source_labels: [__address__]
action: replace
regex: (.*):10250
target_label: __address__
replacement: $1:12381
```
> 注意:master節點必須要有 node-role.kubernetes.io/master 標簽
**metrics-proxy**
```yaml
- job_name: "k8s/metrics-proxy"
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__meta_kubernetes_node_labelpresent_node_role_kubernetes_io_master]
action: keep
- source_labels: [__address__]
action: replace
regex: (.*):10250
target_label: __address__
replacement: $1:8405
```
## node節點
**kubelet**
>[info] 兩種方式獲取metrics數據,任選其中一種就好了
```yaml
# 使用kubelet地址獲取metrics數據
- job_name: "k8s/kubelet"
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
# 使用apiserver地址獲取kubelet metrics數據
- job_name: "k8s/kubelet"
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
action: replace
regex: (.+)
replacement: /api/v1/nodes/$1/proxy/metrics
target_label: __metrics_path__
```
**containers**
```yaml
- job_name: "k8s/cadvisor"
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
# 監控k8s的所有節點容器
kubernetes_sd_configs:
- role: node
metrics_path: /metrics/cadvisor
relabel_configs:
- regex: __meta_kubernetes_node_label_(.+)
action: labelmap
```
**kube-proxy**
>[info] 默認kube-proxy metrics地址也是127.0.0.1.這里將地址暴露成 0.0.0.0.0:10249
```shell
# 修改kube-proxy配置文件
$ kubectl -n kube-system edit cm
# 將 metricsBindAddress 參數改成以下格式
metricsBindAddress: "0.0.0.0:10249"
# 重啟kube-proxy
$ kubectl -n kube-system rollout restart ds/kube-proxy
daemonset.apps/kube-proxy restarted
```
```yaml
# kube-proxy服務的scheme是http
- job_name: "k8s/kube-proxy"
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
action: replace
regex: (.*):10250
target_label: __address__
replacement: $1:10249
```
## k8s生態插件監控
**calico**
默認沒有暴露metrics端口,需要設置開啟metrics接口
```shell
$ kubectl -n kube-system edit ds calico-node
1. 暴露metrics接口,calico-node?的?spec.template.spec.containers.env 下添加一段下面的內容
- name: FELIX_PROMETHEUSMETRICSENABLED
value: "True"
- name: FELIX_PROMETHEUSMETRICSPORT
value: "9091"
2. calico-node?的?spec.template.spec.containers 下添加一段下面的內容
ports:
- containerPort: 9091
name: http-metrics
protocol: TCP
```
```yaml
- job_name: "k8s/calico"
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
action: replace
regex: (.*):10250
target_label: __address__
replacement: $1:9091
```
**cilium**
通過helm安裝cilium插件的話,可以添加以下配置開啟監控
```shell
# 備份 cilium 安裝參數
$ helm -n kube-system get values cilium > /tmp/cilium-values.yml
$ sed -ri '1d' /tmp/cilium-values.yml
# 添加以下配置開啟監控
$ vim /tmp/cilium-values.yml
operator:
prometheus:
enabled: true
prometheus:
enabled: true
# 生效配置文件
$ helm -n kube-system upgrade cilium -f /tmp/cilium-values.yml
Release "cilium" has been upgraded. Happy Helming!
```
```yaml
- job_name: "k8s/cilium"
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape, __meta_kubernetes_pod_container_name, __meta_kubernetes_pod_container_port_name]
action: keep
regex: true;cilium-.+;prometheus
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: (.+):(\d+);(\d+)
replacement: ${1}:${3}
target_label: __address__
- source_labels: [__meta_kubernetes_namespace]
action: replace
regex: (.+)
replacement: $1
target_label: namespace
- source_labels: [__meta_kubernetes_pod_container_name]
target_label: pod_container_name
- source_labels: [__meta_kubernetes_pod_controller_kind]
target_label: pod_controller_kind
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod_name
- source_labels: [__meta_kubernetes_pod_node_name]
target_label: pod_node_name
```
**coredns**
```yaml
- job_name: "k8s/coredns"
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: kube-system;kube-dns;metrics
```
**ingress-nginx**
通過helm安裝ingress-nginx插件的話,可以添加以下配置開啟監控
```shell
# 備份ingress-nginx安裝參數
$ helm -n kube-system get values ingress-nginx > /tmp/ingress-nginx-values.yml
$ sed -ri '1d' /tmp/ingress-nginx-values.yml
# 添加以下配置開啟監控
$ vim /tmp/ingress-nginx-values.yml
controller:
metrics:
enabled: true
port: 10254
# 生效配置文件
$ helm -n kube-system upgrade ingress-nginx -f /tmp/ingress-nginx-values.yml
Release "ingress-nginx" has been upgraded. Happy Helming!
# 確認endpoints信息
$ kubectl -n kube-system get endpoints ingress-nginx-controller-metrics
NAME ENDPOINTS AGE
ingress-nginx-controller-metrics 192.168.32.127:10254,192.168.32.128:10254,192.168.32.129:10254 24d
```
```yaml
- job_name: "k8s/ingress-nginx"
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: kube-system;ingress-nginx-controller-metrics;metrics
```
- 前言
- 架構
- 部署
- kubeadm部署
- kubeadm擴容節點
- 二進制安裝基礎組件
- 添加master節點
- 添加工作節點
- 選裝插件安裝
- Kubernetes使用
- k8s與dockerfile啟動參數
- hostPort與hostNetwork異同
- 應用上下線最佳實踐
- 進入容器命名空間
- 主機與pod之間拷貝
- events排序問題
- k8s會話保持
- 容器root特權
- CNI插件
- calico
- calicoctl安裝
- calico網絡通信
- calico更改pod地址范圍
- 新增節點網卡名不一致
- 修改calico模式
- calico數據存儲遷移
- 啟用 kubectl 來管理 Calico
- calico卸載
- cilium
- cilium架構
- cilium/hubble安裝
- cilium網絡路由
- IP地址管理(IPAM)
- Cilium替換KubeProxy
- NodePort運行DSR模式
- IP地址偽裝
- ingress使用
- nginx-ingress
- ingress安裝
- ingress高可用
- helm方式安裝
- 基本使用
- Rewrite配置
- tls安全路由
- ingress發布管理
- 代理k8s集群外的web應用
- ingress自定義日志
- ingress記錄真實IP地址
- 自定義參數
- traefik-ingress
- traefik名詞概念
- traefik安裝
- traefik初次使用
- traefik路由(IngressRoute)
- traefik中間件(middlewares)
- traefik記錄真實IP地址
- cert-manager
- 安裝教程
- 頒布者CA
- 創建證書
- 外部存儲
- 對接NFS
- 對接ceph-rbd
- 對接cephfs
- 監控平臺
- Prometheus
- Prometheus安裝
- grafana安裝
- Prometheus配置文件
- node_exporter安裝
- kube-state-metrics安裝
- Prometheus黑盒監控
- Prometheus告警
- grafana儀表盤設置
- 常用監控配置文件
- thanos
- Prometheus
- Sidecar組件
- Store Gateway組件
- Querier組件
- Compactor組件
- Prometheus監控項
- grafana
- Querier對接grafana
- alertmanager
- Prometheus對接alertmanager
- 日志中心
- filebeat安裝
- kafka安裝
- logstash安裝
- elasticsearch安裝
- elasticsearch索引生命周期管理
- kibana安裝
- event事件收集
- 資源預留
- 節點資源預留
- imagefs與nodefs驗證
- 資源預留 vs 驅逐 vs OOM
- scheduler調度原理
- Helm
- Helm安裝
- Helm基本使用
- 安全
- apiserver審計日志
- RBAC鑒權
- namespace資源限制
- 加密Secret數據
- 服務網格
- 備份恢復
- Velero安裝
- 備份與恢復
- 常用維護操作
- container runtime
- 拉取私有倉庫鏡像配置
- 拉取公網鏡像加速配置
- runtime網絡代理
- overlay2目錄占用過大
- 更改Docker的數據目錄
- Harbor
- 重置Harbor密碼
- 問題處理
- 關閉或開啟Harbor的認證
- 固定harbor的IP地址范圍
- ETCD
- ETCD擴縮容
- ETCD常用命令
- ETCD數據空間壓縮清理
- ingress
- ingress-nginx header配置
- kubernetes
- 驗證yaml合法性
- 切換KubeProxy模式
- 容器解析域名
- 刪除節點
- 修改鏡像倉庫
- 修改node名稱
- 升級k8s集群
- 切換容器運行時
- apiserver接口
- 其他
- 升級內核
- k8s組件性能分析
- ETCD
- calico
- calico健康檢查失敗
- Harbor
- harbor同步失敗
- Kubernetes
- 資源Terminating狀態
- 啟動容器報錯