[TOC]
kubelet 服務對磁盤檢查是有兩個參數的,分別是 `imagefs` 與 `nodefs`。其中
- imagefs:監控docker啟動參數 `data-root 或者 graph` 目錄所在的分區。默認`/var/lib/docker`
- nodefs:監控kubelet啟動參數 `--root-dir` 指定的目錄所在分區。默認`/var/lib/kubelet`
## 環境說明
kubernetes版本
```shell
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready master 85d v1.18.18
k8s-master02 Ready master 85d v1.18.18
k8s-node01 Ready <none> 85d v1.18.18
k8s-node02 Ready <none> 85d v1.18.18
k8s-node03 Ready <none> 85d v1.18.18
```
節點狀態
```shell
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Wed, 01 Dec 2021 11:39:29 +0800 Wed, 01 Dec 2021 11:39:29 +0800 CalicoIsUp Calico is running on this node
MemoryPressure False Wed, 01 Dec 2021 13:59:51 +0800 Wed, 01 Dec 2021 11:39:25 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 01 Dec 2021 13:59:51 +0800 Wed, 01 Dec 2021 11:39:25 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 01 Dec 2021 13:59:51 +0800 Wed, 01 Dec 2021 11:39:25 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 01 Dec 2021 13:59:51 +0800 Wed, 01 Dec 2021 11:39:25 +0800 KubeletReady kubelet is posting ready status
```
docker數據目錄
```shell
$ docker info | grep "Docker Root Dir"
Docker Root Dir: /data/docker/data
```
kubelet數據目錄
```shell
$ ps -ef | grep kubelet
/data/k8s/bin/kubelet --alsologtostderr=true --logtostderr=false --v=4 --log-dir=/data/k8s/logs/kubelet --hostname-override=k8s-master01 --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin --kubeconfig=/data/k8s/certs/kubelet.kubeconfig --bootstrap-kubeconfig=/data/k8s/certs/bootstrap.kubeconfig --config=/data/k8s/conf/kubelet-config.yaml --cert-dir=/data/k8s/certs/ --root-dir=/data/k8s/data/kubelet/ --pod-infra-container-image=ecloudedu/pause-amd64:3.0
```
分區使用率
```shell
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 40G 8.8G 32G 23% /
/dev/sdb 40G 1.9G 39G 10% /data/docker/data
...
```
## 驗證方案
1. 驗證nodefs超過閾值
2. 驗證imagefs超過閾值
3. 驗證imagefs和nodefs超過閾值
### 驗證nodefs超過閾值
`kubelet` 的 `--root-dir` 參數在所分區(/)已使用23%,現在修改imagefs的閾值為78%,node應該nodefs超標。
```yaml
evictionHard:
memory.available: 10%
nodefs.available: 78%
nodefs.inodesFree: 10%
imagefs.available: 10%
imagefs.inodesFree: 10%
```
然后我們查看節點的狀態,Attempting to reclaim ephemeral-storage,意思為嘗試回收磁盤空間
```shell
$ kubectl describe node k8s-master01
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Wed, 01 Dec 2021 14:18:56 +0800 Wed, 01 Dec 2021 14:18:56 +0800 CalicoIsUp Calico is running on this node
MemoryPressure False Wed, 01 Dec 2021 15:03:52 +0800 Wed, 01 Dec 2021 14:14:34 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Wed, 01 Dec 2021 15:03:52 +0800 Wed, 01 Dec 2021 14:56:13 +0800 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Wed, 01 Dec 2021 15:03:52 +0800 Wed, 01 Dec 2021 14:14:34 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 01 Dec 2021 15:03:52 +0800 Wed, 01 Dec 2021 14:14:34 +0800 KubeletReady kubelet is posting ready status
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 6m45s kubelet Starting kubelet.
Normal NodeAllocatableEnforced 6m45s kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 6m45s kubelet Node k8s-master01 status is now: NodeHasSufficientMemory
Normal NodeHasDiskPressure 6m45s kubelet Node k8s-master01 status is now: NodeHasDiskPressure
Normal NodeHasSufficientPID 6m45s kubelet Node k8s-master01 status is now: NodeHasSufficientPID
Warning EvictionThresholdMet 105s (x31 over 6m45s) kubelet Attempting to reclaim ephemeral-storage
```
### 驗證imagefs超過閾值
`docker` 存儲目錄(/data/docker/data)在所分區已使用10%,現在修改imagefs的閾值為91%,node應該imagefs超標。
```yaml
evictionHard:
memory.available: 10%
nodefs.available: 10%
nodefs.inodesFree: 10%
imagefs.available: 91%
imagefs.inodesFree: 10%
```
然后我們查看節點的狀態,Attempting to reclaim ephemeral-storage,意思為嘗試回收磁盤空間
```shell
$ kubectl describe node k8s-master01
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Wed, 01 Dec 2021 14:18:56 +0800 Wed, 01 Dec 2021 14:18:56 +0800 CalicoIsUp Calico is running on this node
MemoryPressure False Wed, 01 Dec 2021 15:17:31 +0800 Wed, 01 Dec 2021 14:14:34 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Wed, 01 Dec 2021 15:17:31 +0800 Wed, 01 Dec 2021 14:56:13 +0800 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Wed, 01 Dec 2021 15:17:31 +0800 Wed, 01 Dec 2021 14:14:34 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 01 Dec 2021 15:17:31 +0800 Wed, 01 Dec 2021 14:14:34 +0800 KubeletReady kubelet is posting ready status
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NodeHasSufficientPID 18s kubelet Node k8s-master01 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 18s kubelet Updated Node Allocatable limit across pods
Warning EvictionThresholdMet 18s kubelet Attempting to reclaim ephemeral-storage
Normal NodeHasSufficientMemory 18s kubelet Node k8s-master01 status is now: NodeHasSufficientMemory
Normal NodeHasDiskPressure 18s kubelet Node k8s-master01 status is now: NodeHasDiskPressure
Normal Starting 18s kubelet Starting kubelet.
```
### 驗證imagefs和nodefs同時超過閾值
現在修改imagefs的閾值為91%和nodefs的閾值為78%,node應該imagefs和nodefs超標。
```yaml
evictionHard:
memory.available: 10%
nodefs.available: 78%
nodefs.inodesFree: 10%
imagefs.available: 91%
imagefs.inodesFree: 10%
```
然后我們查看節點的狀態,Attempting to reclaim ephemeral-storage,意思為嘗試回收磁盤空間
```shell
$ kubectl describe node k8s-master01
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Wed, 01 Dec 2021 14:18:56 +0800 Wed, 01 Dec 2021 14:18:56 +0800 CalicoIsUp Calico is running on this node
MemoryPressure False Wed, 01 Dec 2021 15:23:03 +0800 Wed, 01 Dec 2021 14:14:34 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Wed, 01 Dec 2021 15:23:03 +0800 Wed, 01 Dec 2021 15:23:03 +0800 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Wed, 01 Dec 2021 15:23:03 +0800 Wed, 01 Dec 2021 14:14:34 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 01 Dec 2021 15:23:03 +0800 Wed, 01 Dec 2021 14:14:34 +0800 KubeletReady kubelet is posting ready status
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 2m9s kubelet Starting kubelet.
Normal NodeHasSufficientPID 2m9s kubelet Node k8s-master01 status is now: NodeHasSufficientPID
Normal NodeAllocatableEnforced 2m9s kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 2m9s kubelet Node k8s-master01 status is now: NodeHasSufficientMemory
Normal NodeHasDiskPressure 2m7s (x2 over 2m9s) kubelet Node k8s-master01 status is now: NodeHasDiskPressure
Warning EvictionThresholdMet 8s (x13 over 2m9s) kubelet Attempting to reclaim ephemeral-storage
```
## 總結
1. nodefs是--root-dir目錄所在分區,imagefs是docker安裝目錄所在的分區
2. 建議nodefs與imagefs共用一個分區,但是這個分區要設置的大一些。
3. 當nodefs與imagefs共用一個分區時,kubelet中的其他幾個參數--root-dir、--cert-dir
- 前言
- 架構
- 部署
- kubeadm部署
- kubeadm擴容節點
- 二進制安裝基礎組件
- 添加master節點
- 添加工作節點
- 選裝插件安裝
- Kubernetes使用
- k8s與dockerfile啟動參數
- hostPort與hostNetwork異同
- 應用上下線最佳實踐
- 進入容器命名空間
- 主機與pod之間拷貝
- events排序問題
- k8s會話保持
- 容器root特權
- CNI插件
- calico
- calicoctl安裝
- calico網絡通信
- calico更改pod地址范圍
- 新增節點網卡名不一致
- 修改calico模式
- calico數據存儲遷移
- 啟用 kubectl 來管理 Calico
- calico卸載
- cilium
- cilium架構
- cilium/hubble安裝
- cilium網絡路由
- IP地址管理(IPAM)
- Cilium替換KubeProxy
- NodePort運行DSR模式
- IP地址偽裝
- ingress使用
- nginx-ingress
- ingress安裝
- ingress高可用
- helm方式安裝
- 基本使用
- Rewrite配置
- tls安全路由
- ingress發布管理
- 代理k8s集群外的web應用
- ingress自定義日志
- ingress記錄真實IP地址
- 自定義參數
- traefik-ingress
- traefik名詞概念
- traefik安裝
- traefik初次使用
- traefik路由(IngressRoute)
- traefik中間件(middlewares)
- traefik記錄真實IP地址
- cert-manager
- 安裝教程
- 頒布者CA
- 創建證書
- 外部存儲
- 對接NFS
- 對接ceph-rbd
- 對接cephfs
- 監控平臺
- Prometheus
- Prometheus安裝
- grafana安裝
- Prometheus配置文件
- node_exporter安裝
- kube-state-metrics安裝
- Prometheus黑盒監控
- Prometheus告警
- grafana儀表盤設置
- 常用監控配置文件
- thanos
- Prometheus
- Sidecar組件
- Store Gateway組件
- Querier組件
- Compactor組件
- Prometheus監控項
- grafana
- Querier對接grafana
- alertmanager
- Prometheus對接alertmanager
- 日志中心
- filebeat安裝
- kafka安裝
- logstash安裝
- elasticsearch安裝
- elasticsearch索引生命周期管理
- kibana安裝
- event事件收集
- 資源預留
- 節點資源預留
- imagefs與nodefs驗證
- 資源預留 vs 驅逐 vs OOM
- scheduler調度原理
- Helm
- Helm安裝
- Helm基本使用
- 安全
- apiserver審計日志
- RBAC鑒權
- namespace資源限制
- 加密Secret數據
- 服務網格
- 備份恢復
- Velero安裝
- 備份與恢復
- 常用維護操作
- container runtime
- 拉取私有倉庫鏡像配置
- 拉取公網鏡像加速配置
- runtime網絡代理
- overlay2目錄占用過大
- 更改Docker的數據目錄
- Harbor
- 重置Harbor密碼
- 問題處理
- 關閉或開啟Harbor的認證
- 固定harbor的IP地址范圍
- ETCD
- ETCD擴縮容
- ETCD常用命令
- ETCD數據空間壓縮清理
- ingress
- ingress-nginx header配置
- kubernetes
- 驗證yaml合法性
- 切換KubeProxy模式
- 容器解析域名
- 刪除節點
- 修改鏡像倉庫
- 修改node名稱
- 升級k8s集群
- 切換容器運行時
- apiserver接口
- 其他
- 升級內核
- k8s組件性能分析
- ETCD
- calico
- calico健康檢查失敗
- Harbor
- harbor同步失敗
- Kubernetes
- 資源Terminating狀態
- 啟動容器報錯