**Prometheus****的組件:**
Prometheus生態由多個組件組成,并且這些組件大部分是可選的:
* **Prometheus****服務器**,用于獲取和存儲時間序列數據;
* 儀表應用數據的客戶端類庫(**Client Library**)
* 支持臨時性工作的**推網關****(Push Gateway)**
* 特殊目的的**輸出者****(Exporter)**,提供被監控組件信息的 HTTP 接口,例如HAProxy、StatsD、MySQL、Nginx和Graphite等服務都有現成的輸出者接口
* 處理告警的**告警管理器(****Alert Manager****)**
* 其它支持工具
**Prometheus****的整體架構**

Prometheus的整體工作流程:
1)Prometheus 服務器定期從配置好的 jobs 或者 exporters 中獲取度量數據;或者接收來自推送網關發送過來的 度量數據。
2)Prometheus 服務器在本地存儲收集到的度量數據,并對這些數據進行聚合;
3)運行已定義好的 alert.rules,記錄新的時間序列或者向告警管理器推送警報。
4)告警管理器根據配置文件,對接收到的警報進行處理,并通過email等途徑發出告警。
5)Grafana等圖形工具獲取到監控數據,并以圖形化的方式進行展示。
**數據模型**
Prometheus從根本上將所有數據存儲為時間序列:屬于相同度量標準和同一組標注尺寸的時間戳值流。除了存儲的時間序列之外,普羅米修斯可能會生成臨時派生時間序列作為查詢的結果。
**度量名稱和標簽**:每個時間序列都是由度量標準名稱和一組鍵值對(也稱為標簽)組成唯一標識。**度量名稱**指定被測量的系統的特征(例如:http\_requests\_total-接收到的HTTP請求的總數)。它可以包含ASCII字母和數字,以及下劃線和冒號。它必須匹配正則表達式\[a-zA-Z\_:\]\[a-zA-Z0-9\_:\]\*。**標簽**啟用Prometheus的維度數據模型:對于相同度量標準名稱,任何給定的標簽組合都標識該度量標準的特定維度實例。查詢語言允許基于這些維度進行篩選和聚合。更改任何標簽值(包括添加或刪除標簽)都會創建新的時間序列。標簽名稱可能包含ASCII字母,數字以及下劃線。他們必須匹配正則表達式\[a-zA-Z\_\]\[a-zA-Z0-9\_\]\*。以\_\_開始的標簽名稱保留給供內部使用。
**樣本**:實際的時間序列,每個序列包括:一個 float64 的值和一個毫秒級的時間戳。
**格式:**給定度量標準名稱和一組標簽,時間序列通常使用以下格式來標識:
{=, ...}
**度量類型**
Prometheus 客戶端庫主要提供Counter、Gauge、**Histogram****和****Summery**四種主要的 metric 類型:
**Counter(****計算器****)****:****Counter****是**一種累加的度量,它的值只能增加或在重新啟動時重置為零。
**Gauge(****測量****)****:**Gauge表示單個數值,表達可以任意地上升和下降的度量。
**Histogram(**直方圖):Histogram樣本觀測(例如:請求持續時間或響應大小),并將它們計入配置的桶中。它也提供所有觀測值的總和。
**Summery**:**類似于****Histogram**,*Summery*樣本觀察(通常是請求持續時間和響應大小)。雖然它也提供觀測總數和所有觀測值的總和,但它計算滑動時間窗內的可配置分位數。

兩種獲取數據的方式:pull、push
pull:客戶端安裝exporters,exporters采集數據,prometheus用HTTP get訪問exporter,exporter返回數據
push:客戶端安裝pushgateway,用自己開發的腳本把數據組織成k\\v形式,發給pushgateway,然后pushgateway推給prometheus
**promql****示例**
1. 1)((sum(increase(node\_cpu{mode="idle"}\[1m\])) by(instance)) /(sum(increase(node\_cpu\[1m\])) by(instance))))\*100
要查詢的是 node\_cpu
increase()求一個時間段的增量,\[1m\]表示求1分鐘之內的增量
{mode="idle"}表示求空閑的cpu的1分鐘之內的增量
sum()求和
by(instance) 可以把sum加到一起的數值按照指定方式拆分,instance代表機器名
/ 代表除法,promql支持 \+ - \* / % ^ 等數學運算
以上promql是計算cpu使用率的表達式
rate()求一個時間段的平均每秒的增量,專門搭配counter類
{exported\_instance=~"XXX"} 過濾,模糊匹配
topk() 取前幾位的最高值,一般用于console查看
count() 把符合條件的輸出數目加合,比如統計pod的總數是多少,而不是看pod有哪些
predict\_linear() 對曲線變化速率的計算,以及對加速的未來預測
以上只列出了比較常用的一些函數,官網上還有很多。
**安裝****prometheus**
安裝prometheus可以在k8s里以容器形式創建,也可以在外部用二進制包安裝
\# vim prometheus.yaml
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
\---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
\- apiGroups: \[""\]
resources:
\- nodes
\- nodes/proxy
\- services
\- endpoints
\- pods
verbs: \["get", "list", "watch"\]
\- apiGroups:
\- extensions
resources:
\- ingresses
verbs: \["get", "list", "watch"\]
\- nonResourceURLs: \["/metrics"\]
verbs: \["get"\]
\---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitoring
\---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
\- kind: ServiceAccount
name: prometheus
namespace: monitoring
\---
apiVersion: apps/v1beta2
kind: Deployment
metadata:
labels:
name: prometheus-deployment
name: prometheus
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
\- image: prom/prometheus:v2.0.0
name: prometheus
command:
\- "/bin/prometheus"
args:
\- "--config.file=/etc/prometheus/prometheus.yml"
\- "--storage.tsdb.path=/prometheus"
\- "--storage.tsdb.retention=24h"
ports:
\- containerPort: 9090
protocol: TCP
volumeMounts:
\- mountPath: "/prometheus"
name: data
\- mountPath: "/etc/prometheus"
name: config-volume
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 500m
memory: 2500Mi
serviceAccountName: prometheus
imagePullSecrets:
\- name: regsecret
volumes:
\- name: data
emptyDir: {}
\- name: config-volume
configMap:
name: prometheus-config
\---
kind: Service
apiVersion: v1
metadata:
labels:
app: prometheus
name: prometheus
namespace: monitoring
spec:
type: ClusterIP
ports:
\- port: 9090
targetPort: 9090
selector:
app: prometheus
\---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: prometheus
namespace: monitoring
spec:
rules:
\- host: prometheus.pkbeta.com
http:
paths:
\- path: /
backend:
serviceName: prometheus
servicePort: 9090
以上就是創建一個prometheus的yaml文件里面創建了一個namespace,deployment,ingress
**!!!最重要的****configmap****也就是****Prometheus****的配置文件我把單獨列出來了**
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape\_interval: 15s 每15s抓取一次數據
evaluation\_interval: 15s 每15s評估一次規則
scrape\_configs:
\- job\_name: 'kubernetes-apiservers' 任務的名稱
kubernetes\_sd\_configs: 以k8s角色來定義收集
\- role: endpoints 從endpoints獲取apiserver數據
scheme: https
tls\_config:
ca\_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer\_token\_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel\_configs: 在抓取之前對任何目標及其標簽進行修改
\- source\_labels: \[\_\_meta\_kubernetes\_namespace, \_\_meta\_kubernetes\_service\_name, \_\_meta\_kubernetes\_endpoint\_port\_name\] 選擇哪些label
action: keep 含有符合regex的source\_label的endpoints進行保留
regex: default;kubernetes;https
\- job\_name: 'kubernetes-nodes'
kubernetes\_sd\_configs:
\- role: node
scheme: https
tls\_config:
ca\_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer\_token\_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel\_configs:
\- action: labelmap
regex: \_\_meta\_kubernetes\_node\_label\_(.+)
\- target\_label: \_\_address\_\_
replacement: kubernetes.default.svc:443
\- source\_labels: \[\_\_meta\_kubernetes\_node\_name\]
regex: (.+)
target\_label: \_\_metrics\_path\_\_
replacement: /api/v1/nodes/${1}/proxy/metrics
\- job\_name: 'kubernetes-cadvisor'
kubernetes\_sd\_configs:
\- role: node
scheme: https
tls\_config:
ca\_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer\_token\_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel\_configs:
\- action: labelmap
regex: \_\_meta\_kubernetes\_node\_label\_(.+)
\- target\_label: \_\_address\_\_
replacement: kubernetes.default.svc:443
\- source\_labels: \[\_\_meta\_kubernetes\_node\_name\]
regex: (.+)
target\_label: \_\_metrics\_path\_\_
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
\- job\_name: 'kubernetes-service-endpoints'
kubernetes\_sd\_configs:
\- role: endpoints
relabel\_configs:
\- source\_labels: \[\_\_meta\_kubernetes\_service\_annotation\_prometheus\_io\_scrape\]
action: keep
regex: true
\- source\_labels: \[\_\_meta\_kubernetes\_service\_annotation\_prometheus\_io\_scheme\]
action: replace
target\_label: \_\_scheme\_\_
regex: (https?)
\- source\_labels: \[\_\_meta\_kubernetes\_service\_annotation\_prometheus\_io\_path\]
action: replace
target\_label: \_\_metrics\_path\_\_
regex: (.+)
\- source\_labels: \[\_\_address\_\_, \_\_meta\_kubernetes\_service\_annotation\_prometheus\_io\_port\]
action: replace
target\_label: \_\_address\_\_
regex: (\[^:\]+)(?::\\d+)?;(\\d+)
replacement: $1:$2
\- action: labelmap
regex: \_\_meta\_kubernetes\_service\_label\_(.+)
\- source\_labels: \[\_\_meta\_kubernetes\_namespace\]
action: replace
target\_label: kubernetes\_namespace
\- source\_labels: \[\_\_meta\_kubernetes\_service\_name\]
action: replace
target\_label: kubernetes\_name
\- job\_name: 'kubernetes-services'
kubernetes\_sd\_configs:
\- role: service
metrics\_path: /probe
params:
module: \[http\_2xx\]
relabel\_configs:
\- source\_labels: \[\_\_meta\_kubernetes\_service\_annotation\_prometheus\_io\_probe\]
action: keep
regex: true
\- source\_labels: \[\_\_address\_\_\]
target\_label: \_\_param\_target
\- target\_label: \_\_address\_\_
replacement: blackbox-exporter.example.com:9115
\- source\_labels: \[\_\_param\_target\]
target\_label: instance
\- action: labelmap
regex: \_\_meta\_kubernetes\_service\_label\_(.+)
\- source\_labels: \[\_\_meta\_kubernetes\_namespace\]
target\_label: kubernetes\_namespace
\- source\_labels: \[\_\_meta\_kubernetes\_service\_name\]
target\_label: kubernetes\_name
\- job\_name: 'kubernetes-ingresses'
kubernetes\_sd\_configs:
\- role: ingress
relabel\_configs:
\- source\_labels: \[\_\_meta\_kubernetes\_ingress\_annotation\_prometheus\_io\_probe\]
action: keep
regex: true
\- source\_labels: \[\_\_meta\_kubernetes\_ingress\_scheme,\_\_address\_\_,\_\_meta\_kubernetes\_ingress\_path\]
regex: (.+);(.+);(.+)
replacement: ${1}://${2}${3}
target\_label: \_\_param\_target
\- target\_label: \_\_address\_\_
replacement: blackbox-exporter.example.com:9115
\- source\_labels: \[\_\_param\_target\]
target\_label: instance
\- action: labelmap
regex: \_\_meta\_kubernetes\_ingress\_label\_(.+)
\- source\_labels: \[\_\_meta\_kubernetes\_namespace\]
target\_label: kubernetes\_namespace
\- source\_labels: \[\_\_meta\_kubernetes\_ingress\_name\]
target\_label: kubernetes\_name
\- job\_name: 'kubernetes-pods'
kubernetes\_sd\_configs:
\- role: pod
relabel\_configs:
\- source\_labels: \[\_\_meta\_kubernetes\_pod\_annotation\_prometheus\_io\_scrape\]
action: keep
regex: true
\- source\_labels: \[\_\_meta\_kubernetes\_pod\_annotation\_prometheus\_io\_path\]
action: replace
target\_label: \_\_metrics\_path\_\_
regex: (.+)
\- source\_labels: \[\_\_address\_\_, \_\_meta\_kubernetes\_pod\_annotation\_prometheus\_io\_port\]
action: replace
regex: (\[^:\]+)(?::\\d+)?;(\\d+)
replacement: $1:$2
target\_label: \_\_address\_\_
\- action: labelmap
regex: \_\_meta\_kubernetes\_pod\_label\_(.+)
\- source\_labels: \[\_\_meta\_kubernetes\_namespace\]
action: replace
target\_label: kubernetes\_namespace
\- source\_labels: \[\_\_meta\_kubernetes\_pod\_name\]
action: replace
target\_label: kubernetes\_pod\_name
上面的job就是定義了你需要抓取的是什么數據
通過kubernetes-apiservers采集apiserver相關的性能指標數據
通過cadvisor采集容器相關的性能指標數據
等等
以上只安裝了一個prometheus的server,但是需要一個在被監控端收集數據的。
通過cadvisor采集容器相關的性能指標數據,已集成在kubelet上
通過prometheus-node-exporter采集主機的性能指標數據,需要部署在每個node上,必須以DaemonSet形式部署
通過kube-state-metrics采集K8S資源對象以及K8S組件的健康狀態指標數據,需要部署在每一個node上,**必須**以DaemonSet形式部署
\# vim node-exporter.yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
labels:
k8s-app: node-exporter
spec:
template:
metadata:
labels:
k8s-app: node-exporter
spec:
containers:
\- image: prom/node-exporter:v0.16.0
name: node-exporter
ports:
\- containerPort: 9100
hostPort: 9100
protocol: TCP
name: http
volumeMounts:
\- name: time
mountPath: /etc/localtime
readOnly: true
volumes:
\- name: time
hostPath:
path: /etc/localtime
\---
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: 'true'
prometheus.io/app-metrics: 'true'
prometheus.io/app-metrics-path: '/metrics'
labels:
k8s-app: node-exporter
name: node-exporter
namespace: monitoring
spec:
ports:
\- name: http
port: 9100
targetPort: 9100
protocol: TCP
selector:
k8s-app: node-exporter
**\# vim kube-state-metrics.yaml**
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-state-metrics
namespace: monitoring
spec:
replicas: 2
template:
metadata:
labels:
app: kube-state-metrics
spec:
serviceAccountName: kube-state-metrics
containers:
\- name: kube-state-metrics
image: gcr.io/google\_containers/kube-state-metrics:v0.5.0
ports:
\- containerPort: 8080
\---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-state-metrics
namespace: monitoring
\---
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: 'true'
name: kube-state-metrics
namespace: monitoring
labels:
app: kube-state-metrics
spec:
ports:
\- name: kube-state-metrics
port: 8080
protocol: TCP
selector:
app: kube-state-metrics
\---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: node-directory-size-metrics
namespace: monitoring
annotations:
description: |
This `DaemonSet` provides metrics in Prometheus format about disk usage on the nodes.
The container `read-du` reads in sizes of all directories below /mnt and writes that to `/tmp/metrics`. It only reports directories larger then `100M` for now.
The other container `caddy` just hands out the contents of that file on request via `http` on `/metrics` at port `9102` which are the defaults for Prometheus.
These are scheduled on every node in the Kubernetes cluster.
To choose directories from the node to check, just mount them on the `read-du` container below `/mnt`.
spec:
template:
metadata:
labels:
app: node-directory-size-metrics
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9102'
description: |
This `Pod` provides metrics in Prometheus format about disk usage on the node.
The container `read-du` reads in sizes of all directories below /mnt and writes that to `/tmp/metrics`. It only reports directories larger then `100M` for now.
The other container `caddy` just hands out the contents of that file on request on `/metrics` at port `9102` which are the defaults for Prometheus.
This `Pod` is scheduled on every node in the Kubernetes cluster.
To choose directories from the node to check just mount them on `read-du` below `/mnt`.
spec:
containers:
\- name: read-du
image: giantswarm/tiny-tools
imagePullPolicy: Always
command:
\- fish
\- --command
\- |
touch /tmp/metrics-temp
while true
for directory in (du --bytes --separate-dirs --threshold=100M /mnt)
echo $directory | read size path
echo "node\_directory\_size\_bytes{path=\\"$path\\"} $size" \\
\>> /tmp/metrics-temp
end
mv /tmp/metrics-temp /tmp/metrics
sleep 300
end
volumeMounts:
\- name: host-fs-var
mountPath: /mnt/var
readOnly: true
\- name: metrics
mountPath: /tmp
\- name: caddy
image: dockermuenster/caddy:0.9.3
command:
\- "caddy"
\- "-port=9102"
\- "-root=/var/www"
ports:
\- containerPort: 9102
volumeMounts:
\- name: metrics
mountPath: /var/www
volumes:
\- name: host-fs-var
hostPath:
path: /var
\- name: metrics
emptyDir:
medium: Memory
現在安裝安裝prometheus
\# kubectl create -f .
剛才設置了Prometheus的ingress,域名是prometheus.pkbeta.com
現在可以瀏覽器登陸域名查看了

1. 可以輸入promql查詢語句,查詢你需要的數據
2. 查詢按鍵
3. 列出可查詢的參數
4. 顯示查詢的數據
5. 以圖形顯示查詢的數據
**安裝****Grafana**
**基本概念**
1. 數據源\--------(grafana只是一個時序數據展現工具,它展現所需的時序數據有數據源提供)
2. 組織\-----------(grafana支持多組織,單個實例就可以服務多個相互之間不信任的組織)
3. 用戶\-----------(一個用戶可以屬于一個或者多個組織,且同一個用戶在不同的組中可以分配不同級別的權限)
4. 行\--------------(在儀表板中行是分割板,用于對面板進行分組)
5. 面板\-----------(面板是最基本的顯示單元,且每一個面板會提供一個查詢編輯器)
6. 查詢編輯器 \-(查詢編輯器暴露了數據源的能力,并且不同的數據源有不同的查詢編輯器)
7. 儀表板 ? ? ----(儀表板是將各種組件組合起來最終展現的地方)
\# vim grafana.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: grafana-core
namespace: monitoring
labels:
app: grafana
spec:
replicas: 1
template:
metadata:
labels:
app: grafana
spec:
containers:
\- image: grafana/grafana:4.2.0
name: grafana
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 100m
memory: 100Mi
env:
\- name: GF\_INSTALL\_PLUGINS
value: "alexanderzobnin-zabbix-app"
\- name: GF\_AUTH\_BASIC\_ENABLED
value: "true"
\- name: GF\_AUTH\_ANONYMOUS\_ENABLED
value: "false"
readinessProbe:
httpGet:
path: /login
port: 3000
volumeMounts:
\- name: grafana-persistent-storage
mountPath: /var
volumes:
\- name: grafana-persistent-storage
emptyDir: {}
\---
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: monitoring
labels:
app: grafana
spec:
type: ClusterIP
ports:
\- port: 3000
selector:
app: grafana
\---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: grafana
namespace: monitoring
spec:
rules:
\- host: grafana.pkbeta.com
http:
paths:
\- path: /
backend:
serviceName: grafana
servicePort: 3000
\# kubectl create -f grafana.yaml
上面定義了grafana的域名為grafana.pkbeta.com,用瀏覽器進入grafana的登錄頁面,默認賬戶密碼為admin/admin


現在還沒有數據源和圖形界面
首先添加數據源



Name:定義數據源的名字,自定義
Type:數據源的類型,選擇prometheus
Url:最好寫k8s內部的prometheus域名加端口
然后添加數據源。

現在有了數據源,還差個dashboard
可以導入別人做好的模板,也可以自己做一個



點擊圖片的標題就可以出現選項

把default換成prometheus

修改Query為自己想要查詢的promql

或者也可以導入其他人做好的模板


第一個是導入模板的文件
或者輸入導入模板的編號
或者直接粘貼模板的json
Name:dashboard模板的名稱
Prometheus:選擇數據源
