K8S 部署Prometheus、Grafana 和 Alertmanager(kube-prometheus方式部署)
myluzh 发布于 阅读:655 Kubernetes
0x01 关于 kube-prometheus
kube-prometheus 通过集成 Prometheus、Grafana 和 Alertmanager,提供开箱即用的 Kubernetes 原生监控解决方案,简化了部署和维护过程,同时具备良好的可扩展性和社区支持。
kube-prometheus 仓库地址 https://github.com/prometheus-operator/kube-prometheus/
0x02 下载 kube-prometheus
根据自己k8s集群版本兼容性,选择对应的kube-prometheus版本,我这边k8s集群版本是1.20,kube-prometheus支持的是release-0.8,关于k8s版本与kube-prometheus兼容性,可以在kube-prometheus仓库的README中查找。
root@test-k8s-master:~# git clone --single-branch --branch release-0.8 https://github.com/prometheus-operator/kube-prometheus.git
0x03 下拉镜像到私有仓(可选)
在kube-prometheus/manifests可以看到部署所需要的镜像,可以先提前拉下来上传到私有仓,然后修改镜像地址,避免Imagepullbackoff。
root@test-k8s-master:~# cd kube-prometheus/manifests/
root@test-k8s-master:~/kube-prometheus/manifests# grep "image: " * -r
alertmanager-alertmanager.yaml: image: quay.io/prometheus/alertmanager:v0.21.0
blackbox-exporter-deployment.yaml: image: quay.io/prometheus/blackbox-exporter:v0.18.0
blackbox-exporter-deployment.yaml: image: jimmidyson/configmap-reload:v0.5.0
blackbox-exporter-deployment.yaml: image: quay.io/brancz/kube-rbac-proxy:v0.8.0
grafana-deployment.yaml: image: grafana/grafana:7.5.4
kube-state-metrics-deployment.yaml: image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0
kube-state-metrics-deployment.yaml: image: quay.io/brancz/kube-rbac-proxy:v0.8.0
kube-state-metrics-deployment.yaml: image: quay.io/brancz/kube-rbac-proxy:v0.8.0
node-exporter-daemonset.yaml: image: quay.io/prometheus/node-exporter:v1.1.2
node-exporter-daemonset.yaml: image: quay.io/brancz/kube-rbac-proxy:v0.8.0
prometheus-adapter-deployment.yaml: image: directxman12/k8s-prometheus-adapter:v0.8.4
prometheus-prometheus.yaml: image: quay.io/prometheus/prometheus:v2.26.0
setup/prometheus-operator-deployment.yaml: image: quay.io/prometheus-operator/prometheus-operator:v0.47.0
setup/prometheus-operator-deployment.yaml: image: quay.io/brancz/kube-rbac-proxy:v0.8.0
在可以访问远程仓库的机器上pull下所有,然后push到私有仓。chmod +x mirror_images.sh && ./mirror_images.sh
# 把远程仓库镜像地址填进去
images=(
quay.io/prometheus/alertmanager:v0.21.0
quay.io/prometheus/blackbox-exporter:v0.18.0
jimmidyson/configmap-reload:v0.5.0
quay.io/brancz/kube-rbac-proxy:v0.8.0
grafana/grafana:7.5.4
k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0
quay.io/prometheus/node-exporter:v1.1.2
directxman12/k8s-prometheus-adapter:v0.8.4
quay.io/prometheus/prometheus:v2.26.0
quay.io/prometheus-operator/prometheus-operator:v0.47.0
)
# 在可以访问远程镜像的地址上 下拉所有镜像
echo "${images[@]}" | xargs -n 1 -P 10 docker pull
# 上传到私有仓
for image in "${images[@]}"; do
# 生成新镜像的 tag
new_image="172.30.82.223:5443/kube-prometheus/${image#*/}"
# 给镜像打 tag
docker tag "$image" "$new_image"
# 将镜像 push 到私有仓库
docker push "$new_image"
echo "Pushed $new_image"
done
然后把manifests目录里面的yaml镜像地址改成私有仓的地址
# 在/kube-prometheus/manifests目录,用sed来批量替换镜像地址
root@test-k8s-master:~/kube-prometheus/manifests# sed -i 's#image: \([^/]*\)/#image: 172.30.82.223:5443/kube-prometheus/#g' *.yaml
# 查看是否成功
root@test-k8s-master:~/kube-prometheus/manifests# grep "image: " * -r
alertmanager-alertmanager.yaml: image: 172.30.82.223:5443/kube-prometheus/prometheus/alertmanager:v0.21.0
...
0x03 开始安装
首先进入到kube-prometheus/manifests/setup目录安装operator
root@test-k8s-master:~/kube-prometheus/manifests # cd setup
root@test-k8s-master:~/kube-prometheus/manifests/setup# kubectl create -f .r
然后进入kube-prometheus/manifests目录开始部署
root@test-k8s-master:~/kube-prometheus/manifests# kubectl create -f .
# 大概有90多个created,看看有没有失败或者错误的就好了。
namespace/monitoring created
customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
...
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus-k8s created
alertmanager-main、grafana、prometheus-k8s 这三个svc,可以修改成NodePort类型供外部访问。或者分配域名。
root@test-k8s-master:~/kube-prometheus/manifests# kubectl edit svc alertmanager-main -n monitoring
...
type: NodePort
...
查看nodeport端口,访问页面即可。
root@test-k8s-master:~/kube-prometheus/manifests# kubectl get svc -n monitoring | grep -E 'alertmanager-main|grafana|prometheus-k8s'
alertmanager-main NodePort 10.43.108.198 <none> 9093:31790/TCP 53m
grafana NodePort 10.43.110.211 <none> 3000:30321/TCP 53m
prometheus-k8s NodePort 10.43.159.36 <none> 9090:30435/TCP 53m
如果配置了nodeport还无法访问grafan、prometheus以及alertmanger,是因为prometheus operator内部默认配置了NetworkPolicy,需要删除其对应的资源,才可以通过外网访问
kubectl delete -f manifests/prometheus-networkPolicy.yaml
kubectl delete -f manifests/grafana-networkPolicy.yaml
kubectl delete -f manifests/alertmanager-networkPolicy.yaml
0x04 测试
查看prometheus svc 端口
root@test-k8s-master:~/kube-prometheus/manifests# kubectl get svc -n monitoring | grep prometheus
...
prometheus-k8s ClusterIP 10.43.192.230 <none> 9090/TCP 69s
...
获取Prometheus服务器的健康状态并获取其度量数据(HELP开头是指标说明,TYPE开头是指标类型)
root@test-k8s-master:~/kube-prometheus/manifests# curl 10.43.192.230:9090/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 5.1543e-05
go_gc_duration_seconds{quantile="0.25"} 0.000103596
go_gc_duration_seconds{quantile="0.5"} 0.000119353
go_gc_duration_seconds{quantile="0.75"} 0.000148745
go_gc_duration_seconds{quantile="1"} 0.000440788
go_gc_duration_seconds_sum 0.007106225
go_gc_duration_seconds_count 52
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 235
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.16.2"} 1
...
0x05 关于 Prometheus metrics
关于 Prometheus 指标类型
Counter:只增不减,适用于统计总数(如 HTTP 请求次数)。
Gauge:可增可减,表示当前值(如内存使用率、并发量等)。
Histogram:用于收集样本分布,通常按区间统计(如请求延迟)。
Summary:提供样本的快速统计,包括百分位数等,适合实时性能分析。
例如:找到一个指标为 Summary 类型的
root@test-k8s-master:~/kube-prometheus/manifests# curl 10.43.192.230:9090/metrics | grep prometheus_target_interval_length_seconds -A5
# TYPE prometheus_target_interval_length_seconds summary
prometheus_target_interval_length_seconds{interval="15s",quantile="0.01"} 14.999101165 # 1% 分位数,响应时间约 14.9991 秒
prometheus_target_interval_length_seconds{interval="15s",quantile="0.05"} 14.999191173 # 5% 分位数,响应时间约 14.9992 秒
prometheus_target_interval_length_seconds{interval="15s",quantile="0.5"} 14.99998389 # 50% 分位数(中位数),响应时间约 14.9999 秒
prometheus_target_interval_length_seconds{interval="15s",quantile="0.9"} 15.00071464 # 90% 分位数,响应时间约 15.0007 秒
prometheus_target_interval_length_seconds{interval="15s",quantile="0.99"} 15.000961493 # 99% 分位数,响应时间约 15.0010 秒
prometheus_target_interval_length_seconds_sum{interval="15s"} 1485.0014637670001 # 在 15 秒间隔内的总响应时间约 1485.0015 秒
参考文章:
kube-prometheus部署(无坑版):https://blog.csdn.net/slc09/article/details/132571091
一文搞懂 Prometheus 的直方图:https://www.cnblogs.com/ryanyangcs/p/11309373.html
k8s k8s部署 kube-prometheus Prometheus Grafana Alertmanager