Myluzh Blog

K8S部署Prometheus、Grafana 和 Alertmanager(kube-prometheus部署)

发布时间: 2024-10-21 文章作者: myluzh 分类名称: Kubernetes 朗读文章


0x01 关于 kube-prometheus
kube-prometheus 通过集成 Prometheus、Grafana 和 Alertmanager,提供开箱即用的 Kubernetes 原生监控解决方案,简化了部署和维护过程,同时具备良好的可扩展性和社区支持。
kube-prometheus 仓库地址 https://github.com/prometheus-operator/kube-prometheus/

0x02 下载 kube-prometheus
根据自己k8s集群版本兼容性,选择对应的kube-prometheus版本,我这边k8s集群版本是1.20,kube-prometheus支持的是release-0.8,关于k8s版本与kube-prometheus兼容性,可以在kube-prometheus仓库的README中查找。
root@test-k8s-master:~# git clone --single-branch --branch release-0.8 https://github.com/prometheus-operator/kube-prometheus.git

0x03 下拉镜像到私有仓(可选)

在kube-prometheus/manifests可以看到部署所需要的镜像,可以先提前拉下来上传到私有仓,然后修改镜像地址,避免Imagepullbackoff。
root@test-k8s-master:~# cd kube-prometheus/manifests/
root@test-k8s-master:~/kube-prometheus/manifests# grep "image: " * -r
alertmanager-alertmanager.yaml:  image: quay.io/prometheus/alertmanager:v0.21.0
blackbox-exporter-deployment.yaml:        image: quay.io/prometheus/blackbox-exporter:v0.18.0
blackbox-exporter-deployment.yaml:        image: jimmidyson/configmap-reload:v0.5.0
blackbox-exporter-deployment.yaml:        image: quay.io/brancz/kube-rbac-proxy:v0.8.0
grafana-deployment.yaml:        image: grafana/grafana:7.5.4
kube-state-metrics-deployment.yaml:        image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0
kube-state-metrics-deployment.yaml:        image: quay.io/brancz/kube-rbac-proxy:v0.8.0
kube-state-metrics-deployment.yaml:        image: quay.io/brancz/kube-rbac-proxy:v0.8.0
node-exporter-daemonset.yaml:        image: quay.io/prometheus/node-exporter:v1.1.2
node-exporter-daemonset.yaml:        image: quay.io/brancz/kube-rbac-proxy:v0.8.0
prometheus-adapter-deployment.yaml:        image: directxman12/k8s-prometheus-adapter:v0.8.4
prometheus-prometheus.yaml:  image: quay.io/prometheus/prometheus:v2.26.0
setup/prometheus-operator-deployment.yaml:        image: quay.io/prometheus-operator/prometheus-operator:v0.47.0
setup/prometheus-operator-deployment.yaml:        image: quay.io/brancz/kube-rbac-proxy:v0.8.0
在可以访问远程仓库的机器上pull下所有,然后push到私有仓。chmod +x mirror_images.sh && ./mirror_images.sh
# 把远程仓库镜像地址填进去
images=(
quay.io/prometheus/alertmanager:v0.21.0
quay.io/prometheus/blackbox-exporter:v0.18.0
jimmidyson/configmap-reload:v0.5.0
quay.io/brancz/kube-rbac-proxy:v0.8.0
grafana/grafana:7.5.4
k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.0.0
quay.io/prometheus/node-exporter:v1.1.2
directxman12/k8s-prometheus-adapter:v0.8.4
quay.io/prometheus/prometheus:v2.26.0
quay.io/prometheus-operator/prometheus-operator:v0.47.0
)

# 在可以访问远程镜像的地址上 下拉所有镜像
echo "${images[@]}" | xargs -n 1 -P 10 docker pull

# 上传到私有仓
for image in "${images[@]}"; do
    # 生成新镜像的 tag
    new_image="172.30.82.223:5443/kube-prometheus/${image#*/}"
    # 给镜像打 tag
    docker tag "$image" "$new_image"
    # 将镜像 push 到私有仓库
    docker push "$new_image"
    echo "Pushed $new_image"
done
然后把manifests目录里面的yaml镜像地址改成私有仓的地址
# 在/kube-prometheus/manifests目录,用sed来批量替换镜像地址
root@test-k8s-master:~/kube-prometheus/manifests# sed -i 's#image: \([^/]*\)/#image: 172.30.82.223:5443/kube-prometheus/#g' *.yaml
# 查看是否成功
root@test-k8s-master:~/kube-prometheus/manifests# grep "image: " * -r
alertmanager-alertmanager.yaml:  image: 172.30.82.223:5443/kube-prometheus/prometheus/alertmanager:v0.21.0
...


0x03 开始安装

首先进入到kube-prometheus/manifests/setup目录安装operator
root@test-k8s-master:~/kube-prometheus/manifests # cd setup
root@test-k8s-master:~/kube-prometheus/manifests/setup# kubectl create -f .r
然后进入kube-prometheus/manifests目录开始部署
root@test-k8s-master:~/kube-prometheus/manifests# kubectl creart -f .
# 大概有90多个created,看看有没有失败或者错误的就好了。
namespace/monitoring created
customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
... 
service/prometheus-k8s created
serviceaccount/prometheus-k8s created
servicemonitor.monitoring.coreos.com/prometheus-k8s created
alertmanager-main、grafana、prometheus-k8s 这三个svc,可以修改成NodePort类型供外部访问。或者分配域名。
root@test-k8s-master:~/kube-prometheus/manifests# kubectl edit svc alertmanager-main -n monitoring
...
  type: NodePort
...
查看nodeport端口,访问页面即可。
root@test-k8s-master:~/kube-prometheus/manifests# kubectl get svc -n  monitoring | grep -E 'alertmanager-main|grafana|prometheus-k8s'
alertmanager-main       NodePort    10.43.108.198   <none>        9093:31790/TCP               53m
grafana                 NodePort    10.43.110.211   <none>        3000:30321/TCP               53m
prometheus-k8s          NodePort    10.43.159.36    <none>        9090:30435/TCP               53m
如果配置了nodeport还无法访问grafan、prometheus以及alertmanger,是因为prometheus operator内部默认配置了NetworkPolicy,需要删除其对应的资源,才可以通过外网访问
kubectl delete -f manifests/prometheus-networkPolicy.yaml
kubectl delete -f manifests/grafana-networkPolicy.yaml
kubectl delete -f manifests/alertmanager-networkPolicy.yaml

0x04 测试
查看prometheus svc 端口
root@test-k8s-master:~/kube-prometheus/manifests# kubectl get svc -n monitoring | grep prometheus
...
prometheus-k8s          ClusterIP   10.43.192.230   <none>        9090/TCP                     69s
...
获取Prometheus服务器的健康状态并获取其度量数据(HELP开头是指标说明,TYPE开头是指标类型)
root@test-k8s-master:~/kube-prometheus/manifests# curl 10.43.192.230:9090/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 5.1543e-05
go_gc_duration_seconds{quantile="0.25"} 0.000103596
go_gc_duration_seconds{quantile="0.5"} 0.000119353
go_gc_duration_seconds{quantile="0.75"} 0.000148745
go_gc_duration_seconds{quantile="1"} 0.000440788
go_gc_duration_seconds_sum 0.007106225
go_gc_duration_seconds_count 52
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 235
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.16.2"} 1
...

0x05 关于 
Prometheus metrics
关于 Prometheus 指标类型
Counter:只增不减,适用于统计总数(如 HTTP 请求次数)。
Gauge:可增可减,表示当前值(如内存使用率、并发量等)。
Histogram:用于收集样本分布,通常按区间统计(如请求延迟)。
Summary:提供样本的快速统计,包括百分位数等,适合实时性能分析。 

例如:找到一个指标为 Summary 类型的
root@test-k8s-master:~/kube-prometheus/manifests# curl 10.43.192.230:9090/metrics | grep prometheus_target_interval_length_seconds -A5
# TYPE prometheus_target_interval_length_seconds summary
prometheus_target_interval_length_seconds{interval="15s",quantile="0.01"} 14.999101165  # 1% 分位数,响应时间约 14.9991 秒
prometheus_target_interval_length_seconds{interval="15s",quantile="0.05"} 14.999191173  # 5% 分位数,响应时间约 14.9992 秒
prometheus_target_interval_length_seconds{interval="15s",quantile="0.5"} 14.99998389  # 50% 分位数(中位数),响应时间约 14.9999 秒
prometheus_target_interval_length_seconds{interval="15s",quantile="0.9"} 15.00071464  # 90% 分位数,响应时间约 15.0007 秒
prometheus_target_interval_length_seconds{interval="15s",quantile="0.99"} 15.000961493  # 99% 分位数,响应时间约 15.0010 秒
prometheus_target_interval_length_seconds_sum{interval="15s"} 1485.0014637670001  # 在 15 秒间隔内的总响应时间约 1485.0015 秒




参考文章:
kube-prometheus部署(无坑版):https://blog.csdn.net/slc09/article/details/132571091
一文搞懂 Prometheus 的直方图:https://www.cnblogs.com/ryanyangcs/p/11309373.html


标签: k8s k8s部署 kube-prometheus Prometheus Grafana Alertmanager

发表评论