Prometheus Blackbox(黑盒监控)
作者:myluzh 分类: Kubernetes 长度:8097 阅读:432
0x00 前言
白盒监控:主要通过应用程序内部的指标(如 Prometheus 的 /metrics 接口)监测系统性能,提供深入的技术洞察。
黑盒监控:主要通过 HTTP 请求、TCP 测试等外部监控手段获取数据,关注系统外部行为和功能,往往不需要访问应用程序的内部结构。
通过kube-prometheus方式部署,默认就有blackbox-exporter,如果没有的话手动部署下。
0x01 手动部署 blackbox_exporter
github地址:https://github.com/prometheus/blackbox_exporter
下面的 blackbox.yml 也可以直接下载下来导入
# curl -o blackbox.yml https://raw.githubusercontent.com/prometheus/blackbox_exporter/master/blackbox.yml
# kubectl get configmap blackbox-exporter-config --namespace=monitoring
apiVersion: apps/v1
kind: Deployment
metadata:
name: blackbox-exporter
namespace: monitoring
labels:
app: blackbox-exporter
spec:
replicas: 1
selector:
matchLabels:
app: blackbox-exporter
template:
metadata:
labels:
app: blackbox-exporter
spec:
containers:
- name: blackbox-exporter
image: quay.io/prometheus/blackbox-exporter:latest
ports:
- containerPort: 9115
# livenessProbe:
# httpGet:
# path: /-/healthy
# port: 9115
# initialDelaySeconds: 30
# periodSeconds: 10
# readinessProbe:
# httpGet:
# path: /-/ready
# port: 9115
# initialDelaySeconds: 30
# periodSeconds: 10
volumeMounts:
- name: config-volume
mountPath: /etc/blackbox_exporter
volumes:
- name: config-volume
configMap:
name: blackbox-exporter-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: blackbox-exporter-config
namespace: monitoring
data:
blackbox.yml: |
modules:
http_2xx:
prober: http
http:
preferred_ip_protocol: "ip4"
http_post_2xx:
prober: http
http:
method: POST
tcp_connect:
prober: tcp
pop3s_banner:
prober: tcp
tcp:
query_response:
- expect: "^+OK"
tls: true
tls_config:
insecure_skip_verify: false
grpc:
prober: grpc
grpc:
tls: true
preferred_ip_protocol: "ip4"
grpc_plain:
prober: grpc
grpc:
tls: false
service: "service1"
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
- send: "SSH-2.0-blackbox-ssh-check"
ssh_banner_extract:
prober: tcp
timeout: 5s
tcp:
query_response:
- expect: "^SSH-2.0-([^ -]+)(?: (.*))?$"
labels:
- name: ssh_version
value: "${1}"
- name: ssh_comments
value: "${2}"
irc_banner:
prober: tcp
tcp:
query_response:
- send: "NICK prober"
- send: "USER prober prober prober :prober"
- expect: "PING :([^ ]+)"
send: "PONG ${1}"
- expect: "^:[^ ]+ 001"
icmp:
prober: icmp
icmp_ttl5:
prober: icmp
timeout: 5s
icmp:
ttl: 5
---
apiVersion: v1
kind: Service
metadata:
name: blackbox-exporter
namespace: monitoring
labels:
app: blackbox-exporter
spec:
ports:
- port: 9115
targetPort: 9115
protocol: TCP
selector:
app: blackbox-exporter
type: ClusterIP
测试 Blackbox Exporter 是否能够成功探测并访问指定的目标网址:
root@k8s-master:~/kube-prometheus/manifests# kubectl get svc -n monitoring | grep black
blackbox-exporter ClusterIP 10.43.9.81 <none> 9115/TCP,19115/TCP 6d1h
root@k8s-master:~/kube-prometheus/manifests# curl http://10.43.9.81:19115/probe?target=http://www.baidu.com&module=http_2xx
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
...
0x02 创建抓取配置(Additional)
1、编写additional-scrape-configs.yaml 下面的blackbox exporter地址记得改,我这边集群blackbox exporter http是19115,有些是9115
root@k8s-master:~/prom# vi prometheus-additional.yaml
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://www.baidu.com
- http://dxp.test.sxhlcloud.com/
- http://dxp.test.sxhlcloud.com/prod-api/code
- http://dxp.sxhlcloud.com/
- http://dxp.sxhlcloud.com/prod-api/code
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:19115
#- source_labels: [instance]
# regex: 'https?://([^/]+)/.*'
# target_label: target
# replacement: '$1'
- source_labels: [instance]
target_label: target
- job_name: 'blackbox_exporter'
static_configs:
- targets: ['blackbox-exporter:19115']
2、把编写好的创建成secret
root@k8s-master:~/prom# kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml --dry-run=client -o yaml > additional-scrape-configs.yaml
root@k8s-master:~/prom# kubectl create -f additional-scrape-configs.yaml -n monitoring
secret/additional-scrape-configs created
root@k8s-master:~/prom# kubectl get secret -n monitoring | grep additional
additional-scrape-configs Opaque 1 4m12s
3、编辑名为 k8s 的 Prometheus 资源,添加 additionalScrapeConfigs 参数来引用 prometheus-additional.yaml 的内容。
编辑成功会提示 prometheus.monitoring.coreos.com/k8s edited
root@k8s-master:~/kube-prometheus/manifests# kubectl get Prometheus -n monitoring
NAME VERSION REPLICAS AGE
k8s 2.26.0 2 6d18h
root@k8s-maste:~/kube-prometheus/manifests# kubectl edit Prometheus k8s -n monitoring
spec:
additionalScrapeConfigs:
key: prometheus-additional.yaml
name: additional-scrape-configs
4、在Prometheus-》Status-》Targets就可以看到,如果没有的话,重启下prometheus跟operator或者reload下prometheus。
kubectl rollout restart deployment prometheus-operator -n monitoring
kubectl rollout restart statefulset prometheus-k8s -n monitoring
# Prometheus 需要手动重新加载配置文件。可以通过访问 Prometheus 的 /-/reload API 手动触发重新加载。也可以通过 Prometheus 的 Web UI 手动重新加载配置。在页面右上角找到“Reload Configuration”。
curl -X POST http://172.30.233.87:30926/-/reload
5、也可以通过curl查询Prometheus 中的 probe_success 指标,通过黑盒监控的指标都是probe开头的
root@k8s-master:~/prom# curl -G 'http://10.43.162.41:9090/api/v1/query' --data-urlencode 'query=probe_success'
{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"probe_success","instance":"http://prometheus.io","job":"blackbox"},"value":[1730171008.843,"1"]},{"metric":{"__name__":"probe_success","instance":"http://secret.cn","job":"blackbox"},"value":[1730171008.843,"1"]},{"metric":{"__name__":"probe_success","instance":"https://prometheus.io","job":"blackbox"},"value":[1730171008.843,"1"]},{"metric":{"__name__":"probe_success","instance":"https://www.itho.cn","job":"blackbox"},"value":[1730171008.843,"0"]}]}}
6、常见常见的 probe_ 指标
probe_success:表示探测(probe)是否成功。值为 1 表示成功,0 表示失败。
probe_duration_seconds:探测请求的持续时间,单位为秒。用于衡量响应时间。
probe_http_status_code:探测到的 HTTP 状态码。这可以用来确定目标服务的健康状态(例如,200 表示成功,404 表示未找到,500 表示服务器错误等)。
probe_ssl_earliest_cert_not_after:SSL 证书的到期时间,表示探测的目标服务使用的 SSL 证书的有效性。
probe_ssl_cert_not_after:SSL 证书的到期时间,表示探测的目标服务的 SSL 证书何时失效。
0x03 Grafana添加Blackbox Dashboards
https://grafana.com/grafana/dashboards/14928-prometheus-blackbox-exporter/
Prometheus blackbox additional 监控 黑盒监控 白盒监控 指标监控