K8s 部署 OpenTelemetry v0.41.0(适用于 Kubernetes v1.20 to v1.22)
myluzh 发布于 阅读:31 Kubernetes
0x00 前言
OpenTelemetry Operator:v0.41.0(api-versions:opentelemetry.io/v1alpha1)
Kubernetes:v1.20 to v1.22
Cert-Manager:1.6.1
业务 Pod 的遥测数据流向为:JavaAgent(自动注入) -> Sidecar Collector -> Center Collector -> 观测后端 (Loki/Jaeger)。
0x01 安装 opentelemetry operator
# 安装cert-manager
root@iZbp12bkuvg20e1j3y9gtxZ:~/k8s-yaml/opentelemetry# wget -O cert-manager-v1.6.1.yaml https://github.com/cert-manager/cert-manager/releases/download/v1.6.1/cert-manager.yaml
root@iZbp12bkuvg20e1j3y9gtxZ:~/k8s-yaml/opentelemetry# kubectl apply -f cert-manager-v1.6.1.yaml
root@iZbp12bkuvg20e1j3y9gtxZ:~/k8s-yaml/opentelemetry# kubectl get pod -n cert-manager
NAME READY STATUS RESTARTS AGE
cert-manager-55658cdf68-4crgm 1/1 Running 0 24s
cert-manager-cainjector-967788869-dwlb2 1/1 Running 0 24s
cert-manager-webhook-7b86bc6578-l6xr4 1/1 Running 0 24s
# 安装opentelemetry-operator
root@iZbp12bkuvg20e1j3y9gtxZ:~/k8s-yaml/opentelemetry# wget -O opentelemetry-operator-v0.41.0.yaml https://github.com/open-telemetry/opentelemetry-operator/releases/download/v0.41.0/opentelemetry-operator.yaml
root@iZbp12bkuvg20e1j3y9gtxZ:~/k8s-yaml/opentelemetry# kubectl apply -f opentelemetry-operator-v0.41.0.yaml
# 创建命名空间
root@iZbp12bkuvg20e1j3y9gtxZ:~# kubectl create ns opentelemetry
0x02 部署Collector
1、部署center
在 loki3.0 之前的版本,使用lokiexporter导出,参考文档:https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/lokiexporter
但是 3.0 后的版本已弃用 lokiExporter,需要使用 otlphttp 导出,参考文档:https://grafana.com/docs/loki/latest/send-data/otel/
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: center
namespace: opentelemetry
spec:
mode: deployment
image: registry.sxhlcloud.com:5443/base/otel/opentelemetry-collector-contrib:0.44.0 # contrib 镜像支持的更多
config: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317 # 启用 gRPC
http:
endpoint: 0.0.0.0:4318 # 启动 HTTP
processors: # 用于处理收集到的数据
batch: {} # 分批发送提高效率
exporters:
# 1. 调试用 (生产环境可设为 info 或关掉)
logging:
loglevel: debug
# 2. 链路 -> Jaeger
otlp/jaeger:
endpoint: "jaeger-collector.opentelemetry.svc.cluster.local:4317"
tls:
insecure: true # 跳过证书验证
# 3. 日志 -> Loki
#loki:
# endpoint: "http://loki-gateway.loki.svc.cluster.local/loki/api/v1/push"
otlphttp/loki:
endpoint: "http://loki-gateway.loki.svc.cluster.local/otlp/v1/logs"
tls:
insecure: true
# 4. 指标 -> Prometheus (暴露端点等待拉取),需要在 Center Collector 的 Service (k8s service) 中暴露 8889 端口,否则 Prometheus Server 抓不到数据。
prometheus:
endpoint: "0.0.0.0:8889"
service:
pipelines:
# 链路:发给 Jaeger
#traces:
# receivers: [otlp]
# processors: [batch]
# exporters: [logging, otlp/jaeger]
# 指标:暴露给 Prometheus
#metrics:
# receivers: [otlp]
# processors: [batch]
# exporters: [logging, prometheus]
# 日志:发给 Loki
logs:
receivers: [otlp]
processors: [batch] # 批处理.提高效率
exporters: [logging, otlphttp/loki]
2、部署sidecar
这个版本的OpenTelemetry Operator,Sidecar 模式的 Collector 必须与其监控的应用程序部署在同一命名空间中。
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: sidecar
namespace: xfsh # 业务应用所在的命名空间
spec:
mode: sidecar
image: registry.sxhlcloud.com:5443/base/otel/opentelemetry-collector-contrib:0.44.0
config: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 1s
send_batch_size: 100
exporters:
otlp:
# 指向 Center Collector
endpoint: "http://center-collector.opentelemetry.svc.cluster.local:4317"
tls:
insecure: true
service:
pipelines:
# 1. 链路
#traces:
# receivers: [otlp]
# processors: [batch]
# exporters: [otlp]
# 2. 日志
logs:
receivers: [otlp]
processors: [batch]
exporters: [otlp]
# 3. 指标
#metrics:
# receivers: [otlp]
# processors: [batch]
# exporters: [otlp]
0x03 自动埋点
1、编写 Instrumentation
Instrumentation也需要与其监控的应用程序部署在同一命名空间中。
由于 Operator v0.41.0 对协议参数支持有限,在Instrumentation不能显示指定通过grpc传输,我一开始尝试使用gRPC 4317,agent还是通过http传输,然后报错。为了避免错误,所以使用http4318,把数据给到sidecar。
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: xfsh-instrumentation
namespace: xfsh # 业务应用所在的命名空间
spec:
exporter:
endpoint: http://localhost:4318
propagators:
- tracecontext
- baggage
- b3
sampler:
type: parentbased_traceidratio
argument: "1"
java:
image: registry.sxhlcloud.com:5443/base/otel/autoinstrumentation-java:2.23.0
2、设置应用自动埋点
1、添加注解
可以直接在需要的deployment或者整个ns 添加注解sidecar.opentelemetry.io/inject: "true"
metadata:
annotations:
sidecar.opentelemetry.io/inject: true # 注入Sidecar容器
instrumentation.opentelemetry.io/inject-java: xfsh-instrumentation # 上面创建 instrumentation 的名称,向业务容器注入定制化的Agent。
2、重新部署
完成后,把旧的pod删掉,新起来的pod就会自带sidecar,自动注入agent。