部署 Doris 集群监控
本文介绍如何对通过Doris Operator 部署的Doris 集群进行监控及日志集中查询。
配置 DorisMonitor
Doris Operator 通过 Prometheus 收集 Doris 集群指标,通过 Loki 收集 Doris 集群日志,并在 Grafana 提供统一的可视化界面。
在通过Doris Operator 创建新的 Doris 集群时,可以对于每个Doris 集群,创建、配置一套独立的监控系统,与Doris 集群运行在同一 Namespace,包括 Prometheus、Grafana、Loki、Promtail 四个组件。
DorisInitializer
CR 定义了 Doris 可视化组件的配置:
A basic DorisMonitor CR sample
# IT IS NOT SUITABLE FOR PRODUCTION USE.
# This YAML describes a basic Doris monitor components with minimum resource requirements,
# which should be able to run in any Kubernetes cluster with storage support.
apiVersion: al-assad.github.io/v1beta1
kind: DorisMonitor
metadata:
name: basic-monitor
spec:
# The doris cluster name to be monitored
cluster: basic
prometheus:
image: prom/prometheus:v2.37.8
# The retention time of the prometheus data in the storage
retentionTime: 15d
# The storage size of prometheus persistent data at pvc.
# It is recommended to be greater than 50Gi in the production env.
requests:
storage: 5Gi
grafana:
image: grafana/grafana:9.5.2
# The default admin user and password of grafana (optional)
adminUser: admin
adminPassword: admin
# The storage size of grafana persistent data at pvc.
# It is recommended to be 10Gi in the production env.
requests:
storage: 1Gi
loki:
image: grafana/loki:2.9.1
# The retention time of the loki data in the storage
retentionTime: 15d
# The storage size of loki persistent data at pvc.
# It is recommended to be greater than 50Gi in the production env.
requests:
storage: 5Gi
promtail:
image: grafana/promtail:2.9.1
A advanced DorisMonitor CR sample
apiVersion: al-assad.github.io/v1beta1
kind: DorisMonitor
metadata:
name: basic-monitor
spec:
## The doris cluster name to be monitored
cluster: basic
## ImagePullPolicy of Doris monitor Pods
## Ref: https://kubernetes.io/docs/concepts/configuration/overview/#container-images
# imagePullPolicy: IfNotPresent
## Ref: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
# imagePullSecrets:
# - name: secretName
## The storageClassName of the persistent volume for prometheus/grafana/loki data storage.
# Kubernetes default storage class is used if not setting this field.
# storageClassName: ""
## Specifies the service account for prometheus/grafana/loki/promtail components.
# serviceAccount: ""
## Whether to disable Loki for log collection
# disableLoki: false
## NodeSelector of pods。
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
# nodeSelector:
# node-role.kubernetes.io/doris-monitor: true
###########################
# Prometheus Configuration #
###########################
prometheus:
## Image of the prometheus
image: prom/prometheus:v2.37.8
## The retention time of the prometheus data in the storage
## When this field is not set, all data from Prometheus will be retained.
# retentionTime: 15d
## The resource requirements
## Ref: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
requests:
# cpu: 500m
# memory: 500Mi
## The storage size of prometheus,
# it is recommended to be greater than 50Gi in the production env.
storage: 5Gi
## Describes the resource limit
# limits:
# cpu: 4
# memory: 8Gi
## Defines Kubernetes service for prometheus-service
# service:
# type: NodePort
# httpPort: 0
## NodeSelector of pods。
# nodeSelector: {}
########################
# Grafana Configuration #
########################
grafana:
## Image of the grafana
image: grafana/grafana:9.5.2
## The default admin user and password of grafana (optional)
# adminUser: admin
# adminPassword: admin
## Describes the resource requirements
## Ref: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
requests:
# cpu: 250m
# memory: 500Mi
## It is recommended to be 10Gi in the production env.
storage: 1Gi
## Describes the resource limit
# limits:
# cpu: 4
# memory: 8Gi
## The storageClassName of the persistent volume for grafana data storage.
# storageClassName: ""
## Defines Kubernetes service for grafana-service
# service:
# type: NodePort
# httpPort: 0
## NodeSelector of pods。
# nodeSelector: {}
#####################
# Loki Configuration #
#####################
loki:
## Image of the loki
image: grafana/loki:2.9.1
## The retention time of the loki data in the storage
## When this field is not set, all data from Loki will be retained.
retentionTime: 15d
## Describes the resource requirements
## Ref: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
requests:
# cpu: 500m
# memory: 500Mi
## It is recommended to be greater than 50Gi in the production env.
storage: 5Gi
## Describes the resource limit
# limits:
# cpu: 4
# memory: 8Gi
## The storageClassName of the persistent volume for grafana data storage.
# storageClassName: ""
## NodeSelector of pods。
# nodeSelector: {}
#########################
# Promtail Configuration #
#########################
promtail:
## Image of the promtail
image: grafana/promtail:2.9.1
## The resource requirements
# requests:
# cpu: 250m
# memory: 256Mi
# limits:
# cpu: 4
# memory: 8Gi
${cluster_name}
目录下组织 Doris 集群的配置,并将其另存为 ${cluster_name}/doris-monitor.yaml
。存储
spec.storageClassName
定义了监控组件的存储类型,参考存储配置文档。
spec:
# ...
storageClassName: ${storageClassName}
spec.<prometheus/grafana/loki>.requests.storage
定义了 Prometheus、Loki、Grafana 的持久存储大小。请根据您的数据保留时间选择合适的大小,以下是生产环境的建议:
- prometheus: 50Gi 以上;
- loki:50Gi 以上;
- grafana:5Gi
spec:
# ...
prometheus:
requests:
storage: 50Gi
grafana:
requests:
storage: 5Gi
loki:
requests:
storage: 50Gi
数据保留时间
可以通过 spec.<prometheus/loki>.retentionTime
配置 Prometheus,Loki 组件的数据保留时间,当不设置该值时,Prometheus 和
Loki 的数据会永久保留在对应绑定的 PVC 上。
以下示例设置了 prometheus、loki 的数据保留时间为 15 天:
spec:
# ...
prometheus:
retentionTime: 15d
loki:
retentionTime: 15d
部署 DorisMonitor
kubectl apply -f ${cluster_name}/doris-monitor.yaml --namespace=${namespace}
查看 monitor 组件的运行情况:
kubectl get dorismonitor ${dorismonitor_name} -n ${namespace} -o yaml
访问 DorisMonitor
访问 Grafana 面板
可以通过 kubectl port-forward
访问 Grafana 监控面板:
kubectl port-forward -n ${namespace} svc/${dorismonitor_name}-grafana 3000:3000
然后在浏览器中打开 http://localhost:3000,默认用户名和密码都为 admin
。
也可以设置 spec.grafana.service.type
为 NodePort
,通过 NodePort
查看监控面板。
访问 Prometheus 监控数据
对于需要直接访问监控数据的情况,可以通过 kubectl port-forward
来访问 Prometheus:
kubectl port-forward -n ${namespace} svc/${dorismonitor_name}-prometheus 9090:9090
然后在浏览器中打开 http://localhost:9090,或通过客户端工具访问此地址即可。
也可以设置 spec.prometheus.service.type
为 NodePort
,通过 NodePort
访问监控数据。