Deploy Monitor for Doris Cluster
This document describes how to monitor a Doris clusters deployed through Doris Operator.
Configure the DorisMonitor
Doris Operator collects Doris cluster metrics through Prometheus and gathers Doris cluster logs using Loki, providing a unified visualization interface through Grafana.
When creating a new Doris cluster through Doris Operator, it’s possible to create and configure an independent monitoring system for each Doris cluster. This monitoring system operates within the same namespace as the Doris cluster and consists of four components: Prometheus, Grafana, Loki, and Promtail.
The DorisInitializer
CR defines the configuration for Doris visualization components:
A basic DorisMonitor CR sample
# IT IS NOT SUITABLE FOR PRODUCTION USE.
# This YAML describes a basic Doris monitor components with minimum resource requirements,
# which should be able to run in any Kubernetes cluster with storage support.
apiVersion: al-assad.github.io/v1beta1
kind: DorisMonitor
metadata:
name: basic-monitor
spec:
# The doris cluster name to be monitored
cluster: basic
prometheus:
image: prom/prometheus:v2.37.8
# The retention time of the prometheus data in the storage
retentionTime: 15d
# The storage size of prometheus persistent data at pvc.
# It is recommended to be greater than 50Gi in the production env.
requests:
storage: 5Gi
grafana:
image: grafana/grafana:9.5.2
# The default admin user and password of grafana (optional)
adminUser: admin
adminPassword: admin
# The storage size of grafana persistent data at pvc.
# It is recommended to be 10Gi in the production env.
requests:
storage: 1Gi
loki:
image: grafana/loki:2.9.1
# The retention time of the loki data in the storage
retentionTime: 15d
# The storage size of loki persistent data at pvc.
# It is recommended to be greater than 50Gi in the production env.
requests:
storage: 5Gi
promtail:
image: grafana/promtail:2.9.1
A advanced DorisMonitor CR sample
apiVersion: al-assad.github.io/v1beta1
kind: DorisMonitor
metadata:
name: basic-monitor
spec:
## The doris cluster name to be monitored
cluster: basic
## ImagePullPolicy of Doris monitor Pods
## Ref: https://kubernetes.io/docs/concepts/configuration/overview/#container-images
# imagePullPolicy: IfNotPresent
## Ref: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
# imagePullSecrets:
# - name: secretName
## The storageClassName of the persistent volume for prometheus/grafana/loki data storage.
# Kubernetes default storage class is used if not setting this field.
# storageClassName: ""
## Specifies the service account for prometheus/grafana/loki/promtail components.
# serviceAccount: ""
## Whether to disable Loki for log collection
# disableLoki: false
## NodeSelector of pods。
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
# nodeSelector:
# node-role.kubernetes.io/doris-monitor: true
###########################
# Prometheus Configuration #
###########################
prometheus:
## Image of the prometheus
image: prom/prometheus:v2.37.8
## The retention time of the prometheus data in the storage
## When this field is not set, all data from Prometheus will be retained.
# retentionTime: 15d
## The resource requirements
## Ref: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
requests:
# cpu: 500m
# memory: 500Mi
## The storage size of prometheus,
# it is recommended to be greater than 50Gi in the production env.
storage: 5Gi
## Describes the resource limit
# limits:
# cpu: 4
# memory: 8Gi
## Defines Kubernetes service for prometheus-service
# service:
# type: NodePort
# httpPort: 0
## NodeSelector of pods。
# nodeSelector: {}
########################
# Grafana Configuration #
########################
grafana:
## Image of the grafana
image: grafana/grafana:9.5.2
## The default admin user and password of grafana (optional)
# adminUser: admin
# adminPassword: admin
## Describes the resource requirements
## Ref: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
requests:
# cpu: 250m
# memory: 500Mi
## It is recommended to be 10Gi in the production env.
storage: 1Gi
## Describes the resource limit
# limits:
# cpu: 4
# memory: 8Gi
## The storageClassName of the persistent volume for grafana data storage.
# storageClassName: ""
## Defines Kubernetes service for grafana-service
# service:
# type: NodePort
# httpPort: 0
## NodeSelector of pods。
# nodeSelector: {}
#####################
# Loki Configuration #
#####################
loki:
## Image of the loki
image: grafana/loki:2.9.1
## The retention time of the loki data in the storage
## When this field is not set, all data from Loki will be retained.
retentionTime: 15d
## Describes the resource requirements
## Ref: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
requests:
# cpu: 500m
# memory: 500Mi
## It is recommended to be greater than 50Gi in the production env.
storage: 5Gi
## Describes the resource limit
# limits:
# cpu: 4
# memory: 8Gi
## The storageClassName of the persistent volume for grafana data storage.
# storageClassName: ""
## NodeSelector of pods。
# nodeSelector: {}
#########################
# Promtail Configuration #
#########################
promtail:
## Image of the promtail
image: grafana/promtail:2.9.1
## The resource requirements
# requests:
# cpu: 250m
# memory: 256Mi
# limits:
# cpu: 4
# memory: 8Gi
${cluster_name}
directory and save it
as ${cluster_name}/doris-monitor.yaml
.Storage
spec.storageClassName
defines the storage type of the monitoring components. Refer to
the storage configuration document.
spec:
# ...
storageClassName: ${storageClassName}
spec.<prometheus/grafana/loki>.requests.storage
defines the persistent storage size for Prometheus, Loki, and Grafana.
Please choose an appropriate size based on your data retention time. Below are recommended sizes for production
environments:
- prometheus: 50Gi or more
- loki: 50Gi or more
- grafana: 5Gi
spec:
# ...
prometheus:
requests:
storage: 50Gi
grafana:
requests:
storage: 5Gi
loki:
requests:
storage: 50Gi
Data Retention Time
You can configure the data retention time for Prometheus and Loki components
using spec.<prometheus/loki>.retentionTime
. When this value is not set, data in Prometheus and Loki will be retained
permanently on the respective bound PVCs.
The following example sets the data retention time for Prometheus and Loki to 15 days:
spec:
# ...
prometheus:
retentionTime: 15d
loki:
retentionTime: 15d
Deploy the DorisMonitor
kubectl apply -f ${cluster_name}/doris-monitor.yaml --namespace=${namespace}
View the status of the monitor components:
kubectl get dorismonitor ${dorismonitor_name} -n ${namespace} -o yaml
Access the DorisMonitor
Access Grafana Dashboard
You can access the Grafana monitoring dashboard using kubectl port-forward
:
kubectl port-forward -n ${namespace} svc/${dorismonitor_name}-grafana 3000:3000
Then open http://localhost:3000 in your browser. The default username and password are
both admin
.
You can also set spec.grafana.service.type
to NodePort
to access the monitoring dashboard through NodePort
.
Access Prometheus Monitoring Data
For cases where direct access to monitoring data is needed, you can access Prometheus using kubectl port-forward
:
kubectl port-forward -n ${namespace} svc/${dorismonitor_name}-prometheus 9090:9090
Then open http://localhost:9090 in your browser or access this address through a client tool.
You can also set spec.prometheus.service.type
to NodePort
to access the monitoring data through NodePort
.