Deploy Monitor for Doris Cluster

This document describes how to monitor a Doris clusters deployed through Doris Operator.

Configure the DorisMonitor

Doris Operator collects Doris cluster metrics through Prometheus and gathers Doris cluster logs using Loki, providing a unified visualization interface through Grafana.

When creating a new Doris cluster through Doris Operator, it’s possible to create and configure an independent monitoring system for each Doris cluster. This monitoring system operates within the same namespace as the Doris cluster and consists of four components: Prometheus, Grafana, Loki, and Promtail.

The DorisInitializer CR defines the configuration for Doris visualization components:

A basic DorisMonitor CR sample

doris-monitor.yaml

# IT IS NOT SUITABLE FOR PRODUCTION USE.
# This YAML describes a basic Doris monitor components with minimum resource requirements,
# which should be able to run in any Kubernetes cluster with storage support.

apiVersion: al-assad.github.io/v1beta1
kind: DorisMonitor
metadata:
  name: basic-monitor
spec:
  # The doris cluster name to be monitored
  cluster: basic

  prometheus:
    image: prom/prometheus:v2.37.8
    # The retention time of the prometheus data in the storage
    retentionTime: 15d
    # The storage size of prometheus persistent data at pvc.
    # It is recommended to be greater than 50Gi in the production env.
    requests:
      storage: 5Gi

  grafana:
    image: grafana/grafana:9.5.2
    # The default admin user and password of grafana (optional)
    adminUser: admin
    adminPassword: admin
    # The storage size of grafana persistent data at pvc.
    # It is recommended to be 10Gi in the production env.
    requests:
      storage: 1Gi

  loki:
    image: grafana/loki:2.9.1
    # The retention time of the loki data in the storage
    retentionTime: 15d
    # The storage size of loki persistent data at pvc.
    # It is recommended to be greater than 50Gi in the production env.
    requests:
      storage: 5Gi

  promtail:
    image: grafana/promtail:2.9.1
A advanced DorisMonitor CR sample

doris-monitor.yaml

apiVersion: al-assad.github.io/v1beta1
kind: DorisMonitor
metadata:
  name: basic-monitor
spec:
  ## The doris cluster name to be monitored
  cluster: basic

  ## ImagePullPolicy of Doris monitor Pods
  ## Ref: https://kubernetes.io/docs/concepts/configuration/overview/#container-images
  # imagePullPolicy: IfNotPresent

  ## Ref: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
  # imagePullSecrets:
  # - name: secretName

  ## The storageClassName of the persistent volume for prometheus/grafana/loki data storage.
  # Kubernetes default storage class is used if not setting this field.
  # storageClassName: ""

  ## Specifies the service account for prometheus/grafana/loki/promtail components.
  # serviceAccount: ""

  ## Whether to disable Loki for log collection
  # disableLoki: false

  ## NodeSelector of pods。
  ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
  # nodeSelector:
  #   node-role.kubernetes.io/doris-monitor: true

  ###########################
  # Prometheus Configuration #
  ###########################
  prometheus:
    ## Image of the prometheus
    image: prom/prometheus:v2.37.8

    ## The retention time of the prometheus data in the storage
    ## When this field is not set, all data from Prometheus will be retained.
    # retentionTime: 15d

    ## The resource requirements
    ## Ref: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
    requests:
      # cpu: 500m
      # memory: 500Mi
      ## The storage size of prometheus,
      # it is recommended to be greater than 50Gi in the production env.
      storage: 5Gi
    ##  Describes the resource limit
    # limits:
    #   cpu: 4
    #   memory: 8Gi

    ## Defines Kubernetes service for prometheus-service
    # service:
    #  type: NodePort
    #  httpPort: 0

    ## NodeSelector of pods。
    # nodeSelector: {}

  ########################
  # Grafana Configuration #
  ########################
  grafana:
    ## Image of the grafana
    image: grafana/grafana:9.5.2

    ## The default admin user and password of grafana (optional)
    # adminUser: admin
    # adminPassword: admin

    ## Describes the resource requirements
    ## Ref: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
    requests:
      # cpu: 250m
      # memory: 500Mi
      ## It is recommended to be 10Gi in the production env.
      storage: 1Gi
    ##  Describes the resource limit
    # limits:
    #   cpu: 4
    #   memory: 8Gi

    ## The storageClassName of the persistent volume for grafana data storage.
    # storageClassName: ""

    ## Defines Kubernetes service for grafana-service
    # service:
    #  type: NodePort
    #  httpPort: 0

    ## NodeSelector of pods。
    # nodeSelector: {}

  #####################
  # Loki Configuration #
  #####################
  loki:
    ## Image of the loki
    image: grafana/loki:2.9.1

    ## The retention time of the loki data in the storage
    ## When this field is not set, all data from Loki will be retained.
    retentionTime: 15d

    ## Describes the resource requirements
    ## Ref: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
    requests:
      # cpu: 500m
      # memory: 500Mi
      ## It is recommended to be greater than 50Gi in the production env.
      storage: 5Gi
    ##  Describes the resource limit
    # limits:
    #   cpu: 4
    #   memory: 8Gi

    ## The storageClassName of the persistent volume for grafana data storage.
    # storageClassName: ""

    ## NodeSelector of pods。
    # nodeSelector: {}

  #########################
  # Promtail Configuration #
  #########################
  promtail:
    ## Image of the promtail
    image: grafana/promtail:2.9.1
    ## The resource requirements
    # requests:
    #   cpu: 250m
    #   memory: 256Mi
    # limits:
    #   cpu: 4
    #   memory: 8Gi
Note
It is recommended to organize the Doris cluster’s configuration under the ${cluster_name} directory and save it as ${cluster_name}/doris-monitor.yaml.

Storage

spec.storageClassName defines the storage type of the monitoring components. Refer to the storage configuration document.

spec:
  # ...
  storageClassName: ${storageClassName}

spec.<prometheus/grafana/loki>.requests.storage defines the persistent storage size for Prometheus, Loki, and Grafana. Please choose an appropriate size based on your data retention time. Below are recommended sizes for production environments:

  • prometheus: 50Gi or more
  • loki: 50Gi or more
  • grafana: 5Gi
spec:
  # ...
  prometheus:
    requests:
      storage: 50Gi
  grafana:
    requests:
      storage: 5Gi
  loki:
    requests:
      storage: 50Gi

Data Retention Time

You can configure the data retention time for Prometheus and Loki components using spec.<prometheus/loki>.retentionTime. When this value is not set, data in Prometheus and Loki will be retained permanently on the respective bound PVCs.

The following example sets the data retention time for Prometheus and Loki to 15 days:

spec:
  # ...
  prometheus:
    retentionTime: 15d
  loki:
    retentionTime: 15d

Deploy the DorisMonitor

kubectl apply -f ${cluster_name}/doris-monitor.yaml --namespace=${namespace}

View the status of the monitor components:

kubectl get dorismonitor ${dorismonitor_name} -n ${namespace} -o yaml

Access the DorisMonitor

Access Grafana Dashboard

You can access the Grafana monitoring dashboard using kubectl port-forward:

kubectl port-forward -n ${namespace} svc/${dorismonitor_name}-grafana 3000:3000

Then open http://localhost:3000 in your browser. The default username and password are both admin.

You can also set spec.grafana.service.type to NodePort to access the monitoring dashboard through NodePort.

Access Prometheus Monitoring Data

For cases where direct access to monitoring data is needed, you can access Prometheus using kubectl port-forward:

kubectl port-forward -n ${namespace} svc/${dorismonitor_name}-prometheus 9090:9090 

Then open http://localhost:9090 in your browser or access this address through a client tool.

You can also set spec.prometheus.service.type to NodePort to access the monitoring data through NodePort.