Automatic Scaling Doris Cluster

This document describes how to enable automatic horizontal scaling for CN (Compute Node) in a Doris cluster on Kubernetes, which based on the actual workload.

Prerequisites

Automatic scaling with the Doris Operator requires Kubernetes version 1.22+ and the installation of Metric Server on the Kubernetes cluster.

Configuring DorisAutoscaler

You can configure the automatic scaling behavior of the Doris cluster by configuring the DorisInitializer custom resource (CR).

A DorisAutoscaler CR sample

doris-autoscaler.yaml

apiVersion: al-assad.github.io/v1beta1
kind: DorisAutoscaler
metadata:
  name: basic-autoscale
spec:
  # The doris cluster name to be scaled
  cluster: basic

  # Whether to disable the behavior of scaling down.
  # disableScaleDown: false

  # The period of time in seconds for each scaling operation.
  # scalePeriodSeconds:
  #   scaleUp: 60
  #   scaleDown: 60

  cn:
    # The maximum and minimum CN replicas of automatic scaling.
    replicas:
      min: 1
      max: 5
    # Metrics rules for scaling
    rules:
      # Use CPU metrics as scaling rules (optional)
      # The maximum and minimum CPU utilization of CN, the value is a percentage, such as 80 represents 80%.
      #
      # When the average overall cpu usage of a CN cluster is greater than the max value for a period of time,
      # one replica would be automatically added until the next round of computation is below this max value.
      # When the average overall cpu usage of a CN cluster is less than the min value for a period of time,
      # one replica would be automatically removed until the next round of computation is above this min value.
      cpu:
        max: 90
        min: 20
      # Use Memory metrics as scaling rules (optional)
      # The maximum and minimum CPU utilization of CN, the value is a percentage, such as 80 represents 80%.
      memory:
        max: 80
        min: 20


Note
It’s recommended to organize the Doris cluster configuration under the ${cluster_name} directory and save it as ${cluster_name}/doris-autoscaler.yaml.

Replica Limit

spec.cn.replicas defines the maximum and minimum replica limits for CN’s automatic scaling. In the following example, the maximum number of replicas for CN scaling is limited to 5, and the minimum for scaling ins is 1.

spec:
  cn:
    # ...
    replicas:
      min: 1
      max: 5

Scaling Rules

spec.cn.rules defines the scaling rules for CN based on CPU and memory metrics.

spec:
  cn:
    # ...
    rules:
      cpu:
        max: 90
        min: 20
        memory:
          max: 80
          min: 20

In the above example, max and min values are in percentages, e.g., 90 represents 90%. The DorisAutoscaler dynamically scales in or out based on the overall CPU and memory utilization of CN nodes. Taking CPU as an example:

  • When the overall average CPU usage of the CN cluster continuously exceeds cpu.max for a period, it automatically adds a replica until the next assessment does not exceed cpu.max.
  • When the overall average CPU usage of the CN cluster falls below cpu.min for a period, it automatically removes a replica until the next assessment shows a CPU usage above cpu.min.

Apply DorisAutoscaler

kubectl apply -f ${cluster_name}/doris-autoscale.yaml --namespace=${namespace}

View the status of the DorisAutoscaler:

kubectl get dorisautoscaler ${dorisautoscaler_name} -n ${namespace} -o yaml

Delete DorisAutoScaler

kubectl delete -f ${cluster_name}/doris-autoscale.yaml --namespace=${namespace}