Version: 2.16.0

Advanced Configuration for the Smart Scaler Agent

This topic describes the advanced configuration for the Smart Scaler agent.

Configure an Existing Secret

To securely access the clientSecret, store it securely, and reference it as existingSecret in the ss-agents-values.yaml file, as shown below:

caution

To configure existingSecret, you must remove clientSecret parameter from the ss-agents-values.yaml file.

existingSecret: "<secret name>" # If provided, The agent will use this existing secret instead of creating a new one

Supported Format for an Existing Secret

Ensure that the existingSecret is stored in the following format:

apiVersion: v1
kind: Secret
metadata:
  name: secret-name
  namespace: <namespace>
type: Opaque
data:
  # Client ID is plain text, therefore it must be base64 encoded
  clientId: ""
  # Client Secret is already base64 encoded
  clientSecret: ""
  # Cluster Display Name is plain text, therefore it must be base64 encoded
  clusterDisplayName: ""

Configure an External Secrets Operator

To synchronize the clientSecret from external secret management systems such as HashiCorp Vault into Kubernetes, configure the External Secrets Operator in the ss-agents-values.yaml file. The following code snippet shows the External Secrets Operator object that you must configure in the ss-agent-values.yaml file.

caution

To configure the External Secrets Operator, you must remove clientSecret from the ss-agents-values.yaml file.

externalSecret: # Configuration for external secrets
    enabled: false # Set to true to use external secrets
    refreshInterval: "1h" # How often to refresh the secret
    secretStoreRef:
      name: "" # Name of the SecretStore/ClusterSecretStore
      kind: "SecretStore" # Kind of the secret store (SecretStore or ClusterSecretStore)
    # Client ID configuration
    clientIdRemoteRefKey: "kv/clientid" # The name or identifier of the client ID in the external secret store
    clientIdRemoteRefProperty: "clientId" # The property/key within the secret to fetch for client ID
    # Client Secret configuration
    clientSecretRemoteRefKey: "kv/clientsecret" # The name or identifier of the client secret in the external secret store
    clientSecretRemoteRefProperty: "clientSecret" # The property/key within the secret to fetch for client secret

Manage Smart Scaler Agent Deployments through Terraform

You can manage Smart Scaler deployments through Terraform, as shown in the following example.

provider "helm" {
  kubernetes {
    config_path = "~/.kube/config"
  }
}

resource "helm_release" "smart_scaler_agent" {
  name       = "smart-scaler-agent"
  namespace  = "smart-scaler" # Change if needed
  repository = "https://smartscaler.nexus.aveshalabs.io/repository/smartscaler-helm-ent-prod/" # The actual Helm repo URL; replace it if required
  chart      = "smart-scaler-agent"              # Replace with the actual chart name
  version    = "1.2.3"                           # Replace with the actual version if needed

  values = [
    yamlencode({
      agentConfiguration = {
        host                     = "https://gateway.saas1.smart-scaler.io"
        clusterDisplayName       = "ss-agent-values"
        clientID                 = "tenant-apollo"
        clientSecret             = "<client secret>" # Replace or use var.client_secret
        smartScalerAgentMonitorNs = ".*"
        smartScalerAgentExcludeNs = "istio-system,kube-node-lease,kube-public,kube-system"
        deploymentChunk          = "300"
        namespaceAnnotationKey   = ""
      }
    })
  ]
}

Apply Smart Scaling to Specific Services in Target Namespaces

Apply Smart Scaling to specific services at the namespace level in the ss-agents-values.yaml file. To exclude a namespace, add it to the hpaAutoApplyExcludeNamespaceslist. Smart Scaler skips that namespace during the process. The following example shows an example configuration.

warning

For event scaling with Smart Scaler agent versions 2.9.28 or earlier, each application deployment in a namespace must have its own configured HPA. Without an individual HPA, event scaling fails.

However, starting with Smart Scaler agent versions 2.9.29 and later, event scaling no longer requires an individual HPA for each application deployment.

eventAutoscaler:
  autoscalerProperties:
    hpaAutoApply:
      enabled: true
      syncInterval: 3m
      hpaStabilizationDownWindowSeconds: 30
      recommendationTriggerType:
        cpu: true
        rl: true
      hpaAutoApplyExcludeNamespaces:
        - kube-system
        - default
        - istio-system
        - bookinfo2

Ensure Beyla DaemonSets Run on All Nodes

To set the taints to run on all nodes, add the below configuration in the ss-agent-values.yaml file with the tolerations:

beyla:
  tolerations:
    - effect: NoSchedule
      key: [KEY_NAME]
      operator: Exists

To ensure that the eBPF Beyla daemonset has the right resources for the production traffic, you can configure requests and limits of the resources (CPU and Memory).

The following is an example yaml:

beyla:
  resources:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      cpu: 200m
      memory: 350Mi

The following are the steps to ensure that Beyla DaemonSets are scheduled on all nodes across all node pools in the Kubernetes cluster, including nodes with taints. This is achieved by extracting the node taints and applying the necessary tolerations in the Helm chart values file.

Step 1: Identify the Node Taints

To determine which taints exist on the nodes, run the following command:

kubectl get nodes -o json | jq -r '
  .items[] |
  select(.spec.taints) |
  .metadata.name as $node |
  .spec.taints[] |
  "Node: \($node)\nTaint: key=\(.key), value=\(.value // \"None\"), effect=\(.effect)\nToleration:\n- key: \(.key)\n  operator: \"Equal\"\n  value: \"\(.value // \"None\")\"\n  effect: \"\(.effect)\"\n"'

Example Output:

Node: 10.0.69.54
Taint: key=custom.taint.key1, value=None, effect=NoSchedule
Toleration:
- key: custom.taint.key1
  operator: "Equal"
  value: "None"
  effect: "NoSchedule"

Node: 10.0.69.54
Taint: key=custom.taint.key2, value=custom-value, effect=NoSchedule
Toleration:
- key: custom.taint.key2
  operator: "Equal"
  value: "custom-value"
  effect: "NoSchedule"

Node: 10.0.69.55
Taint: key=custom.taint.key3, effect=NoExecute
Toleration:
- key: custom.taint.key3
  operator: "Exists"
  effect: "NoExecute"

Step 2: Apply the Tolerations in the Helm Chart

After you identify the required tolerations, add them to the Helm chart values file for the Beyla agent. Modify the values.yaml file

tolerations:
  - key: custom.taint.key1
    operator: "Equal"
    value: "None"
    effect: "NoSchedule"
  - key: custom.taint.key2
    operator: "Equal"
    value: "custom-value"
    effect: "NoSchedule"
  - key: custom.taint.key3
    operator: "Exists"
    effect: "NoExecute"

beyla:
  tolerations:
    - key: custom.taint.key1
      operator: "Equal"
      value: "None"
      effect: "NoSchedule"
    - key: custom.taint.key2
      operator: "Equal"
      value: "custom-value"
      effect: "NoSchedule"
    - key: custom.taint.key3
      operator: "Exists"
      effect: "NoExecute"

Step 3: Apply the Values File

Apply the changes by upgrading the Helm release. Use the following command to apply the values file:

helm install smartscaler smart-scaler/smartscaler-agent -f ss-agent-values.yaml -n smart-scaler

This ensures that Beyla DaemonSet can tolerate the identified taints and is scheduled accordingly across all node pools in the cluster.

Auto Smart Sizing of Pods in a Kubernetes Cluster

Starting with the Smart Scaler agent version 2.9.13, you can configure Smart Scaler to automatically apply in-cluster Smart Sizing recommendations. This feature is useful for automatically resizing the deployments in lower environments.

info

You can apply Smart Sizing by monitoring recommendations on the Smart Scaler Console's Smart Sizing page. If you want Avesha to apply Smart Sizing based on the recommendations, then you can enable Auto Smart Sizing.

caution

Applying Auto Smart Sizing to production needs to be carefully planned, including integrating it into the production CI/CD process with manual approval.

The Smart Scaler Agent running in an application cluster requests Smart Sizing recommendations from the Smart Scaler SaaS cloud and applies them to the cluster, one deployment at a time. When a deployment is smart sized, the Kubernetes default behavior is to ensure that at least 75% of the required pods are running (25% maximum unavailable). This allows for a rolling update that ensures service availability without adding stress on the node pool of the cluster.

Prerequisites

The Smart Scaler Agent requires privileges to patch deployments across all application namespaces within the cluster.
Recommendations should be available for Auto Smart Sizing to be applied to application namespaces.

Enable Auto Smart Sizing

Smart Scaler is deployed into a Kubernetes cluster with a Helm values YAML file. To enable auto-Smart Sizing of a cluster, add the following properties to the ss-agent-values.yaml file:

eventAutoscaler:
  autoscalerProperties:
    rightSizing:
      enabled: true
#      syncInterval: 24h
#      deploymentReadyTimeout: 10m

To enable Smart Sizing, set enabled: true. The commented items in the above YAML snippet are default values that can be adjusted based on your environment. The following two properties must be used with caution:

syncInterval: The time interval between consecutive Smart Sizing configuration applications.

warning
Do not apply the configuration more frequently than once every two hours. This means you must allow a gap of at least two hours between each auto Smart Sizing application.
deploymentReadyTimeout - The maximum time to wait for the entire deployment to be Smart Sized. If this timeout exceeds, some pods in the deployment may not resize. However, the service continues to function as before.

note

If you need to Smart Size immediately, restart the agent-controller pod.

Configure APM Tools for Metrics Collection

Smart Scaler supports any application performance management (APM) tool from which its agents can retrieve metrics data. The following configuration shows Prometheus and Redis configured in the ss-agent-values.yaml file.

namedDataSources:
  - name: "prometheus" # This is the same name that you used in the agent configuration
    datasourceType: "prometheus" # This is the type of the datasource
    url: "http://prometheus-server.monitoring" # URL of the datasource
    credentials:
      username: "" # For prometheus
      password: "" # For prometheus
  - name: "ebpf" # This is the same name that you used in the agent configuration
    datasourceType: "ebpf" # This is the type of the datasource

inferenceAgent:
  inferenceAgentConfig:
    metric_interval: 60
    push_interval: 40
    app:
      - metric_labels:
          app: "boutique-ebpf"
          app_version: "1.0"
          customer: tenant-apollo
          ss_agent_name: ebpf-prom-test
        default_fallback: 3
        use_collector_queries: false
        use_jobs: false
        clusters:
          - name: "ebpf-prom-test"
            namespaces:
              - name: "boutique"
                deployments:
                  - name: adservice
                    fallback: 4
                    data_source:
                  - name: cartservice
                    fallback: 4
                    data_source:
                  - name: checkoutservice
                    fallback: 4
                    data_source:
                  - name: currencyservice
                    fallback: 4
                    data_source:
                  - name: emailservice
                    fallback: 4
                    data_source:
                  - name: frontend
                    fallback: 4
                    data_source:
                  - name: loadgenerator
                    fallback: 4
                    data_source:
                  - name: paymentservice
                    fallback: 4
                    data_source:
                  - name: productcatalogservice
                    fallback: 4
                    data_source:
                  - name: recommendationservice
                    fallback: 4
                    data_source:
                  - name: redis-cart
                    fallback: 4
                    data_source:
                  - name: shippingservice
                    fallback: 4
                    data_source:
        metrics:
          - name: istio_requests_total_rate
            description: forwarded rps data from ebpf requests
            namespace: boutique
            deployment: (adservice|cartservice|checkoutservice|currencyservice|emailservice|frontend|loadgenerator|paymentservice|productcatalogservice|recommendationservice|redis-cart|shippingservice)
            additional_labels:
              kube_cluster_name: ebpf-prom-test
            data_source: ebpf
          - name: istio_request_duration_milliseconds_bucket_rate
            description: forwarded latency data from ebpf requests
            namespace: boutique
            deployment: (adservice|cartservice|checkoutservice|currencyservice|emailservice|frontend|loadgenerator|paymentservice|productcatalogservice|recommendationservice|redis-cart|shippingservice)
            additional_labels:
              kube_cluster_name: ebpf-prom-test
            data_source: ebpf
      - metric_labels:
          app: "boutique-prometheus"
          app_version: "1.0"
          customer: tenant-apollo
          ss_agent_name: ebpf-prom-test
        default_fallback: 3
        use_collector_queries: false
        use_jobs: false
        clusters:
          - name: "ebpf-prom-test"
            namespaces:
              - name: "boutique"
                deployments:
                  - name: adservice
                    fallback: 4
                    data_source:
                  - name: cartservice
                    fallback: 4
                    data_source:
                  - name: checkoutservice
                    fallback: 4
                    data_source:
                  - name: currencyservice
                    fallback: 4
                    data_source:
                  - name: emailservice
                    fallback: 4
                    data_source:
                  - name: frontend
                    fallback: 4
                    data_source:
                  - name: loadgenerator
                    fallback: 4
                    data_source:
                  - name: paymentservice
                    fallback: 4
                    data_source:
                  - name: productcatalogservice
                    fallback: 4
                    data_source:
                  - name: recommendationservice
                    fallback: 4
                    data_source:
                  - name: redis-cart
                    fallback: 4
                    data_source:
                  - name: shippingservice
                    fallback: 4
                    data_source:
        metrics:
          - name: istio_requests_total_rate
            description: forwarded rps data from istio requests
            query: sum(rate(label_replace(istio_requests_total{namespace=~'boutique', destination_workload=~'.*(adservice|cartservice|checkoutservice|currencyservice|emailservice|frontend|loadgenerator|paymentservice|productcatalogservice|recommendationservice|redis-cart|shippingservice).*'},'kube_namespace', '$1', 'namespace', '(.*)')[2m:])) by (destination_service_name,response_code,destination_workload,source_workload,reporter,kube_namespace)
            additional_labels:
              kube_cluster_name: ebpf-prom-test
            data_source: prometheus
          - name: istio_request_duration_milliseconds_bucket_rate
            description: forwarded latency data from istio requests
            query: sum(irate(label_replace(istio_request_duration_milliseconds_bucket{namespace=~'boutique',reporter=~'destination', destination_workload=~'.*(adservice|cartservice|checkoutservice|currencyservice|emailservice|frontend|loadgenerator|paymentservice|productcatalogservice|recommendationservice|redis-cart|shippingservice).*'},'kube_namespace', '$1', 'namespace', '(.*)')[2m:])) by (le, response_code, destination_service_name, destination_workload, source_workload, reporter, kube_namespace)
            additional_labels:
              kube_cluster_name: ebpf-prom-test
            data_source: prometheus
          - name: istio_average_latency
            description: forwarded average latency data from istio requests
            query: ((sum(irate(label_replace(istio_request_duration_milliseconds_sum{namespace=~'boutique',reporter=~'destination', destination_workload=~'.*(adservice|cartservice|checkoutservice|currencyservice|emailservice|frontend|loadgenerator|paymentservice|productcatalogservice|recommendationservice|redis-cart|shippingservice).*'},'kube_namespace', '$1', 'namespace', '(.*)')[2m:])) by ( destination_service_name, destination_workload, source_workload, kube_namespace))/(sum(irate(label_replace(istio_request_duration_milliseconds_count{namespace=~'boutique',reporter=~'destination', destination_workload=~'.*(adservice|cartservice|checkoutservice|currencyservice|emailservice|frontend|loadgenerator|paymentservice|productcatalogservice|recommendationservice|redis-cart|shippingservice).*'},'kube_namespace', '$1', 'namespace', '(.*)')[2m:])) by ( destination_service_name, destination_workload, source_workload, kube_namespace)))>=0
            additional_labels:
              kube_cluster_name: ebpf-prom-test
            data_source: prometheus
    app_datasource:
      prometheus:
        url: http://prometheus-server.monitoring
        name: prometheus
      generic_prom_client:
        url: "http://127.0.0.1:9000"
        name: "ebpf"
      grafana_beyla:
        namespace: smart-scaler
        ds-name: smartscaler-beyla
        port: 8999

Configure Custom Metrics for Data Sources

Configure custom metrics more easily in the ss-agent-values.yaml file. Set customConfigurationEnabled: true under agentConfiguration, and add the custom metrics objects as shown in the following YAML file.

info

This configuration is supported only with Smart Scaler Agent chart version 2.9.34 or later.

agentConfiguration:
    # DO NOT CHANGE THIS
    host: 
    # DO NOT CHANGE THIS
    clusterDisplayName: smart-scaler-agent
    # DO NOT CHANGE THIS
    clientID: 
    # DO NOT CHANGE THIS
    clientSecret: 
    # IMPORTANT: Add the namespaces to monitor, separated by commas or .* to monitor all namespaces.
    smartScalerAgentMonitorNs: .*
    # IMPORTANT: Add the namespaces to exclude, separated by commas. Exclude namespace will take higher priority over monitor namespace.
    smartScalerAgentExcludeNs: istio-system,kube-node-lease,kube-public,kube-system
    # IMPORTANT: Number of deployments in each app
    deploymentChunk: '300'
    # IMPORTANT: Annotation key at namespace level to identify the application
    namespaceAnnotationKey: ''
    customConfigurationEnabled: true ## Required only for custom metrics for data sources
    
customConfigurationSpec:
  custom_metrics: true # Boolean value. Must be set to `true` for custom metrics. 
  custom_metrics_spec: 
    app_name: "custom-metrics-app" # The application name for which you want to configure custom metrics 
    data_sources:
      - name: "prometheus" # data source; currently we only support Prometheus. 
        custom_metrics: # Metrics information 
          - name: "smartscaler_custom_total_cpu_usage" # name of the metric 
            description: "Total CPU seconds across all nodes in the cluster" # descritpion about the metric 
            query: 'sum(node_cpu_seconds_total)' # custom metric query 
  
# Data Source metrics configuration
namedDataSources: 
  - name: "prometheus" # Name of the data source
    datasourceType: "prometheus" # Type of the data source 
    url: "" # Endpoint of the data source to which your application is configured. 

Configure an Existing Secret​

Supported Format for an Existing Secret​

Configure an External Secrets Operator​

Manage Smart Scaler Agent Deployments through Terraform​

Apply Smart Scaling to Specific Services in Target Namespaces​

Ensure Beyla DaemonSets Run on All Nodes​

Step 1: Identify the Node Taints​

Step 2: Apply the Tolerations in the Helm Chart​

Step 3: Apply the Values File​

Auto Smart Sizing of Pods in a Kubernetes Cluster​

Prerequisites​

Enable Auto Smart Sizing​

Configure APM Tools for Metrics Collection​

Configure Custom Metrics for Data Sources​

Configure an Existing Secret

Supported Format for an Existing Secret

Configure an External Secrets Operator

Manage Smart Scaler Agent Deployments through Terraform

Apply Smart Scaling to Specific Services in Target Namespaces

Ensure Beyla DaemonSets Run on All Nodes

Step 1: Identify the Node Taints

Step 2: Apply the Tolerations in the Helm Chart

Step 3: Apply the Values File

Auto Smart Sizing of Pods in a Kubernetes Cluster

Prerequisites

Enable Auto Smart Sizing

Configure APM Tools for Metrics Collection

Configure Custom Metrics for Data Sources