Skip to main content
Version: 2.16.0

Advanced Configuration for the Smart Scaler Agent

This topic describes the advanced configuration for the Smart Scaler agent.

Configure an Existing Secret

To securely access the clientSecret, store it securely, and reference it as existingSecret in the ss-agents-values.yaml file, as shown below:

caution

To configure existingSecret, you must remove clientSecret parameter from the ss-agents-values.yaml file.

existingSecret: "<secret name>" # If provided, The agent will use this existing secret instead of creating a new one

Supported Format for an Existing Secret

Ensure that the existingSecret is stored in the following format:

apiVersion: v1
kind: Secret
metadata:
name: secret-name
namespace: <namespace>
type: Opaque
data:
# Client ID is plain text, therefore it must be base64 encoded
clientId: ""
# Client Secret is already base64 encoded
clientSecret: ""
# Cluster Display Name is plain text, therefore it must be base64 encoded
clusterDisplayName: ""

Configure an External Secrets Operator

To synchronize the clientSecret from external secret management systems such as HashiCorp Vault into Kubernetes, configure the External Secrets Operator in the ss-agents-values.yaml file. The following code snippet shows the External Secrets Operator object that you must configure in the ss-agent-values.yaml file.

caution

To configure the External Secrets Operator, you must remove clientSecret from the ss-agents-values.yaml file.

externalSecret: # Configuration for external secrets
enabled: false # Set to true to use external secrets
refreshInterval: "1h" # How often to refresh the secret
secretStoreRef:
name: "" # Name of the SecretStore/ClusterSecretStore
kind: "SecretStore" # Kind of the secret store (SecretStore or ClusterSecretStore)
# Client ID configuration
clientIdRemoteRefKey: "kv/clientid" # The name or identifier of the client ID in the external secret store
clientIdRemoteRefProperty: "clientId" # The property/key within the secret to fetch for client ID
# Client Secret configuration
clientSecretRemoteRefKey: "kv/clientsecret" # The name or identifier of the client secret in the external secret store
clientSecretRemoteRefProperty: "clientSecret" # The property/key within the secret to fetch for client secret

Manage Smart Scaler Agent Deployments through Terraform

You can manage Smart Scaler deployments through Terraform, as shown in the following example.

provider "helm" {
kubernetes {
config_path = "~/.kube/config"
}
}

resource "helm_release" "smart_scaler_agent" {
name = "smart-scaler-agent"
namespace = "smart-scaler" # Change if needed
repository = "https://smartscaler.nexus.aveshalabs.io/repository/smartscaler-helm-ent-prod/" # The actual Helm repo URL; replace it if required
chart = "smart-scaler-agent" # Replace with the actual chart name
version = "1.2.3" # Replace with the actual version if needed

values = [
yamlencode({
agentConfiguration = {
host = "https://gateway.saas1.smart-scaler.io"
clusterDisplayName = "ss-agent-values"
clientID = "tenant-apollo"
clientSecret = "<client secret>" # Replace or use var.client_secret
smartScalerAgentMonitorNs = ".*"
smartScalerAgentExcludeNs = "istio-system,kube-node-lease,kube-public,kube-system"
deploymentChunk = "300"
namespaceAnnotationKey = ""
}
})
]
}

Apply Smart Scaling to Specific Services in Target Namespaces

Apply Smart Scaling to specific services at the namespace level in the ss-agents-values.yaml file. To exclude a namespace, add it to the hpaAutoApplyExcludeNamespaceslist. Smart Scaler skips that namespace during the process. The following example shows an example configuration.

warning

For event scaling with Smart Scaler agent versions 2.9.28 or earlier, each application deployment in a namespace must have its own configured HPA. Without an individual HPA, event scaling fails.

However, starting with Smart Scaler agent versions 2.9.29 and later, event scaling no longer requires an individual HPA for each application deployment.

eventAutoscaler:
autoscalerProperties:
hpaAutoApply:
enabled: true
syncInterval: 3m
hpaStabilizationDownWindowSeconds: 30
recommendationTriggerType:
cpu: true
rl: true
hpaAutoApplyExcludeNamespaces:
- kube-system
- default
- istio-system
- bookinfo2

Ensure Beyla DaemonSets Run on All Nodes

To set the taints to run on all nodes, add the below configuration in the ss-agent-values.yaml file with the tolerations:

beyla:
tolerations:
- effect: NoSchedule
key: [KEY_NAME]
operator: Exists

To ensure that the eBPF Beyla daemonset has the right resources for the production traffic, you can configure requests and limits of the resources (CPU and Memory).

The following is an example yaml:

beyla:
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 200m
memory: 350Mi

The following are the steps to ensure that Beyla DaemonSets are scheduled on all nodes across all node pools in the Kubernetes cluster, including nodes with taints. This is achieved by extracting the node taints and applying the necessary tolerations in the Helm chart values file.

Step 1: Identify the Node Taints

To determine which taints exist on the nodes, run the following command:

kubectl get nodes -o json | jq -r '
.items[] |
select(.spec.taints) |
.metadata.name as $node |
.spec.taints[] |
"Node: \($node)\nTaint: key=\(.key), value=\(.value // \"None\"), effect=\(.effect)\nToleration:\n- key: \(.key)\n operator: \"Equal\"\n value: \"\(.value // \"None\")\"\n effect: \"\(.effect)\"\n"'

Example Output:

Node: 10.0.69.54
Taint: key=custom.taint.key1, value=None, effect=NoSchedule
Toleration:
- key: custom.taint.key1
operator: "Equal"
value: "None"
effect: "NoSchedule"

Node: 10.0.69.54
Taint: key=custom.taint.key2, value=custom-value, effect=NoSchedule
Toleration:
- key: custom.taint.key2
operator: "Equal"
value: "custom-value"
effect: "NoSchedule"

Node: 10.0.69.55
Taint: key=custom.taint.key3, effect=NoExecute
Toleration:
- key: custom.taint.key3
operator: "Exists"
effect: "NoExecute"

Step 2: Apply the Tolerations in the Helm Chart

After you identify the required tolerations, add them to the Helm chart values file for the Beyla agent. Modify the values.yaml file

tolerations:
- key: custom.taint.key1
operator: "Equal"
value: "None"
effect: "NoSchedule"
- key: custom.taint.key2
operator: "Equal"
value: "custom-value"
effect: "NoSchedule"
- key: custom.taint.key3
operator: "Exists"
effect: "NoExecute"

beyla:
tolerations:
- key: custom.taint.key1
operator: "Equal"
value: "None"
effect: "NoSchedule"
- key: custom.taint.key2
operator: "Equal"
value: "custom-value"
effect: "NoSchedule"
- key: custom.taint.key3
operator: "Exists"
effect: "NoExecute"

Step 3: Apply the Values File

Apply the changes by upgrading the Helm release. Use the following command to apply the values file:

helm install smartscaler smart-scaler/smartscaler-agent -f ss-agent-values.yaml -n smart-scaler

This ensures that Beyla DaemonSet can tolerate the identified taints and is scheduled accordingly across all node pools in the cluster.

Auto Smart Sizing of Pods in a Kubernetes Cluster

Starting with the Smart Scaler agent version 2.9.13, you can configure Smart Scaler to automatically apply in-cluster Smart Sizing recommendations. This feature is useful for automatically resizing the deployments in lower environments.

info

You can apply Smart Sizing by monitoring recommendations on the Smart Scaler Console's Smart Sizing page. If you want Avesha to apply Smart Sizing based on the recommendations, then you can enable Auto Smart Sizing.

caution

Applying Auto Smart Sizing to production needs to be carefully planned, including integrating it into the production CI/CD process with manual approval.

The Smart Scaler Agent running in an application cluster requests Smart Sizing recommendations from the Smart Scaler SaaS cloud and applies them to the cluster, one deployment at a time. When a deployment is smart sized, the Kubernetes default behavior is to ensure that at least 75% of the required pods are running (25% maximum unavailable). This allows for a rolling update that ensures service availability without adding stress on the node pool of the cluster.

Prerequisites

  • The Smart Scaler Agent requires privileges to patch deployments across all application namespaces within the cluster.
  • Recommendations should be available for Auto Smart Sizing to be applied to application namespaces.

Enable Auto Smart Sizing

Smart Scaler is deployed into a Kubernetes cluster with a Helm values YAML file. To enable auto-Smart Sizing of a cluster, add the following properties to the ss-agent-values.yaml file:

eventAutoscaler:
autoscalerProperties:
rightSizing:
enabled: true
# syncInterval: 24h
# deploymentReadyTimeout: 10m

To enable Smart Sizing, set enabled: true. The commented items in the above YAML snippet are default values that can be adjusted based on your environment. The following two properties must be used with caution:

  • syncInterval: The time interval between consecutive Smart Sizing configuration applications.

    warning

    Do not apply the configuration more frequently than once every two hours. This means you must allow a gap of at least two hours between each auto Smart Sizing application.

  • deploymentReadyTimeout - The maximum time to wait for the entire deployment to be Smart Sized. If this timeout exceeds, some pods in the deployment may not resize. However, the service continues to function as before.

note

If you need to Smart Size immediately, restart the agent-controller pod.

Configure APM Tools for Metrics Collection

Smart Scaler supports any application performance management (APM) tool from which its agents can retrieve metrics data. The following configuration shows Prometheus and Redis configured in the ss-agent-values.yaml file.

namedDataSources:
- name: "prometheus" # This is the same name that you used in the agent configuration
datasourceType: "prometheus" # This is the type of the datasource
url: "http://prometheus-server.monitoring" # URL of the datasource
credentials:
username: "" # For prometheus
password: "" # For prometheus
- name: "ebpf" # This is the same name that you used in the agent configuration
datasourceType: "ebpf" # This is the type of the datasource

inferenceAgent:
inferenceAgentConfig:
metric_interval: 60
push_interval: 40
app:
- metric_labels:
app: "boutique-ebpf"
app_version: "1.0"
customer: tenant-apollo
ss_agent_name: ebpf-prom-test
default_fallback: 3
use_collector_queries: false
use_jobs: false
clusters:
- name: "ebpf-prom-test"
namespaces:
- name: "boutique"
deployments:
- name: adservice
fallback: 4
data_source:
- name: cartservice
fallback: 4
data_source:
- name: checkoutservice
fallback: 4
data_source:
- name: currencyservice
fallback: 4
data_source:
- name: emailservice
fallback: 4
data_source:
- name: frontend
fallback: 4
data_source:
- name: loadgenerator
fallback: 4
data_source:
- name: paymentservice
fallback: 4
data_source:
- name: productcatalogservice
fallback: 4
data_source:
- name: recommendationservice
fallback: 4
data_source:
- name: redis-cart
fallback: 4
data_source:
- name: shippingservice
fallback: 4
data_source:
metrics:
- name: istio_requests_total_rate
description: forwarded rps data from ebpf requests
namespace: boutique
deployment: (adservice|cartservice|checkoutservice|currencyservice|emailservice|frontend|loadgenerator|paymentservice|productcatalogservice|recommendationservice|redis-cart|shippingservice)
additional_labels:
kube_cluster_name: ebpf-prom-test
data_source: ebpf
- name: istio_request_duration_milliseconds_bucket_rate
description: forwarded latency data from ebpf requests
namespace: boutique
deployment: (adservice|cartservice|checkoutservice|currencyservice|emailservice|frontend|loadgenerator|paymentservice|productcatalogservice|recommendationservice|redis-cart|shippingservice)
additional_labels:
kube_cluster_name: ebpf-prom-test
data_source: ebpf
- metric_labels:
app: "boutique-prometheus"
app_version: "1.0"
customer: tenant-apollo
ss_agent_name: ebpf-prom-test
default_fallback: 3
use_collector_queries: false
use_jobs: false
clusters:
- name: "ebpf-prom-test"
namespaces:
- name: "boutique"
deployments:
- name: adservice
fallback: 4
data_source:
- name: cartservice
fallback: 4
data_source:
- name: checkoutservice
fallback: 4
data_source:
- name: currencyservice
fallback: 4
data_source:
- name: emailservice
fallback: 4
data_source:
- name: frontend
fallback: 4
data_source:
- name: loadgenerator
fallback: 4
data_source:
- name: paymentservice
fallback: 4
data_source:
- name: productcatalogservice
fallback: 4
data_source:
- name: recommendationservice
fallback: 4
data_source:
- name: redis-cart
fallback: 4
data_source:
- name: shippingservice
fallback: 4
data_source:
metrics:
- name: istio_requests_total_rate
description: forwarded rps data from istio requests
query: sum(rate(label_replace(istio_requests_total{namespace=~'boutique', destination_workload=~'.*(adservice|cartservice|checkoutservice|currencyservice|emailservice|frontend|loadgenerator|paymentservice|productcatalogservice|recommendationservice|redis-cart|shippingservice).*'},'kube_namespace', '$1', 'namespace', '(.*)')[2m:])) by (destination_service_name,response_code,destination_workload,source_workload,reporter,kube_namespace)
additional_labels:
kube_cluster_name: ebpf-prom-test
data_source: prometheus
- name: istio_request_duration_milliseconds_bucket_rate
description: forwarded latency data from istio requests
query: sum(irate(label_replace(istio_request_duration_milliseconds_bucket{namespace=~'boutique',reporter=~'destination', destination_workload=~'.*(adservice|cartservice|checkoutservice|currencyservice|emailservice|frontend|loadgenerator|paymentservice|productcatalogservice|recommendationservice|redis-cart|shippingservice).*'},'kube_namespace', '$1', 'namespace', '(.*)')[2m:])) by (le, response_code, destination_service_name, destination_workload, source_workload, reporter, kube_namespace)
additional_labels:
kube_cluster_name: ebpf-prom-test
data_source: prometheus
- name: istio_average_latency
description: forwarded average latency data from istio requests
query: ((sum(irate(label_replace(istio_request_duration_milliseconds_sum{namespace=~'boutique',reporter=~'destination', destination_workload=~'.*(adservice|cartservice|checkoutservice|currencyservice|emailservice|frontend|loadgenerator|paymentservice|productcatalogservice|recommendationservice|redis-cart|shippingservice).*'},'kube_namespace', '$1', 'namespace', '(.*)')[2m:])) by ( destination_service_name, destination_workload, source_workload, kube_namespace))/(sum(irate(label_replace(istio_request_duration_milliseconds_count{namespace=~'boutique',reporter=~'destination', destination_workload=~'.*(adservice|cartservice|checkoutservice|currencyservice|emailservice|frontend|loadgenerator|paymentservice|productcatalogservice|recommendationservice|redis-cart|shippingservice).*'},'kube_namespace', '$1', 'namespace', '(.*)')[2m:])) by ( destination_service_name, destination_workload, source_workload, kube_namespace)))>=0
additional_labels:
kube_cluster_name: ebpf-prom-test
data_source: prometheus
app_datasource:
prometheus:
url: http://prometheus-server.monitoring
name: prometheus
generic_prom_client:
url: "http://127.0.0.1:9000"
name: "ebpf"
grafana_beyla:
namespace: smart-scaler
ds-name: smartscaler-beyla
port: 8999