Frequently Asked Questions
This topic provides help with the most common questions about Smart Scaler.
Installation and Configuration
The questions are based on installation and configuration of Smart Scaler.
How do I check the log location?
Check the logs using the following command:
kubectl logs <pod name> -n <namespace>
Where the pod name
is the name of the pod.
For example, inference-agent-r9wbf
is the pod name and customer-1
is the namespace. For an
Inference Agent pod, look for the inference-agent prefix
.
Example
kubectl logs -f inference-agent-r9wbf -n customer-1
Expected Output
{"level":"info","ts":1688372847.6437705,"msg":"Starting Smart Scaler Inference Agent"}
{"level":"info","ts":1688372847.643794,"msg":"reading configuration"}
{"level":"info","ts":1688372847.6438043,"msg":"Unable to find Scrapper configuration file path in env.SMART_SCALER_CONFIGURATION_PATH. Using /app/configuration.yaml"}
{"level":"info","ts":1688372847.6441996,"msg":"Using startTime 2023-05-08 19:00:00 +0000 UTC and end time 2023-05-09 19:00:00 +0000 UTC to set duration 24.000000"}
{"level":"info","ts":1688372847.6442132,"msg":"bootstrapping saas apis service"}
{"level":"info","ts":1688372847.6442165,"msg":"exchanging credentials for access token"}
How do I validate the communication between the Smart Scaler Agent and the Smart Scaler cloud?
Communication is established when the job starts running. Check the logs using the following command:
kubectl logs <pod name> -n <namespace>
Where the pod name is the name of the pod.
In the log output:
If the communication is successful, the agent logs the statement below.
{"level":"info","ts":1687944475.416642,"msg":"Successfully connected to the Smart Scaler APIs"}
How do we validate if the Smart Scaler Agent sources data from the correct data source?
The Smart Scaler Agent logs a message for the data source it is using. Check the logs for the message
using datasource in <datasource-name>
using the following command:
kubectl logs <pod name> -n <namespace>
Where the pod name is the name of the pod.
Example
kubectl logs inference-agent-tl4d9 | egrep 'data source|datasource'
Expected Output
{"level":"info","ts":1689953376.0342093,"msg":"initializing datasource"}
{"level":"info","ts":1689953376.0342476,"msg":"init prometheus data source with name my-prometheus"}
{"level":"info","ts":1689953376.0342827,"msg":"init datadog data source with name us5-datadog"}
{"level":"info","ts":1689953376.0343823,"msg":"using datasource my-prometheus"}
How should I handle secrets/keys if I am using a vault?
Enter the secrets/keys into the vault/keystore. Save the path to the secret/key and use it
in the ss-agent-values.yaml
file.
You will need to use a third-party tool such as helm resolver to take the path and retrieve the secret/key from the vault and apply it to the cluster.
See the example below.
dataSources:
datadog:
existingSecret: ""
apiKey: path:avesha/data/infrastructure/smart-scaler/#datadog_api_key
appkey: path:avesha/data/infrastructure/smart-scaler/#datadog_app_key
How do I securely pass the Smart Scaler clientSecret
?
To securely pass the Smart Scaler clientSecret
, use one of the following methods:
- Configure to reference securely stored
clientSecret
an existing secret in thess-agents-values.yaml
file. For more information, see Configure an Existing Secret. - Configure
clientSecret
as an External Secret Operator in thess-agents-values.yaml
file. For more information, see Configure an External Secrets Operator.
Can I change the values under agentConfiguration section of the downloaded `ss-agent-values.yaml file?
It is important that you change the clusterDisplayName
field to match the cluster name you
defined in Datadog/Prometheus/New Relic or another metrics data source.
Other than that, do not change the values under agentConfiguration
as they are set
specifically for your setup. Changing these values could lead to errors in the
feature behavior of the Smart Scaler agent.
How can I change the clusterDisplayName in the ss-agent-values.yaml
?
If you change the clusterDisplayName
under the agentConfiguration
section of the ss-agent-values.yaml
file,
then:
- Run the following command:
helm upgrade --install smartscaler smart-scaler/smartscaler-agent -f <valuesfilename>.yaml -n smart-scaler --create-namespace --set configHelper.enabled=true
- Confirm that the cluster display name change is reflected in the management console's Deployed Agents page.
How do I get the values for smartscalerApi and smartscalerMetrics in the ss-agent-values.yaml
file?
The values for smartscalerApi
and smartscalerMetrics
will be provided to you through email
from the Smart Scaler support. Please wait to start the agent installation until you receive
this email before proceeding with the installation. If you haven't received it, please contact
Avesha support at support@avesha.io.
After installing the Smart Scaler Agent, why is it not showing up in the Smart Scaler management console?
Please make sure that the agentConfiguration
section in the values file is not changed
(apart from clusterDisplayName
), this will result in handshake failure between
the Smart Scaler agent and Avesha SaaS.
Carefully read and follow the instructions in the downloaded ss-agent-values.yaml
file.
After the Smart Scaler Agent is configured, why don't I see any data in the management console's Dashboard?
Check if the smartscalerApi
and smartscalerMetrics
values provided in the email is correctly added in the ss-agent-values.yaml
file.
How do I obtain the URL for the Prometheus metrics data source?
For Prometheus, it is the external IP of the prometheus-server service. If it is ClusterIP
service, use the DNS name of the service in this format: http://my-svc.my-namespace.svc.cluster-domain.example:port
.
For example, http://prometheus.istio-system.svc.cluster.local:9090
.
What are the recommendations to configure replica from the metrics server on a Kubernetes cluster with large number of nodes and services?
For 100 deployments in an application, it is recommended to configure one replica. For example, configure four replicas for monitoring an application containing 400 deployments.
For more information, see Kubernetes Metrics Server.
Does Smart Scaler use customer data to train Avesha models? What specific data (if any) leaves customer site, and what is it used for?
No. Avesha's models are pre trained. We do not use customer data to train the models. We only use application and Kubernetes metrics data to train the models. For example, we use data such as application RPS, response latency, number of pods, and CPU and memory usage metrics.
We don't need/collect/use or transmit customer/user/account details.
The model uses metrics data to understand application behavior in the cluster under user traffic, detect anomalies, correct configurations, and proactively autoscale applications.
Cluster Autoscaler
How does Smart Scaler work with a Cluster Autoscaler?
Smart Scaler operates at the application/microservice layer of Kubernetes, scaling pods as required.
A Cluster Autoscaler manages node scaling based on the pod status. It adds nodes when there are pending pods and removes them when pods scale down. No changes are required for the Cluster Autoscaler.
Beyla Configuration
How can I ensure Beyla DaemonSets run on all nodes?
You can ensure Beyla DaemonSets run on all nodes by extracting the node taints and applying the necessary tolerations in the Helm chart values file. For more information, see Ensure Beyla DaemonSets run on all nodes.
Karpenter
How does Karpenter select OCI images?
Karpenter launches self-managed nodes.
OL7 and OL8 images are required for self-managed nodes. Karpenter deploys the images as described in the following steps:
-
Karpenter uses the first image found on the existing OKI node pools.
-
Override this behavior by setting the
IMAGE_OCID
environment variable to the required image. The image that you configure must beOL7
orOL8
with Kubernetes installed. -
Override the
ImageOCID
property in theOciNodeClass
definition as shown in the following example:apiVersion:
karpenter.multicloud.sh/v1alpha1
kind:
OciNodeClass
metadata:
name:
ocinodeclass1
spec:
#ImageOCID:
<image.ocid>
#BootVolumeSizeGB:
<size in GB>
#SubnetOCID:
<subnet OCID>
#NetworkSgOCID:
<nsgOCid1,nsgOCid2>
#PodsSubnetOCID:
<PODs subnet OCID> #For OCI VCN-Native Pod Networking
#PodsNetworkSgOCIDs:
<PODs nsgOCid1,PODs nsgOCid2> #For OCI VCN-Native Pod Networking
#SSHKeys:
"ssh-rsa ********" -
Updating
ImageOCID
in theOciNodeClass
definition marks all nodes as drifted and replaces them with new nodes.
How can I disable disruption on a NodePool?
On a NodePool definition, replace the following disruption
object parameters:
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
With the following disruption
object parameters:
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 1m
How to make sure a Pod will not be disrupted by Karpenter?
If you are using Karpenter and want to ensure a pod is not disrupted during scale-down or node
consolidation, set the karpenter.sh/do-not-disrupt: "true"
annotation on the pod. Add it
under the metadata.annotations
field in your Pod spec.
The following is an example YAML file:
apiVersion: apps/v1
kind: Deployment
spec:
template:
metadata:
annotations:
karpenter.sh/do-not-disrupt: "true"
This annotation tells Karpenter not to consider the pod for disruption during node termination, consolidation, or deprovisioning operations.
We receive additional discounts on E5 instance types, which could help with cost savings. Is there a way to prioritize E5 (or any specific instance type) when Karpenter scales up new nodes?
The following example values files represents the prices per CPU and per GB of memory per hour, taken from OCI Price List.
oci:
prices:
vm:
- name: "VM.Standard.E3.Flex"
ocpu: 0.025
mem: 0.0015
- name: "VM.Standard.E4.Flex"
ocpu: 0.025
mem: 0.0015
- name: "VM.Standard.E5.Flex"
ocpu: 0.03
mem: 0.002
- name: "VM.GPU.A10"
gpu: 2
- name: "VM.GPU2"
gpu: 1.275
- name: "VM.GPU3"
gpu: 2.95
If you assign the same pricing values to E3, E4, and E5 instance types, Karpenter will treat them as equally cost-effective and may provision any of them interchangeably. However, if E5 is configured with a lower price, Karpenter will prioritize it to optimize for cost savings.
Another way to control which VM shapes Karpenter uses is by setting the karpenter.oci.sh/instance-family
requirement in the
node pool definition:
- key: "karpenter.oci.sh/instance-family"
operator: In
values: ["E3","E4"]
This acts as a restrictive filter. For example:
- If you specify ["E3", "E4", "E5"] and the pricing for all is equal, Karpenter will likely favor E3 and E4 due to internal preferences or availability.
- If you set it to only ["E5"], then only E5 instances will be provisioned regardless of price, because you have restricted the selection explicitly.