Version: 2.13.0

Troubleshooting Guide

This topic describes the troubleshooting steps for known issues.

Installation Issues

If existing issues don't help, we encourage you to submit a support ticket.

The Inference Agent will go into a crash loop after installing the Smart Scaler Agent.

Introduction

If configuring the Smart Scaler is delayed after the installation, the Inference Agent will go into the crash loop. If there is a delay of 5 minutes between the installation and configuration of the Smart Scaler, the Inference Agent goes into a crash loop. For example, you see the following message:

Error: INSTALLATION FAILED: 1 error occurred: * admission webhook "vapplicationconfig.kb.io" denied the request: ApplicationConfig.agent.smart-scaler.io "llm-inference" is forbidden: Invalid License. Please contact Smart Scaler Support at support@avesha.io

Solution

To troubleshoot and identify the root cause of the discrepancy, follow these steps:

Log into the Smart Scaler UI to verify that your application and services appear on the dashboard.

You can verify if the pod, CPU, and requests appear correctly on the dashboard. If not (or if it is and you still are not seeing your app in the SaaS UI), you can check out the agent logs.
```
kubectl logs -n smart-scaler inference-agent-5b8f87c49f-hc77v
```

If the logout ends with the lines below, the Inference Agent communicates with the SaaS.

{"level":"info","ts":1722545554.283851,"msg":"Metrics sent successfully for app &{App:boutique AppVersion:1.0 Namespace: ClusterName:}"} {"level":"debug","ts":1722545563.1009502,"msg":"errorCount value = 0"}

If the agent communicates to SaaS but does not have the right information, check for the application configuration. Verify the logs for more details, you see periodic blocks.

{"level":"info","ts":1722545322.270564,"msg":"Namespace: demo"} {"level":"debug","ts":1722545322.2759717,"msg":"filtering deployments for namespace demo"} {"level":"debug","ts":1722545322.275996,"msg":"Filtered deployment list: [adservice cartservice checkoutservice emailservice frontend paymentservice productcatalogservice recommendationservice shippingservice]"} {"level":"info","ts":1722545322.6678247,"msg":"Metrics sent successfully for app &{App:boutique AppVersion:1.0 Namespace: ClusterName:}"} {"level":"debug","ts":1722545323.1008768,"msg":"errorCount value = 0"}

The above output shows the data is collected for the services specified in the configuration file. If the errors are missing for any service, check the configuration file and verify that the namedDataSources are correctly configured for the data source and the application namespaces and deployments are correctly identified.

Additional considerations

Configuration Timing: To avoid delays, complete the Smart Scaler configuration immediately after installation. You can plan your installation and configuration process to reduce downtime and delays.

Error Logging and Diagnostics: You can check the logs for the Inference Agent to gather more detailed information about the crash loop. This can provide insights into any additional issues or misconfigurations that may be causing the problem.

Smart Scaler Documentation: You can refer to the Smart Scaler documentation for specific instructions or prerequisites related to installation and configuration. There might be steps or settings required to prevent issues.

Support Contact: If you continue to face issues, gather all relevant details (error messages, logs, configuration settings) and contact Smart Scaler Support at support@avesha.io for further assistance.

The pods and CPU/Memory usage is seen on the dashboard with no other details.

Introduction

After the Inference Agent is configured, the pods and the CPU/Memory usage are seen with no other details.

Solution

After the agent installation, complete the agent configuration. To complete the configuration, perform the following actions:

Configuration

Action: Proceed to the configuration process to ensure that all required settings and parameters are applied. This step is crucial for the proper functioning of the Inference Agent.

Validate Metrics:

Action: Confirm that CPU/Memory usage and the number of pods are visible in the monitoring dashboard. This indicates that the Inference Agent is properly collecting and pushing metric data to the SaaS platform.

Initial Wait Time:

Action: Wait for at least 30 minutes after completing the configuration for metric data to appear. Some systems may take time to start reporting data.

Troubleshooting Steps if Data Is Not Visible:

Check Error Events: Action: Examine the error events in the Agent cluster to identify any issues or warnings that could explain why metric data is not being collected or sent.

Contact Support:

Action: If you do not see any metric data after the 30-minute wait period, contact Avesha support. Provide the details required to troubleshoot the issue, including any error messages and the steps you have already taken.

By following the steps above, you can ensure that the Inference Agent is properly configured and that any issues with metric data reporting are addressed.

Not able to see the application details on the SaaS management console.

Introduction

Not able to see the application details on the SaaS management console. No data on the management console is available for No of Pods, CPU Usage, and Requests Served (SLO).

Solution

First, make sure the Inference Agent is running successfully using the following command:

kubectl get pods -n smart-scaler

Example Output

NAME                                        READY   STATUS    RESTARTS   AGE
agent-controller-manager-56476b5676-jfhst   2/2     Running   0          86m
git-operations-5476c685c6-nc79m             1/1     Running   0          86m
inference-agent-5b8f87c49f-hc77v            1/1     Running   0          4m54s

If the agent is running and you do not see the application details on the management console, check the logs for the agent using the following command:

kubectl logs -n smart-scaler inference-agent-5b8f87c49f-hc77v

In the output, check if the log ends with the following log levels:

{"level":"info","ts":1722545554.283851,"msg":"Metrics sent successfully for app &{App:boutique AppVersion:1.0 Namespace: ClusterName:}"}   
{"level":"debug","ts":1722545563.1009502,"msg":"errorCount value = 0"}

If it ends with the above levels, your agent is communicating successfully with the Smart Scaler cloud. (If not, it is not communicating.) If it is communicating, but the Smart Scaler cloud does not have the right information, there is probably a problem with your application configuration. Look at the log in more detail. You should see the following periodic blocks.

{"level":"info","ts":1722545322.270564,"msg":"Namespace: demo"}   
{"level":"debug","ts":1722545322.2759717,"msg":"filtering deployments for namespace demo"}   
{"level":"debug","ts":1722545322.275996,"msg":"Filtered deployment list: [adservice cartservice checkoutservice emailservice frontend paymentservice productcatalogservice recommendationservice shippingservice]"}   
{"level":"info","ts":1722545322.6678247,"msg":"Metrics sent successfully for app &{App:boutique AppVersion:1.0 Namespace: ClusterName:}"}   
{"level":"debug","ts":1722545323.1008768,"msg":"errorCount value = 0"}   

This shows successful data collection for the services specified by your configuration file. If you are missing services you expect to see errors trying to collect them. Look at your configuration file and verify that the namedDataSources are correctly configured for where to collect the data and that the application namespaces and deployments are correctly identified.

Additional Considerations

Support Contact: If you continue to face issues, gather all relevant details (error messages, logs, and configuration settings) and contact Smart Scaler Support at support@avesha.io for further assistance.

Installation Issues​

Introduction​

Solution​

Additional considerations​

Introduction​

Solution​

Introduction​

Solution​

Additional Considerations​

Installation Issues

Introduction

Solution

Additional considerations

Introduction

Solution

Introduction

Solution

Additional Considerations