Configure High Availability for the Inference Agent
The Smart Scaler Agent version 2.9.40 and later provides High Availability (HA) with priority-based custom metrics coordination. This ensures that the HPA always queries the pod delivering the highest-quality data.
Key Benefits
- Removes any single point of failure in the system
- Ensures the HPA consistently receives the highest-quality available metrics
- Maintains service continuity with graceful degradation during partial outages
- Automatically fails over to the healthiest data source based on data quality
Data Priority in Custom Metrics Coordination
The Smart Scaler Agent version 2.9.40 and later supports the following data priority in custom metrics coordination.
| Priority Level | Description |
|---|---|
| SaaS-connected (highest priority) | Real-time recommendations from SaaS |
| Dynamic-fallback | Calculated by the HPA algorithm using real-time metrics |
| Static-fallback (lowest priority) | Uses fixed fallback values |
Configure HA
To use HA, configure multiple replicas in your ss-agent-values.yaml file.
For production environments, we recommend configuring at least two replicas.
By default, the deployed agent contains single replica.
Enable HA for the Smart Scaler Agent by adding the following configuration to the
ss-agent-values.yaml file:
inferenceAgent:
replicas: 2 # Enable HA with 2 replicas