Skip to main content
Version: 2.17.0

Configure High Availability for the Inference Agent

The Smart Scaler Agent version 2.9.40 and later provides High Availability (HA) with priority-based custom metrics coordination. This ensures that the HPA always queries the pod delivering the highest-quality data.

Key Benefits

  • Removes any single point of failure in the system
  • Ensures the HPA consistently receives the highest-quality available metrics
  • Maintains service continuity with graceful degradation during partial outages
  • Automatically fails over to the healthiest data source based on data quality

Data Priority in Custom Metrics Coordination

The Smart Scaler Agent version 2.9.40 and later supports the following data priority in custom metrics coordination.

Priority LevelDescription
SaaS-connected (highest priority)Real-time recommendations from SaaS
Dynamic-fallbackCalculated by the HPA algorithm using real-time metrics
Static-fallback (lowest priority)Uses fixed fallback values

Configure HA

To use HA, configure multiple replicas in your ss-agent-values.yaml file. For production environments, we recommend configuring at least two replicas.

By default, the deployed agent contains single replica.

Enable HA for the Smart Scaler Agent by adding the following configuration to the ss-agent-values.yaml file:

inferenceAgent:
replicas: 2 # Enable HA with 2 replicas