Best Practices
This topic outlines recommended best practices for achieving secure and efficient production-grade operations.
Graceful Node Consolidation with Smart Karpenter on OCI
To ensure reliable and predictable node consolidations when using Smart Karpenter on Oracle Cloud Infrastructure (OCI), review the following recommended practices:
-
Configure an Appropriate
consolidateAfterValueOCI nodes typically require around 3 minutes to provision, and this buffer period allows new nodes to initialize and applications to stabilize before Smart Karpenter initiates consolidation or voluntary disruptions.
If this parameter is not configured, Smart Karpenter may consolidate nodes too aggressively. To prevent the aggressive nodes consolidation:
- Confirm whether
consolidateAfteris explicitly defined in the NodePool configuration. - Set consolidateAfter to approximately 10 minutes.
- Confirm whether
-
Use Pod Disruption Budgets (PDBs) and Anti-Affinity Rules
Implement PDBs and anti-affinity rules for workloads, especially mission-critical or stateful applications. These safeguards help maintain availability during rescheduling events initiated by Smart Karpenter or Kubernetes. These safeguards are considered standard best practices even outside of Smart Karpenter environments.
-
Protect Highly Sensitive Workloads from Voluntary Disruption
For workloads that must never be disrupted during consolidation, add the following annotation to the pod specification:
karpenter.sh/do-not-disrupt: "true"This prevents Smart Karpenter from evicting the pod during voluntary disruption workflows, including consolidation operations.
If node consolidation issues continue, please share the Smart Karpenter logs, NodePool definition, and relevant application manifests with Avesha Support. This information helps diagnose the issue accurately and offer more targeted guidance.