Version: 1.2.0

Best Practices

This topic outlines recommended best practices for achieving secure and efficient production-grade operations.

Graceful Node Consolidation with Smart Karpenter on OCI

To ensure reliable and predictable node consolidations when using Smart Karpenter on Oracle Cloud Infrastructure (OCI), review the following recommended practices:

Configure an Appropriate consolidateAfter Value

OCI nodes typically require around 3 minutes to provision, and this buffer period allows new nodes to initialize and applications to stabilize before Smart Karpenter initiates consolidation or voluntary disruptions.

If this parameter is not configured, Smart Karpenter may consolidate nodes too aggressively. To prevent the aggressive nodes consolidation:
1. Confirm whether consolidateAfter is explicitly defined in the NodePool configuration.
2. Set consolidateAfter to approximately 10 minutes.
Use Pod Disruption Budgets (PDBs) and Anti-Affinity Rules

Implement PDBs and anti-affinity rules for workloads, especially mission-critical or stateful applications. These safeguards help maintain availability during rescheduling events initiated by Smart Karpenter or Kubernetes. These safeguards are considered standard best practices even outside of Smart Karpenter environments.
Protect Highly Sensitive Workloads from Voluntary Disruption

For workloads that must never be disrupted during consolidation, add the following annotation to the pod specification:
```
karpenter.sh/do-not-disrupt: "true"
```
This prevents Smart Karpenter from evicting the pod during voluntary disruption workflows, including consolidation operations.

note

If node consolidation issues continue, please share the Smart Karpenter logs, NodePool definition, and relevant application manifests with Avesha Support. This information helps diagnose the issue accurately and offer more targeted guidance.

Node Rotation on OCI

Smart Karpenter rotates nodes through disruption actions such as drift handling, consolidation, expiration, or explicit user-triggered operations.

Graceful Time-Based Node Rotation on OCI

Smart Karpenter 1.2.0 introduces support for graceful, scheduled node rotation through the new rotateAfter parameter. This capability enables organizations to rotate nodes predictably while minimizing workload disruption.

Property	Description
Field	`rotateAfter` (optional)
Type	Duration string (for example, `"168h"` for one week, `"720h"` for 30 days)
Default	`""` (disabled)
Purpose	Initiates graceful node replacement based on configured node age

How It Works

Smart Karpenter continuously evaluates the age of nodes in the cluster:

When a node exceeds the configured rotateAfter duration, it is marked as drifted.
Smart Karpenter provisions a new replacement node and waits for it to reach the Ready state.
The existing node is gracefully drained, adhering to Pod Disruption Budgets (PDBs).
After all workloads are evicted safely, the old node is deleted.
The process strictly respects configured disruption policies.

Key Benefits

Predictable, scheduled node rotation for compliance or operational policies.
Zero-downtime rotations when used with appropriate PDB settings.
Non-disruptive alternative to expireAfter, providing more controlled lifecycle management.

Manual Node Rotation with Smart Karpenter on OCI

You can manually mark nodes as `drifted so Smart Karpenter treats them as needing replacement.

# Trigger manual rotation
kubectl annotate nodepool <nodepool-name> \
  karpenter.sh/nodepool-hash="manual-rotation-$(date +%s)" \
  --overwrite --namespace smart-scaler

# View current annotation
kubectl get nodepool <nodepool-name> -n smart-scaler \
  -o jsonpath='{.metadata.annotations.karpenter\.sh/nodepool-hash}'

This is the sequence of steps that happen during manual node rotation:

Smart Karpenter identifies drift and marks all nodes in the NodePool as drifted.
New nodes are provisioned to replace the drifted ones.
Smart Karpenter waits until the new nodes become Ready and can safely take over workloads.
Old nodes are tainted with karpenter.sh/disrupted:NoSchedule to prevent new workloads from being scheduled on them.
Existing workloads on old nodes are drained, honoring any configured Pod Disruption Budgets (PDBs).
Old nodes are deleted once draining is complete.
Throughout the process, Smart Karpenter respects all configured disruption budgets to ensure controlled and safe rotation.

Disruption Budgets on OCI

Configure disruption budgets to prevent simultaneous multi-node rotation that can cause service disruptions.

Disruption Budget Parameters

Parameter	Description
nodes	Percentage ("20%") or absolute count ("5")
reasons	Filter by disruption reason (`Empty`, `Drifted`, `Underutilized`, or `Expired`)
schedule	Cron expression for time-based budgets
duration	The time for how long the budget applies.

Automatic Kubernetes Version Detection on OCI

Smart Karpenter automatically detects the Kubernetes version from the OKE control plane and selects the appropriate node image, ensuring consistent and compatible node provisioning without manual intervention.

How It Works

Smart Karpenter performs the following steps to select the correct OKE node image:

Detects the Kubernetes version from the OKE control plane
Searches the OCI image catalog for corresponding OKE images that match the detected version
Selects the latest compatible image from the available options
After a control plane upgrade and a subsequent Smart Karpenter restart:
- Nodes using older images are automatically marked as drifted.
- Smart Karpenter initiates graceful node rotation, respecting Pod Disruption Budgets (PDBs) and disruption policies.

Image Selection Priority

Smart Karpenter uses the following priority order to determine which image to apply:

Environment variables (highest priority, if configured)
Auto-detected version from the OKE control plane
NodePool image configuration (fallback option)

Key Benefits

The automatic Kubernetes version detection:

Eliminates the need for manual image ID management
Ensures automatic compatibility with the control plane version
Reduces operational overhead across multiple clusters
Simplifies and accelerates cluster upgrade workflows

note

Image detection automatically searches for matching OKE images in both your OCI compartment and Oracle's public compartment.

Persistent Volumes

For workloads that use persistent volumes on OCI, ensure you use the CSI StorageClass or patch existing FlexVolume PVs as described in Persistent Volumes and Provisioner Support.

Graceful Node Consolidation with Smart Karpenter on OCI​

Node Rotation on OCI​

Graceful Time-Based Node Rotation on OCI​

How It Works​

Key Benefits​

Manual Node Rotation with Smart Karpenter on OCI​

Disruption Budgets on OCI​

Automatic Kubernetes Version Detection on OCI​

How It Works​

Image Selection Priority​

Key Benefits​

Persistent Volumes​

Graceful Node Consolidation with Smart Karpenter on OCI

Node Rotation on OCI

Graceful Time-Based Node Rotation on OCI

How It Works

Key Benefits

Manual Node Rotation with Smart Karpenter on OCI

Disruption Budgets on OCI

Automatic Kubernetes Version Detection on OCI

How It Works

Image Selection Priority

Key Benefits

Persistent Volumes