Skip to main content
Version: 1.2.0

Best Practices

This topic outlines recommended best practices for achieving secure and efficient production-grade operations.

Graceful Node Consolidation with Smart Karpenter on OCI

To ensure reliable and predictable node consolidations when using Smart Karpenter on Oracle Cloud Infrastructure (OCI), review the following recommended practices:

  1. Configure an Appropriate consolidateAfter Value

    OCI nodes typically require around 3 minutes to provision, and this buffer period allows new nodes to initialize and applications to stabilize before Smart Karpenter initiates consolidation or voluntary disruptions.

    If this parameter is not configured, Smart Karpenter may consolidate nodes too aggressively. To prevent the aggressive nodes consolidation:

    1. Confirm whether consolidateAfter is explicitly defined in the NodePool configuration.
    2. Set consolidateAfter to approximately 10 minutes.
  2. Use Pod Disruption Budgets (PDBs) and Anti-Affinity Rules

    Implement PDBs and anti-affinity rules for workloads, especially mission-critical or stateful applications. These safeguards help maintain availability during rescheduling events initiated by Smart Karpenter or Kubernetes. These safeguards are considered standard best practices even outside of Smart Karpenter environments.

  3. Protect Highly Sensitive Workloads from Voluntary Disruption

    For workloads that must never be disrupted during consolidation, add the following annotation to the pod specification:

    karpenter.sh/do-not-disrupt: "true"

    This prevents Smart Karpenter from evicting the pod during voluntary disruption workflows, including consolidation operations.

note

If node consolidation issues continue, please share the Smart Karpenter logs, NodePool definition, and relevant application manifests with Avesha Support. This information helps diagnose the issue accurately and offer more targeted guidance.

Node Rotation on OCI

Smart Karpenter rotates nodes through disruption actions such as drift handling, consolidation, expiration, or explicit user-triggered operations.

Graceful Time-Based Node Rotation on OCI

Smart Karpenter 1.2.0 introduces support for graceful, scheduled node rotation through the new rotateAfter parameter. This capability enables organizations to rotate nodes predictably while minimizing workload disruption.

PropertyDescription
FieldrotateAfter (optional)
TypeDuration string (for example, "168h" for one week, "720h" for 30 days)
Default"" (disabled)
PurposeInitiates graceful node replacement based on configured node age

How It Works

Smart Karpenter continuously evaluates the age of nodes in the cluster:

  1. When a node exceeds the configured rotateAfter duration, it is marked as drifted.
  2. Smart Karpenter provisions a new replacement node and waits for it to reach the Ready state.
  3. The existing node is gracefully drained, adhering to Pod Disruption Budgets (PDBs).
  4. After all workloads are evicted safely, the old node is deleted.
  5. The process strictly respects configured disruption policies.

Key Benefits

  • Predictable, scheduled node rotation for compliance or operational policies.
  • Zero-downtime rotations when used with appropriate PDB settings.
  • Non-disruptive alternative to expireAfter, providing more controlled lifecycle management.

Manual Node Rotation with Smart Karpenter on OCI

You can manually mark nodes as `drifted so Smart Karpenter treats them as needing replacement.

# Trigger manual rotation
kubectl annotate nodepool <nodepool-name> \
karpenter.sh/nodepool-hash="manual-rotation-$(date +%s)" \
--overwrite --namespace smart-scaler

# View current annotation
kubectl get nodepool <nodepool-name> -n smart-scaler \
-o jsonpath='{.metadata.annotations.karpenter\.sh/nodepool-hash}'

This is the sequence of steps that happen during manual node rotation:

  1. Smart Karpenter identifies drift and marks all nodes in the NodePool as drifted.
  2. New nodes are provisioned to replace the drifted ones.
  3. Smart Karpenter waits until the new nodes become Ready and can safely take over workloads.
  4. Old nodes are tainted with karpenter.sh/disrupted:NoSchedule to prevent new workloads from being scheduled on them.
  5. Existing workloads on old nodes are drained, honoring any configured Pod Disruption Budgets (PDBs).
  6. Old nodes are deleted once draining is complete.
  7. Throughout the process, Smart Karpenter respects all configured disruption budgets to ensure controlled and safe rotation.

Disruption Budgets on OCI

Configure disruption budgets to prevent simultaneous multi-node rotation that can cause service disruptions.

Disruption Budget Parameters

ParameterDescription
nodesPercentage ("20%") or absolute count ("5")
reasonsFilter by disruption reason (Empty, Drifted, Underutilized, or Expired)
scheduleCron expression for time-based budgets
durationThe time for how long the budget applies.

Automatic Kubernetes Version Detection on OCI

Smart Karpenter automatically detects the Kubernetes version from the OKE control plane and selects the appropriate node image, ensuring consistent and compatible node provisioning without manual intervention.

How It Works

Smart Karpenter performs the following steps to select the correct OKE node image:

  1. Detects the Kubernetes version from the OKE control plane
  2. Searches the OCI image catalog for corresponding OKE images that match the detected version
  3. Selects the latest compatible image from the available options
  4. After a control plane upgrade and a subsequent Smart Karpenter restart:
    • Nodes using older images are automatically marked as drifted.
    • Smart Karpenter initiates graceful node rotation, respecting Pod Disruption Budgets (PDBs) and disruption policies.

Image Selection Priority

Smart Karpenter uses the following priority order to determine which image to apply:

  • Environment variables (highest priority, if configured)
  • Auto-detected version from the OKE control plane
  • NodePool image configuration (fallback option)

Key Benefits

The automatic Kubernetes version detection:

  • Eliminates the need for manual image ID management
  • Ensures automatic compatibility with the control plane version
  • Reduces operational overhead across multiple clusters
  • Simplifies and accelerates cluster upgrade workflows
note

Image detection automatically searches for matching OKE images in both your OCI compartment and Oracle's public compartment.