Skip to main content
Version: 1.1.0

Configure Smart Karpenter on Oracle Cloud

Smart Karpenter for OCI uses self-managed nodes to extend the OKE cluster to have sufficient power to run all the pods. This implies that all the OCI self-managed nodes' requirements apply to Smart Karpenter too.

Supported Network Types and Oracle VM Shapes

Network TypesOracle VM ShapesOracle VM Shapes
FlannelBM.GPU.A10VM.Standard.E3.Flex
Cilium
(Flannel-based OKE
network converted to Cilium
)
VM.GPU2.*VM.Standard.E3.Flex
VCN-NativeVM.GPU3.* xVM.Standard.E5.Flex
-VM.GPU.A10.*-

Prerequisites

The following section describes the prerequisites required for Smart Karpenter on Oracle cloud.

  1. Create or use an existing OKE cluster where you want to deploy Smart Karpenter. Share the cluster ID (OCID) with Avesha for license. Note down the cluster name for using it in the configuration later.

  2. Ensure that the OKE cluster is an enchanted cluster and it uses the CNI plugin for pod networking. Smart Karpenter also works on Cilium OKE clusters.

  3. The OKE cluster must contain Dynamic Group and associated policies created to allow node joining. Ensure that the following policies are configured:

    Allow dynamic-group <dynamic-group-name> to {CLUSTER_JOIN} in compartment <compartment-name>
    Allow dynamic-group <dynamic-group-name> to manage cluster-node-pools in compartment <compartment-name>
    Allow dynamic-group <dynamic-group-name> to manage instance-family in compartment <compartment-name>
    Allow dynamic-group <dynamic-group-name> to use subnets in compartment <compartment-name>
    Allow dynamic-group <dynamic-group-name> to read virtual-network-family in compartment <compartment-name>
    Allow dynamic-group <dynamic-group-name> to use vnics in compartment <compartment-name>
    Allow dynamic-group <dynamic-group-name> to inspect compartments in compartment <compartment-name>
  4. You must have helm and kubectl tools configured on the target OKE cluster.

  5. Before deploying Smart Karpenter on Oracle Kubernetes Engine (OKE), ensure that outbound Internet connectivity is properly configured. The following prerequisites ensure that Smart Karpenter can securely access OCI APIs and external services while keeping the pod network private:

    note

    Internet Gateways (IGWs) cannot be used for pod egress, as they route only public IP traffic and do not perform NAT.

    • Network Mode: The OKE cluster uses OCI_VCN_IP_NATIVE pod networking, where pods have private IPs from private subnets and no public IPs.
    • NAT Gateway: Configure a NAT Gateway (NGW) for any private subnet hosting Smart Karpenter controller pods or system components that require external connectivity.
    • Route Table: The subnet route table must include a default route (0.0.0.0/0 → NAT Gateway).
    • VCN Association: The NAT Gateway must reside in the same Virtual Cloud Network (VCN).
    • Security Rules: Ensure security lists or Network Security Groups (NSGs) allow outbound HTTPS (TCP 443) traffic to OCI service endpoints.

Install Smart Karpenter

  1. Add the repository using the following commands:

    helm repo add smartscaler https://smartscaler.nexus.aveshalabs.io/repository/smartscaler-helm-ent-prod
    helm repo update
  2. To view the Smart Karpenter charts, use the following command:

    helm search repo avesha-karpenter
  3. Retrieve the values.yaml file from the repository you added using the following command:

    helm show values smartscaler/avesha-karpenter > values.yaml
  4. In the values.yaml file:

    1. Change the value of CLUSTER_NAME to your OKE cluster name, as shown below:

      name: CLUSTER_NAME
      value: "<your OKE cluster name>"
    2. Add the license you received from Avesha, as shown below:

      license:
      name: "<name you received from Avesha>"
      license: "<License you received from Avesha>"
      licensekey: "<License key you received from Avesha>"
  5. Install Smart Karpenter using the modified values.yaml file using the following command:

    helm install karpenter smartscaler/avesha-karpenter -f values.yaml --namespace smart-scaler --create-namespace
  6. Create a new file to define the Oracle Node Class Definition using the following YAML:

    info

    For more information, see Understand OCI Features.

    apiVersion: karpenter.multicloud.sh/v1alpha1
    kind: OciNodeClass
    metadata:
    name: ocinodeclass
    spec:
    #ImageOCID: <image.ocid> #One OKE Nodepol for CPU workloads and one for GPU
    #BootVolumeSizeGB: <size in GB>
    #SubnetOCID: <subnet OCID>
    #NetworkSgOCID: <nsgOCid1,nsgOCid2>
    #PodsSubnetOCID: <PODs subnet OCID> #For OCI VCN-Native Pod Networking
    #PodsNetworkSgOCIDs: <PODs nsgOCid1,PODs nsgOCid2> #For OCI VCN-Native Pod Networking
    #SSHKeys: "ssh-rsa ********"

    A valid OciNodeClass is mandatory and must be referenced by every NodePool. The same class can be used across multiple NodePools. All parameters are optional. Except for SSHKeys, the other parameters in the OciNodeClass object overwrite values from the OCI NodePools.

  7. Apply the OCINodeClass that you just created using the following command:

    kubectl apply -f <Name of the OCINodeClass>.yaml --namespace smart-scaler
  8. Create a new file to deploy at least one NodePool on the OKE cluster as shown in the following example:

    info

    For more information, see NodePool Examples.

    For GPU workloads, we have listed the successfully tested OCI images and the image to avoid under GPU NodePool Example.

    note
    • To exclude a NodePool from Smart Karpenter, tag it with karpenter=false.
    • The OKE NodePools for Smart Karpenter may have Node count set to 0.
    apiVersion: karpenter.sh/v1
    kind: NodePool
    metadata:
    name: preemptible
    spec:
    template:
    spec:
    requirements:
    - key: kubernetes.io/arch
    operator: In
    values: ["amd64"]
    - key: kubernetes.io/os
    operator: In
    values: ["linux"]
    - key: karpenter.sh/capacity-type
    operator: In
    values: ["spot"]
    - key: "karpenter.oci.sh/instance-family"
    operator: In
    values: ["E3","E4"]
    nodeClassRef:
    name: ocinodeclass
    kind: OciNodeClass
    group: karpenter.multicloud.sh
    expireAfter: 720h # 30 * 24h = 720h
    limits:
    cpu: 20
    nvidia.com/gpu: 0
    disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 10m
    weight: 2
  9. Apply the NodePool that you created using the following command:

    kubectl apply -f <Name of the NodePool>.yaml --namespace smart-scaler
  10. To view OCINodeClass and NodePools, use the following commands:

    kubectl --namespace smart-scaler get ocinodeclass
    kubectl --namespace smart-scaler get nodepools
  11. To view generated NodeClaims (by OCINodeClass and NodePools), use the following command:

    kubectl --namespace smart-scaler get nodeclaims

NodePool Examples

Preemptible (Spot) NodePool

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: preemptible
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: "karpenter.oci.sh/instance-family"
operator: In
values: ["E3","E4"]
nodeClassRef:
name: ocinodeclass
kind: OciNodeClass
group: karpenter.multicloud.sh
expireAfter: 720h # 30 * 24h = 720h
limits:
cpu: 20
nvidia.com/gpu: 0
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 10m
weight: 2

On-Demand NodePool

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: on-demand
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: "karpenter.oci.sh/instance-cpu"
operator: In
values: ["6", "8"]
- key: "karpenter.oci.sh/instance-memory"
operator: In
values: ["16","32"]
nodeClassRef:
name: ocinodeclass
kind: OciNodeClass
group: karpenter.multicloud.sh
expireAfter: 720h # 30 * 24h = 720h
limits:
cpu: 10
nvidia.com/gpu: 0
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 10m
weight: 2

GPU NodePool

note

Smart Karpenter has been successfully tested with the following OCI images in the US-Ashburn-1 region:

caution

For GPU workloads, avoid using the oracle-linux-8.10-gen2-gpu-2025.06.17-0-oke-1.31.10-878 image, as it may cause compatibility issues.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: gpu
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: "karpenter.oci.sh/instance-family"
operator: In
values: ["A10"] # A10, GPU2 or GPU3
taints:
- key: nvidia.com/gpu
value: "true"
effect: NoSchedule
nodeClassRef:
name: ocinodeclass
kind: OciNodeClass
group: karpenter.multicloud.sh
expireAfter: 720h # 30 * 24h = 720h
limits:
cpu: 200
nvidia.com/gpu: 10
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 10m
weight: 1

Bare Metal GPU NodePool Beta

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: gpu-bms
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["bms"]
- key: "karpenter.oci.sh/instance-family"
operator: In
values: ["A10"]
taints:
- key: nvidia.com/gpu
value: "true"
effect: NoSchedule
nodeClassRef:
name: ocinodeclass
kind: OciNodeClass
group: karpenter.multicloud.sh
limits:
cpu: 200
nvidia.com/gpu: 4
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 5m
weight: 10

Delete Smart Karpenter

  1. To delete all NodePools that Smart Karpenter deployed, use the following command:

    kubectl delete nodepools --all
  2. To delete the OCINodeClass that Smart Karpenter deployed, use the following command:

    kubectl delete ocinodeclass <Name of the OCINodeClass> --namespace smart-scaler
  3. To remove Smart Karpenter and associated CRDs, use the following command:

    helm uninstall karpenter --namespace smart-scaler

Understand OCI Features

Subnet Selection

Smart Karpenter reads the subnet configuration of OKE-managed nodes and uses the same subnet for provisioning additional nodes.

Node Placement and Availability Domains

Smart Karpenter reads the availability domains from OKE NodePool definitions and randomly selects one for newly created nodes.

Preemptible (Spot) VM Support

Oracle Preemptible Instances are available at 50% lower cost than standard VMs, but can be terminated by Oracle at any time with a 2-minutes' notice.

Smart Karpenter allows users to enable or disable Preemptible VMs at the NodePool level.

For more information on Preemptible VM Shapes, see OCI Documentation.