Configure Smart Karpenter on Oracle Cloud
Smart Karpenter for OCI uses self-managed nodes to extend the OKE cluster to have sufficient power to run all the pods. This implies that all the OCI self-managed nodes' requirements apply to Smart Karpenter too.
Supported Network Types and Oracle VM Shapes
| Network Types | Oracle VM Shapes | Oracle VM Shapes |
|---|---|---|
| Flannel | BM.GPU.A10 | VM.Standard.E3.Flex |
| Cilium (Flannel-based OKE network converted to Cilium) | VM.GPU2.* | VM.Standard.E3.Flex |
| VCN-Native | VM.GPU3.* x | VM.Standard.E5.Flex |
| - | VM.GPU.A10.* | - |
Prerequisites
The following section describes the prerequisites required for Smart Karpenter on Oracle cloud.
-
Create or use an existing OKE cluster where you want to deploy Smart Karpenter. Share the cluster ID (OCID) with Avesha for license. Note down the cluster name for using it in the configuration later.
-
Ensure that the OKE cluster is an enchanted cluster and it uses the CNI plugin for pod networking. Smart Karpenter also works on Cilium OKE clusters.
-
The OKE cluster must contain Dynamic Group and associated policies created to allow node joining. Ensure that the following policies are configured:
Allow dynamic-group <dynamic-group-name> to {CLUSTER_JOIN} in compartment <compartment-name>
Allow dynamic-group <dynamic-group-name> to manage cluster-node-pools in compartment <compartment-name>
Allow dynamic-group <dynamic-group-name> to manage instance-family in compartment <compartment-name>
Allow dynamic-group <dynamic-group-name> to use subnets in compartment <compartment-name>
Allow dynamic-group <dynamic-group-name> to read virtual-network-family in compartment <compartment-name>
Allow dynamic-group <dynamic-group-name> to use vnics in compartment <compartment-name>
Allow dynamic-group <dynamic-group-name> to inspect compartments in compartment <compartment-name> -
You must have
helmandkubectltools configured on the target OKE cluster. -
Before deploying Smart Karpenter on Oracle Kubernetes Engine (OKE), ensure that outbound Internet connectivity is properly configured. The following prerequisites ensure that Smart Karpenter can securely access OCI APIs and external services while keeping the pod network private:
noteInternet Gateways (IGWs) cannot be used for pod egress, as they route only public IP traffic and do not perform NAT.
- Network Mode: The OKE cluster uses
OCI_VCN_IP_NATIVEpod networking, where pods have private IPs from private subnets and no public IPs. - NAT Gateway: Configure a NAT Gateway (NGW) for any private subnet hosting Smart Karpenter controller pods or system components that require external connectivity.
- Route Table: The subnet route table must include a default route (0.0.0.0/0 → NAT Gateway).
- VCN Association: The NAT Gateway must reside in the same Virtual Cloud Network (VCN).
- Security Rules: Ensure security lists or Network Security Groups (NSGs) allow outbound HTTPS (TCP 443) traffic to OCI service endpoints.
- Network Mode: The OKE cluster uses
Install Smart Karpenter
-
Add the repository using the following commands:
helm repo add smartscaler https://smartscaler.nexus.aveshalabs.io/repository/smartscaler-helm-ent-prod
helm repo update -
To view the Smart Karpenter charts, use the following command:
helm search repo avesha-karpenter -
Retrieve the
values.yamlfile from the repository you added using the following command:helm show values smartscaler/avesha-karpenter > values.yaml -
In the
values.yamlfile:-
Change the value of
CLUSTER_NAMEto your OKE cluster name, as shown below:name: CLUSTER_NAME
value: "<your OKE cluster name>" -
Add the license you received from Avesha, as shown below:
license:
name: "<name you received from Avesha>"
license: "<License you received from Avesha>"
licensekey: "<License key you received from Avesha>"
-
-
Install Smart Karpenter using the modified
values.yamlfile using the following command:helm install karpenter smartscaler/avesha-karpenter -f values.yaml --namespace smart-scaler --create-namespace -
Create a new file to define the Oracle Node Class Definition using the following YAML:
infoFor more information, see Understand OCI Features.
apiVersion: karpenter.multicloud.sh/v1alpha1
kind: OciNodeClass
metadata:
name: ocinodeclass
spec:
#ImageOCID: <image.ocid> #One OKE Nodepol for CPU workloads and one for GPU
#BootVolumeSizeGB: <size in GB>
#SubnetOCID: <subnet OCID>
#NetworkSgOCID: <nsgOCid1,nsgOCid2>
#PodsSubnetOCID: <PODs subnet OCID> #For OCI VCN-Native Pod Networking
#PodsNetworkSgOCIDs: <PODs nsgOCid1,PODs nsgOCid2> #For OCI VCN-Native Pod Networking
#SSHKeys: "ssh-rsa ********"A valid
OciNodeClassis mandatory and must be referenced by every NodePool. The same class can be used across multiple NodePools. All parameters are optional. Except forSSHKeys, the other parameters in theOciNodeClassobject overwrite values from the OCI NodePools. -
Apply the
OCINodeClassthat you just created using the following command:kubectl apply -f <Name of the OCINodeClass>.yaml --namespace smart-scaler -
Create a new file to deploy at least one NodePool on the OKE cluster as shown in the following example:
infoFor more information, see NodePool Examples.
For GPU workloads, we have listed the successfully tested OCI images and the image to avoid under GPU NodePool Example.
note- To exclude a NodePool from Smart Karpenter, tag it with
karpenter=false. - The OKE NodePools for Smart Karpenter may have
Node countset to0.
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: preemptible
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: "karpenter.oci.sh/instance-family"
operator: In
values: ["E3","E4"]
nodeClassRef:
name: ocinodeclass
kind: OciNodeClass
group: karpenter.multicloud.sh
expireAfter: 720h # 30 * 24h = 720h
limits:
cpu: 20
nvidia.com/gpu: 0
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 10m
weight: 2 - To exclude a NodePool from Smart Karpenter, tag it with
-
Apply the NodePool that you created using the following command:
kubectl apply -f <Name of the NodePool>.yaml --namespace smart-scaler -
To view OCINodeClass and NodePools, use the following commands:
kubectl --namespace smart-scaler get ocinodeclass
kubectl --namespace smart-scaler get nodepools -
To view generated NodeClaims (by OCINodeClass and NodePools), use the following command:
kubectl --namespace smart-scaler get nodeclaims
NodePool Examples
Preemptible (Spot) NodePool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: preemptible
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: "karpenter.oci.sh/instance-family"
operator: In
values: ["E3","E4"]
nodeClassRef:
name: ocinodeclass
kind: OciNodeClass
group: karpenter.multicloud.sh
expireAfter: 720h # 30 * 24h = 720h
limits:
cpu: 20
nvidia.com/gpu: 0
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 10m
weight: 2
On-Demand NodePool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: on-demand
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: "karpenter.oci.sh/instance-cpu"
operator: In
values: ["6", "8"]
- key: "karpenter.oci.sh/instance-memory"
operator: In
values: ["16","32"]
nodeClassRef:
name: ocinodeclass
kind: OciNodeClass
group: karpenter.multicloud.sh
expireAfter: 720h # 30 * 24h = 720h
limits:
cpu: 10
nvidia.com/gpu: 0
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 10m
weight: 2
GPU NodePool
Smart Karpenter has been successfully tested with the following OCI images in the US-Ashburn-1 region:
For GPU workloads, avoid using the oracle-linux-8.10-gen2-gpu-2025.06.17-0-oke-1.31.10-878 image, as it may cause compatibility issues.
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: gpu
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: "karpenter.oci.sh/instance-family"
operator: In
values: ["A10"] # A10, GPU2 or GPU3
taints:
- key: nvidia.com/gpu
value: "true"
effect: NoSchedule
nodeClassRef:
name: ocinodeclass
kind: OciNodeClass
group: karpenter.multicloud.sh
expireAfter: 720h # 30 * 24h = 720h
limits:
cpu: 200
nvidia.com/gpu: 10
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 10m
weight: 1
Bare Metal GPU NodePool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: gpu-bms
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["bms"]
- key: "karpenter.oci.sh/instance-family"
operator: In
values: ["A10"]
taints:
- key: nvidia.com/gpu
value: "true"
effect: NoSchedule
nodeClassRef:
name: ocinodeclass
kind: OciNodeClass
group: karpenter.multicloud.sh
limits:
cpu: 200
nvidia.com/gpu: 4
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 5m
weight: 10
Delete Smart Karpenter
-
To delete all NodePools that Smart Karpenter deployed, use the following command:
kubectl delete nodepools --all -
To delete the OCINodeClass that Smart Karpenter deployed, use the following command:
kubectl delete ocinodeclass <Name of the OCINodeClass> --namespace smart-scaler -
To remove Smart Karpenter and associated CRDs, use the following command:
helm uninstall karpenter --namespace smart-scaler
Understand OCI Features
Subnet Selection
Smart Karpenter reads the subnet configuration of OKE-managed nodes and uses the same subnet for provisioning additional nodes.
Node Placement and Availability Domains
Smart Karpenter reads the availability domains from OKE NodePool definitions and randomly selects one for newly created nodes.
Preemptible (Spot) VM Support
Oracle Preemptible Instances are available at 50% lower cost than standard VMs, but can be terminated by Oracle at any time with a 2-minutes' notice.
Smart Karpenter allows users to enable or disable Preemptible VMs at the NodePool level.
For more information on Preemptible VM Shapes, see OCI Documentation.