Configure Karpenter on Oracle Cloud
Karpenter for OCI uses self-managed nodes to extend the OKE cluster to have sufficient power to run all the pods. This implies that all the OCI self-managed nodes' requirements apply to Karpenter too.
Procure the Karpenter License
The following are the steps to procure a Karpenter license.
-
Contact Avesha Sales at
sales@avesha.io
to obtain a Karpenter license. -
License options:
- A trial license is available for evaluation purposes and is valid for a limited time.
- For production deployments, a commercial license must be purchased.
-
Provide the OCID (Oracle Cloud Identifier) of the cluster to be licensed.
-
To set resource limits, specify the maximum number of vCPUs that Karpenter is allowed to provision in the cluster.
You will need to provide the cluster ID for each cluster you intend to license. Avesha issues a license on per-cluster basis. After processing your request, Avesha shares the license details with you.
Prerequisites
The following section describes the prerequisites required for Karpenter on Oracle cloud.
OCI Requirements
-
Ensure that the OKE cluster is an enchanted cluster and it uses the flannel CNI plugin for pod networking. Karpenter also works on Cilium OKE clusters.
-
The OKE cluster must contain Dynamic Group and associated policies created to allow node joining. Ensure that the following policies are configured:
Allow dynamic-group <dynamic-group-name> to {CLUSTER_JOIN} in compartment <compartment-name>
Allow dynamic-group <dynamic-group-name> to manage cluster-node-pools in compartment <compartment-name>
Allow dynamic-group <dynamic-group-name> to manage instance-family in compartment <compartment-name>
Allow dynamic-group <dynamic-group-name> to use subnets in compartment <compartment-name>
Allow dynamic-group <dynamic-group-name> to read virtual-network-family in compartment <compartment-name>
Allow dynamic-group <dynamic-group-name> to use vnics in compartment <compartment-name>
Allow dynamic-group <dynamic-group-name> to inspect compartments in compartment <compartment-name>
Supported Network Types
- Flannel
- Cilium (Flannel-based OKE network converted to Cilium)
- VCN-Native (Supported starting from
karpenter-oci v0.0.13
)
Supported Oracle VM Shapes
Currently, the following VM shapes are supported:
VM.Standard.E3.Flex
VM.Standard.E4.Flex
VM.Standard.E5.Flex
VM.GPU2.*
VM.GPU3.*
VM.GPU.A10.*
Configure OCI-Specific Features
The following configurations are derived from existing OKE NodePools:
- Node Subnet
- Node Image (One OKE Nodepol for CPU workloads and one for GPU)
- Node Placement
- Node Network Security Group
- Pod Subnet (For OCI VCN-Native Pod Networking)
- Pod Network Security Groups (For OCI VCN-Native Pod Networking)
- To exclude a NodePool from Karpenter, tag it with
karpenter=false
. - The OKE nodepools for Karpenter may have
Node count
set to0
.
Subnet Selection
Karpenter reads the subnet configuration of OKE-managed nodes and uses the same subnet for provisioning additional nodes.
Node Placement and Availability Domains
Karpenter reads the availability domains from OKE NodePool definitions and randomly selects one for newly created nodes.
Preemptible (Spot) VM Support
Oracle Preemptible Instances are available at 50% lower cost than standard VMs, but can be terminated by Oracle at any time with a 2-minutes' notice.
Karpenter allows users to enable or disable Preemptible VMs at the NodePool level.
For more information on Preemptible VM Shapes, see OCI Documentation.
Oracle Node Class Definition
Karpenter on OCI uses Avesha's karpenter-multicloud platform, which is based on
karpenter-core-version 1.0
.
OciNodeClass
apiVersion: karpenter.multicloud.sh/v1alpha1
kind: OciNodeClass
metadata:
name: ocinodeclass1
spec:
#ImageOCID: <image.ocid>
#BootVolumeSizeGB: <size in GB>
#SubnetOCID: <subnet OCID>
#NetworkSgOCID: <nsgOCid1,nsgOCid2>
#PodsSubnetOCID: <PODs subnet OCID> #For OCI VCN-Native Pod Networking
#PodsNetworkSgOCIDs: <PODs nsgOCid1,PODs nsgOCid2> #For OCI VCN-Native Pod Networking
#SSHKeys: "ssh-rsa ********"
A valid OciNodeClass is mandatory and must be referenced by every NodePool. The same class can be used across multiple NodePools. All fields are optional. Except for SSHKeys, the other fields in the OciNodeClass object overwrite values from the OCI NodePools.
Install Karpenter
You must have helm
and kubectl
configured on the target cluster.
-
Specify the OCID received as part of the Karpenter license when you configure Karpenter on the OKE cluster.
-
Add the repository using the following commands:
helm repo add smartscaler https://smartscaler.nexus.aveshalabs.io/repository/smartscaler-helm-ent-prod
helm repo update -
To see the Smart Karpenter charts, use the following command:
helm search repo avesha-karpenter
-
Retrieve the
values.yaml
file from the repository you added using the following command:helm show values smartscaler/avesha-karpenter > values.yaml
Modify the
values.yaml
file as required. -
Install Karpenter using the modified
values.yaml
file in following command:helm install karpenter smartscaler/avesha-karpenter -f values.yaml --namespace smart-scaler --create-namespace
-
Deploy at least one NodePool on the OCI cluster. For more information, see NodePool Examples.
-
To see NodePools, use the following commands:
kubectl -n smart-scaler get nodepools
kubectl -n smart-scaler get nodeclaims
NodePool Examples
Preemptible (Spot) NodePool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: preemptible
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: "karpenter.oci.sh/instance-family"
operator: In
values: ["E3","E4"]
nodeClassRef:
name: ocinodeclass1
kind: OciNodeClass
group: karpenter.multicloud.sh
expireAfter: 720h # 30 * 24h = 720h
limits:
cpu: 20
nvidia.com/gpu: 0
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 10m
weight: 2
On-Demand NodePool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: on-demand
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: "karpenter.oci.sh/instance-cpu"
operator: In
values: ["6", "8"]
- key: "karpenter.oci.sh/instance-memory"
operator: In
values: ["16","32"]
nodeClassRef:
name: ocinodeclass1
kind: OciNodeClass
group: karpenter.multicloud.sh
expireAfter: 720h # 30 * 24h = 720h
limits:
cpu: 10
nvidia.com/gpu: 0
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 10m
weight: 2
GPU NodePool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: gpu
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: "karpenter.oci.sh/instance-family"
operator: In
values: ["A10"] # A10, GPU2 or GPU3
taints:
- key: nvidia.com/gpu
value: "true"
effect: NoSchedule
nodeClassRef:
name: ocinodeclass1
kind: OciNodeClass
group: karpenter.multicloud.sh
expireAfter: 720h # 30 * 24h = 720h
limits:
cpu: 200
nvidia.com/gpu: 10
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 10m
weight: 1
Delete Karpenter and NodePools
To delete all nodes that Karpenter deployed, use the following command:
kubectl delete nodepools --all
To remove Karpenter and associated CRDs, use the following command:
helm delete karpenter --namespace smart-scaler