Configure Dynamic Node Allocation for GPRs
This topic describes how to enable and use dynamic node allocation for GPU provisioning requests (GPRs) in EGS. Dynamic node allocation allows the system to automatically scale GPU node pools on supported cloud providers, such as Linode LKE, based on the demand for GPU resources.
Currently, dynamic node allocation is only supported for clusters using Linode LKE. Support for additional cloud providers will be added in future releases.
Overview
Dynamic Node is a feature in EGS that enables automatic scaling of GPU node pools on supported cloud providers. When enabled, the system can automatically create new node pools or add nodes to existing pools when there is a demand for GPU resources that cannot be met with the current inventory. Additionally, the system can automatically remove idle nodes to optimize costs.
Actions performed by the dynamic node allocation feature include:
-
Scale-up: When a new GPU request is submitted and the current cluster does not have enough available capacity, the system automatically provisions additional compute resources so the request can be fulfilled.
-
Scale-down: When GPU resources remain unused for a certain period of time, the system automatically deallocates those resources to optimize infrastructure costs.
Configure an API Token for Linode LKE
For clusters using Linode LKE as the cloud provider, a Linode API token is required to enable dynamic node allocation. The GPR Manager uses this token to interact with the Linode API for scaling operations:
- To scale up, the controller creates new node pools or adds nodes through the Linode API when there is demand for GPU resources.
- To scale down, the controller deletes node pools for idle nodes using the Linode API.
Authentication is through a Linode Personal Access Token, read from the LINODE_TOKEN environment variable in the GPR Manager
deployment. If the token is not set, scaling operations will fail (logs may show: LINODE_TOKEN environment variable is not set).
Create a Linode API Token
To create a Linode API token:
-
Log in to the Linode Cloud Manager.
-
Click your username (top right) → API Tokens.
-
Click Create a Personal Access Token.
-
Configure:
- Label: for example, the
gpr-manager-lkeis the label for the token. - Expiry: Choose duration (for example, 90 days, 1 year, and so on).
- Access: Grant read/write for Linodes and Kubernetes/LKE.
- Label: for example, the
-
Click Create Token.
-
Copy the token immediately and store it securely (it will only be shown once).
Use the Token in EGS
To configure the GPR Manager with the Linode API token:
-
Create a Kubernetes Secret to store the token securely:
kubectl create secret generic linode-api-token --from-literal=token='YOUR_LINODE_TOKEN' -n <kubeslice-controller> -
Update the GPR Manager deployment to reference the token from the Secret:
env:
- name: LINODE_TOKEN
valueFrom:
secretKeyRef:
name: linode-api-token
key: tokenReplace
YOUR_LINODE_TOKENand<kubeslice-controller>with your actual token and the namespace where the GPR Manager is deployed.
Enable Dynamic Node Pool Allocation
Dynamic Node is controlled by the ENABLE_DYNAMIC_NODE environment variable in the GPR Manager deployment. This parameter is
set to true by default, which means dynamic node allocation is enabled.
No additional configuration is required to enable dynamic node allocation. When enabled, the EGS will automatically manage GPU node pools based on GPU requests. When a GPU request is submitted and there are insufficient resources, EGS will automatically provision additional nodes to fulfill the request.
Disable Dynamic Node Pool Allocation
To disable Dynamic Node allocation, set the parameter ENABLE_DYNAMIC_NODE to false (or remove it) in the
GPR Manager deployment and restart the pods.