Skip to main content
Version: 1.17.0

Configure Dynamic Node Allocation for GPRs

This topic describes how to enable and use dynamic node allocation for GPU provisioning requests (GPRs) in EGS. Dynamic node allocation allows the system to automatically scale GPU node pools on supported cloud providers, such as Linode LKE, based on the demand for GPU resources.

info

Currently, dynamic node allocation is only supported for clusters using Linode LKE. Support for additional cloud providers will be added in future releases.

Overview

Dynamic Node is a feature in EGS that enables automatic scaling of GPU node pools on supported cloud providers. When enabled, the system can automatically create new node pools or add nodes to existing pools when there is a demand for GPU resources that cannot be met with the current inventory. Additionally, the system can automatically remove idle nodes to optimize costs.

Actions performed by the dynamic node allocation feature include:

  • Scale-up: When a new GPU request is submitted and the current cluster does not have enough available capacity, the system automatically provisions additional compute resources so the request can be fulfilled.

  • Scale-down: When GPU resources remain unused for a certain period of time, the system automatically deallocates those resources to optimize infrastructure costs.

Configure an API Token for Linode LKE

For clusters using Linode LKE as the cloud provider, a Linode API token is required to enable dynamic node allocation. The GPR Manager uses this token to interact with the Linode API for scaling operations:

  • To scale up, the controller creates new node pools or adds nodes through the Linode API when there is demand for GPU resources.
  • To scale down, the controller deletes node pools for idle nodes using the Linode API.

Authentication is through a Linode Personal Access Token, read from the LINODE_TOKEN environment variable in the GPR Manager deployment. If the token is not set, scaling operations will fail (logs may show: LINODE_TOKEN environment variable is not set).

Create a Linode API Token

To create a Linode API token:

  1. Log in to the Linode Cloud Manager.

  2. Click your username (top right) → API Tokens.

  3. Click Create a Personal Access Token.

  4. Configure:

    • Label: for example, the gpr-manager-lke is the label for the token.
    • Expiry: Choose duration (for example, 90 days, 1 year, and so on).
    • Access: Grant read/write for Linodes and Kubernetes/LKE.
  5. Click Create Token.

  6. Copy the token immediately and store it securely (it will only be shown once).

Use the Token in EGS

To configure the GPR Manager with the Linode API token:

  1. Create a Kubernetes Secret to store the token securely:

    kubectl create secret generic linode-api-token --from-literal=token='YOUR_LINODE_TOKEN' -n <kubeslice-controller>
  2. Update the GPR Manager deployment to reference the token from the Secret:

    env:
    - name: LINODE_TOKEN
    valueFrom:
    secretKeyRef:
    name: linode-api-token
    key: token

    Replace YOUR_LINODE_TOKEN and <kubeslice-controller> with your actual token and the namespace where the GPR Manager is deployed.

Enable Dynamic Node Pool Allocation

Dynamic Node is controlled by the ENABLE_DYNAMIC_NODE environment variable in the GPR Manager deployment. This parameter is set to true by default, which means dynamic node allocation is enabled.

No additional configuration is required to enable dynamic node allocation. When enabled, the EGS will automatically manage GPU node pools based on GPU requests. When a GPU request is submitted and there are insufficient resources, EGS will automatically provision additional nodes to fulfill the request.

Disable Dynamic Node Pool Allocation

To disable Dynamic Node allocation, set the parameter ENABLE_DYNAMIC_NODE to false (or remove it) in the GPR Manager deployment and restart the pods.