Skip to main content
Version: 1.15.0

Configure Workload Placement

This topic describe the steps to enable the automatic workload placement across clusters in a workspace using Workload Placement feature in EGS.

Overview

Workload Placement enables elastic, cross-cluster scaling in EGS. It automatically deploys additional workload replicas to other clusters within the same workspace when the primary cluster runs out of GPU capacity.

A GPU Provision Request (GPR) secures baseline GPUs in the primary cluster. When demand increases through Horizontal Pod Autoscaler (HPA) actions or manual replica updates and additional pods cannot be scheduled, Workload Placement evaluates available clusters and bursts the incremental replicas to the optimal targets.

warning

Scaled deployments when burst to a new cluster are simply created through helm install, the source cluster deployment replica count is not reflected all at once.

Existing replicas remain on the primary cluster. Networking between replicas is seamless, and burst replicas are automatically removed when they are no longer required. Cluster selection considers resource availability, wait time, and policy or priority settings defined in GPR templates, ensuring compliance with governance policies.

Custom Resource Definitions

Workload Placement introduces two Custom Resource Definitions (CRDs) that operate together.

  • WorkloadTemplate: a reusable blueprint that defines what to deploy, such as Helm, manifests, or commands, and the deployment steps. It is prepared ahead of time.
  • WorkloadPlacement: an execution request that deploys the blueprint to a specific cluster. It is created automatically during scale events by Auto-GPR or manually by a user.

Together, these CRDs support automated, policy-driven workload bursting while still allowing manual, on-demand deployments when required.

The following table summarizes the difference between WorkloadTemplate and WorkloadPlacement:

DimensionWorkloadTemplateWorkloadPlacement
PurposeDefine workload specificationsExecute workload deployments
CreationCreated manually by usersCreated automatically by EGS during scaling events
ReusabilityCan be reused across multiple deploymentsSpecific to a single deployment instance
Target ClusterNot tied to any specific clusterDeployed to a designated target cluster. Required parameter spec.ClusterName
ContentContains deployment details (Helm - helmConfig, manifests - manifestResources, commands - cmdExe). Optional ordered steps, burstDuration, deletionPolicy, gprTemplates.References a WorkloadTemplate and includes deployment parameters
Typical TriggerHPA/manual scale-up that needs extra GPUsAuto-GPR creates it, or a user triggers it directly
LifecycleTemplate object persists and can be re-usedHas phases (Running/Succeeded/Failed/Completed) and cleanup of burst resources per policy
CustomizationHighly customizable using parametersLimited customization, focused on deployment

Workflow

  1. The auto-GPR feature must be enabled in the GPR Template to allow automatic workload placement across clusters. While creating a GPR Template, ensure the Auto-GPR option is selected. This setting allows EGS to automatically create Workload Placements when scaling events occur.
  2. Create a Workload Template describing your workload configuration and optional GPR template references.
  3. When scaling events occur that exceed the GPU capacity of the primary cluster, Auto-GPR evaluates available clusters in the workspace.
  4. When scaling exceeds available GPU capacity, Auto-GPR selects an appropriate target cluster and creates a Workload Placement based on the template.
  5. The Workload Placement deploys only the additional replicas to the target cluster. These replicas are cleaned up when the burst duration expires or when the workload scales down.

Prerequisites

Before you begin, ensure you meet the following prerequisites:

1. EGS Controller and Worker Components

EGS Controller and EGS Worker components are installed and running. For installation steps, see Install EGS.

2. Registered Clusters

At least two clusters (for example, worker-1, worker-2) are registered with the controller cluster. For more information on registering clusters, see Register Clusters.

3. Primary and Target Clusters

Designate one cluster as the primary cluster where the initial workload is deployed. Other cluster will serve as target cluster for workload bursting. At least one cluster in the workspace must have available GPU inventory for auto-placement to trigger successfully.

4. Workspace Setup

Create a workspace across the clusters. For more information on how to create a workspace, see Manage Workspaces.

5. Namespace Onboarding

Onboard namespaces onto a workspace. For more information, see Onboard Namespaces.

6. GPR Templates

Create a GPR Template for each worker cluster that is part of the workspace, ensuring the Auto-GPR option is enabled in the GPR Template. For more information on how to create a GPR Template, see Manage GPR Templates.

7. Service Account and RBAC Permissions

Ensure access to the destination worker cluster where workloads will be deployed, and verify permissions to create ServiceAccounts, Roles, and RoleBindings in the kubeslice-system namespace. For more information, see ServiceAccount and RBAC Setup for Workload Templates.

Configuration Parameters

The following sections describe the configuration parameters for WorkloadTemplate and WorkloadPlacement resources.

Workload Template Configuration Parameters

apiVersion: gpr.kubeslice.io/v1alpha1 # 
kind: WorkloadTemplate
metadata:
name: <template-name>
namespace: <namespace>
spec:
# List of GPU Provisioning Request (GPR) template names that this workload template can use
gprTemplates: []string

# Kubernetes resources to be deployed
manifestResources: []ManifestResource # Array of ManifestResource objects. The list of Kubernetes resources to be deployed on the managed cluster

# Helm chart configurations
helmConfig: []HelmConfig

# Duration for which the workload will be bursted (for example, "3m", "1h")
burstDuration: string

# Service account for workload execution
# IMPORTANT: The ServiceAccount must be created in the kubeslice-system namespace in the destination cluster before deployment

serviceAccount: string # The name of the ServiceAccount to be used for the workload deployment

# kubectl commands to execute before deployment
cmdExec: []CmdExec

# Deletion policy: "Delete" (default) or "Retain"
deletionPolicy: DeletionPolicyType

# Ordered list of execution steps
steps: []WorkloadStep

# Workspace name for multi-tenant environments
workspaceName: string # The name of the workspace this template belongs to

# Namespaces to create/manage
namespaces: []string # The list of namespaces associated with this workload template

Workload Placement Configuration Parameters

apiVersion: worker.kubeslice.io/v1alpha1
kind: WorkloadPlacement
metadata:
name: <placement-name>
namespace: <namespace>
spec:
# REQUIRED: Target cluster name for deployment
clusterName: string

# Kubernetes resources to be deployed
manifestResources: []ManifestResource

# Helm chart configurations
helmConfig: []HelmConfig

# Duration for which the workload will be bursted
burstDuration: string

# Service account for workload execution
# IMPORTANT: The ServiceAccount must be created in the kubeslice-system namespace
# in the destination cluster before deployment. See ServiceAccount and RBAC Setup
# documentation for details on creating custom Roles and RoleBindings.
serviceAccount: string

# kubectl commands to execute
cmdExec: []CmdExec

# Deletion policy: "Delete" (default) or "Retain"
deletionPolicy: DeletionPolicyType

# Ordered list of execution steps
steps: []WorkloadStep

ServiceAccount and RBAC Setup for Workload Templates

When deploying workloads using Workload Template or Workload Placement, you may need to configure custom ServiceAccounts with appropriate RBAC (Role-Based Access Control) permissions.

The ServiceAccount must be created in the kubeslice-system namespace in the destination cluster, as the workload deployment is managed through the EGS control plane. This also ensure namespace isolation and security boundaries.

The following are the example YAML configurations to create a custom ServiceAccount and setup RBAC for Workload Templates.

1. Create a ServiceAccount

In the following example, we create a ServiceAccount named vllm-sa for a Virtual Large Language Model (vLLM) workload.

Create a YAML file named serviceaccount.yaml:

apiVersion: v1
kind: ServiceAccount
metadata:
name: vllm-sa # ServiceAccount name
namespace: kubeslice-system
labels:
app: vllm-app # Application label
purpose: workload-execution # Purpose of the ServiceAccount

Use the following command to apply the YAML file in the destination cluster (in the kubeslice-system namespace):

kubectl apply -f serviceaccount.yaml

2. Create a Role

In the following example, we create a Role named vllm-role that grants permissions to manage ConfigMaps, Secrets, Pods, and Jobs.

Create a YAML file named role.yaml:

apiVersion: rbac.authorization.k8s.io/v1 # RBAC API version
kind: Role
metadata:
name: vllm-role # Role name
namespace: vllm-demo # Workload namespace
rules:
- apiGroups: [""]
resources: ["configmaps", "secrets"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["create", "get", "list", "watch", "update", "patch", "delete"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create", "get", "list", "watch", "update", "patch", "delete"]

Use the following command to apply the YAML file in the destination cluster (in the workload namespace):

kubectl apply -f role.yaml

3. Create a RoleBinding

In the following example, we create a RoleBinding named vllm-rolebinding that binds the vllm-role to the vllm-sa ServiceAccount.

Create a YAML file named rolebinding.yaml:

apiVersion: rbac.authorization.k8s.io/v1 # RBAC API version
kind: RoleBinding
metadata:
name: vllm-rolebinding # RoleBinding name
namespace: vllm-demo # Workload namespace
subjects:
- kind: ServiceAccount
name: vllm-sa
namespace: kubeslice-system
roleRef:
kind: Role
name: vllm-role
apiGroup: rbac.authorization.k8s.io # RBAC API group

Use the following command to apply the YAML file in the destination cluster (in the workload namespace):

kubectl apply -f rolebinding.yaml

4. Verify a ServiceAccount in the Destination Cluster

After creating the WorkloadTemplate, you can verify that the ServiceAccount and RBAC permissions are correctly set up in the destination cluster using the following commands:

# Connect to destination cluster
kubectl get serviceaccount vllm-sa -n kubeslice-system

# Verify RoleBinding
kubectl get rolebinding vllm-rolebinding -n vllm-demo

# Check permissions
kubectl auth can-i create pods --as=system:serviceaccount:kubeslice-system:vllm-sa -n vllm-demo

Auto Workload Placement Using the Workload Template

This section provides an example of creating a Workload Template to enable automatic Workload Placement across clusters in a workspace.

Create and Apply a Workload Template

Create a Workload Template that defines the workload configuration to be deployed across clusters.

  1. Create a YAML file called workload-template.yaml. The following is an example Workload Template that deploys a vLLM Helm chart:

    apiVersion: gpr.kubeslice.io/v1alpha1
    kind: WorkloadTemplate
    metadata:
    name: vllm-workload-template
    namespace: kubeslice-avesha
    spec:
    # Reference the ServiceAccount created in the destination cluster
    serviceAccount: vllm-sa

    # Define Helm configurations
    helmConfig:
    - name: "vllm-app"
    chart: vllm/vllm-stack
    releaseName: vllm
    releaseNamespace: vllm-demo
    repoName: vllm
    repoURL: https://vllm-project.github.io/production-stack
    helmFlags: "--debug"
    values:
    servingEngineSpec:
    modelSpec:
    - name: "llama3"
    repository: "vllm/vllm-openai"
    tag: "v0.10.1"
    modelURL: "meta-llama/Llama-3.2-1B-Instruct" # "TheBloke/deepseek-llm-7B-chat-AWQ" #"meta-llama/Llama-3.1-8B-Instruct"
    replicaCount: 1

    requestCPU: 4
    requestMemory: "8Gi"
    requestGPU: 1

    pvcStorage: "100Gi"
    storageClass: "standard"
    #pvcMatchLabels:
    # model: "llama3-pv"

    vllmConfig:
    maxModelLen: 4096
    #quantization: 'awq'
    #extraArgs:

    env:
    - name: VLLM_FLASHINFER_DISABLED
    value: "1"

    hf_token: <your-huggingface-token>
    routerSpec:
    resources:
    requests:
    cpu: "2"
    memory: "8G"
    limits:
    cpu: "8"
    memory: "32G"
    # Define command executions
    cmdExec:
    - name: "get-namespace"
    cmd: "kubectl get pods -n vllm-demo"
    # Define the order of execution (steps)
    steps:
    - name: "get-namespace"
    type: "command"
    - name: "vllm-app"
    type: "helm"
    # Duration after which helm releases should be uninstalled
    burstDuration: "10m"
  2. Apply the workload-template.yaml file on the controller cluster:

    kubectl apply -f workload-template.yaml

Verify the Workload Template Creation

  1. To verify that the Workload Template has been created successfully, run the following command on the controller cluster:

    Example

    kubectl get workloadTemplates -n kubeslice-avesha

    Example Output

    NAME                     AGE
    vllm-workload-template 44h
  2. To verify the Workload Template details, run the following command on the controller cluster:

    Example

    kubectl get workloadtemplates -n kubeslice-avesha vllm-workload-template -o yaml

    Example Output

    apiVersion: gpr.kubeslice.io/v1alpha1
    kind: WorkloadTemplate
    metadata:
    annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
    {"apiVersion":"gpr.kubeslice.io/v1alpha1","kind":"WorkloadTemplate","metadata":{"annotations":{},"name":"vllm-workload-template","namespace":"kubeslice-avesha"},"spec":{"burstDuration":"10m","cmdExec":[{"cmd":"kubectl get pods -n vllm-demo","name":"get-namespace"}],"helmConfig":[{"chart":"vllm/vllm-stack","helmFlags":"--debug","name":"vllm-app","releaseName":"vllm","releaseNamespace":"vllm-demo","repoName":"vllm","repoURL":"https://vllm-project.github.io/production-stack","values":{"routerSpec":{"resources":{"limits":{"cpu":"8","memory":"32G"},"requests":{"cpu":"2","memory":"8G"}}},"servingEngineSpec":{"modelSpec":[{"env":[{"name":"VLLM_FLASHINFER_DISABLED","value":"1"}],"hf_token":"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","maxModelLen":4096,"modelURL":"meta-llama/Llama-3.1-1B-Instruct","name":"llama3","pvcStorage":"100Gi","replicaCount":1,"repository":"vllm/vllm-openai","requestCPU":4,"requestGPU":1,"requestMemory":"8Gi","storageClass":"standard","tag":"v0.10.1","vllmConfig":null}]}}}],"steps":[{"name":"get-namespace","type":"command"},{"name":"vllm-app","type":"helm"}]}}
    creationTimestamp: "2025-11-04T09:25:26Z"
    generation: 8
    name: vllm-workload-template
    namespace: kubeslice-avesha
    resourceVersion: "1762286904037935011"
    uid: 06c04428-a041-4424-86c5-af6d4c490892
    spec:
    burstDuration: 10m
    cmdExec:
    - cmd: kubectl get pods -n vllm-demo
    name: get-namespace
    deletionPolicy: Delete
    helmConfig:
    - chart: vllm/vllm-stack
    helmFlags: --debug
    name: vllm-app
    releaseName: vllm
    releaseNamespace: vllm-demo
    repoName: vllm
    repoURL: https://vllm-project.github.io/production-stack
    values:
    routerSpec:
    resources:
    limits:
    cpu: "8"
    memory: 32G
    requests:
    cpu: "2"
    memory: 8G
    servingEngineSpec:
    modelSpec:
    - env:
    - name: VLLM_FLASHINFER_DISABLED
    value: "1"
    hf_token: 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
    modelURL: meta-llama/Llama-3.2-1B-Instruct
    name: llama3
    pvcStorage: 100Gi
    replicaCount: 1
    repository: vllm/vllm-openai
    requestCPU: 4
    requestGPU: 1
    requestMemory: 8Gi
    storageClass: standard
    tag: v0.10.1
    vllmConfig:
    maxModelLen: 4096
    steps:
    - name: get-namespace
    type: command
    - name: vllm-app
    type: helm
    status:
    workloadSelector:
    name: vllm
    namespace: vllm-demo

Verify the GPR Creation

When the workload scales beyond the GPU capacity of the primary cluster, EGS automatically creates GPU Provision Requests (GPRs).

  1. Verify the GPRs created using the following command on the worker clusters:

    Example

    kubectl get gprs -n kubeslice-avesha

    Example Output

    NAME                       AGE
    gpr-03a815ee-c2dc-460f-a 38h
  2. To get the list of GPU Provisioning Requests (GPRs) created as part of workload placement, run the following command on the controller cluster:

    Example

    kubectl get gpuprovisioningrequests.gpr.kubeslice.io -n kubeslice-avesha

    Example Output

    NAME                       AGE                                                                                                          
    gpr-03a815ee-c2dc-460f-a 38h
    gpr-05adeb5c-61b6-4dc0-9 37h

Examples

The following are examples of WorkloadPlacement configurations using different deployment methods.

Manifest Deployment Example

apiVersion: aiops.kubeslice.io/v1alpha1
kind: WorkloadPlacement
metadata:
name: workload-placement-example
spec:
manifestResources:
- name: "sample-configmap"
manifest:
apiVersion: v1
kind: ConfigMap
metadata:
name: sample-configmap
namespace: default
data:
ui.properties: |
color=purple
theme=dark
language=en
database.properties: |
host=localhost
port=5432
database=myapp
steps:
- name: "sample-configmap"
type: "manifest"

Helm Deployment Example

apiVersion: aiops.kubeslice.io/v1alpha1
kind: WorkloadPlacement
metadata:
name: workload-placement-helm-example
spec:
helmConfig:
- name: "gpu-operator"
chart: "nvidia/gpu-operator"
repoName: "nvidia"
repoURL: "https://helm.ngc.nvidia.com/nvidia"
releaseName: "gpu-operator"
releaseNamespace: "gpu-operator"
version: "1.0.0"
helmFlags: "--create-namespace --wait --timeout 5m"
- name: "hello-world"
chart: "examples/hello-world"
repoName: "examples"
repoURL: "https://helm.github.io/examples"
releaseName: "ahoy"
releaseNamespace: "xyz"
helmFlags: "--wait --create-namespace --timeout 5m"
steps:
- name: "gpu-operator"
type: "helm"
- name: "hello-world"
type: "helm"
deletionPolicy: "Delete" # or "Retain"

Command Execution Example

apiVersion: aiops.kubeslice.io/v1alpha1
kind: WorkloadPlacement
metadata:
name: workload-placement-cmd-example
spec:
cmdExec:
- name: "create-namespace"
cmd: "kubectl create ns test-ns --dry-run=client"
- name: "label-nodes"
cmd: "kubectl label node worker-node1 gpu=true"
steps:
- name: "create-namespace"
type: "command"
- name: "label-nodes"
type: "command"