Version: 1.15.0

Configure Workload Placement

This topic describe the steps to enable the automatic workload placement across clusters in a workspace using Workload Placement feature in EGS.

Overview

Workload Placement enables elastic, cross-cluster scaling in EGS. It automatically deploys additional workload replicas to other clusters within the same workspace when the primary cluster runs out of GPU capacity.

A GPU Provision Request (GPR) secures baseline GPUs in the primary cluster. When demand increases through Horizontal Pod Autoscaler (HPA) actions or manual replica updates and additional pods cannot be scheduled, Workload Placement evaluates available clusters and bursts the incremental replicas to the optimal targets.

warning

Scaled deployments when burst to a new cluster are simply created through helm install, the source cluster deployment replica count is not reflected all at once.

Existing replicas remain on the primary cluster. Networking between replicas is seamless, and burst replicas are automatically removed when they are no longer required. Cluster selection considers resource availability, wait time, and policy or priority settings defined in GPR templates, ensuring compliance with governance policies.

Custom Resource Definitions

Workload Placement introduces two Custom Resource Definitions (CRDs) that operate together.

WorkloadTemplate: a reusable blueprint that defines what to deploy, such as Helm, manifests, or commands, and the deployment steps. It is prepared ahead of time.
WorkloadPlacement: an execution request that deploys the blueprint to a specific cluster. It is created automatically during scale events by Auto-GPR or manually by a user.

Together, these CRDs support automated, policy-driven workload bursting while still allowing manual, on-demand deployments when required.

The following table summarizes the difference between WorkloadTemplate and WorkloadPlacement:

Dimension	WorkloadTemplate	WorkloadPlacement
Purpose	Define workload specifications	Execute workload deployments
Creation	Created manually by users	Created automatically by EGS during scaling events
Reusability	Can be reused across multiple deployments	Specific to a single deployment instance
Target Cluster	Not tied to any specific cluster	Deployed to a designated target cluster. Required parameter `spec.ClusterName`
Content	Contains deployment details (Helm - `helmConfig`, manifests - `manifestResources`, commands - `cmdExe`). Optional ordered steps, `burstDuration`, `deletionPolicy`, `gprTemplates`.	References a WorkloadTemplate and includes deployment parameters
Typical Trigger	HPA/manual scale-up that needs extra GPUs	Auto-GPR creates it, or a user triggers it directly
Lifecycle	Template object persists and can be re-used	Has phases (Running/Succeeded/Failed/Completed) and cleanup of burst resources per policy
Customization	Highly customizable using parameters	Limited customization, focused on deployment

Workflow

The auto-GPR feature must be enabled in the GPR Template to allow automatic workload placement across clusters. While creating a GPR Template, ensure the Auto-GPR option is selected. This setting allows EGS to automatically create Workload Placements when scaling events occur.
Create a Workload Template describing your workload configuration and optional GPR template references.
When scaling events occur that exceed the GPU capacity of the primary cluster, Auto-GPR evaluates available clusters in the workspace.
When scaling exceeds available GPU capacity, Auto-GPR selects an appropriate target cluster and creates a Workload Placement based on the template.
The Workload Placement deploys only the additional replicas to the target cluster. These replicas are cleaned up when the burst duration expires or when the workload scales down.

Prerequisites

Before you begin, ensure you meet the following prerequisites:

1. EGS Controller and Worker Components

EGS Controller and EGS Worker components are installed and running. For installation steps, see Install EGS.

2. Registered Clusters

At least two clusters (for example, worker-1, worker-2) are registered with the controller cluster. For more information on registering clusters, see Register Clusters.

3. Primary and Target Clusters

Designate one cluster as the primary cluster where the initial workload is deployed. Other cluster will serve as target cluster for workload bursting. At least one cluster in the workspace must have available GPU inventory for auto-placement to trigger successfully.

4. Workspace Setup

Create a workspace across the clusters. For more information on how to create a workspace, see Manage Workspaces.

5. Namespace Onboarding

Onboard namespaces onto a workspace. For more information, see Onboard Namespaces.

6. GPR Templates

Create a GPR Template for each worker cluster that is part of the workspace, ensuring the Auto-GPR option is enabled in the GPR Template. For more information on how to create a GPR Template, see Manage GPR Templates.

7. Service Account and RBAC Permissions

Ensure access to the destination worker cluster where workloads will be deployed, and verify permissions to create ServiceAccounts, Roles, and RoleBindings in the kubeslice-system namespace. For more information, see ServiceAccount and RBAC Setup for Workload Templates.

Configuration Parameters

The following sections describe the configuration parameters for WorkloadTemplate and WorkloadPlacement resources.

Workload Template Configuration Parameters

apiVersion: gpr.kubeslice.io/v1alpha1 # 
kind: WorkloadTemplate 
metadata:
  name: <template-name>
  namespace: <namespace>
spec:
  # List of GPU Provisioning Request (GPR) template names that this workload template can use
  gprTemplates: []string

  # Kubernetes resources to be deployed
  manifestResources: []ManifestResource # Array of ManifestResource objects. The list of Kubernetes resources to be deployed on the managed cluster

  # Helm chart configurations
  helmConfig: []HelmConfig

  # Duration for which the workload will be bursted (for example, "3m", "1h")
  burstDuration: string

  # Service account for workload execution
  # IMPORTANT: The ServiceAccount must be created in the kubeslice-system namespace in the destination cluster before deployment

  serviceAccount: string   # The name of the ServiceAccount to be used for the workload deployment 

  # kubectl commands to execute before deployment
  cmdExec: []CmdExec

  # Deletion policy: "Delete" (default) or "Retain"
  deletionPolicy: DeletionPolicyType

  # Ordered list of execution steps
  steps: []WorkloadStep

  # Workspace name for multi-tenant environments
  workspaceName: string   # The name of the workspace this template belongs to

  # Namespaces to create/manage
  namespaces: []string   # The list of namespaces  associated with this workload template

Workload Placement Configuration Parameters

apiVersion: worker.kubeslice.io/v1alpha1
kind: WorkloadPlacement
metadata:
  name: <placement-name>
  namespace: <namespace>
spec:
  # REQUIRED: Target cluster name for deployment
  clusterName: string

  # Kubernetes resources to be deployed
  manifestResources: []ManifestResource

  # Helm chart configurations
  helmConfig: []HelmConfig

  # Duration for which the workload will be bursted
  burstDuration: string

  # Service account for workload execution
  # IMPORTANT: The ServiceAccount must be created in the kubeslice-system namespace
  # in the destination cluster before deployment. See ServiceAccount and RBAC Setup
  # documentation for details on creating custom Roles and RoleBindings.
  serviceAccount: string

  # kubectl commands to execute
  cmdExec: []CmdExec

  # Deletion policy: "Delete" (default) or "Retain"
  deletionPolicy: DeletionPolicyType

  # Ordered list of execution steps
  steps: []WorkloadStep

ServiceAccount and RBAC Setup for Workload Templates

When deploying workloads using Workload Template or Workload Placement, you may need to configure custom ServiceAccounts with appropriate RBAC (Role-Based Access Control) permissions.

The ServiceAccount must be created in the kubeslice-system namespace in the destination cluster, as the workload deployment is managed through the EGS control plane. This also ensure namespace isolation and security boundaries.

The following are the example YAML configurations to create a custom ServiceAccount and setup RBAC for Workload Templates.

1. Create a ServiceAccount

In the following example, we create a ServiceAccount named vllm-sa for a Virtual Large Language Model (vLLM) workload.

Create a YAML file named serviceaccount.yaml:

apiVersion: v1
kind: ServiceAccount
metadata: 
  name: vllm-sa # ServiceAccount name
  namespace: kubeslice-system
  labels:
    app: vllm-app  # Application label
    purpose: workload-execution  # Purpose of the ServiceAccount

Use the following command to apply the YAML file in the destination cluster (in the kubeslice-system namespace):

kubectl apply -f serviceaccount.yaml

2. Create a Role

In the following example, we create a Role named vllm-role that grants permissions to manage ConfigMaps, Secrets, Pods, and Jobs.

Create a YAML file named role.yaml:

apiVersion: rbac.authorization.k8s.io/v1 # RBAC API version
kind: Role
metadata:
  name: vllm-role # Role name
  namespace: vllm-demo # Workload namespace
rules:
  - apiGroups: [""]
    resources: ["configmaps", "secrets"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["create", "get", "list", "watch", "update", "patch", "delete"]
  - apiGroups: ["batch"]
    resources: ["jobs"]
    verbs: ["create", "get", "list", "watch", "update", "patch", "delete"]

Use the following command to apply the YAML file in the destination cluster (in the workload namespace):

kubectl apply -f role.yaml

3. Create a RoleBinding

In the following example, we create a RoleBinding named vllm-rolebinding that binds the vllm-role to the vllm-sa ServiceAccount.

Create a YAML file named rolebinding.yaml:

apiVersion: rbac.authorization.k8s.io/v1 # RBAC API version
kind: RoleBinding
metadata:
  name: vllm-rolebinding # RoleBinding name
  namespace: vllm-demo # Workload namespace
subjects:
  - kind: ServiceAccount
    name: vllm-sa
    namespace: kubeslice-system
roleRef:
  kind: Role
  name: vllm-role
  apiGroup: rbac.authorization.k8s.io # RBAC API group

Use the following command to apply the YAML file in the destination cluster (in the workload namespace):

kubectl apply -f rolebinding.yaml

4. Verify a ServiceAccount in the Destination Cluster

After creating the WorkloadTemplate, you can verify that the ServiceAccount and RBAC permissions are correctly set up in the destination cluster using the following commands:

# Connect to destination cluster
kubectl get serviceaccount vllm-sa -n kubeslice-system

# Verify RoleBinding
kubectl get rolebinding vllm-rolebinding -n vllm-demo

# Check permissions
kubectl auth can-i create pods --as=system:serviceaccount:kubeslice-system:vllm-sa -n vllm-demo

Auto Workload Placement Using the Workload Template

This section provides an example of creating a Workload Template to enable automatic Workload Placement across clusters in a workspace.

Create and Apply a Workload Template

Create a Workload Template that defines the workload configuration to be deployed across clusters.

Create a YAML file called workload-template.yaml. The following is an example Workload Template that deploys a vLLM Helm chart:

apiVersion: gpr.kubeslice.io/v1alpha1
kind: WorkloadTemplate
metadata:
  name: vllm-workload-template
  namespace: kubeslice-avesha
spec:
  # Reference the ServiceAccount created in the destination cluster
  serviceAccount: vllm-sa
  
  # Define Helm configurations
  helmConfig:
    - name: "vllm-app"
      chart: vllm/vllm-stack
      releaseName: vllm
      releaseNamespace: vllm-demo
      repoName: vllm
      repoURL: https://vllm-project.github.io/production-stack
      helmFlags: "--debug"
      values:
        servingEngineSpec:
            modelSpec:
            - name: "llama3"
              repository: "vllm/vllm-openai"
              tag: "v0.10.1"
              modelURL: "meta-llama/Llama-3.2-1B-Instruct" # "TheBloke/deepseek-llm-7B-chat-AWQ" #"meta-llama/Llama-3.1-8B-Instruct"
              replicaCount: 1

              requestCPU: 4
              requestMemory: "8Gi"
              requestGPU: 1

              pvcStorage: "100Gi"
              storageClass: "standard"
              #pvcMatchLabels:
              #  model: "llama3-pv"

              vllmConfig:
              maxModelLen: 4096
              #quantization: 'awq'
              #extraArgs:
           
              env:
              - name: VLLM_FLASHINFER_DISABLED
                value: "1"              

              hf_token: <your-huggingface-token> 
        routerSpec:
            resources:
              requests:
                cpu: "2"
                memory: "8G"
              limits:
                cpu: "8"
                memory: "32G"
  # Define command executions
  cmdExec:
    - name: "get-namespace"
      cmd: "kubectl get pods -n vllm-demo"
  # Define the order of execution (steps)
  steps:
    - name: "get-namespace"
      type: "command"
    - name: "vllm-app"
      type: "helm"
  # Duration after which helm releases should be uninstalled
  burstDuration: "10m"

Apply the workload-template.yaml file on the controller cluster:
```
kubectl apply -f workload-template.yaml
```

Verify the Workload Template Creation

To verify that the Workload Template has been created successfully, run the following command on the controller cluster:

Example
```
kubectl get workloadTemplates -n kubeslice-avesha
```
Example Output
```
NAME                     AGE
vllm-workload-template   44h
```

To verify the Workload Template details, run the following command on the controller cluster:

Example

kubectl get workloadtemplates -n kubeslice-avesha vllm-workload-template -o yaml

Example Output

apiVersion: gpr.kubeslice.io/v1alpha1
kind: WorkloadTemplate
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"gpr.kubeslice.io/v1alpha1","kind":"WorkloadTemplate","metadata":{"annotations":{},"name":"vllm-workload-template","namespace":"kubeslice-avesha"},"spec":{"burstDuration":"10m","cmdExec":[{"cmd":"kubectl get pods -n vllm-demo","name":"get-namespace"}],"helmConfig":[{"chart":"vllm/vllm-stack","helmFlags":"--debug","name":"vllm-app","releaseName":"vllm","releaseNamespace":"vllm-demo","repoName":"vllm","repoURL":"https://vllm-project.github.io/production-stack","values":{"routerSpec":{"resources":{"limits":{"cpu":"8","memory":"32G"},"requests":{"cpu":"2","memory":"8G"}}},"servingEngineSpec":{"modelSpec":[{"env":[{"name":"VLLM_FLASHINFER_DISABLED","value":"1"}],"hf_token":"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","maxModelLen":4096,"modelURL":"meta-llama/Llama-3.1-1B-Instruct","name":"llama3","pvcStorage":"100Gi","replicaCount":1,"repository":"vllm/vllm-openai","requestCPU":4,"requestGPU":1,"requestMemory":"8Gi","storageClass":"standard","tag":"v0.10.1","vllmConfig":null}]}}}],"steps":[{"name":"get-namespace","type":"command"},{"name":"vllm-app","type":"helm"}]}}
  creationTimestamp: "2025-11-04T09:25:26Z"
  generation: 8
  name: vllm-workload-template
  namespace: kubeslice-avesha
  resourceVersion: "1762286904037935011"
  uid: 06c04428-a041-4424-86c5-af6d4c490892
spec:
  burstDuration: 10m
  cmdExec:
  - cmd: kubectl get pods -n vllm-demo
    name: get-namespace
 deletionPolicy: Delete
 helmConfig:
 - chart: vllm/vllm-stack
   helmFlags: --debug
   name: vllm-app
   releaseName: vllm
   releaseNamespace: vllm-demo
   repoName: vllm
   repoURL: https://vllm-project.github.io/production-stack
   values:
     routerSpec:
       resources:
         limits:
           cpu: "8"
           memory: 32G
         requests:
           cpu: "2"
           memory: 8G
     servingEngineSpec:
       modelSpec:
       - env:
         - name: VLLM_FLASHINFER_DISABLED
           value: "1"
         hf_token: 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
         modelURL: meta-llama/Llama-3.2-1B-Instruct
         name: llama3
         pvcStorage: 100Gi
         replicaCount: 1
         repository: vllm/vllm-openai
         requestCPU: 4
         requestGPU: 1
         requestMemory: 8Gi
         storageClass: standard
         tag: v0.10.1
         vllmConfig:
           maxModelLen: 4096
  steps:
  - name: get-namespace
    type: command
  - name: vllm-app
    type: helm
status:
 workloadSelector:
   name: vllm
   namespace: vllm-demo

Verify the GPR Creation

When the workload scales beyond the GPU capacity of the primary cluster, EGS automatically creates GPU Provision Requests (GPRs).

Verify the GPRs created using the following command on the worker clusters:

Example

kubectl get gprs -n kubeslice-avesha

Example Output

NAME                       AGE
gpr-03a815ee-c2dc-460f-a   38h

To get the list of GPU Provisioning Requests (GPRs) created as part of workload placement, run the following command on the controller cluster:

Example

kubectl get gpuprovisioningrequests.gpr.kubeslice.io -n kubeslice-avesha

Example Output

NAME                       AGE                                                                                                          
gpr-03a815ee-c2dc-460f-a   38h                                                                                                          
gpr-05adeb5c-61b6-4dc0-9   37h

Examples

The following are examples of WorkloadPlacement configurations using different deployment methods.

Manifest Deployment Example

apiVersion: aiops.kubeslice.io/v1alpha1
kind: WorkloadPlacement
metadata:
  name: workload-placement-example
spec:
  manifestResources:
    - name: "sample-configmap"
      manifest:
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: sample-configmap
          namespace: default
        data:
          ui.properties: |
            color=purple
            theme=dark
            language=en
          database.properties: |
            host=localhost
            port=5432
            database=myapp
  steps:
    - name: "sample-configmap"
      type: "manifest"

Helm Deployment Example

apiVersion: aiops.kubeslice.io/v1alpha1
kind: WorkloadPlacement
metadata:
  name: workload-placement-helm-example
spec:
  helmConfig:
    - name: "gpu-operator"
      chart: "nvidia/gpu-operator"
      repoName: "nvidia"
      repoURL: "https://helm.ngc.nvidia.com/nvidia"
      releaseName: "gpu-operator"
      releaseNamespace: "gpu-operator"
      version: "1.0.0"
      helmFlags: "--create-namespace --wait --timeout 5m"
    - name: "hello-world"
      chart: "examples/hello-world"
      repoName: "examples"
      repoURL: "https://helm.github.io/examples"
      releaseName: "ahoy"
      releaseNamespace: "xyz"
      helmFlags: "--wait --create-namespace --timeout 5m"
  steps:
    - name: "gpu-operator"
      type: "helm"
    - name: "hello-world"
      type: "helm"
  deletionPolicy: "Delete"  # or "Retain"

Command Execution Example

apiVersion: aiops.kubeslice.io/v1alpha1
kind: WorkloadPlacement
metadata:
  name: workload-placement-cmd-example
spec:
  cmdExec:
    - name: "create-namespace"
      cmd: "kubectl create ns test-ns --dry-run=client"
    - name: "label-nodes"
      cmd: "kubectl label node worker-node1 gpu=true"
  steps:
    - name: "create-namespace"
      type: "command"
    - name: "label-nodes"
      type: "command"

Overview​

Custom Resource Definitions​

Workflow​

Prerequisites​

1. EGS Controller and Worker Components​

2. Registered Clusters​

3. Primary and Target Clusters​

4. Workspace Setup​

5. Namespace Onboarding​

6. GPR Templates​

7. Service Account and RBAC Permissions​

Configuration Parameters​

Workload Template Configuration Parameters​

Workload Placement Configuration Parameters​

ServiceAccount and RBAC Setup for Workload Templates​

1. Create a ServiceAccount​

2. Create a Role​

3. Create a RoleBinding​

4. Verify a ServiceAccount in the Destination Cluster​

Auto Workload Placement Using the Workload Template​

Create and Apply a Workload Template​

Verify the Workload Template Creation​

Verify the GPR Creation​

Examples​

Manifest Deployment Example​

Helm Deployment Example​

Command Execution Example​

Overview

Custom Resource Definitions

Workflow

Prerequisites

1. EGS Controller and Worker Components

2. Registered Clusters

3. Primary and Target Clusters

4. Workspace Setup

5. Namespace Onboarding

6. GPR Templates

7. Service Account and RBAC Permissions

Configuration Parameters

Workload Template Configuration Parameters

Workload Placement Configuration Parameters

ServiceAccount and RBAC Setup for Workload Templates

1. Create a ServiceAccount

2. Create a Role

3. Create a RoleBinding

4. Verify a ServiceAccount in the Destination Cluster

Auto Workload Placement Using the Workload Template

Create and Apply a Workload Template

Verify the Workload Template Creation

Verify the GPR Creation

Examples

Manifest Deployment Example

Helm Deployment Example

Command Execution Example