Version: 1.13.0

Release Notes for EGS Version 1.11.0

Release Date: 20th February 2025

The EGS (Elastic GPU Service) platform is an innovative solution designed to optimize GPU utilization and efficiency for your AI projects. EGS leverages the power of Kubernetes to deliver optimized GPU resource management, GPU provisioning and GPU fault identification.

We continue to add new features and enhancements to EGS.

These release notes describe the new changes and enhancements in this version.

info

Across our documentation, we refer to the workspace as the slice workspace. The two terms are used interchangeably.

What's New 🔈

GPU Nodes Cost Exploration

The EGS Platform introduces a new feature called KubeTally that helps in exploring GPU node costs. The admin dashboard contains a new tab called Cost Analysis with tiles that provides GPU node costs. There is a tab called Cost Management that you can use to explore GPU node costs.

For more information, see:

Multi-Instance GPU Node Support

The EGS platform now supports Multi-Instance GPU (MIG) nodes. When creating a new GPU request (GPR), users can configure the memory for MIG nodes.

info

Currently, it is required to provide a node type as part of the GPR definition. Therefore, it's not currently possible to request GPUs distributed across different node types in a single GPR.

For more information, see:

Auto Eviction of a GPR

An admin can enable auto eviction of a GPR while registering a worker cluster. An admin can configure auto eviction of low-priority GPRs if he wants a current/existing GPR to run immediately.

For more information, see:

GPR Auto Remediation

To prevent GPU downtime, auto remediation can be configured during GPR creation. In case of adverse situations, EGS auto detects issues with one or more GPUs in the provisioned GPR. It removes the GPR (GPUs and nodes) from slice workspace and re-queues that GPR.

For more information, see:

Cluster Policy Limits

By default, there is no change in cluster behavior (that is, no cluster-wide limits are enforced). If you want to apply limits, you can define them in a ConfigMap named policy-cfgmap.yaml as shown below:

apiVersion: v1
kind: ConfigMap
metadata:
  name: kubeslice-api-gw-request-limits
data:
  request-limits.json: |
    {
      "gpr": {
        enableLimits: true
        "limits": {
          "gpusPerNode": {
            "max": 100
          },
          "gpuNodes": {
            "max": 100
          },
          "pendingGprs": {
            "max": 100
          },
          "idleTimeout": {
            "max": 30,
            "fallback": "0d0h30m",
            "forceEnforcement": true
          },
          "exitDuration": {
            "max": 525600 /* in minutes (1 Year) */
          },
          "priority": {
            "max": 299,
            "bypass": 261
          },
          "workspaceGprQueue": {
            "max": 100
          }
        }
      }
    }

note

The exitDuration is the maximum time a GPR may run, measured in minutes. After the ConfigMap is applied, these limits affect only users, not the admin.

Apply the ConfigMap you defined to the controller namespace using the following command:

kubectl apply -n kubeslice-controller -f policy-cfgmap.yaml

What's New 🔈​

GPU Nodes Cost Exploration​

Multi-Instance GPU Node Support​

Auto Eviction of a GPR​

GPR Auto Remediation​

Cluster Policy Limits​

What's New 🔈

GPU Nodes Cost Exploration

Multi-Instance GPU Node Support

Auto Eviction of a GPR

GPR Auto Remediation

Cluster Policy Limits