Release Notes for EGS Version 1.11.0
Release Date: 20th February 2025
The EGS (Elastic GPU Service) platform is an innovative solution designed to optimize GPU utilization and efficiency for your AI projects. EGS leverages the power of Kubernetes to deliver optimized GPU resource management, GPU provisioning and GPU fault identification.
We continue to add new features and enhancements to EGS.
These release notes describe the new changes and enhancements in this version.
Across our documentation, we refer to the workspace as the slice workspace. The two terms are used interchangeably.
What's New πβ
GPU Nodes Cost Explorationβ
The EGS Platform introduces a new feature called KubeTally that helps in exploring GPU node costs. The admin dashboard contains a new tab called Cost Analysis with tiles that provides GPU node costs. There is a tab called Cost Management that you can use to explore GPU node costs.
For more information, see:
Multi-Instance GPU Node Supportβ
The EGS platform now supports Multi-Instance GPU (MIG) nodes. When creating a new GPU request (GPR), users can configure the memory for MIG nodes.
Currently, it is required to provide a node type as part of the GPR definition. Therefore, it's not currently possible to request GPUs distributed across different node types in a single GPR.
For more information, see:
Auto Eviction of a GPRβ
An admin can enable auto eviction of a GPR while registering a worker cluster. An admin can configure auto eviction of low-priority GPRs if he wants a current/existing GPR to run immediately.
For more information, see:
GPR Auto Remediationβ
To prevent GPU downtime, auto remediation can be configured during GPR creation. In case of adverse situations, EGS auto detects issues with one or more GPUs in the provisioned GPR. It removes the GPR (GPUs and nodes) from slice workspace and re-queues that GPR.
For more information, see:
Cluster Policy Limitsβ
By default, there is no change in cluster behavior (that is, no
cluster-wide limits are enforced). If you want to apply limits, you can define them in a ConfigMap named policy-cfgmap.yaml
as shown below:
apiVersion: v1
kind: ConfigMap
metadata:
name: kubeslice-api-gw-request-limits
data:
request-limits.json: |
{
"gpr": {
enableLimits: true
"limits": {
"gpusPerNode": {
"max": 100
},
"gpuNodes": {
"max": 100
},
"pendingGprs": {
"max": 100
},
"idleTimeout": {
"max": 30,
"fallback": "0d0h30m",
"forceEnforcement": true
},
"exitDuration": {
"max": 525600 /* in minutes (1 Year) */
},
"priority": {
"max": 299,
"bypass": 261
},
"workspaceGprQueue": {
"max": 100
}
}
}
}
The exitDuration
is the maximum time a GPR may run, measured in minutes.
After the ConfigMap is applied, these limits affect only users, not the admin.
Apply the ConfigMap you defined to the controller namespace using the following command:
kubectl apply -n kubeslice-controller -f policy-cfgmap.yaml