Skip to main content
Version: 1.15.0

Release Notes for EGS Version 1.15.4

Release Date: 14th Nov 2025

The Elastic Grid Service (EGS) platform is an innovative solution designed to optimize GPU utilization and efficiency for your AI projects. EGS leverages the power of Kubernetes to deliver optimized GPU resource management, GPU provisioning, and GPU fault identification.

We continue to add new features and enhancements to EGS.

These release notes describe the new changes and enhancements in this version.

info
  • Across our documentation, we refer to the workspace as the slice workspace. The two terms are used interchangeably.
  • The EGS Controller is also referred to as the KubeSlice Controller in some diagrams and in the YAML files.
  • The EGS Admin Portal is also referred to as the KubeSlice Manager (UI) in some diagrams and in the YAML files.

What's New ๐Ÿ”ˆโ€‹

Workload Placementโ€‹

We have introduced the Workload Placement feature to improve workload scheduling and distribution across clusters.

This enhancement enables:

  1. Auto Replica Placement: When the primary cluster runs out of GPU capacity, Workload Placement automatically deploys additional replicas to other clusters within the same workspace. This ensures elastic scaling and uninterrupted inference performance without manual intervention.

  2. Placement During Initial Deployment: During workload deployment, EGS evaluates the desired number of deployments, preferred clusters defined in the workload template, and the available GPU capacity across all clusters in the workspace. Based on these factors, EGS automatically selects the optimal clusters and distributes the workload accordingly.

    Currently, you can enable this feature using YAML configuration only.

    For more information on how to enable Workload Placement, see Configure Workload Placement.

CPU Workloads on GPU Nodesโ€‹

You can now run and monitor CPU workloads on GPU nodes within your workspace. This enhancement allows for better resource utilization by enabling CPU-intensive tasks to leverage the available GPU node resources when GPUs are not in use.

For more information, see:

Network Latency Monitoringโ€‹

We have introduced network topology and latency monitoring to help users and administrators better understand the network performance between clusters and nodes. This feature provides insights into network health, enabling proactive management of potential issues that could impact workload performance.

For more information, see: