Release Notes for EGS Version 1.15.3
Release Date: 13th Oct 2025
The Elastic GPU Service (EGS) platform is an innovative solution designed to optimize GPU utilization and efficiency for your AI projects. EGS leverages the power of Kubernetes to deliver optimized GPU resource management, GPU provisioning, and GPU fault identification.
We continue to add new features and enhancements to EGS.
These release notes describe the new changes and enhancements in this version.
- Across our documentation, we refer to the workspace as the slice workspace. The two terms are used interchangeably.
- The EGS Controller is also referred to as the KubeSlice Controller in some diagrams and in the YAML files.
- The EGS Admin Portal is also referred to as the KubeSlice Manager (UI) in some diagrams and in the YAML files.
What's New πβ
Pre-Check Node Health for GPU Allocationβ
You now have an option to run a pre-check when creating a GPU Provision Request (GPR) to verify node health. Enable this option on the Create GPU Request pane, under Advanced Configuration. When you create a GPR with pre-check enabled, EGS performs a Node health check to ensure that the requested GPU nodes can be allocated.
For more information, see Manage GPU Requests.
Workspace Policiesβ
We have introduced Workspace Policies allow administrators to define and manage resource quotas, access controls, and other policies at the workspace level. This ensures that workloads running within a workspace adhere to policies and resource constraints.
For more information, see Workspace Policies.
Replicate Resourcesβ
You can now replicate resources from a source cluster to a destination cluster on a given workspace. This feature allows you to create a copy of an existing cluster, including its configurations, settings, and data.
For more information, see Workspace Replication.
TLS based Prometheus Authentication in EGSβ
You can now configure TLS based Prometheus authentication in EGS. This enhancement improves the security of monitoring data transmitted between Prometheus and EGS components. Currently, TLS-based Prometheus authentication is supported only for ingress endpoints.
For more information, see TLS authentication.
Support for CPU Workloadsβ
You can now schedule CPU workloads in EGS. This feature allows you to run CPU-intensive applications alongside GPU workloads, optimizing resource utilization within your Kubernetes clusters.
Issues Resolvedβ
This release includes fixes for identified vulnerabilities in third-party dependencies, including KServe and Network Service Mesh (NSM) components, to enhance overall platform security.