Version: 1.14.0

EGS Components

The EGS architecture is designed around several key components that work collaboratively to provide a comprehensive GPU management solution. The main components include:

Controller Services

KubeSlice Controller

The KubeSlice Controller centralizes the management of configurations across multiple clusters. Its key functions include:

Facilitating communication between Slice Operators across different clusters.
Reconciling SliceRoleBinding and SliceRoleTemplate resources in the controller cluster.
Providing APIs for workspace creation and management through the EGS User Interface (UI).
Acting as the EGS API Gateway.

API Gateway

The API Gateway serves as the central interface for all EGS components, offering a unified entry point for both external and internal interactions. It provides a set of APIs to support essential operations, including:

Inventory APIs: Tracking and managing GPU resources across the system.
Workspace APIs: Configuring and maintaining user-specific environments and organizational structures.
GPR APIs: Managing the creation, modification, and oversight of GPU Provisioning Requests (GPRs).

EGS Core APIs

Inference Endpoint APIs: Responsible for the deployment, bursting, and monitoring of AI models as managed services.
GPR Templates APIs: Allowing users to create and reuse templates for GPRs.
GPR Template Binding APIs: Linking GPR templates to specific workflows or environments.

For more information on APIs, refer to the REST API documentation and the SDK documentation.

The EGS Admin Portal utilizes these APIs, enabling users to easily manage and interact with their GPRs through the portal.

GPR Manager

The GPR Manager is responsible for managing the lifecycle of GPRs, acting as the controller-manager of the GPR Custom Resource Definition (CRD). Its key functions include:

Managing inventory by allocating GPUs to GPRs via Inventory Manager APIs.
Sending enqueue requests to the Queue Manager to handle priorities for GPRs.
Updating the status of GPRs to reflect allocation and usage.
Freeing up GPU resources for high-priority requests by evicting lower-priority ones.
Handling auto remediation by re-queuing GPRs in Failed state.

Wait-time Service

The Wait-Time Service calculates the duration it takes for a new workload to start by considering three main factors:

The current GPU inventory, assessing available GPUs and their current status (e.g., idle, reserved, or in use).
Active workloads that are utilizing the GPU resources.
The queue of waiting workloads in line.

Queue Manager

The Queue Manager is responsible for organizing and processing all pending GPU Provisioning Requests (GPRs). Its key functions include:

Maintaining a queue of all pending GPR requests based on their priority number.
Processing GPRs in order of their priority, ensuring that higher-priority requests are handled first.

Inventory Manager

The Inventory Manager oversees the real-time GPU resources available across all clusters registered on the controller cluster. Its key functions include:

Continuously monitoring and updating the status of GPU resources across all clusters and projects.
Allocating GPUs when requests are received from the available inventory.
Releasing GPU resources after a workload is completed or when the GPU is no longer needed, thus making them available for new requests.

Worker Services

Slice Operator

The Slice Operator is responsible for managing KubeSlice Custom Resource Definitions (CRDs). Its key functions include:

Reconciliation of slice resources with updates from the KubeSlice Controller.
Creation of Roles, RoleBindings, and ServiceAccounts for each workspace, based on access rules defined by administrators.
Updating the GPU workload resource usage to the controller cluster.
Ensuring that the slice resources are in sync with the controller cluster, allowing for efficient resource management and allocation.

AIOps Operator

The AIOps Operator enhances GPU node utilization for AI workloads through several functions. Its key functions include:

Reservation of GPU nodes for incoming GPU Resource Requests (GPRs).
Configuration of node affinities and tolerations to ensure effective scheduling of workloads across GPU nodes.
Continuous monitoring of AI workloads and the health of GPU nodes, triggering alerts based on resource usage, temperature, and power consumption.
Reconciliation of inventory details to maintain the current state of GPU node information.
Support for the detection, onboarding, and deboarding of existing (brownfield) GPU nodes.
Handling of IdleTimeout GPRs.

EGS Agent

The EGS Agent is responsible for handling Auto GPR Create, Read, Update, and Delete (CRUD) operations.

Controller Services​

KubeSlice Controller​

API Gateway​

EGS Core APIs​

GPR Manager​

Wait-time Service​

Queue Manager​

Inventory Manager​

Worker Services​

Slice Operator​

AIOps Operator​

EGS Agent​