EGS Serverless
EGS Serverless is a cloud-hosted model deployment service that lets developers and teams deploy and run AI/ML models without managing the underlying infrastructure. Instead of provisioning servers or clusters, you simply upload or select your model and EGS Serverless handles the rest, from compute allocation to scaling and inference endpoints.
Purpose
- Deployment-centric inference: Developers and data scientists can focus solely on building models and applications while EGS Serverless automates infrastructure, scaling, availability, and reliability for inference workloads.
- Operational abstraction: EGS Serverless removes the complexity of managing Kubernetes clusters, GPUs, load balancing, autoscaling, and lifecycle operations like patching or instance health, all handled automatically by the platform.
- Production-grade readiness: The platform is designed to support production workloads, ensuring consistent performance, high availability, and seamless scaling for real-time model serving.
Key Features
The below listed are EGS Serverless core features:
Managed Inference Endpoints
The EGS Serverless platform builds inference endpoints for AI models that serve predictions or generate outputs in response to application requests.
The platform supports deploying models with both CPU and GPU configurations as needed.
The platform enables deployment of multiple endpoints per workspace, facilitating multi-model serving and workload separation.
Seamless Model Serving
Endpoints can be invoked via stable URLs or APIs once created, making them easy to integrate into apps, services, or workflows.
EGS handles ingress management, endpoint exposure, and request routing, so teams do not need to build this plumbing manually.
Model Flexibility
Standard models with preset configurations can be selected directly when deploying. Custom model deployments are also supported.
Workspaces and Logical Isolation
Inference endpoints are deployed within workspaces, each representing a logical environment or team context. This helps isolate workloads, resources, and permissions appropriately.
Controlled Access and Governance
EGS integrates with role-based access control, ensuring that inference endpoints (and associated resources) are securely managed per user or team policies.