Skip to main content
Version: 2.18.0

Bring Your Training Job

We support training your custom AI models on EGS Serverless. You can bring your AI model to the EGS Serverless SaaS platform for both fine-tuning and training.

This feature supports:

  • Distributed Multi-GPU Training: Run training workloads across multiple GPU nodes with NVLink and InfiniBand connectivity.
  • Unified Data Access: Access training data, checkpoints, and artifacts through Persistent Volumes or S3 storage.
  • Framework Compatibility: Supports PyTorch, TensorFlow, Hugging Face, Horovod, Ray, and more.
  • MLOps Integration: Works with Kubeflow, Flyte, MLflow, and modern MLOps stacks.

To use this feature:

  1. Go to Bring Your Training Job on the left sidebar.

    alt

  2. Please contact Avesha Support at support@avesha.io for more information.