Skip to main content
Version: 1.14.0

Install EGS using Script

This topic describes the steps to install EGS on the cluster using the script provided in the egs-installation repository.

info

Across our documentation, we refer to the workspace as the slice workspace. The two terms are used interchangeably.

Prerequisites

  1. Kubernetes cluster with GPU nodes.

    • If fractional GPUs must be supported, the GPU node must have MIG capability. For example, the NVIDIA A100.
  2. GPU Operator is installed on the cluster.

    • If MIG capability is supported and if shared CPU features are required, the GPU Operator must be configured correctly. For more information, see GPU Operator with MIG.
  3. On the cluster, verify the following:

    • NVIDIA GPU Operator is installed, running with the nvidia-dcgm-exporter pod is running.
    • Prometheus is running and using a pvc for data persistence.
    • Prometheus is configured to collect the metrics from the dcgm exporter.
    • Grafana is configured with NVIDIA DCGM dashboard.
  4. The Admin Kubeconfig is required to access the Kubernetes cluster.

  5. Outbound Internet connectivity from the Kubernetes cluster to several image repositories.

  6. You must have privilege to create namespaces kubeslice-controller, kubeslice-system to deploy EGS, and a namespace for a given project name kubeslice-<PROJECT NAME>.

  7. You must have permission to create the load balancer service.

  8. The PostgresSQL database must be supported by pvc for data persistence.

  9. An ingress controller, such as nginx, must be installed on the cluster.

  10. The following command line tools are required for installation:

    • bash version higher than 5.0.0
    • helm
    • kubectl
    • jq
    • yq

    For more information, see required tools to install EGS.

  11. To receive tokens for image pull secrets, you must first register. To register, visit the KubeSlice registration page.

Clone the Repository

Clone the repository using the following command:

git clone https://github.com/kubeslice-ent/egs-installation.git
note

Ensure the YAML configuration file is correctly formatted and contains all necessary fields. The script will exit with an error if any critical steps fail unless configured to skip on failure. Paths specified in the YAML file must be relative to the base_path unless absolute paths are used.

Check for Prerequisites

Use the egs-preflight-check.sh script to verify the prerequisites for installing EGS.

  • Navigate to the cloned repository and use the following command to change the file permission:

    chmod +x egs-preflight-check.sh    
  • Use the following command to run the script:

    ./egs-preflight-check.sh --kubeconfig <ADMIN KUBECONFIG> --kubecontext-list <KUBECTX>

After passing all of the necessary checks through script, proceed to install EGS on the cluster.

Modify the Configuration File

  1. Gather the following information required for installation:

    • Prometheus endpoint
    • Grafana endpoint
    • ProstgresSQL connection configuration
    • Admin kubconfig/context to the cluster with GPU
    # from the email received after registering 
    IMAGE_REPOSITORY="https://index.docker.io/v1/"
    USERNAME="xxx"
    PASSWORD="xxx"

    KUBECONFIG="kubeconfig" #location of kubeconfig file
    KUBECONTEXT="kubecontext" # cluster context

    # Define required variables
    PROMETHEUS_ENDPOINT="http://prometheus.monitoring.svc.cluster.local:9090"
    GRAFANA_DASHBOARD_BASE_URL="http://grafana.egs-monitoring.svc.cluster.local:8088"

    INGRESS_CLASS_NAME="nginx"

    CONTROLLER_ENDPOINT="$(kubectl cluster-info | awk '/control plane/ {print $NF}')"

    #set helm version
    EGS_VERSION="1.11.0"
  2. Navigate to the cloned repository and locate the input configuration egs-only-config.yamlfile.

  3. Update the egs-only-config.yaml file using the information from Step 1.

  4. Update the following mandatory parameters in the egs-only-config.yaml file:

    a. Set all the Prometheus URL values

    • kubeslice_controller_egs:
      inline_values:
      global:
      KubeTally:
      prometheusUrl: <set-prometheus-url>
    • kubeslice_ui_egs:
      inline_values:
      kubeslice:
      prometheus:
      url: <set-prometheus-url>
    • kubeslice_worker_egs:
      inline_values:
      egs:
      prometheusEndpoint: <set-prometheus-url>
    • cluster_registration:
      cluster_name:
      telemetry:
      endpoint: <set-telemetry-endpoint>

    b. Set the Grafana URL values

    kubeslice_worker_egs:
    inline_values:
    egs:
    grafanaDashboardBaseUrl: <set-grafana-url>

    c. Set all use_local_charts to false

    use_local_charts: false

    d. Set the helm repo URL

    global_helm_repo_url: "https://smartscaler.nexus.aveshalabs.io/repository/kubeslice-egs-helm-ent-prod"
info

You can add the kubeslice.io/managed-by-egs=false label to GPU nodes. This label excludes or filters the associated GPU nodes from the EGS inventory.

Install EGS

note

The installation script creates a default project workspace and registers a worker cluster.

To register additional worker clusters, use the k8s Clusters page on the Admin Portal after running this script. For more information, see Register Clusters.

Use the following command to install EGS:

./egs-installer.sh --input-yaml egs-only-config.yaml

Uninstall EGS

Use the following command to uninstall EGS:

./egs-uninstall.sh --input-yaml egs-only-config.yaml

Troubleshooting

  • For missing binaries, ensure all required binaries are installed and accessible in your system's PATH.
  • For cluster access issues, verify that kubeconfig files configuration so the script can access the clusters specified in the YAML configuration.
  • For timeout issues, if a component fails to install within the specified timeout, increase the verify_install_timeout in the YAML file.