Skip to main content
Version: 1.15.0

Prerequisites to Install EGS Controller

This topic describes the prerequisites to install Elastic GPU Service (EGS) Controller on a Kubernetes cluster.

EGS requires the following components:

  1. A monitoring stack for metrics collection (for example, kube-prometheus-stack).
  2. A Service Monitor to scrape the EGS controller metrics.
  3. PostgreSQL database to use the KubeTally (Cost Management) features.

Install Prometheus and PostgreSQL Using the Script

You can use the egs-install-prerequisites.sh script to install prerequisites on the controller cluster. Modify the egs-installer-config.yaml to add the Prometheus and PostgreSQL parameters. To the install the additional applications, always set the enable_install_additional_apps parameter to true. The script installs and configures all the required components.

The following is an example configuration YAML to install Prometheus and PostgreSQL using the script:

# Enable additional applications installation
enable_install_additional_apps: true

# Enable custom applications
enable_custom_apps: true

# Command execution settings
run_commands: false

# Additional applications configuration
additional_apps:
- name: "prometheus"
skip_installation: false
use_global_kubeconfig: true
namespace: "egs-monitoring"
release: "prometheus"
chart: "kube-prometheus-stack"
repo_url: "https://prometheus-community.github.io/helm-charts"
version: "v45.0.0"
specific_use_local_charts: true
inline_values:
prometheus:
service:
type: ClusterIP
prometheusSpec:
storageSpec: {}
additionalScrapeConfigs:
- job_name: tgi
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod_name
- source_labels: [__meta_kubernetes_pod_container_name]
target_label: container_name
- job_name: gpu-metrics
scrape_interval: 1s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- egs-gpu-operator
relabel_configs:
- source_labels: [__meta_kubernetes_endpoints_name]
action: drop
regex: .*-node-feature-discovery-master
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: kubernetes_node
grafana:
enabled: true
grafana.ini:
auth:
disable_login_form: true
disable_signout_menu: true
auth.anonymous:
enabled: true
org_role: Viewer
service:
type: ClusterIP
persistence:
enabled: false
size: 1Gi
helm_flags: "--debug"
verify_install: false
verify_install_timeout: 600
skip_on_verify_fail: true
enable_troubleshoot: false

- name: "postgresql"
skip_installation: false
use_global_kubeconfig: true
namespace: "kt-postgresql"
release: "kt-postgresql"
chart: "postgresql"
repo_url: "oci://registry-1.docker.io/bitnamicharts/postgresql"
version: "16.2.1"
specific_use_local_charts: true
inline_values:
auth:
postgresPassword: "postgres"
username: "postgres"
password: "postgres"
database: "postgres"
primary:
persistence:
enabled: false
size: 10Gi
helm_flags: "--wait --debug"
verify_install: true
verify_install_timeout: 600
skip_on_verify_fail: false

Run the installer script, using the following command:

./egs-install-prerequisites.sh --input-yaml egs-installer-config.yaml

The script installs the Prometheus Stack in the egs-monitoring namespace and PostgreSQL in the kt-postgresql namespace.

Install Prometheus

To install Prometheus manually in the existing set-up, follow these steps:

Install Prometheus-Kube-Stack

The kube-prometheus-stack is the recommended monitoring solution as it provides a complete monitoring stack with Prometheus, Grafana, and AlertManager.

  1. Add Helm repository using the following command:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
  1. Create a monitoring namespace using the following command:

    kubectl create namespace egs-monitoring
  2. Install the Prometheus stack using the following command:

    helm install prometheus prometheus-community/kube-prometheus-stack \
    --namespace egs-monitoring \
    --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false \
    --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false

Monitoring Configuration

The EGS Controller exposes metrics on port 18080 and requires proper monitoring configuration to be scraped by Prometheus. To create a Service Monitor and Pod Monitor manually, follow the below steps:

Service Monitor Configuration

Create a Service Monitor to scrape metrics from the EGS Controller service. Use the following example configuration to create a servicemonitor.yaml file:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kubeslice-controller-manager-monitor
namespace: egs-monitoring # NAMESPACE: Change this to your monitoring namespace
labels:
app.kubernetes.io/instance: kube-prometheus-stack # PROMETHEUS_INSTANCE: Change to your Prometheus instance
release: prometheus # PROMETHEUS_RELEASE: Change to your Prometheus release name
spec:
endpoints:
- interval: 30s # SCRAPE_INTERVAL: How often to collect metrics (30s, 15s, 60s, etc.)
port: metrics # Port name where metrics are exposed (port 18080)
path: /metrics # METRICS_PATH: Path where metrics are exposed (default: /metrics)
scrapeTimeout: 10s # SCRAPE_TIMEOUT: Maximum time to wait for metrics response
scheme: http # SCHEME: Use http for port 18080
namespaceSelector:
matchNames:
- kubeslice-controller # KUBESLICE_CONTROLLER_NAMESPACE: Namespace where controller is deployed
selector:
matchLabels:
control-plane: controller-manager # Matches the service selector

Pod Monitor Configuration

Create a Pod Monitor for direct pod metrics collection. Use the following example configuration to create a podmonitor.yaml file:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: kubeslice-controller-manager-pod-monitor
namespace: egs-monitoring # NAMESPACE: Change this to your monitoring namespace
labels:
app.kubernetes.io/instance: kube-prometheus-stack # PROMETHEUS_INSTANCE: Change to your Prometheus instance
release: prometheus # PROMETHEUS_RELEASE: Change to your Prometheus release name
spec:
selector:
matchLabels:
control-plane: controller-manager # Matches the pod labels
namespaceSelector:
matchNames:
- kubeslice-controller # KUBESLICE_CONTROLLER_NAMESPACE: Namespace where controller is deployed
podMetricsEndpoints:
- interval: 30s # SCRAPE_INTERVAL: How often to collect metrics (30s, 15s, 60s, etc.)
port: "18080" # PORT: Direct port number as string (matches prometheus.io/port annotation)
path: /metrics # METRICS_PATH: Path where metrics are exposed (default: /metrics)
scrapeTimeout: 10s # SCRAPE_TIMEOUT: Maximum time to wait for metrics response
scheme: http # SCHEME: Use http for direct pod access

Apply the Configuration

Apply the servicemonitor.yaml file and podmonitor.yaml file using the following command:

kubectl apply -f servicemonitor.yaml
kubectl apply -f podmonitor.yaml

PostgreSQL Database Setup

The EGS Controller uses PostgreSQL for KubeTally functionality, which handles chargeback and metrics storage. You have two options for PostgreSQL deployment:

  1. Internal PostgreSQL deployment (development/testing)
  2. External PostgreSQL deployment

Internal PostgreSQL Deployment

  1. Install PostgreSQL using Helm.

    1. Add Helm repository using the following command:

      helm repo add bitnami https://charts.bitnami.com/bitnami
      helm repo update
    2. Create a kt-postgresql namespace using the following command:

      kubectl create namespace kt-postgresql
    3. Install PostgreSQL using the following command:

      helm install kt-postgresql oci://registry-1.docker.io/bitnamicharts/postgresql \
      --namespace kt-postgresql \
      --version 16.2.1 \
      --set auth.postgresPassword=postgres \
      --set auth.username=postgres \
      --set auth.database=postgres \
      --set primary.persistence.enabled=false \
      --set primary.persistence.size=
  2. Configure EGS controller to use PostgreSQL. Add the PostgreSQL parameters in the kubeslice_controller_egs object array to, in the egs-installer-config.yaml file:

    kubeslice_controller_egs:
    inline_values:
    global:
    kubeTally:
    enabled: true
    postgresSecretName: kubetally-db-credentials
    postgresAddr: "kt-postgresql.kt-postgresql.svc.cluster.local"
    postgresPort: 5432
    postgresUser: "postgres"
    postgresPassword: "postgres"
    postgresDB: "postgres"
    postgresSslmode: disable
    prometheusUrl: "http://prometheus-kube-prometheus-prometheus.egs-monitoring.svc.cluster.local:9090"
    ```
  3. Create a database credentials secrets using the following commands:

    # Get PostgreSQL credentials
    export POSTGRES_PASSWORD=$(kubectl get secret --namespace kt-postgresql kt-postgresql -o jsonpath="{.data.postgres-password}" | base64 -d)
    export POSTGRES_HOST="kt-postgresql.kt-postgresql.svc.cluster.local"
    export POSTGRES_PORT="5432"
    export POSTGRES_DB="postgres"

    # Create secret for EGS Controller
    kubectl create secret generic kubetally-db-credentials \
    --from-literal=postgres-addr=$POSTGRES_HOST \
    --from-literal=postgres-port=$POSTGRES_PORT \
    --from-literal=postgres-user=postgres \
    --from-literal=postgres-password=$POSTGRES_PASSWORD \
    --from-literal=postgres-db=$POSTGRES_DB \
    --from-literal=postgres-sslmode=disable \
    -n kubeslice-controller

External PostgreSQL Connection

Prerequisites

  • PostgreSQL 12+ with SSL support
  • Database named kubetally
  • User with appropriate permissions
  • Network access from the Kubernetes cluster
  1. Install PostgreSQL using Helm.

    1. Add Helm repository using the following command:

      helm repo add bitnami https://charts.bitnami.com/bitnami
      helm repo update
    2. Create a kt-postgresql namespace using the following command:

      kubectl create namespace kt-postgresql
    3. Install PostgreSQL using the following command:

      helm install kt-postgresql oci://registry-1.docker.io/bitnamicharts/postgresql \
      --namespace kt-postgresql \
      --version 16.2.1 \
      --set auth.postgresPassword=postgres \
      --set auth.username=postgres \
      --set auth.database=postgres \
      --set primary.persistence.enabled=false \
      --set primary.persistence.size=
  2. Configure EGS controller to use PostgreSQL. Add the PostgreSQL parameters in the kubeslice_controller_egs object array to, in the egs-installer-config.yaml file:

    kubeslice_controller_egs:
    inline_values:
    global:
    kubeTally:
    enabled: true
    postgresSecretName: kubetally-db-credentials
    postgresAddr: "kt-postgresql.kt-postgresql.svc.cluster.local"
    postgresPort: 5432
    postgresUser: "postgres"
    postgresPassword: "postgres"
    postgresDB: "postgres"
    postgresSslmode: disable
    prometheusUrl: "http://prometheus-kube-prometheus-prometheus.egs-monitoring.svc.cluster.local:9090"
  3. Create external database secrets using the following command:

    kubectl create secret generic kubetally-db-credentials \
    --from-literal=postgres-addr=your-external-postgres-host \
    --from-literal=postgres-port=5432 \
    --from-literal=postgres-user=your-username \
    --from-literal=postgres-password=your-password \
    --from-literal=postgres-db=your-database-name \
    --from-literal=postgres-sslmode=require \
    -n kubeslice-controller
  4. The EGS Controller automatically creates the required database schema when it starts. Ensure the database user has the following permissions:

    -- Connect to your PostgreSQL instance
    \c <your-database-name>

    -- Grant necessary permissions
    GRANT CREATE ON DATABASE your-database-name TO your_username;
    GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO your_username;
    GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO your_username;
  5. Configure your egs-installer-config.yaml:

    # Enable additional applications installation
    enable_install_additional_apps: true

    # PostgreSQL configuration
    additional_apps:
    - name: "postgresql"
    skip_installation: false
    use_global_kubeconfig: true
    namespace: "kt-postgresql"
    release: "kt-postgresql"
    chart: "postgresql"
    repo_url: "oci://registry-1.docker.io/bitnamicharts/postgresql"
    version: "16.2.1"
    specific_use_local_charts: true
    inline_values:
    auth:
    postgresPassword: "postgres"
    username: "postgres"
    password: "postgres"
    database: "postgres"
    primary:
    persistence:
    enabled: false
    size: 10Gi
    helm_flags: "--wait --debug"
    verify_install: true
    verify_install_timeout: 600
    skip_on_verify_fail: false
  6. Run the prerequisites installer script using the following command:

    ./egs-install-prerequisites.sh --input-yaml egs-installer-config.yaml