Skip to main content
Version: 1.1.0

Install Obliq AI SRE Agent

Obliq is an AI-Powered infrastructure management platform designed to bring intelligence, automation, and reliability to operations across Kubernetes and AWS environments.

Install Obliq AI SRE Agent using Helm charts.

Set up Installation Prerequisites

Installation Packages

Before starting deployment, ensure you have the following packages installed:

Kubernetes Tools

ToolVersionNotes
kubectlv1.24+Compatible with your cluster version
helmv3.8+
helmfilev0.148+

Commands to Install Kubernetes Tools

Use the following commands to install the Kubernetes tools listed above:

# kubectl - Kubernetes command line tool
brew install kubectl # macOS
# or
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" # Linux
chmod +x kubectl
sudo mv kubectl /usr/local/bin/

# helm - Kubernetes package manager
brew install helm # macOS
# or
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash # Linux

# helmfile - Helm chart deployment tool
brew install helmfile # macOS
# or
curl -L https://github.com/helmfile/helmfile/releases/download/v0.148.0/helmfile_linux_amd64 > /usr/local/bin/helmfile # Linux
chmod +x /usr/local/bin/helmfile

Network and DNS Tools

The following network and DNS tools are required to test connectivity, resolve issues, and validate configurations.

ToolPurposeAvailability
nslookup and digDNS lookup toolsUsually pre-installed on most systems
curlHTTP client (API testing)Usually pre-installed on most systems

Verify the Required Packages Installation

  1. Check versions using the following commands:

    kubectl version --client
    helm version
    helmfile version
  2. Test the kubectl Connectivity using the following command:

    kubectl cluster-info

Create ACR Secret

Create ACR Secret using the following command:

kubectl create secret docker-registry registry \
--docker-server=avesha.azurecr.io \
--docker-username=<your-acr-username> \
--docker-password=<your-acr-password> \
--docker-email=<your-email> \
--namespace=avesha

Download the Helm Charts Package

  1. Download the AI SRE Agent Helm charts using the following command:

    wget -L https://smartscaler.nexus.aveshalabs.io/repository/agents/obliq-sre-agent-charts-v1.0.0.zip

    All versions are available in this location.

  2. Install the unzip package and extract the file you just downloaded using the following commands:

    sudo apt-get install -y unzip
    unzip agents-charts-v1.0.0.zip -d agents
    cd agents

Platforms Installed From the Helm Charts

NameNamespaceEnabledInstalledLabelsChartVersion
cert-managercert-managertruetruestep:platformcert-manager/cert-managerv1.13.3
cert-manager-clusterissuercert-managertruetruestep:platform./apps/cert-manager-clusterissuer0.1.0
nginx-ingressingress-nginxtruetruestep:platformingress-nginx/ingress-nginx4.7.1

Applications Installed From the Helm Charts

NameNamespaceEnabledInstalledLabelsChartVersion
anomaly-detectionaveshatruetruestep:apps./apps/anomaly-detection
active-inventoryaveshatruetruestep:apps./apps/active-inventory
aws-ec2-cloudwatch-alarmsaveshatruetruestep:apps./apps/aws-ec2-cloudwatch-alarms
kubernetes-events-ingesteraveshatruetruestep:apps./apps/kubernetes-events-ingester
slack-ingesteraveshatruetruestep:apps./apps/slack-ingester
loki-mcpaveshatruetruestep:apps./apps/loki-mcp
cloudwatch-mcpaveshatruetruestep:apps./apps/cloudwatch-mcp
neo4j-mcpaveshatruetruestep:apps./apps/neo4j-mcp
opentelemetry-collectoraveshatruetruestep:appsopen-telemetry/opentelemetry-collector0.130.2
incident-manageraveshatruetruestep:apps./apps/incident-manager
rca-agentaveshatruetruestep:apps./apps/rca-agent
auto-remediationaveshatruetruestep:apps./apps/auto-remediation
aws-mcpaveshatruetruestep:apps./apps/aws-mcp
k8s-mcpaveshatruetruestep:apps./apps/k8s-mcp
prometheus-mcpaveshatruetruestep:apps./apps/prometheus-mcp
neo4javeshatruetruestep:appsneo4j/neo4j5.26.10
mongodbaveshatruetruestep:appsbitnami/mongodb13.17.0
backendaveshatruetruestep:apps./apps/backend
infra-agentaveshatruetruestep:apps./apps/infra-agent
avesha-unified-uiaveshatruetruestep:apps./apps/avesha-unified-ui
service-graph-engineaveshatruetruestep:apps./apps/service-graph-engine
orchestratoraveshatruetruestep:apps./apps/orchestrator

Set up an Environment

Set up an environment using the following command:

info

Currently, we have env.sample only for Sandbox.

cp env.sample .env
vim .env
note

We recommend you to go through the env.sample to better understand the environments. To know more, see Explore the env.sample Helm Chart.

Environments

info

Currently, we have env.sample only for Sandbox.

  • sandbox - Development/testing
  • env/sandbox/ - Sandbox environment values (.gotmpl files)

Configure Kubernetes Access

A few services require kubeconfig files to access Kubernetes clusters. Edit the following files in the env/sandbox/ directory.

info

For detailed information, see Kubernetes Permissions.

Edit the env/sandbox/k8s-mcp.yaml.gotmpl File

configMap:
enabled: true
files:
config: |
# Add your kubeconfig content here
apiVersion: v1
kind: Config
clusters:
- name: your-cluster
cluster:
server: https://your-cluster-server:6443
certificate-authority-data: LS0tLS1CRUdJTi...
users:
- name: your-user
user:
client-certificate-data: LS0tLS1CRUdJTi...
client-key-data: LS0tLS1CRUdJTi...
contexts:
- name: your-context
context:
cluster: your-cluster
user: your-user
current-context: your-context

Edit the env/sandbox/kubernetes-events-ingester.yaml.gotmpl File

configMap:
enabled: true
files:
config: |
# Add your kubeconfig content here
apiVersion: v1
kind: Config
clusters:
- name: your-cluster
cluster:
server: https://your-cluster-server:6443
certificate-authority-data: LS0tLS1CRUdJTi...
users:
- name: your-user
user:
client-certificate-data: LS0tLS1CRUdJTi...
client-key-data: LS0tLS1CRUdJTi...
contexts:
- name: your-context
context:
cluster: your-cluster
user: your-user
current-context: your-context

Get your KubeConfig

You can either get the current cluster configuration or copy from your local kubeconfig using the the corresponding commands listed below:

  • Get current cluster configuration using the following command:

    kubectl config view --raw
  • Copy from your local kubeconfig using the following command:

    cat ~/.kube/config

Install Obliq AI SRE Agent in a Green Field Environment

info

Currently, the AI SRE Agent only supports Jira tracking tool.

Initialize Helmfile

Before installing the Agent, initialize the Helmfile environment using the following command:

# Initialize Helmfile (required before first deployment)
helmfile init

Install Obliq AI SRE Agent Quickly

  1. Load environment variables using the following command:

    set -a ; source .env; set +a;
  2. Install the Helm chart using the following command:

    helmfile -e sandbox apply

The Helm chart deployment installs platforms and applications on your cluster. See their complete list in Platforms Installed From the Helm Charts and Applications Installed From the Helm Charts.

Install Obliq AI SRE Agent in in Phases

Install in two phases using step labels:

  1. Deploy the platform using the following command:

    helmfile -e sandbox -l step=platform apply

    This command installs only the platforms, nginx and cert-manager. See the complete list in Platforms Installed From the Helm Charts.

  2. Install applications using the following command:

    helmfile -e sandbox -l step=apps apply

    This command installs all applications. See the complete list in Applications Installed From the Helm Charts.

Install Platform and Applications

Install both platforms and applications using the following command:

helmfile -e sandbox apply

This command installs platforms and applications. See their complete list in Platforms Installed From the Helm Charts and Applications Installed From the Helm Charts.

Get IP Address for the Obliq AI SRE Agent Console

Get the IP address from the LoadBalancer that can be used to access the Obliq AI SRE Agent console.

To get the external IP address from nginx-ingress LoadBalancer for DNS configuration, use the following commands:

# Get the LoadBalancer IP
kubectl get service -n ingress-nginx nginx-ingress-ingress-nginx-controller

# Or watch until the external IP is assigned
kubectl get service -n ingress-nginx nginx-ingress-ingress-nginx-controller -w

# Alternative: Get all services in ingress-nginx namespace
kubectl get service -n ingress-nginx

Example Output:

NAME                                       TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                      AGE
nginx-ingress-ingress-nginx-controller LoadBalancer 10.96.123.45 203.0.113.10 80:30080/TCP,443:30443/TCP 5m

Use the EXTERNAL-IP (203.0.113.10) for your DNS A records as shown in the example below:

  • *.yourdomain.com203.0.113.10

See Access the Obliq AI SRE Agent Console to learn how to add a port number to the LoadBalancer IP address to create the console access URL.

Troubleshooting and Cleanup

Check Access Issues

Check access and communication issues using the commands listed in the following table.

TaskCommand
Check DNS resolutionnslookup ui.yourdomain.com
Check SSL certificateskubectl get certificates -n avesha
Check ingress statuskubectl get ingress -n avesha

Delete Persistent Volume Claims and Persistent Volumes

When you need to completely clean up your deployment or troubleshoot storage issues, you may need to delete all Persistent Volume Claims (PVCs) and Persistent Volumes (PVs).

warning

⚠️ This will permanently delete all data!

Delete All PVCs in a Namespace

Delete all PVCs in a namespace using the following commands:

# List all PVCs in the namespace
kubectl get pvc -n avesha

# Delete all PVCs in the namespace
kubectl delete pvc --all -n avesha

# Or delete specific PVCs
kubectl delete pvc <pvc-name> -n avesha

Delete All PVs

Delete all PVs using the following commands:

# List all PVs
kubectl get pv

# Delete all PVs (be careful - this affects the entire cluster)
kubectl delete pv --all

# Or delete specific PVs
kubectl delete pv <pv-name>

Force Delete Stuck PVCs/PVs

If PVCs or PVs are stuck in the Terminating state, force delete them using the following commands:

# Force delete stuck PVC
kubectl patch pvc <pvc-name> -n avesha -p '{"metadata":{"finalizers":null}}' --type=merge

# Force delete stuck PV
kubectl patch pv <pv-name> -p '{"metadata":{"finalizers":null}}' --type=merge

Complete Cleanup Script

Use this cleanup script in the following scenarios:

  • Complete environment reset: Starting fresh with no data
  • Storage troubleshooting: Resolving persistent storage issues
  • Environment migration: Moving to a different storage class
  • Development cleanup: Removing test data

Use the following script for a complete cleanup:

#!/bin/bash
NAMESPACE="avesha"

echo "⚠️ WARNING: This will delete ALL data in namespace: $NAMESPACE"
read -p "Are you sure? (y/N): " -n 1 -r
echo

if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "Deleting all PVCs in namespace: $NAMESPACE"
kubectl delete pvc --all -n $NAMESPACE

echo "Deleting all PVs"
kubectl delete pv --all

echo "Cleanup complete!"
else
echo "Cleanup cancelled."
if

Post Cleanup

After deleting PVCs and PVs, redeploy your applications using the following commands:

# Load environment variables
set -a ; source .env; set +a;

# Redeploy
helmfile -e sandbox apply

Charts

  • service-graph-engine - Service graph engine
  • cert-manager-clusterissuer - SSL certificate issuers

Global Configuration

Each environment includes a global.yaml.gotmpl file with common settings:

  • Image registry configuration
  • Resource limits
  • Security contexts
  • Monitoring settings
  • Ingress configuration
  • Environment variables from .env file

Service-Specific Configuration

Each service has its own .gotmpl file in the environment directory:

  • backend.yaml.gotmpl - Backend service configuration
  • anomaly-detection.yaml.gotmpl - Anomaly detection configuration
  • incident-manager.yaml.gotmpl - Incident manager configuration
  • service-graph-engine.yaml.gotmpl - Service graph engine configuration
  • avesha-unified-ui.yaml.gotmpl - Unified UI configuration
  • infra-agent.yaml.gotmpl - Infrastructure agent configuration

Customization

# Override values
helmfile apply --environment sandbox --set backend.image.tag=v1.2.0

# Use custom values file
helmfile apply --environment sandbox -f custom-values.yaml

SSL Setup

Add the following properties to your .env file:

CERT_MANAGER_EMAIL=admin@yourdomain.com
DOMAIN_NAME=yourdomain.com

Services will be available at:

  • https://ui.yourdomain.com
  • https://api.yourdomain.com