Skip to main content
Version: 1.16.0

Cluster and Registration Issues

Connectivity Issues

Cluster Registration Failure Due to Incorrect Application of registration.yaml File

Introduction

This scenario addresses situations where the cluster registration process fails despite having a correct registration.yaml file. The root cause of this issue is mistakenly applying the registration.yaml file to multiple clusters,leading to conflicts and incorrect configurations. To ensure a successful cluster registration process, it is crucial to apply the registration.yaml file only to the intended cluster. For accurate steps on using registration.yaml for installation, refer to the registering clusters through YAML. .

Background

During the registration process, a registration.yaml file is utilized to provide essential configuration details for the cluster being registered with KubeSlice. While the process is straightforward for registering a single cluster, issues arise when the same registration.yaml file is mistakenly applied to multiple clusters.

Root Cause The failure in cluster registration occurs when the registration.yaml file, meant for registering a single cluster, is inadvertently used to register multiple clusters. This results in conflicts and incorrect configurations, leading to the failure of the registration process.

Best Practice

To avoid cluster registration failures due to incorrect application of the registration.yaml file, adhere to the following best practice:

Ensure Single Cluster Application Apply the registration.yaml file exclusively to the target cluster that is intended to be registered. Avoid using the same configuration file for multiple clusters to prevent conflicts and ensure a seamless registration process.

Steps to Rectify the Issue

If a cluster registration fails due to the incorrect application of the registration.yaml file to multiple clusters, follow these steps to rectify the issue:

  1. Identify Misapplied registration.yaml

    Review the registration.yaml file and verify if it has been applied to multiple clusters.

  2. Backup and Isolate the Config File

    Create a backup of the original registration.yaml file and remove it from any clusters to which it was mistakenly applied.

  3. Generate Separate Config Files

    If you intend to register multiple clusters, ensure that each cluster has its own dedicated and correctly customized registration.yaml file.

  4. Retry Registration

    After ensuring that the registration.yaml file is accurately applied to the intended cluster, retry the registration process.

Conclusion

Cluster registration failure with a correct registration.yaml file can often be attributed to the incorrect application of the file to multiple clusters. By adhering to the best practice of using a dedicated registration.yaml file for each target cluster, users can avoid conflicts and successfully register their clusters with KubeSlice. Ensuring precise configuration and isolation of the configuration files streamlines the registration process and provides a smooth experience when utilizing KubeSlice's features for Kubernetes management and scaling. For detailed installation steps usingregistration.yaml, refer to registering cluster through YAML.

Handling Cluster Registration with Duplicate Names in KubeSlice

Introduction

This scenario addresses the behavior of KubeSlice when registering clusters with duplicate names.When attempting to register multiple clusters with the same name, Kubernetes treats each instance as a separate cluster and does not throw an error. This scenario provides insights into how KubeSlice handles such scenarios and best practices to avoid duplicating cluster names for clarity and consistency.

Duplicate Cluster Registration

KubeSlice allows users to register multiple clusters to effectively manage and monitor their Kubernetes environments.Surprisingly, registering clusters with identical names does not trigger an error or conflict. Instead, Kubernetes treats each instance as an individual, distinct cluster.

Behavior Explanation

When a user registers clusters with the same name in KubeSlice, each instance is identified based on its uniqueKubernetes configuration, API endpoint, and authentication token. Kubernetes ignores the duplication of cluster names,enabling each cluster to be recognized separately by its unique credentials.

Best Practices

While Kubernetes might allow duplicate cluster names, it is essential to adhere to best practices for clarity and ease of management:

  1. Unique Cluster Names

    To avoid confusion and ambiguity, it is recommended to use unique names when registering clusters with KubeSlice. Choose descriptive names that reflect the identity or purpose of each cluster.

  2. Clear Cluster Identification

    Employing distinctive names ensures clear identification of registered clusters,streamlining navigation and operations within the KubeSlice platform.

  3. Documentation and Communication

    Maintain proper documentation and communication within your team to ensure everyone is aware of the registered clusters and their respective names. This practice enhances collaboration and avoids misunderstandings.

Conclusion

While Kubernetes permits registering clusters with duplicate names, KubeSlice treats each instance as a separate cluster, leading to potential confusion in management and monitoring. To promote clarity and consistency, it is recommended to use unique names for registering clusters within KubeSlice. Employing descriptive names and adhering to best practices ensures a seamless experience in managing and monitoring multiple clusters with KubeSlice.

Manual Clean-Up for Node IP Address Changes in Registered Clusters

Introduction

This scenario addresses scenarios where the Node IP address on a registered cluster is changed, but the KubeSlice components are not automatically updated to reflect the new IP. To ensure smooth functionality and communication between KubeSlice components and the registered cluster, a manual clean-up process is necessary in such cases.

Background

During the registration process of a cluster with KubeSlice, the Node IP address is automatically configured by pulling the value from the cluster. However, if the Node IP address is changed manually or becomes invalid, KubeSlice components might continue using the old IP, leading to communication issues.

Best Practice

It is recommended not to change the Node IP manually when it is already configured by KubeSlice. Moreover, adding an invalid Node IP address should be avoided to prevent potential complications.

Manual Clean-Up Process

To address Node IP address changes in registered clusters and update the KubeSlice components accordingly, follow these manual clean-up steps:

  1. Identify Node IP Change

    Verify that the Node IP address on the registered cluster has been changed or updated. It is crucial to ensure that the change indeed occurred and requires action.

  2. Stop KubeSlice Components

    On the registered cluster, stop all KubeSlice components, including the Slice Operator and other relevant services.

  3. Update Configuration Files

    Navigate to the configuration files for the KubeSlice components, such as the Slice Operator YAML configuration file. Update the Node IP address in the configuration files to reflect the new, valid IP.

  4. Restart KubeSlice Components

    After updating the configuration files, restart the KubeSlice components to apply the changes. Ensure that all services are up and running without any errors.

  5. Verify Communication

    Verify that the KubeSlice components can now successfully communicate with the registered cluster using the updated Node IP address.

Conclusion

When the Node IP address is changed or updated on a registered cluster, it is essential to perform a manual clean-up to ensure that the KubeSlice components are using the correct and valid IP. Avoiding manual changes to the Node IP already configured by KubeSlice and following the recommended clean-up process helps maintain smooth communication between KubeSlice components and the registered cluster. By proactively addressing Node IP changes and ensuring proper configuration, you can enhance the overall stability and performance of their KubeSlice environment.

Troubleshooting Avesha Router Connectivity Issues in KubeSlice

Introduction

This scenario addresses troubleshooting steps for Avesha Router connectivity issues in KubeSlice. The Avesha Router is a vital component of KubeSlice that manages network connectivity within the worker clusters. Connectivity disruptions can occur when one or more nodes in the worker clusters are restarted. Understanding the root cause and following the correct resolution steps will help restore stable and uninterrupted connectivity, ensuring smooth operations within the clusters.

Background

KubeSlice, an enterprise-grade solution for Kubernetes, includes the Avesha Router as a crucial component. TheAvesha Router manages network connectivity between pods, services, and external resources within the worker clusters. During node restarts in the worker clusters, connectivity issues may arise, impacting the performance of applications and services.

Root Cause

The root cause of Avesha Router connectivity issues lies in the interruption of network connections during node restarts in the worker clusters. These disruptions may lead to temporary connectivity problems that affect the flow of data and communication.

Impact

The connectivity disruptions can have various impacts on KubeSlice:

  1. Service Unavailability

    The connectivity issues can render services temporarily unavailable, affecting critical processes and workflows.

  2. Intermittent Application Access

    Users may experience intermittent access to applications due to the connectivity problems.

  3. Data Transmission Delay

    Communication delays between pods and external resources may occur, causing data transmission delays.

Solution

To restore Avesha Router connectivity and mitigate the impact of node restarts, follow these recommended steps:

  1. Restart Application Pods

    After a node restart, identify the application pods affected by the connectivity issue. Restart these pods to re-establish network connections and restore connectivity.

  2. Monitoring and Alerts

    Implement monitoring and alert mechanisms to detect connectivity disruptions and node restart events. Automated alerts will facilitate quick response and timely remediation.

  3. Node Restart Scheduling

    Whenever possible, schedule node restarts during maintenance windows or periods of low traffic to minimize the impact on critical operations.

Conclusion:

Troubleshooting Avesha Router connectivity issues in KubeSlice is crucial for maintaining stable operations within the worker clusters. By understanding the root cause of the problem and adopting appropriate remediation and preventive measures, administrators can restore and ensure uninterrupted connectivity. The Avesha Router, along with other KubeSlice components, contributes to a reliable infrastructure that enhances overall performance and user experience within the Kubernetes environment.

Troubleshooting Connectivity Issues with Unregistered Cluster in KubeSlice Controller

Introduction

This scenario addresses the issue where a registered cluster is not connected to the KubeSlice Controller. When a cluster fails to connect, it can result from various factors, such as installation problems with theSlice Operator or misconfiguration of the KubeSlice Controller endpoint and token. Follow the steps provided below to troubleshoot and resolve the connectivity issue.

Issue Description

The registered cluster is not connected to the KubeSlice Controller, preventing it from being managed and monitored through the KubeSlice platform.

Solution

To troubleshoot and resolve the connectivity issue with the unregistered cluster:

  1. Switch to Registered Cluster Context

    Use the kubectx command to switch to the context of the registered cluster where you are facing the connectivity issues.

    kubectx <cluster name>
  2. Validate Slice Operator Installation

    Check the installation status of the Slice Operator on the registered cluster. Run the following command to seethe pods belonging to the kubeslice-controller-system namespace and verify their status.

    kubectl get pods -n kubeslice-controller-system
  3. Verify the Controller Endpoint and Token

    If the connectivity issue persists, ensure that the KubeSlice Controller endpoint and token in the cluster are correctly configured in the Slice Operator YAML configuration file applied to the registered cluster.

    Review the Slice Operator YAML file to confirm that the Controller endpoint and token are accurate and match the KubeSlice Controller setup.

Additional Considerations

If you have followed the above steps and are still experiencing connectivity issues with the registered cluster, consider the following points:

  • Verify Network Connectivity

    Ensure that the registered cluster has network connectivity with the KubeSlice Controller. Check for any network restrictions or firewalls that may be blocking communication.

  • Review Slice Operator Documentation Consult the Slice Operator documentation for any specific requirements or troubleshooting steps related to connecting clusters to the KubeSlice Controller.

  • Seek Technical Support

    If you are unable to resolve the connectivity issue on your own, consider seeking assistance from the Avesha Systems support team or your system administrator.

Conclusion

By validating the Slice Operator installation, checking the KubeSlice Controller endpoint and token configuration, and ensuring network connectivity, you can troubleshoot and resolve the connectivity issue between the registered cluster and the KubeSlice Controller. Following the provided steps enables successful cluster connection, allowing you to effectively manage and monitor the cluster through the KubeSlice platform.

Troubleshooting Reachability Issues for KubeSlice Controller Endpoint

Introduction

This scenario addresses scenarios where the KubeSlice Controller's endpoint is not reachable by a slice after a successful installation. When encountering such issues, it is crucial to investigate potential causes and perform troubleshooting steps to ensure seamless communication between the slice and the KubeSlice Controller.

Possible Causes

Several factors could lead to the KubeSlice Controller's endpoint being inaccessible by a slice:

  1. Incorrect Endpoint Configuration

    During the installation of the Slice Operator on the worker cluster, if the controller endpoint is misconfigured or contains errors, the slice may fail to establish communication.

  2. Invalid Secret Token

    The secret token and CA-cert installed on the worker cluster might be incorrect, resulting in failed authentication and preventing the slice from reaching the KubeSlice Controller.

Solution

To resolve reachability issues with the KubeSlice Controller's endpoint:

  1. Validate Endpoint Configuration

    Ensure that the controller endpoint specified during the installation of the Slice Operator on the worker cluster is accurate and accessible. Verify the correctness of the API endpoint URL and any associated authentication mechanisms.

  2. Check Secret Token and CA-Cert

    Verify the correctness of the controller cluster's secret token and CA-cert installed on the worker cluster. Incorrect or outdated credentials can cause authentication failures and hinder communication.

  3. Refer to the Automated Retrieval of Registered Cluster Secrets documentation:

    Consult the documentation section titled Automated Retrieval of Registered Cluster Secrets. for detailed information on automatically retrieving and validating the necessary secrets for cluster communication.

Conclusion

When the KubeSlice Controller's endpoint is successfully installed but not reachable by a slice, it is crucial to examine the endpoint configuration, secret token, and CA-cert used for authentication. Ensuring the accuracy of these components will facilitate seamless communication between the slice and the KubeSlice Controller, enhancing the overall functionality and effectiveness of the KubeSlice platform. By following the troubleshooting steps and referring to the provided documentation, users can efficiently address reachability issues and optimize their experience with KubeSlice.

Cluster Issues

Resolving Stuck CRD Object Error/Warning in KubeSlice Controller

Introduction

This scenario provides steps to address the error/warning related to a stuck CRD (Custom ResourceDefinition) object in the KubeSlice Controller. A stuck CRD object can lead to operational issues and hinder the proper functioning of KubeSlice. By patching an empty finalizer for the failing CRD object and performing an uninstall and reinstall of the KubeSlice Controller, administrators can resolve this issue effectively.

Background

In KubeSlice, Custom Resource Definitions (CRDs) define new resource types that extend the Kubernetes API. Occasionally, a CRD object may become stuck or encounter issues, resulting in an error/warning that affects the overall stability of the KubeSlice Controller.

Error/Warning Description:

The error/warning message indicates that a CRD object is stuck, potentially leading to unintended behavior in KubeSlice operations.

Root Cause

The root cause of the stuck CRD object can vary and may be attributed to multiple factors, such as improper configuration, resource constraints, or network issues.

Impact

The impact of a stuck CRD object includes:

  1. Impaired Functionality

    The CRD object's stuck state may lead to impaired functionality of KubeSlice features andoperations.

  2. Unpredictable Behavior

    KubeSlice behavior can become unpredictable due to the unresolved CRD object.

Solution

To address the error/warning related to the stuck CRD object in the KubeSlice Controller:

  1. Patch Empty Finalizer

    a. Identify the failing CRD object (for example, serviceexportconfigs.hub.kubeslice.io) that is stuck. b. Use the following kubectl patch command to patch an empty finalizer for the CRD object:

    kubectl patch crd/<CRD_OBJECT_NAME> -p '{"metadata":{"finalizers":[]}}' --type=merge

    Replace <CRD_OBJECT_NAME> with the name of the specific CRD object that is failing, as indicated in theerror/warning message.

  2. Uninstall and Reinstall KubeSlice Controller

    a. Uninstall the existing KubeSlice Controller using the appropriate package manager or helm command. b. Ensure that any leftover artifacts or configuration files related to the previous installation are completely removed. c. Reinstall the KubeSlice Controller with the latest version to ensure a clean and updated installation.

Preventive Measures:

To prevent similar CRD object issues in the future, consider implementing the following preventive measures:

  1. Regular Monitoring and Auditing

    Implement regular monitoring and auditing of CRD objects to detect and address potential issues early.

  2. Backup and Restore Strategy

    Establish a backup and restore strategy to safeguard critical configurations and data in case of unexpected issues.

Conclusion:

Resolving the stuck CRD object error/warning in the KubeSlice Controller is essential for maintaining stable and predictable operations. By patching the empty finalizer for the failing CRD object and performing a fresh reinstall of theKubeSlice Controller, administrators can ensure smooth functioning and reliable performance of KubeSlice. Additionally, adopting preventive measures help minimize the occurrence of such issues in the future, enhancing the overall reliability and availability of the KubeSlice environment.

Resolving Stuck Project Namespace Error in KubeSlice Controller

Introduction

This scenario provides steps to address the error related to a stuck project namespace in the KubeSlice Controller. A stuck namespace can lead to operational issues and hinder the proper functioning of KubeSlice. By deleting the stuck namespace using the provided kubectl patch command and performing an uninstall and reinstall of the KubeSlice Controller, administrators can resolve this issue effectively.

Background

In KubeSlice, project namespaces are used to logically separate and organize resources. Occasionally, a project namespace may become stuck or encounter issues, resulting in an error that affects the overall stability of the KubeSlice Controller.

Issue Description:

The error message indicates that a project namespace is stuck, potentially leading to unintended behavior in KubeSlice operations.

Root Cause:

The root cause of the stuck project namespace can vary and may be attributed to multiple factors, such as improper configuration, resource constraints, or network issues.

Impact

The impact of a stuck project namespace includes:

  1. Impaired Functionality

    The project namespace's stuck state may lead to impaired functionality of KubeSlice features and operations within that specific namespace.

  2. Unpredictable Behavior

    KubeSlice behavior can become unpredictable within the stuck project namespace.

Solution

To address the error related to the stuck project namespace in the KubeSlice Controller:

  1. Delete Stuck Namespace

    a. Identify the stuck project namespace (for example, <stuck-namespace>) as indicated in the error message. b. Use the following kubectl patch command to delete the stuck namespace:

    kubectl patch ns/<stuck-namespace> -p '{"metadata":{"finalizers":[]}}' --type=merge

    Replace <stuck-namespace> with the name of the specific namespace that is stuck.

  2. Uninstall and Reinstall KubeSlice Controller

    a. Uninstall the existing KubeSlice Controller using the appropriate package manager or the helm command. b. Ensure that any leftover artifacts or configuration files related to the previous installation are completely removed. c. Reinstall the KubeSlice Controller with the latest version to ensure a clean and updated installation.

Preventive Measures

To prevent similar project namespace issues in the future, consider implementing the following preventive measures:

  1. Regular Monitoring and Auditing

    Implement regular monitoring and auditing of project namespaces to detect andaddress potential issues early.

  2. Namespace Resource Management

    Monitor resource utilization within namespaces and ensure that resources are efficiently allocated to avoid resource constraints.

Conclusion:

Resolving the stuck project namespace error in the KubeSlice Controller is essential for maintaining stable and predictable operations. By deleting the stuck namespace and performing a fresh reinstall of the KubeSlice Controller,administrators can ensure smooth functioning and reliable performance of KubeSlice. Additionally, adopting preventive measures help minimize the occurrence of such issues in the future, enhancing the overall reliability and availability of the KubeSlice environment.

Cluster Registration Issues

Dashboard Does Not Display Metrics Chart

Problem Description

After accessing the KubeSlice Manager dashboard, you notice that the metrics chart is not displayed. This problem occurs when the Prometheus URL is either not provided or provided incorrectly during the cluster registration process.

Solution

  1. Access KubeSlice Manager

    Log in to the KubeSlice Manager using your credentials to access the management console.

  2. Navigate to Cluster Operations

    From the dashboard, navigate to the Clusters section to manage registered clusters.

  3. Edit Cluster Details

    Locate the affected cluster in the list of registered clusters. Click on the cluster's name to access its details and configuration.

  4. Check Prometheus URL

    In the cluster details page, verify the accuracy of the Prometheus URL provided during registration. Ensure that the correct and valid URL for Prometheus is used.

  5. Update Prometheus URL

    If the Prometheus URL is missing or incorrect, update it with the correct URL. Make sure to provide the full URL, including the protocol (for example, http:// or https://) and the domain or IP address where Prometheus is accessible.

  6. Save Changes

    After updating the Prometheus URL, click the Edit Cluster button to apply the changes.

  7. Verify Dashboard Metrics Chart

    After the changes are saved, navigate back to the KubeSlice Manager dashboard. Check if the metrics chart is now displayed and accessible for the registered cluster.

note

It may take a few moments for the metrics data to be fetched and displayed in the chart, especially if there is a delay in Prometheus data retrieval.

If the metrics chart is still not displaying, double-check the correctness of the Prometheus URL and ensure that the connectivity to Prometheus is established.

For more detailed information on cluster operations and troubleshooting, refer to edit a cluster.

Node Information and Kubernetes Dashboard Not Showing Up After Cluster Registration

Introduction

This is an issue where node information and the Kubernetes dashboard do not appear after registering a cluster with the KubeSlice Controller. This problem occurs due to an incorrect Kube API endpoint being entered during the cluster registration process. The solution provides steps to resolve this issue by updating the correct cluster Kube API endpoint on the KubeSlice Manager.

Issue Description

After successfully registering a cluster with the KubeSlice Controller, you encounter an issue where the node information and Kubernetes dashboard are not visible or not showing up in the KubeSlice Manager. This problem occurs when the incorrect Kube API endpoint is provided during the cluster registration process.

Solution

To resolve the issue of missing node information and the Kubernetes dashboard, follow these steps to update the correct cluster Kube API endpoint on the KubeSlice Manager:

  1. Access KubeSlice Manager

    Log in to the KubeSlice Manager dashboard using your credentials.

  2. Locate the Registered Cluster

    On the KubeSlice Manager dashboard, find the registered cluster where the node information and Kubernetes dashboard are not displaying correctly.

  3. Navigate to Clusters

    From the dashboard, navigate to the Clusters page to manage registered clusters.

  4. Edit Cluster Information

    Locate the affected cluster in the list of registered clusters. Click the cluster's name to access its details and configuration.

  5. Check Kube API Endpoint

    In the cluster details page, verify the accuracy of the Kube API endpoint provided during registration. Ensure that the correct and valid endpoint URL is used.

  6. Update Kube API Endpoint

    If the Kube API endpoint is incorrect or invalid, update it with the correct URL. Make sure to use the appropriate protocol (for example, HTTP or HTTPS) and provide the correct domain or IP address.

  7. Save Changes

    After updating the Kube API endpoint, click the Edit Cluster button to apply the changes.

  8. Verify Node Information and Dashboard

    After the changes are saved, navigate back to the KubeSlice Manager dashboard. Verify that the node information and Kubernetes dashboard are now visible and accessible for the registered cluster.

note

It may take a few moments for the updated information to propagate, so give it some time if the changes don't reflect immediately.

If the node information and Kubernetes dashboard are still not showing up, double-check the correctness of the KubeAPI endpoint and ensure that the cluster's connectivity to the KubeSlice Controller is established.

For more detailed information on cluster operations and troubleshooting, refer to edit a cluster.

Conclusion

By updating the correct Kube API endpoint on the KubeSlice Manager for the affected cluster, you can resolve the issue of missing node information and the Kubernetes dashboard. Following the provided steps ensures that the registered cluster's configuration aligns correctly with the KubeSlice Controller, allowing users to access the necessary cluster details and dashboard features.

Partner Cluster Issues

Accessing Cluster Details on Rancher-Managed Clusters with KubeSlice

Introduction

This scenario explains the issue faced when attempting to get cluster details from a Rancher-managed cluster using the kubectl get clusters command. Due to a conflict in preexisting Custom Resource Definitions (CRDs) between KubeSlice and Rancher, the short form of the command is unable to retrieve information related to KubeSlice. The solution below provides a solution by using the full form of the command to access the KubeSlice version of cluster details on Rancher-managed clusters.

Issue Description

When running the kubectl get clusters command with the short form -n <project-name> on a Rancher-managed cluster, it may not return the expected information related to KubeSlice. The conflict arises from the CRDs for bothKubeSlice (clusters.controller.kubeslice.io) and Rancher (clusters.provisioning.cattle.io) attempting to handle the same short form of the command, leading to ambiguity and inaccurate results.

Solution

To access the KubeSlice version of the command on a Rancher-managed cluster and retrieve accurate cluster details:

  1. Use Full Form of the Command

    Instead of using the short form of the command (kubectl get clusters -n <project-name>), use the full form of the command with the specific KubeSlice CRD:

    kubectl get clusters.controller.kubeslice.io -n <project-name>

    This full form of the command explicitly points to the KubeSlice CRD, bypassing the conflict with the Rancher CRD and providing accurate cluster details for the specified project name.

    note

    Ensure that you have the necessary permissions to access the KubeSlice CRD and the specified project namespace.

Conclusion:

By using the full form of the command (kubectl get clusters.controller.kubeslice.io) instead of the short form (kubectlget clusters), you can successfully access cluster details on Rancher-managed clusters with KubeSlice installed. This solution resolves the conflict between the KubeSlice and Rancher CRDs and provides accurate information related to KubeSlice for the specified project namespace.