Skip to main content
Version: 1.13.0

Manage Inference Endpoints

An Inference Endpoint is a hosted service to perform inference tasks such as making predictions or generating outputs using a pre-trained AI model. It enables real-time or batch processing for AI tasks like natural language processing and speech recognition. An Inference Endpoint serves as the operational interface to deploy AI models to users or applications.

This topic delves with managing Inference Endpoints on the EGS platform. An admin can create and manage multiple Inference Endpoints.

info

Across our documentation, we refer to the workspace as the slice workspace. The two terms are used interchangeably.

View Inference Endpoints

  1. Go to Inference Endpoints on the left sidebar.

  2. On the Workspaces page, click a workspace whose Inference Endpoints you want to view.

    alt

  3. On the Inference Endpoints page, you see a list of Inference Endpoints for that workspace.

    alt

  4. Click the > icon for the Inference Endpoint that you want to view.

    alt

Deploy an Inference Endpoint

  1. Go to Inference Endpoint on the left sidebar.

  2. On the Workspaces page, go to the workspace on which you want to deploy an Inference Endpoint.

  3. On the Inference Endpoints page, click Deploy Inference Endpoint.

    alt

  4. On the Create Inference Endpoint pane, under Basic Specifications:

    1. Enter a name for the Inference Endpoint in the Endpoint Name text box.
    2. From the Cluster Name, drop-down list, select the worker cluster on which you want to deploy this Inference Endpoint.
  5. Under Advanced Options, for Model Specifications:

    info

    The following parameters are standard and work for most models. However, if these parameters do not meet your model requirements, then select the Specify your own model configuration checkbox. To know more, see own model configuration.

    1. Enter a name in the Model Format Name text box.
    2. Add the storage URI in the Storage URI text box.
    3. Add the CPU value in the CPU text box.
    4. Add the Memory value in the Memory text box.
    5. Add the arguments in the Args text box.
    6. To add secret key-value pair, click the plus sign against Secret and add them.

    Own Model Configuration

    When the parameters provided under Model Specifications do not meet your model requirements, you can select the Specify your own model configuration checkbox.

    To add your own model configuration:

    1. Select the Specify your own model configuration checkbox, which provides you a terminal screen.

      alt

    2. On the terminal screen, specify your model configuration as InferenceService specifications from KServe. For more information, see KServe.

  6. Under GPU Specifications:

    info

    If you only want CPU-based inference, then select the Create CPU-only Inference checkbox.

    alt

    1. Select node type from the Node Type drop-down list.

    2. GPU Shape and Memory per GPU get auto populated.

    3. The parameters, GPU Nodes and GPUs Per Node have default values. Change them if you want non-default values.

    4. The Reserve For duration parameter in DDHHMM contains default value of 365 days.

      info

      The maximum duration is 365 days. Change the duration to less than 365 days.

    5. The Priority parameter has a default value. Select a different priority (low: 1-100, medium: 101-200, high: 201-300) from the drop-down list.

    6. Set a different priority number as per the priority set. This parameter also contains a default value as per the default priority.

    7. Click Create Inference Endpoint. The status goes to Pending before it changes to Ready.

      alt

Delete an Inference Endpoint

  1. On the Workspaces page, click a workspace whose Inference Endpoint you want to delete.

  2. On that Inference Endpoints page, click the Delete button that is on the top-right corner.

  3. On the confirmation dialog, type the name of the Inference Endpoint and click Delete.

    alt