Skip to main content
Version: 1.10.0

Inference Endpoint APIs

This topic describes the SDK APIs to create and manage Inference Endpoints.

Create an Inference Endpoint

Use this API to create an Inference Endpoint using the standard model specifications.

Syntax

egs.create_inference_endpoint(cluster_name, endpoint_name, workspace, model_spec(model_format_name, storage_uri, args, secret, resources), gpu_spec(gpu_shape, instance_type, memory_per_gpu,number_of_gpu_nodes,number_of_gpu, exit_duration, priority), authenticated_session=None)

Parameters

ParameterParameter TypeDescriptionRequired
clusterNameStringThe worker cluster name on which you want to deploy the Inference Endpoint.Mandatory
endpointNameStringThe name of the Inference Endpoint that you want to deploy.Mandatory
workspaceStringThe workspace that is associated with the worker cluster that you will deploy the Inference Endpoint.Mandatory
model_specObjectThe model specifications that you want to provide for inference.Mandatory
gpu_specObjectThis object is to specify the GPU specifications. If you want only CPU-based inference, then you must specify this parameter value as None.Mandatory
authenticated_session[AuthenticatedSession]The authenticated session with the EGS Controller. The default value is None. If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised.Optional

Model Spec Parameters

ParameterParameter TypeDescriptionRequired
model_format_nameStringThe name of the Inference model format.Mandatory
storage_uriStringThe storage URI of the Inference model.Optional
argsStringThe arguments for the Inference model.Optional
secretDictionaryThe value of the secret in the dictionary data type format.Optional
resourcesResourceSpecify resources such as CPU and Memory.Optional

GPU Spec Parameters

ParameterParameter TypeDescriptionRequired
gpu_shapeStringThe name of the GPU type that you can get from the Inventory details.Mandatory
instance_typeStringThe type of the instance requested for.Mandatory
memory_per_gpuIntegerThe memory requirement in GB per GPU.Mandatory
number_of_gpusIntegerThe number of GPUs requested.Mandatory
number_of_gpu_nodesIntegerThe number of GPU nodes requestedMandatory
exit_durationStringThe duration for which the GPU is requested for. The format should be 0d0h0m.Mandatory
priorityIntegerThis is the priority of the request. You can set the priority of a GPR in the queue. You can select a GPR and increase the priority number (low: 1-100, medium: 101-200, high: 201-300) to move a GPR higher in the queue.Mandatory

Response Returned

ReturnsDescription
StringThe Inference Endpoint name that is successfully created.

Exceptions Raised

RaisesDescription
exceptions.UnhandledThis exception is raised when the API had no appropriate handling exceptions.

Example

import egs
auth = egs.authenticate("https://egs-core-apis.example.com", "5067bd55-1aef-4c84-8987-3e966e917f07")
egs.create_inference_endpoint
(cluster_name = "<clusterName>", endpoint_name = "<endpointName>", workspace = "<workspaceName>",
model_spec = ("modelName": "sklearn", "storageURI": "gs://kfserving-examples/models/sklearn/1.0/model"),
gpu_spec = ("model_format_name" = "Hugging Face") )

Create an Inference Endpoint with Custom Model

Use this API to create an Inference Endpoint with custom model specifications.

Syntax

egs.create_inference_endpoint(cluster_name, endpoint_name, workspace, raw_model_spec, gpu_spec(gpu_shape, instance_type, memory_per_gpu,number_of_gpu_nodes,number_of_gpu, exit_duration, priority), authenticated_session=None)

Parameters

ParameterParameter TypeDescriptionRequired
clusterNameStringThe worker cluster name on which you want to deploy the Inference Endpoint.Mandatory
endpointNameStringThe name of the Inference Endpoint that you want to deploy.Mandatory
workspaceStringThe workspace that is associated with the worker cluster that you will deploy the Inference Endpoint.Mandatory
raw_model_specStringThe custom model specifications.Mandatory
gpu_specObjectThis object is to specify the GPU specifications. If you want only CPU-based inference, then you must specify this parameter value as None.Mandatory
authenticated_session[AuthenticatedSession]The authenticated session with the EGS Controller. The default value is None. If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised.Optional

GPU Spec Parameters

ParameterParameter TypeDescriptionRequired
gpu_shapeStringThe name of the GPU type that you can get from the Inventory details.Mandatory
instance_typeStringThe type of the instance requested for.Mandatory
memory_per_gpuIntegerThe memory requirement in GB per GPU.Mandatory
number_of_gpusIntegerThe number of GPUs requested.Mandatory
number_of_gpu_nodesIntegerThe number of GPU nodes requestedMandatory
exit_durationStringThe duration for which the GPU is requested for. The format should be 0d0h0m.Mandatory
priorityIntegerThis is the priority of the request. You can set the priority of a GPR in the queue. You can select a GPR and increase the priority number (low: 1-100, medium: 101-200, high: 201-300) to move a GPR higher in the queue.Mandatory

Response Returned

ReturnsDescription
StringThe Inference Endpoint name that is successfully created.

Exceptions Raised

RaisesDescription
exceptions.UnhandledThis exception is raised when the API had no appropriate handling exceptions.

Example

import egs
auth = egs.authenticate("https://egs-core-apis.example.com", "5067bd55-1aef-4c84-8987-3e966e917f07")
egs.create_inference_endpoint
(cluster_name = "<clusterName>", endpoint_name = "<endpointName>", workspace = "<workspaceName>",
raw_model_spec = "<custom model specifications>",
gpu_spec = ("model_format_name" = "Hugging Face") )

List Inference Endpoints

Use this API to list all the inference endpoints for a given slice workspace.

Syntax

egs.list_inference_endpoint(workspace, authenticated_session=None)

Parameters

ParameterParameter TypeDescriptionRequired
workspaceStringThe name of the slice workspace whose Inference Endpoints you want to view.Mandatory
authenticated_session[AuthenticatedSession]The authenticated session with the EGS Controller. The default value is None. If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised.Optional

Response Returned

ReturnsDescription
Class InferenceEndpointBriefThe Inference Endpoint object contains the endpoint name, the model name, its status, the associated cluster and the namespace details.

Class InferenceEndpointBrief

ParameterTypeDescription
endpoint_nameStringThe name of the Inference Endpoint.
model_nameStringThe name of the Inference model.
statusStringThe status of the Inference Endpoint.
endpointStringThe endpoint or the URL of the Inference Endpoint.
cluster_nameStringThe worker cluster name that contains the Inference Endpoint deployed.
namespaceStringThe namespace that contains the Inference Endpoint.

Exceptions Raised

RaisesDescription
exceptions.UnhandledThis exception is raised when the API had no appropriate handling exceptions.

Example

import egs
auth = egs.authenticate("https://egs-core-apis.example.com", "5067bd55-1aef-4c84-8987-3e966e917f07")
egs.list_inference_endpoint(workspace = "slice-1")

Describe an Inference Endpoint

Use this API to describe an Inference Endpoint.

Syntax

egs.describe_inference_endpoint(workspace, endpoint_name, authenticated_session=None)

Parameters

ParameterParameter TypeDescriptionRequired
endpoint_nameStringThe name of the Inference Endpoint whose detailed description you want to view.Mandatory
workspaceStringThe name of the slice workspace to which the Inference Endpoint parent cluster is connected.Mandatory
authenticated_session[AuthenticatedSession]The authenticated session with the EGS Controller. The default value is None. If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised.Optional

Response Returned

ReturnsDescription
class InferenceEndpointThe response object returns the Inference Endpoint details including the DNS records and GPRs.

Class InferenceEndpoint

ParameterTypeDescription
endpoint_nameStringThe name of the Inference Endpoint.
model_nameStringThe name of the Inference model.
statusStringThe status of the Inference Endpoint.
endpointStringThe endpoint URL of the Inference Endpoint.
cluster_nameStringThe worker cluster on which the Inference Endpoint is deployed.
namespaceStringThe namespace in which the Inference Endpoint is deployed.
predict_statusStringThe status whether the Inference Endpoint is ready for predictions or not.
ingress_statusStringThe status of the Kubernetes Ingress which enables incoming network requests to the Inference Endpoint.
try_commandStringAn example command which can be used to check the models that are available for prediction.
dns_recordsObjectThe DNS records details related to the Inference Endpoint. You must add the following DNS records to your domain host zone file for traffic routing to work with the provided DNS Name.
gpu_requestObjectThe GPU request details associated with the Inference Endpoint.

DNS Records

You must add the following DNS records to your domain host zone file for traffic routing to work with the provided DNS Name.

ParameterTypeDescription
dnsStringThe name of the DNS record.
typeStringThe type of DNS.
valueStringThe value of DNS.

GPU Request

ParameterTypeDescription
gpr_nameStringThe name of the GPU request.
gprIdStringThe GPU request ID.
instance_typeStringThe instance type of the node.
gpu_shapeStringThe shape of the GPU.
number_of_gpusIntegerThe number of GPUs requested.
number_of_gpu_nodesIntegerThe number of GPU nodes requested
memoryPerGPUStringThe memory requirement in GB per GPU.
statusStringThe status of the GPU request.

Exceptions Raised

RaisesDescription
exceptions.UnhandledThis exception is raised when the API had no appropriate handling exceptions.

Example

import egs
auth = egs.authenticate("https://egs-core-apis.example.com", "5067bd55-1aef-4c84-8987-3e966e917f07")
egs.describe_inference_endpoint(endpoint_name = "inference-1-gpu", workspace="slice-1")

Delete an Inference Endpoint

Use this API to delete an Inference Endpoint associated with a given slice workspace.

Syntax

egs.delete_inference_endpoint(endpoint_name, cluster_name, workspace, authenticated_session=None)

Parameters

ParameterParameter TypeDescriptionRequired
clusterNameStringThe worker cluster name from which you want to delete the deployed Inference EndpointMandatory
endpointNameStringThe Inference Endpoint name that you want to delete.Mandatory
workspaceStringThe slice workspace name that contains the Inference Endpoint you want to delete.
authenticated_session[AuthenticatedSession]The authenticated session with the EGS Controller. The default value is None. If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised.Optional

Response Returned

ReturnsDescription
VoidThere is no response object.

Exceptions Raised

RaisesDescription
exceptions.UnhandledThis exception is raised when the API had no appropriate handling exceptions.

Example

import egs
auth = egs.authenticate("https://egs-core-apis.example.com", "5067bd55-1aef-4c84-8987-3e966e917f07")
egs.delete_inference_endpoint(endpoint_name = "inference-1-gpu", cluster_name = "worker-1", workspace = "slice-1")