Version: 1.10.0

Inference Endpoint APIs

This topic describes the SDK APIs to create and manage Inference Endpoints.

Create an Inference Endpoint

Use this API to create an Inference Endpoint using the standard model specifications.

Syntax

egs.create_inference_endpoint(cluster_name, endpoint_name, workspace, model_spec(model_format_name, storage_uri, args, secret, resources), gpu_spec(gpu_shape, instance_type, memory_per_gpu,number_of_gpu_nodes,number_of_gpu, exit_duration, priority), authenticated_session=None)

Parameters

Parameter	Parameter Type	Description	Required
clusterName	String	The worker cluster name on which you want to deploy the Inference Endpoint.	Mandatory
endpointName	String	The name of the Inference Endpoint that you want to deploy.	Mandatory
workspace	String	The workspace that is associated with the worker cluster that you will deploy the Inference Endpoint.	Mandatory
model_spec	Object	The model specifications that you want to provide for inference.	Mandatory
gpu_spec	Object	This object is to specify the GPU specifications. If you want only CPU-based inference, then you must specify this parameter value as `None`.	Mandatory
authenticated_session	[AuthenticatedSession]	The authenticated session with the EGS Controller. The default value is `None`. If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised.	Optional

Model Spec Parameters

Parameter	Parameter Type	Description	Required
model_format_name	String	The name of the Inference model format.	Mandatory
storage_uri	String	The storage URI of the Inference model.	Optional
args	String	The arguments for the Inference model.	Optional
secret	Dictionary	The value of the secret in the dictionary data type format.	Optional
resources	Resource	Specify resources such as CPU and Memory.	Optional

GPU Spec Parameters

Parameter	Parameter Type	Description	Required
gpu_shape	String	The name of the GPU type that you can get from the Inventory details.	Mandatory
instance_type	String	The type of the instance requested for.	Mandatory
memory_per_gpu	Integer	The memory requirement in GB per GPU.	Mandatory
number_of_gpus	Integer	The number of GPUs requested.	Mandatory
number_of_gpu_nodes	Integer	The number of GPU nodes requested	Mandatory
exit_duration	String	The duration for which the GPU is requested for. The format should be `0d0h0m`.	Mandatory
priority	Integer	This is the priority of the request. You can set the priority of a GPR in the queue. You can select a GPR and increase the priority number (low: 1-100, medium: 101-200, high: 201-300) to move a GPR higher in the queue.	Mandatory

Response Returned

Returns	Description
String	The Inference Endpoint name that is successfully created.

Exceptions Raised

Raises	Description
exceptions.Unhandled	This exception is raised when the API had no appropriate handling exceptions.

Example

import egs
auth = egs.authenticate("https://egs-core-apis.example.com", "5067bd55-1aef-4c84-8987-3e966e917f07")
egs.create_inference_endpoint
   (cluster_name = "<clusterName>", endpoint_name = "<endpointName>", workspace = "<workspaceName>",
    model_spec = ("modelName": "sklearn", "storageURI": "gs://kfserving-examples/models/sklearn/1.0/model"),
    gpu_spec = ("model_format_name" = "Hugging Face") )

Create an Inference Endpoint with Custom Model

Use this API to create an Inference Endpoint with custom model specifications.

Syntax

egs.create_inference_endpoint(cluster_name, endpoint_name, workspace, raw_model_spec, gpu_spec(gpu_shape, instance_type, memory_per_gpu,number_of_gpu_nodes,number_of_gpu, exit_duration, priority), authenticated_session=None)

Parameters

Parameter	Parameter Type	Description	Required
clusterName	String	The worker cluster name on which you want to deploy the Inference Endpoint.	Mandatory
endpointName	String	The name of the Inference Endpoint that you want to deploy.	Mandatory
workspace	String	The workspace that is associated with the worker cluster that you will deploy the Inference Endpoint.	Mandatory
raw_model_spec	String	The custom model specifications.	Mandatory
gpu_spec	Object	This object is to specify the GPU specifications. If you want only CPU-based inference, then you must specify this parameter value as `None`.	Mandatory
authenticated_session	[AuthenticatedSession]	The authenticated session with the EGS Controller. The default value is `None`. If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised.	Optional

GPU Spec Parameters

Parameter	Parameter Type	Description	Required
gpu_shape	String	The name of the GPU type that you can get from the Inventory details.	Mandatory
instance_type	String	The type of the instance requested for.	Mandatory
memory_per_gpu	Integer	The memory requirement in GB per GPU.	Mandatory
number_of_gpus	Integer	The number of GPUs requested.	Mandatory
number_of_gpu_nodes	Integer	The number of GPU nodes requested	Mandatory
exit_duration	String	The duration for which the GPU is requested for. The format should be `0d0h0m`.	Mandatory
priority	Integer	This is the priority of the request. You can set the priority of a GPR in the queue. You can select a GPR and increase the priority number (low: 1-100, medium: 101-200, high: 201-300) to move a GPR higher in the queue.	Mandatory

Response Returned

Returns	Description
String	The Inference Endpoint name that is successfully created.

Exceptions Raised

Raises	Description
exceptions.Unhandled	This exception is raised when the API had no appropriate handling exceptions.

Example

import egs
auth = egs.authenticate("https://egs-core-apis.example.com", "5067bd55-1aef-4c84-8987-3e966e917f07")
egs.create_inference_endpoint
   (cluster_name = "<clusterName>", endpoint_name = "<endpointName>", workspace = "<workspaceName>",
    raw_model_spec = "<custom model specifications>",
    gpu_spec = ("model_format_name" = "Hugging Face") )

List Inference Endpoints

Use this API to list all the inference endpoints for a given slice workspace.

Syntax

egs.list_inference_endpoint(workspace, authenticated_session=None)

Parameters

Parameter	Parameter Type	Description	Required
workspace	String	The name of the slice workspace whose Inference Endpoints you want to view.	Mandatory
authenticated_session	[AuthenticatedSession]	The authenticated session with the EGS Controller. The default value is `None`. If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised.	Optional

Response Returned

Returns	Description
Class InferenceEndpointBrief	The Inference Endpoint object contains the endpoint name, the model name, its status, the associated cluster and the namespace details.

Class InferenceEndpointBrief

Parameter	Type	Description
endpoint_name	String	The name of the Inference Endpoint.
model_name	String	The name of the Inference model.
status	String	The status of the Inference Endpoint.
endpoint	String	The endpoint or the URL of the Inference Endpoint.
cluster_name	String	The worker cluster name that contains the Inference Endpoint deployed.
namespace	String	The namespace that contains the Inference Endpoint.

Exceptions Raised

Raises	Description
exceptions.Unhandled	This exception is raised when the API had no appropriate handling exceptions.

Example

import egs
auth = egs.authenticate("https://egs-core-apis.example.com", "5067bd55-1aef-4c84-8987-3e966e917f07")
egs.list_inference_endpoint(workspace = "slice-1")

Describe an Inference Endpoint

Use this API to describe an Inference Endpoint.

Syntax

egs.describe_inference_endpoint(workspace, endpoint_name, authenticated_session=None)

Parameters

Parameter	Parameter Type	Description	Required
endpoint_name	String	The name of the Inference Endpoint whose detailed description you want to view.	Mandatory
workspace	String	The name of the slice workspace to which the Inference Endpoint parent cluster is connected.	Mandatory
authenticated_session	[AuthenticatedSession]	The authenticated session with the EGS Controller. The default value is `None`. If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised.	Optional

Response Returned

Returns	Description
class InferenceEndpoint	The response object returns the Inference Endpoint details including the DNS records and GPRs.

Class InferenceEndpoint

Parameter	Type	Description
endpoint_name	String	The name of the Inference Endpoint.
model_name	String	The name of the Inference model.
status	String	The status of the Inference Endpoint.
endpoint	String	The endpoint URL of the Inference Endpoint.
cluster_name	String	The worker cluster on which the Inference Endpoint is deployed.
namespace	String	The namespace in which the Inference Endpoint is deployed.
predict_status	String	The status whether the Inference Endpoint is ready for predictions or not.
ingress_status	String	The status of the Kubernetes Ingress which enables incoming network requests to the Inference Endpoint.
try_command	String	An example command which can be used to check the models that are available for prediction.
dns_records	Object	The DNS records details related to the Inference Endpoint. You must add the following DNS records to your domain host zone file for traffic routing to work with the provided DNS Name.
gpu_request	Object	The GPU request details associated with the Inference Endpoint.

DNS Records

You must add the following DNS records to your domain host zone file for traffic routing to work with the provided DNS Name.

Parameter	Type	Description
dns	String	The name of the DNS record.
type	String	The type of DNS.
value	String	The value of DNS.

GPU Request

Parameter	Type	Description
gpr_name	String	The name of the GPU request.
gprId	String	The GPU request ID.
instance_type	String	The instance type of the node.
gpu_shape	String	The shape of the GPU.
number_of_gpus	Integer	The number of GPUs requested.
number_of_gpu_nodes	Integer	The number of GPU nodes requested
memoryPerGPU	String	The memory requirement in GB per GPU.
status	String	The status of the GPU request.

Exceptions Raised

Raises	Description
exceptions.Unhandled	This exception is raised when the API had no appropriate handling exceptions.

Example

import egs
auth = egs.authenticate("https://egs-core-apis.example.com", "5067bd55-1aef-4c84-8987-3e966e917f07")
egs.describe_inference_endpoint(endpoint_name = "inference-1-gpu", workspace="slice-1")

Delete an Inference Endpoint

Use this API to delete an Inference Endpoint associated with a given slice workspace.

Syntax

egs.delete_inference_endpoint(endpoint_name, cluster_name, workspace, authenticated_session=None)

Parameters

Parameter	Parameter Type	Description	Required
clusterName	String	The worker cluster name from which you want to delete the deployed Inference Endpoint	Mandatory
endpointName	String	The Inference Endpoint name that you want to delete.	Mandatory
workspace	String	The slice workspace name that contains the Inference Endpoint you want to delete.
authenticated_session	[AuthenticatedSession]	The authenticated session with the EGS Controller. The default value is `None`. If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised.	Optional

Response Returned

Returns	Description
Void	There is no response object.

Exceptions Raised

Raises	Description
exceptions.Unhandled	This exception is raised when the API had no appropriate handling exceptions.

Example

import egs
auth = egs.authenticate("https://egs-core-apis.example.com", "5067bd55-1aef-4c84-8987-3e966e917f07")
egs.delete_inference_endpoint(endpoint_name = "inference-1-gpu", cluster_name = "worker-1", workspace = "slice-1")

Create an Inference Endpoint​

Syntax​

Parameters​

Model Spec Parameters​

GPU Spec Parameters​

Response Returned​

Exceptions Raised​

Example​

Create an Inference Endpoint with Custom Model​

Syntax​

Parameters​

GPU Spec Parameters​

Response Returned​

Exceptions Raised​

Example​

List Inference Endpoints​

Syntax​

Parameters​

Response Returned​

Class InferenceEndpointBrief​

Exceptions Raised​

Example​

Describe an Inference Endpoint​

Syntax​

Parameters​

Response Returned​

Class InferenceEndpoint​

DNS Records​

GPU Request​

Exceptions Raised​

Example​

Delete an Inference Endpoint​

Syntax​

Parameters​

Response Returned​

Exceptions Raised​

Example​

Create an Inference Endpoint

Syntax

Parameters

Model Spec Parameters

GPU Spec Parameters

Response Returned

Exceptions Raised

Example

Create an Inference Endpoint with Custom Model

Syntax

Parameters

GPU Spec Parameters

Response Returned

Exceptions Raised

Example

List Inference Endpoints

Syntax

Parameters

Response Returned

Class InferenceEndpointBrief

Exceptions Raised

Example

Describe an Inference Endpoint

Syntax

Parameters

Response Returned

Class InferenceEndpoint

DNS Records

GPU Request

Exceptions Raised

Example

Delete an Inference Endpoint

Syntax

Parameters

Response Returned

Exceptions Raised

Example