Inference Endpoint APIs
This topic describes the SDK APIs to create and manage Inference Endpoints.
Create an Inference Endpoint
Use this API to create an Inference Endpoint using the standard model specifications.
Syntax
egs.create_inference_endpoint(cluster_name, endpoint_name, workspace, model_spec(model_format_name, storage_uri, args, secret, resources), gpu_spec(gpu_shape, instance_type, memory_per_gpu,number_of_gpu_nodes,number_of_gpu, exit_duration, priority), authenticated_session=None)
Parameters
Parameter | Parameter Type | Description | Required |
---|---|---|---|
clusterName | String | The worker cluster name on which you want to deploy the Inference Endpoint. | Mandatory |
endpointName | String | The name of the Inference Endpoint that you want to deploy. | Mandatory |
workspace | String | The workspace that is associated with the worker cluster that you will deploy the Inference Endpoint. | Mandatory |
model_spec | Object | The model specifications that you want to provide for inference. | Mandatory |
gpu_spec | Object | This object is to specify the GPU specifications. If you want only CPU-based inference, then you must specify this parameter value as None . | Mandatory |
authenticated_session | [AuthenticatedSession] | The authenticated session with the EGS Controller. The default value is None . If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised. | Optional |
Model Spec Parameters
Parameter | Parameter Type | Description | Required |
---|---|---|---|
model_format_name | String | The name of the Inference model format. | Mandatory |
storage_uri | String | The storage URI of the Inference model. | Optional |
args | String | The arguments for the Inference model. | Optional |
secret | Dictionary | The value of the secret in the dictionary data type format. | Optional |
resources | Resource | Specify resources such as CPU and Memory. | Optional |
GPU Spec Parameters
Parameter | Parameter Type | Description | Required |
---|---|---|---|
gpu_shape | String | The name of the GPU type that you can get from the Inventory details. | Mandatory |
instance_type | String | The type of the instance requested for. | Mandatory |
memory_per_gpu | Integer | The memory requirement in GB per GPU. | Mandatory |
number_of_gpus | Integer | The number of GPUs requested. | Mandatory |
number_of_gpu_nodes | Integer | The number of GPU nodes requested | Mandatory |
exit_duration | String | The duration for which the GPU is requested for. The format should be 0d0h0m . | Mandatory |
priority | Integer | This is the priority of the request. You can set the priority of a GPR in the queue. You can select a GPR and increase the priority number (low: 1-100, medium: 101-200, high: 201-300) to move a GPR higher in the queue. | Mandatory |
Response Returned
Returns | Description |
---|---|
String | The Inference Endpoint name that is successfully created. |
Exceptions Raised
Raises | Description |
---|---|
exceptions.Unhandled | This exception is raised when the API had no appropriate handling exceptions. |
Example
import egs
auth = egs.authenticate("https://egs-core-apis.example.com", "5067bd55-1aef-4c84-8987-3e966e917f07")
egs.create_inference_endpoint
(cluster_name = "<clusterName>", endpoint_name = "<endpointName>", workspace = "<workspaceName>",
model_spec = ("modelName": "sklearn", "storageURI": "gs://kfserving-examples/models/sklearn/1.0/model"),
gpu_spec = ("model_format_name" = "Hugging Face") )
Create an Inference Endpoint with Custom Model
Use this API to create an Inference Endpoint with custom model specifications.
Syntax
egs.create_inference_endpoint(cluster_name, endpoint_name, workspace, raw_model_spec, gpu_spec(gpu_shape, instance_type, memory_per_gpu,number_of_gpu_nodes,number_of_gpu, exit_duration, priority), authenticated_session=None)
Parameters
Parameter | Parameter Type | Description | Required |
---|---|---|---|
clusterName | String | The worker cluster name on which you want to deploy the Inference Endpoint. | Mandatory |
endpointName | String | The name of the Inference Endpoint that you want to deploy. | Mandatory |
workspace | String | The workspace that is associated with the worker cluster that you will deploy the Inference Endpoint. | Mandatory |
raw_model_spec | String | The custom model specifications. | Mandatory |
gpu_spec | Object | This object is to specify the GPU specifications. If you want only CPU-based inference, then you must specify this parameter value as None . | Mandatory |
authenticated_session | [AuthenticatedSession] | The authenticated session with the EGS Controller. The default value is None . If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised. | Optional |
GPU Spec Parameters
Parameter | Parameter Type | Description | Required |
---|---|---|---|
gpu_shape | String | The name of the GPU type that you can get from the Inventory details. | Mandatory |
instance_type | String | The type of the instance requested for. | Mandatory |
memory_per_gpu | Integer | The memory requirement in GB per GPU. | Mandatory |
number_of_gpus | Integer | The number of GPUs requested. | Mandatory |
number_of_gpu_nodes | Integer | The number of GPU nodes requested | Mandatory |
exit_duration | String | The duration for which the GPU is requested for. The format should be 0d0h0m . | Mandatory |
priority | Integer | This is the priority of the request. You can set the priority of a GPR in the queue. You can select a GPR and increase the priority number (low: 1-100, medium: 101-200, high: 201-300) to move a GPR higher in the queue. | Mandatory |
Response Returned
Returns | Description |
---|---|
String | The Inference Endpoint name that is successfully created. |
Exceptions Raised
Raises | Description |
---|---|
exceptions.Unhandled | This exception is raised when the API had no appropriate handling exceptions. |
Example
import egs
auth = egs.authenticate("https://egs-core-apis.example.com", "5067bd55-1aef-4c84-8987-3e966e917f07")
egs.create_inference_endpoint
(cluster_name = "<clusterName>", endpoint_name = "<endpointName>", workspace = "<workspaceName>",
raw_model_spec = "<custom model specifications>",
gpu_spec = ("model_format_name" = "Hugging Face") )
List Inference Endpoints
Use this API to list all the inference endpoints for a given slice workspace.
Syntax
egs.list_inference_endpoint(workspace, authenticated_session=None)
Parameters
Parameter | Parameter Type | Description | Required |
---|---|---|---|
workspace | String | The name of the slice workspace whose Inference Endpoints you want to view. | Mandatory |
authenticated_session | [AuthenticatedSession] | The authenticated session with the EGS Controller. The default value is None . If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised. | Optional |
Response Returned
Returns | Description |
---|---|
Class InferenceEndpointBrief | The Inference Endpoint object contains the endpoint name, the model name, its status, the associated cluster and the namespace details. |
Class InferenceEndpointBrief
Parameter | Type | Description |
---|---|---|
endpoint_name | String | The name of the Inference Endpoint. |
model_name | String | The name of the Inference model. |
status | String | The status of the Inference Endpoint. |
endpoint | String | The endpoint or the URL of the Inference Endpoint. |
cluster_name | String | The worker cluster name that contains the Inference Endpoint deployed. |
namespace | String | The namespace that contains the Inference Endpoint. |
Exceptions Raised
Raises | Description |
---|---|
exceptions.Unhandled | This exception is raised when the API had no appropriate handling exceptions. |
Example
import egs
auth = egs.authenticate("https://egs-core-apis.example.com", "5067bd55-1aef-4c84-8987-3e966e917f07")
egs.list_inference_endpoint(workspace = "slice-1")
Describe an Inference Endpoint
Use this API to describe an Inference Endpoint.
Syntax
egs.describe_inference_endpoint(workspace, endpoint_name, authenticated_session=None)
Parameters
Parameter | Parameter Type | Description | Required |
---|---|---|---|
endpoint_name | String | The name of the Inference Endpoint whose detailed description you want to view. | Mandatory |
workspace | String | The name of the slice workspace to which the Inference Endpoint parent cluster is connected. | Mandatory |
authenticated_session | [AuthenticatedSession] | The authenticated session with the EGS Controller. The default value is None . If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised. | Optional |
Response Returned
Returns | Description |
---|---|
class InferenceEndpoint | The response object returns the Inference Endpoint details including the DNS records and GPRs. |
Class InferenceEndpoint
Parameter | Type | Description |
---|---|---|
endpoint_name | String | The name of the Inference Endpoint. |
model_name | String | The name of the Inference model. |
status | String | The status of the Inference Endpoint. |
endpoint | String | The endpoint URL of the Inference Endpoint. |
cluster_name | String | The worker cluster on which the Inference Endpoint is deployed. |
namespace | String | The namespace in which the Inference Endpoint is deployed. |
predict_status | String | The status whether the Inference Endpoint is ready for predictions or not. |
ingress_status | String | The status of the Kubernetes Ingress which enables incoming network requests to the Inference Endpoint. |
try_command | String | An example command which can be used to check the models that are available for prediction. |
dns_records | Object | The DNS records details related to the Inference Endpoint. You must add the following DNS records to your domain host zone file for traffic routing to work with the provided DNS Name. |
gpu_request | Object | The GPU request details associated with the Inference Endpoint. |
DNS Records
You must add the following DNS records to your domain host zone file for traffic routing to work with the provided DNS Name.
Parameter | Type | Description |
---|---|---|
dns | String | The name of the DNS record. |
type | String | The type of DNS. |
value | String | The value of DNS. |
GPU Request
Parameter | Type | Description |
---|---|---|
gpr_name | String | The name of the GPU request. |
gprId | String | The GPU request ID. |
instance_type | String | The instance type of the node. |
gpu_shape | String | The shape of the GPU. |
number_of_gpus | Integer | The number of GPUs requested. |
number_of_gpu_nodes | Integer | The number of GPU nodes requested |
memoryPerGPU | String | The memory requirement in GB per GPU. |
status | String | The status of the GPU request. |
Exceptions Raised
Raises | Description |
---|---|
exceptions.Unhandled | This exception is raised when the API had no appropriate handling exceptions. |
Example
import egs
auth = egs.authenticate("https://egs-core-apis.example.com", "5067bd55-1aef-4c84-8987-3e966e917f07")
egs.describe_inference_endpoint(endpoint_name = "inference-1-gpu", workspace="slice-1")
Delete an Inference Endpoint
Use this API to delete an Inference Endpoint associated with a given slice workspace.
Syntax
egs.delete_inference_endpoint(endpoint_name, cluster_name, workspace, authenticated_session=None)
Parameters
Parameter | Parameter Type | Description | Required |
---|---|---|---|
clusterName | String | The worker cluster name from which you want to delete the deployed Inference Endpoint | Mandatory |
endpointName | String | The Inference Endpoint name that you want to delete. | Mandatory |
workspace | String | The slice workspace name that contains the Inference Endpoint you want to delete. | |
authenticated_session | [AuthenticatedSession] | The authenticated session with the EGS Controller. The default value is None . If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised. | Optional |
Response Returned
Returns | Description |
---|---|
Void | There is no response object. |
Exceptions Raised
Raises | Description |
---|---|
exceptions.Unhandled | This exception is raised when the API had no appropriate handling exceptions. |
Example
import egs
auth = egs.authenticate("https://egs-core-apis.example.com", "5067bd55-1aef-4c84-8987-3e966e917f07")
egs.delete_inference_endpoint(endpoint_name = "inference-1-gpu", cluster_name = "worker-1", workspace = "slice-1")