Skip to main content
Version: 1.14.0

GPR Template APIs

This topic describes how to use the Python SDK to interact with the GPR Template APIs, including creation, retrieval, listing, updating, and deletion of GPU provisioning templates.

Create a GPR Template

Use this API to create a new GPR template.

Syntax

egs.create_gpr_template (name, cluster_name, gpu_per_node_count, num_gpu_nodes, memory_per_gpu, gpu_shape, instance_type, exit_duration, priority, idle_timeout_duration, enforce_idle_timeout, enable_eviction, requeue_on_failure, authenticated_session)

Parameters

ParameterTypeDescriptionRequired
nameStringThe name of the GPR template.Mandatory
cluster_nameStringThe name of the cluster this template is being created for.Mandatory
gpu_per_node_countIntegerThe number of GPUs required per node to run the workload.Mandatory
num_gpu_nodesIntegerThe number of GPUs required.Mandatory
memory_per_gpuIntegerThe memory requirement in GB per GPU.Mandatory
gpu_shapeStringThe name of the GPU type that you can get from the Inventory details.Mandatory
instance_typeStringThe type of the instance requested for.Mandatory
exit_durationStringThe duration for which the GPU is requested for. The format should be 0d0h0m.Mandatory
priorityIntegerThis is the priority of the request. You can set the priority of a GPR in the queue. You can select a GPR and increase the priority number (low: 1-100, medium: 101-200, high: 201-300) to move a GPR higher in the queue.Mandatory
idle_timeout_durationStringThe duration for which a GPU node can be considered idle before it can be used by another GPR.Optional
enforce_idle_timeoutBooleanIf you set the idleTimeOutDuration, then the value of this parameter is enabled by default. Set the value to false if you do not want it enforce the idle time out.Optional
enable_evictionBooleanEnable this option to enable auto-eviction of the low priority GPR.Optional
requeue_on_failureBooleanEnable this option to requeue GPR in case it fails.Optional
authenticated_session[AuthenticatedSession]The authenticated session with the EGS Controller. The default value is None. If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised.Optional

Response Returned

ReturnsDescription
StringThe name of the GPU template that is just created.

Exceptions Raised

RaisesDescription
ValueErrorThis error is raised if idle timeout is enforced but duration is not provided.
UnhandledExceptionThis exceptions is raised if the API call fails.

Example

from egs.gpr_template import create_gpr_template

create_gpr_template(
name="my-template",
cluster_name="worker-cluster",
gpu_per_node_count=1,
num_gpu_nodes=2,
memory_per_gpu=40,
gpu_shape="A100",
instance_type="a2-highgpu-2g",
exit_duration="1h",
priority=100,
enforce_idle_timeout=True,
idle_timeout_duration="10m",
enable_eviction=True,
requeue_on_failure=False
)

Get a GPR Template

Use this API to get a GPR template by specifying its name.

Syntax

egs.get_gpr_template (gpr_template_name, authenticated_session)

Parameters

ParameterParameter TypeDescriptionRequired
gpr_template_nameStringThe name of the GPR template that you want to retrieve.Mandatory.
authenticated_session[AuthenticatedSession]The authenticated session with the EGS Controller. The default value is None. If no authenticated session is provided, SDK tries to use the SDK default. If no SDK default is found, an exception is raised.Optional

Response Returned

ReturnsDescription
GetGprTemplateResponseThe specified GPR Template is returned.

GetGprTemplateResponse

ParameterTypeDescription
nameStringThe name of the GPR template.
clusterNameStringThe name of the cluster this GPR template is tied to.
exitDurationStringThe duration after which the GPR resources will be automatically exited.
gpuShapeStringThe type/model of GPU (for example, NVIDIA-A100-SXM4-40GB).
gpuSharingModeStringThe sharing mode for the GPU (for example, exclusive, time-sharing).
instanceTypeStringThe cloud provider instance type (for example, a2-highgpu-2g).
memoryPerGpuIntegerThe memory in GB assigned per GPU.
numberOfGPUNodesIntegerThe number of GPU-enabled nodes requested.
numberOfGPUsIntegerThe total number of GPUs requested across nodes.
priorityIntegerThe priority score for scheduling this GPR template.
requeueOnFailureBooleanWhether the system should retry if this GPR template fails.

Exceptions Raised

RaisesDescription
UnhandledExceptionThis exceptions is raised if the API call fails.

Example

from egs.gpr_template import get_gpr_template

template = get_gpr_template("my-template")
print(template.name)

List GPR Templates

Use this API to list all GPR templates.

Parameters

ParameterParameter TypeDescriptionRequired
authenticated_session[AuthenticatedSession]The authenticated session with the EGS Controller. The default value is None. If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised.Optional

Response Returned

ReturnsDescription
List of GetGprTemplateResponse objectsThe object that contains all the GPR templates is returned.

Exceptions Raised

RaisesDescription
UnhandledExceptionThis exceptions is raised when the API call fails.

Example

from egs.gpr_template import list_gpr_templates

templates = list_gpr_templates()
for t in templates.items:
print(t.name)

Update a GPR Template

Use this API to update an existing GPR template.

Syntax

egs.update_gpr_template (name, cluster_name, gpu_per_node_count, num_gpu_nodes, memory_per_gpu, gpu_shape, instance_type, exit_duration, priority, idle_timeout_duration, enforce_idle_timeout, enable_eviction, requeue_on_failure, authenticated_session)

Parameters

ParameterTypeDescriptionRequired
nameStringName of the GPR template.Mandatory
cluster_nameStringThe name of the cluster this template is being updated for.Mandatory
gpu_per_node_countIntegerThe number of GPUs required per node to run the workload.Mandatory
num_gpu_nodesIntegerThe number of GPUs required.Mandatory
memory_per_gpuIntegerThe memory requirement in GB per GPU.Mandatory
gpu_shapeStringThe name of the GPU type that you can get from the Inventory details.Mandatory
instance_typeStringThe type of the instance requested for.Mandatory
exit_durationStringThe duration for which the GPU is requested for. The format should be 0d0h0m.Mandatory
priorityIntegerThis is the priority of the request. You can set the priority of a GPR in the queue. You can select a GPR and increase the priority number (low: 1-100, medium: 101-200, high: 201-300) to move a GPR higher in the queue.Mandatory
idle_timeout_durationStringThe duration for which a GPU node can be considered idle before it can be used by another GPR.Optional
enforce_idle_timeoutBooleanIf you set the idleTimeOutDuration, then the value of this parameter is enabled by default. Set the value to false if you do not want it enforce the idle time out.Optional
enable_evictionBooleanEnable this option to enable auto-eviction of the low priority GPR.Optional
requeue_on_failureBooleanEnable this option to requeue GPR in case it fails.Optional
authenticated_session[AuthenticatedSession]The authenticated session with the EGS Controller. The default value is None. If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised.Optional

Response Returned

ReturnsDescription
UpdateGprTemplateResponseThe updated GPR Template object is returned.

Exceptions Raised

RaisesDescription
UnhandledExceptionThis exceptions is raised if the API call fails.

Example

from egs.gpr_template import update_gpr_template

update_gpr_template(
name="my-template",
cluster_name="worker-cluster",
number_of_gpus=2,
number_of_gpu_nodes=2,
memory_per_gpu=40,
gpu_shape="A100",
instance_type="a2-highgpu-2g",
exit_duration="2h",
priority=120,
enforce_idle_timeout=True,
idle_timeout_duration="15m",
enable_eviction=True,
requeue_on_failure=True
)

Delete a GPR Template

Use this API to delete a GPR template by specifying its name.

Syntax

egs.delete_gpr_template(<template name>)

Parameters

ParameterTypeDescriptionRequired
gpr_template_nameStringThe name of the template that you want to delete.Mandatory
authenticated_session[AuthenticatedSession]The authenticated session with the EGS Controller. The default value is None. If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised.Optional

Response Returned

ReturnsDescription
VoidThere is no response object.

Exceptions Raised

RaisesDescription
UnhandledExceptionThis exceptions is raised if the API call fails.

Example

from egs.gpr_template import delete_gpr_template

delete_gpr_template("my-template")