GPR Template APIs
This topic describes how to use the Python SDK to interact with the GPR Template APIs, including creation, retrieval, listing, updating, and deletion of GPU provisioning templates.
Create a GPR Template
Use this API to create a new GPR template.
Syntax
egs.create_gpr_template (name, cluster_name, gpu_per_node_count, num_gpu_nodes, memory_per_gpu, gpu_shape, instance_type, exit_duration, priority, idle_timeout_duration, enforce_idle_timeout, enable_eviction, requeue_on_failure, authenticated_session)
Parameters
Parameter | Type | Description | Required |
---|---|---|---|
name | String | The name of the GPR template. | Mandatory |
cluster_name | String | The name of the cluster this template is being created for. | Mandatory |
gpu_per_node_count | Integer | The number of GPUs required per node to run the workload. | Mandatory |
num_gpu_nodes | Integer | The number of GPUs required. | Mandatory |
memory_per_gpu | Integer | The memory requirement in GB per GPU. | Mandatory |
gpu_shape | String | The name of the GPU type that you can get from the Inventory details. | Mandatory |
instance_type | String | The type of the instance requested for. | Mandatory |
exit_duration | String | The duration for which the GPU is requested for. The format should be 0d0h0m . | Mandatory |
priority | Integer | This is the priority of the request. You can set the priority of a GPR in the queue. You can select a GPR and increase the priority number (low: 1-100, medium: 101-200, high: 201-300) to move a GPR higher in the queue. | Mandatory |
idle_timeout_duration | String | The duration for which a GPU node can be considered idle before it can be used by another GPR. | Optional |
enforce_idle_timeout | Boolean | If you set the idleTimeOutDuration , then the value of this parameter is enabled by default. Set the value to false if you do not want it enforce the idle time out. | Optional |
enable_eviction | Boolean | Enable this option to enable auto-eviction of the low priority GPR. | Optional |
requeue_on_failure | Boolean | Enable this option to requeue GPR in case it fails. | Optional |
authenticated_session | [AuthenticatedSession] | The authenticated session with the EGS Controller. The default value is None . If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised. | Optional |
Response Returned
Returns | Description |
---|---|
String | The name of the GPU template that is just created. |
Exceptions Raised
Raises | Description |
---|---|
ValueError | This error is raised if idle timeout is enforced but duration is not provided. |
UnhandledException | This exceptions is raised if the API call fails. |
Example
from egs.gpr_template import create_gpr_template
create_gpr_template(
name="my-template",
cluster_name="worker-cluster",
gpu_per_node_count=1,
num_gpu_nodes=2,
memory_per_gpu=40,
gpu_shape="A100",
instance_type="a2-highgpu-2g",
exit_duration="1h",
priority=100,
enforce_idle_timeout=True,
idle_timeout_duration="10m",
enable_eviction=True,
requeue_on_failure=False
)
Get a GPR Template
Use this API to get a GPR template by specifying its name.
Syntax
egs.get_gpr_template (gpr_template_name, authenticated_session)
Parameters
Parameter | Parameter Type | Description | Required |
---|---|---|---|
gpr_template_name | String | The name of the GPR template that you want to retrieve. | Mandatory. |
authenticated_session | [AuthenticatedSession] | The authenticated session with the EGS Controller. The default value is None . If no authenticated session is provided, SDK tries to use the SDK default. If no SDK default is found, an exception is raised. | Optional |
Response Returned
Returns | Description |
---|---|
GetGprTemplateResponse | The specified GPR Template is returned. |
GetGprTemplateResponse
Parameter | Type | Description |
---|---|---|
name | String | The name of the GPR template. |
clusterName | String | The name of the cluster this GPR template is tied to. |
exitDuration | String | The duration after which the GPR resources will be automatically exited. |
gpuShape | String | The type/model of GPU (for example, NVIDIA-A100-SXM4-40GB). |
gpuSharingMode | String | The sharing mode for the GPU (for example, exclusive, time-sharing). |
instanceType | String | The cloud provider instance type (for example, a2-highgpu-2g). |
memoryPerGpu | Integer | The memory in GB assigned per GPU. |
numberOfGPUNodes | Integer | The number of GPU-enabled nodes requested. |
numberOfGPUs | Integer | The total number of GPUs requested across nodes. |
priority | Integer | The priority score for scheduling this GPR template. |
requeueOnFailure | Boolean | Whether the system should retry if this GPR template fails. |
Exceptions Raised
Raises | Description |
---|---|
UnhandledException | This exceptions is raised if the API call fails. |
Example
from egs.gpr_template import get_gpr_template
template = get_gpr_template("my-template")
print(template.name)
List GPR Templates
Use this API to list all GPR templates.
Parameters
Parameter | Parameter Type | Description | Required |
---|---|---|---|
authenticated_session | [AuthenticatedSession] | The authenticated session with the EGS Controller. The default value is None . If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised. | Optional |
Response Returned
Returns | Description |
---|---|
List of GetGprTemplateResponse objects | The object that contains all the GPR templates is returned. |
Exceptions Raised
Raises | Description |
---|---|
UnhandledException | This exceptions is raised when the API call fails. |
Example
from egs.gpr_template import list_gpr_templates
templates = list_gpr_templates()
for t in templates.items:
print(t.name)
Update a GPR Template
Use this API to update an existing GPR template.
Syntax
egs.update_gpr_template (name, cluster_name, gpu_per_node_count, num_gpu_nodes, memory_per_gpu, gpu_shape, instance_type, exit_duration, priority, idle_timeout_duration, enforce_idle_timeout, enable_eviction, requeue_on_failure, authenticated_session)
Parameters
Parameter | Type | Description | Required |
---|---|---|---|
name | String | Name of the GPR template. | Mandatory |
cluster_name | String | The name of the cluster this template is being updated for. | Mandatory |
gpu_per_node_count | Integer | The number of GPUs required per node to run the workload. | Mandatory |
num_gpu_nodes | Integer | The number of GPUs required. | Mandatory |
memory_per_gpu | Integer | The memory requirement in GB per GPU. | Mandatory |
gpu_shape | String | The name of the GPU type that you can get from the Inventory details. | Mandatory |
instance_type | String | The type of the instance requested for. | Mandatory |
exit_duration | String | The duration for which the GPU is requested for. The format should be 0d0h0m . | Mandatory |
priority | Integer | This is the priority of the request. You can set the priority of a GPR in the queue. You can select a GPR and increase the priority number (low: 1-100, medium: 101-200, high: 201-300) to move a GPR higher in the queue. | Mandatory |
idle_timeout_duration | String | The duration for which a GPU node can be considered idle before it can be used by another GPR. | Optional |
enforce_idle_timeout | Boolean | If you set the idleTimeOutDuration , then the value of this parameter is enabled by default. Set the value to false if you do not want it enforce the idle time out. | Optional |
enable_eviction | Boolean | Enable this option to enable auto-eviction of the low priority GPR. | Optional |
requeue_on_failure | Boolean | Enable this option to requeue GPR in case it fails. | Optional |
authenticated_session | [AuthenticatedSession] | The authenticated session with the EGS Controller. The default value is None . If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised. | Optional |
Response Returned
Returns | Description |
---|---|
UpdateGprTemplateResponse | The updated GPR Template object is returned. |
Exceptions Raised
Raises | Description |
---|---|
UnhandledException | This exceptions is raised if the API call fails. |
Example
from egs.gpr_template import update_gpr_template
update_gpr_template(
name="my-template",
cluster_name="worker-cluster",
number_of_gpus=2,
number_of_gpu_nodes=2,
memory_per_gpu=40,
gpu_shape="A100",
instance_type="a2-highgpu-2g",
exit_duration="2h",
priority=120,
enforce_idle_timeout=True,
idle_timeout_duration="15m",
enable_eviction=True,
requeue_on_failure=True
)
Delete a GPR Template
Use this API to delete a GPR template by specifying its name.
Syntax
egs.delete_gpr_template(<template name>)
Parameters
Parameter | Type | Description | Required |
---|---|---|---|
gpr_template_name | String | The name of the template that you want to delete. | Mandatory |
authenticated_session | [AuthenticatedSession] | The authenticated session with the EGS Controller. The default value is None . If no authenticated session is set, SDK tries to use the SDK default. If no SDK default is found, an exception is raised. | Optional |
Response Returned
Returns | Description |
---|---|
Void | There is no response object. |
Exceptions Raised
Raises | Description |
---|---|
UnhandledException | This exceptions is raised if the API call fails. |
Example
from egs.gpr_template import delete_gpr_template
delete_gpr_template("my-template")