Slurm Generic Resource (GRES) Plugin API

Overview

This document describes Slurm generic resource (GRES) plugins and the API that defines them. It is intended as a resource to programmers wishing to write their own Slurm GRES plugins.

Slurm GRES plugins must conform to the Slurm Plugin API with the following specifications:

const char *plugin_type="major/minor"

major must be gres. minor can be any suitable name representing the GRES type of the plugin.

const char *plugin_name

Some descriptive name for the plugin. There is no requirement with respect to its format.

const uint32_t plugin_version

If specified, identifies the version of Slurm used to build this plugin and any attempt to load the plugin from a different version of Slurm will result in an error. If not specified, then the plugin may be loaded by Slurm commands and daemons from any version; however, this may result in difficult to diagnose failures due to changes in the arguments to plugin functions or changes in other Slurm functions used by the plugin.

We include samples in the Slurm distribution for:

  • gpu — Manage GPUs (Graphics Processing Units).
  • mps — Manage MPS (CUDA Multi-Process Service).

API Functions

All of the following functions are required. Functions which are not implemented must be stubbed.

int init(void)

Description:
Called when the plugin is loaded, before any other functions are called. Put global initialization here.

Arguments: None.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

void fini(void)

Description:
Called when the plugin is removed. Clear any allocated storage here.

Arguments: None.

Returns: None.

Note: init() and fini() are not the same as those described in the dlopen(3) system library. The C run-time system co-opts those symbols for its own initialization. The system _init() is called before the Slurm init(), and the Slurm fini() is called before the system's _fini().

int node_config_load(List gres_conf_list, node_config_load_t *config)

Description:
This function is called by the slurmd daemon after the slurm.conf and gres.conf files have been read. It can be used to validate or infer the system configuration by testing the actual hardware resources available or just confirm that an entry for the resource was included in the gres.conf file.

Arguments:
gres_conf_list (input/output) a list of configuration records generated by reading the slurm.conf and gres.conf files
config (input) Additional data. Contains fields cpu_cnt and xcpuinfo_mac_to_abs.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

void job_set_env(char ***job_env_ptr, void *gres_ptr, int node_inx)

Description:
This function is called by the slurmd daemon after the getting a job credential and can be used to set environment variables for the job based upon GRES state information in that credential.

Arguments:
job_env_ptr (input/output) pointer to the job's environment variable structure.
gres_ptr (input) pointer to the job's GRES allocation information.
node_inx (input) zero origin node index, used to interpret node-specific GRES data.

Returns: None.

void step_set_env(char ***job_env_ptr, void *gres_ptr, uint32_t flags)

Description:
This function is called by the slurmd daemon after the getting a job step credential and can be used to set environment variables for the job step based upon GRES state information in that credential.

Arguments:
job_env_ptr (input/output) pointer to the job step's environment variable structure.
gres_ptr (input) pointer to the step's GRES allocation information.
flags (input) Various flags to alter behavior. Currently, only a verbose flag is used to print verbose GRES binding information to stderr for each task.

Returns: None.

void send_stepd(int fd)

Description:
This function is called by the slurmd daemon to send any needed information to the slurmstepd step shepherd.

Arguments:
fd (input) file descriptor to write information to.

Returns: None.

void recv_stepd(int fd)

Description:
This function is called by the slurmstepd step shepherd to read any needed information from the slurmd daemon.

Arguments:
fd (input) file descriptor to read information from.

Returns: None.

int job_info(gres_job_state_t *job_gres_data, uint32_t node_inx, enum gres_job_data_type data_type, void *data)

Description:
This function is used to extract plugin-specific data from the job's GRES data structure. Note that enum gres_job_data_type values GRES_JOB_DATA_COUNT and GRES_JOB_DATA_BITMAP are processed in common code rather than within the plugin and return data types of uint32_t* and bitstr_t**, respectively.

Arguments:
job_gres_data (input) Information about the job's GRES resources.
node_inx (input) Zero origin index within the job's resource allocation for which data is desired.
data_type (input) Type of information to be gathered from the data structure.
data (output) Pointer to data within job_gres_data. No data is copied or needs to be freed. Data type depends upon the value of data_type.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

int step_info(gres_step_state_t *step_gres_data, uint32_t node_inx, enum gres_step_data_type data_type, void *data)

Description:
This function is used to extract plugin-specific data from the step's GRES data structure. Note that enum gres_job_data_type values GRES_JOB_DATA_COUNT and GRES_JOB_DATA_BITMAP are processed in common code rather than within the plugin and return data types of uint32_t* and bitstr_t**, respectively.

Arguments:
step_gres_data (input) Information about the step's GRES resources.
node_inx (input) Zero origin index within the job's resource allocation for which data is desired.
data_type (input) Type of information to be gathered from the data structure.
data (output) Pointer to data within step_gres_data. No data is copied or needs to be freed. Data type depends upon the value of data_type.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

List get_devices(void)

Description:
This function returns the list of GRES devices.

Arguments: None.

Returns:
Returns a List of GRES device records of type gres_slurmd_conf_t.

void step_hardware_init(bitstr_t *usable_gres, char *settings)

Description:
Configure device hardware corresponding to all the GRES devices of the plugin type. The slurmstepd calls this function while privileged and before tasks are forked and executed. The gres/gpu plugin sets GPU frequencies here.

Arguments:
usable_gres (input) A bit string specifying all GRES devices of the plugin type allocated to the step.
settings (input) A string containing device hardware settings to be set for all specified hardware devices.

Returns: None.

void step_hardware_fini(void)

Description:
Do hardware configuration after the step is finished while privileged. This is meant to allow Slurm to undo hardware configuration changes performed by step_hardware_init(). The slurmstepd calls this function while privileged and after tasks complete. The gres/gpu plugin resets GPU frequencies to high here.

Arguments: None.

Returns: None.

gres_epilog_info_t *epilog_build_env(gres_job_state_t *gres_job_ptr)

Description:
Given a job's GRES allocation data, translated that to the data required by epilog_set_env() to set environment variables for the Prolog and Epilog programs.

Arguments:
gres_job_ptr (input) job's GRES allocation data.

Returns: Data structure containing the information required by epilog_set_env() to set environment variables for the Prolog and Epilog programs.

void epilog_set_env(char ***epilog_env_ptr, gres_epilog_info_t *epilog_info, int node_inx)

Description:
Set GRES specific environment variables for the Prolog and Epilog programs.

Arguments:
epilog_env_ptr (input) environment variables set for the Prolog and Epilog programs. This array may be reallocated as needed to contain additional environment variables.
epilog_info (input) GRES specific job allocation information. Built by epilog_build_env().
node_inx (input) zero-origin index of this node in the job's allocation. Needed to identify the resources on a specific node allocated to this job.

Returns: None.

Last modified 30 October 2020