Job Container Plugin API

Overview

This document describes Slurm job container plugins and the API that defines them. It is intended as a resource to programmers wishing to write their own Slurm job container plugins. Note that job container plugin is designed for use with Slurm jobs. It also applies to the sbcast server process on compute nodes. There is a proctrack plugin designed for use with Slurm job steps.

Slurm job container plugins are Slurm plugins that implement the Slurm job container API described herein. They must conform to the Slurm Plugin API with the following specifications:

const char plugin_type[]
The major type must be "job_container." The minor type can be any recognizable abbreviation for the type of proctrack. We recommend, for example:

  • cncu — Designed for use on Cray systems only and interface with Compute Node Clean Up (CNCU) the Cray infrastructure.
  • none — Designed for all other systems.

const char plugin_name[]
Some descriptive name for the plugin. There is no requirement with respect to its format.

const uint32_t plugin_version
If specified, identifies the version of Slurm used to build this plugin and any attempt to load the plugin from a different version of Slurm will result in an error. If not specified, then the plugin may be loaded by Slurm commands and daemons from any version, however this may result in difficult to diagnose failures due to changes in the arguments to plugin functions or changes in other Slurm functions used by the plugin.

The programmer is urged to study src/plugins/proctrack/job_container/job_container_cncu.c for an example implementation of a Slurm proctrack plugin.

Data Objects

The implementation must support a container ID of type uint64_t. This container ID is generated by the proctrack plugin.

The implementation must maintain (though not necessarily directly export) an enumerated errno to allow Slurm to discover as practically as possible the reason for any failed API call. These values must not be used as return values in integer-valued functions in the API. The proper error return value from integer-valued functions is SLURM_ERROR. The implementation should endeavor to provide useful and pertinent information by whatever means is practical. Successful API calls are not required to reset errno to a known value.

API Functions

The following functions must appear. Functions which are not implemented should be stubbed.

int init (void)

Description:
Called when the plugin is loaded, before any other functions are called. Put global initialization here.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

void fini (void)

Description:
Called when the plugin is removed. Clear any allocated storage here.

Returns: None.

Note: These init and fini functions are not the same as those described in the dlopen (3) system library. The C run-time system co-opts those symbols for its own initialization. The system _init() is called before the Slurm init(), and the Slurm fini() is called before the system's _fini().

int container_p_create (uint32_t job_id);

Description: Create a container. The caller should ensure that the valid container_p_delete() is called.

Argument: job_id    (input) Job ID.

Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR and set the errno to an appropriate value to indicate the reason for failure.

int container_p_add_cont (uint32_t job_id, uint64_t cont_id);

Description: Add a specific process tracking container (PAGG) to a given job's container.

Arguments:
job_id    (input) Job ID.
cont_id    (input) Process tracking container value as set by the proctrack plugin.

Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR and set the errno to an appropriate value to indicate the reason for failure.

int container_p_delete (uint32_t job_id);

Description: Destroy or otherwise invalidate a job container. This does not imply the container is empty, just that it is no longer needed.

Arguments: job_id    (input) Job ID.

Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR and set the errno to an appropriate value to indicate the reason for failure.

int container_p_join(uint32_t job_id, uid_t uid);

Description: Add this process to a given job's container. The process is first placed into a process tracking container (PAGG).

Arguments:
job_id    (input) Job ID.
uid    (input) Owning user ID.

Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR and set the errno to an appropriate value to indicate the reason for failure.

int container_p_join_external(uint32_t job_id);

Description: Add this external process to a given job's container. The process is first placed into a process tracking container (PAGG).

Arguments:
job_id    (input) Job ID.

Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR and set the errno to an appropriate value to indicate the reason for failure.

void container_p_reconfig (void);

Description: Note change in configuration, especially the value of the DebugFlags with respect to JobContainer.

Last modified 22 April 2019