Slurm Job Completion Logging Plugin API

Overview

This document describes Slurm job completion logging plugins and the API that defines them. It is intended as a resource to programmers wishing to write their own Slurm job completion logging plugins.

Slurm job completion logging plugins are Slurm plugins that implement the Slurm API for logging job information upon their completion. This may be used to log job information to a text file, database, etc. The plugins must conform to the Slurm Plugin API with the following specifications:

const char plugin_type[]
The major type must be "jobcomp." The minor type can be any recognizable abbreviation for the type of scheduler. We recommend, for example:

  • none — No job logging.
  • elasticsearch — Log job information to an Elasticsearch server.
  • filetxt — Log job information to a text file.
  • mysql — Job completion is written to a mysql database.
  • script — Execute a script passing in job information in environment variables.

const char plugin_name[]
Some descriptive name for the plugin. There is no requirement with respect to its format.

const uint32_t plugin_version
If specified, identifies the version of Slurm used to build this plugin and any attempt to load the plugin from a different version of Slurm will result in an error. If not specified, then the plugin may be loaded by Slurm commands and daemons from any version, however this may result in difficult to diagnose failures due to changes in the arguments to plugin functions or changes in other Slurm functions used by the plugin.

The programmer is urged to study src/plugins/jobcomp/filetxt/jobcomp_filetxt.c and src/plugins/jobcomp/none/jobcomp_none.c for sample implementations of a Slurm job completion logging plugin.

API Functions

The following functions must appear. Functions which are not implemented should be stubbed.

int init (void)

Description:
Called when the plugin is loaded, before any other functions are called. Put global initialization here.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

void fini (void)

Description:
Called when the plugin is removed. Clear any allocated storage here.

Returns: None.

Note: These init and fini functions are not the same as those described in the dlopen (3) system library. The C run-time system co-opts those symbols for its own initialization. The system _init() is called before the Slurm init(), and the Slurm fini() is called before the system's _fini().

int slurm_jobcomp_set_location (char * location);

Description: Specify the location to be used for job logging.

Argument: location    (input) specification of where logging should be done. The interpretation of this string is at the discretion of the plugin implementation.

Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR and set the errno to an appropriate value to indicate the reason for failure.

int slurm_jobcomp_log_record(job_record_t *job_ptr);

Description: Note that a job is about to terminate or change size. The job's state will include the JOB_RESIZING flag if and only if it is about to change size. Otherwise the job is terminating. Note the existence of resize_time in the job record if one wishes to record information about a job at each size (i.e. a history of the job as its size changes through time).

Argument:
job_ptr   (input) Pointer to job record as defined in src/slurmctld/slurmctld.h

Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR and set the errno to an appropriate value to indicate the reason for failure.

List slurm_jobcomp_get_jobs(acct_job_cond_t *job_cond);

Description: Get completed job info from storage.

Arguments:
job_cond     (input) specification of filters to identify the jobs we wish information about (start time, end time, cluster name, user id, etc). acct_job_cond_t is defined in common/slurm_accounting_storage.h.

Returns: A list of job records or NULL on error. Elements on the list are of type jobcomp_job_rec_t, which is defined in common/slurm_jobcomp.h. Any returned list must be destroyed to avoid memory leaks.

void slurm_jobcomp_archive(List selected_parts, void *params)

Description: used to archive old data.

Arguments:
List selected_parts (input) list containing char *'s of names of partitions to query against.
void *params (input) to be cast as sacct_parameters_t in the plugin.

Returns: None

Last modified 23 October 2019