Core Specialization Plugin Programmer Guide

Overview

This document describes the Slurm core specialization plugins and the APIs that defines them. It is intended as a resource to programmers wishing to write their own Slurm core specialization plugin. This is version 100 of the API.

Slurm core specialization plugins must conform to the Slurm Plugin API with the following specifications:

const char plugin_name[]="full text name"

A free-formatted ASCII text string that identifies the plugin.

const char plugin_type[]="major/minor"

The major type must be "core_spec". The minor type can be any suitable name for the type of core specialization package. The following core specialization plugins are included in the Slurm distribution

  • cray_aries — Use Cray XC APIs to enforce core specialization.
  • none — Can be configured to log calls to its functions, but otherwise does nothing.

Slurm can be configured to use multiple core specialization plugins if desired.

const uint32_t plugin_version
If specified, identifies the version of Slurm used to build this plugin and any attempt to load the plugin from a different version of Slurm will result in an error. If not specified, then the plugin may be loaded by Slurm commands and daemons from any version, however this may result in difficult to diagnose failures due to changes in the arguments to plugin functions or changes in other Slurm functions used by the plugin.

NOTE: These functions all accept as an argument the job step's container ID (as set by the proctrack plugin). Each job step will have a different container ID. Since a job may execute multiple job steps sequentially and/or in parallel; these functions will be called once for each job step on each compute node.

API Functions

All of the following functions are required. Functions which are not implemented must be stubbed.

int init (void)

Description:
Called when the plugin is loaded, before any other functions are called. Put global initialization here.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

void fini (void)

Description:
Called when the plugin is removed. Clear any allocated storage here.

Returns: None.

Note: These init and fini functions are not the same as those described in the dlopen (3) system library. The C run-time system co-opts those symbols for its own initialization. The system _init() is called before the Slurm init(), and the Slurm fini() is called before the system's _fini().

int core_spec_p_set(uint64_t cont_id, uint16_t core_count)

Description:
This function is called by the slurmstepd daemon after the job step's tasks have been forked and exec'ed, and immediately before they are released from a held state. Note that each job step will have a different container ID. Note that since a job may execute multiple job steps sequentially and/or in parallel; this function will be called once for each job step on each compute node.

Arguments:
cont_id (input) the job step's container ID as set by the proctrack plugin.
core_count (input) number of specialized cores to be reserved for the job.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

int core_spec_p_clear(uint64_t cont_id)

Description:
This function is called by the slurmstepd daemon after the job step's tasks have all exited. Note that each job step will have a different container ID. Note that since a job may execute multiple job steps sequentially and/or in parallel; this function will be called once for each job step on each compute node.

Arguments:
cont_id (input) the job step's container ID as set by the proctrack plugin.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

int core_spec_p_suspend(uint64_t cont_id, uint16_t core_count)

Description:
This function is called by the slurmstepd daemon immediately after the job step's tasks have all been sent a SIGSTOP signal. Note that each job step will have a different container ID. Note that since a job may execute multiple job steps sequentially and/or in parallel; this function will be called once for each job step on each compute node.

Arguments:
cont_id (input) the job step's container ID as set by the proctrack plugin.
core_count (input) number of specialized cores to be reserved for the job.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

int core_spec_p_resume(uint64_t cont_id, uint16_t core_count)

Description:
This function is called by the slurmstepd daemon immediately before the job step's tasks will all be sent a SIGCONT signal. Note that each job step will have a different container ID. Note that since a job may execute multiple job steps sequentially and/or in parallel; this function will be called once for each job step on each compute node.

Arguments:
cont_id (input) the job step's container ID as set by the proctrack plugin.
core_count (input) number of specialized cores to be reserved for the job.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

Last modified 27 March 2015