Slurm Energy Accounting Plugin API (AcctGatherEnergyType)

Overview

This document describes Slurm's energy accounting plugins and the API that defines them. It is intended as a resource to programmers wishing to write their own Slurm energy accounting plugins.

Slurm energy accounting plugins must conform to the Slurm Plugin API with the following specifications:

const char plugin_name[]="full text name"

A free-formatted ASCII text string that identifies the plugin.

const char plugin_type[]="major/minor"

The major type must be "acct_gather_energy." The minor type can be any suitable name for the type of energy accounting. We currently use

  • none — No energy consumption data is provided.
  • ipmi — Gets energy consumption data from the BMC (Baseboard Management Controller) using the
  • pm_counters — Energy consumption data is collected from the Baseboard Management Controller (BMC) for HPE Cray systems. IPMI (Intelligent Platform Management Interface) tool.
  • rapl — Gets energy consumption data from hardware sensors on each core/socket, using RAPL (Running Average Power Limit) sensors. Note that enabling RAPL may require the execution of the command "sudo modprobe msr".
  • xcc — Gets energy consumption data from the Lenovo ThinkSystem SD650 XClarity Controller (XCC) using IPMI OEM raw commands.

const uint32_t plugin_version
If specified, identifies the version of Slurm used to build this plugin and any attempt to load the plugin from a different version of Slurm will result in an error. If not specified, then the plugin may be loaded by Slurm commands and daemons from any version, however this may result in difficult to diagnose failures due to changes in the arguments to plugin functions or changes in other Slurm functions used by the plugin.

The programmer is urged to study src/plugins/acct_gather_energy/rapl and src/common/slurm_acct_gather_energy.c for a sample implementation of a Slurm energy accounting plugin.

API Functions

All of the following functions are required. Functions which are not implemented must be stubbed.

int init (void)

Description:
Called when the plugin is loaded, before any other functions are called. Put global initialization here.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

void fini (void)

Description:
Called when the plugin is removed. Clear any allocated storage here.

Returns: None.

Note: These init and fini functions are not the same as those described in the dlopen (3) system library. The C run-time system co-opts those symbols for its own initialization. The system _init() is called before the Slurm init(), and the Slurm fini() is called before the system's _fini().

int acct_gather_energy_p_update_node_energy(void)

Description:
Updates energy accounting data for a node. Sets/updates the energy and power accounting values in the acct_gather_energy_t structure for the node on which it is called. Called by the slurmd daemon.

Arguments:
None

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

int acct_gather_energy_p_get_data(enum acct_energy_type data_type, acct_gather_energy_t *energy)

Description:
Updates and returns energy consumption of a task, or returns current energy and power consumption of a node, according to specified data_type. Called by jobacct_gather plugin to update and return energy consumption of a task. Called by slurmd to return energy and power consumption of a node.

Arguments:
data_type (input) type of energy/power data to be returned.
energy (input) pointer to acct_gather_energy_t struct in which energy/power data is to be returned.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

int acct_gather_energy_p_set_data(enum acct_energy_type data_type, acct_gather_energy_t *energy)

Description:
Sets the energy consumption data for a node. Not currently used.

Arguments:
data_type (input) type of energy/power data to be set.
energy (input) pointer to acct_gather_energy_t struct from which energy/power data is to be taken.

Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.

Parameters

These parameters can be used in the slurm.conf to configure the plugin and the frequency at which to gather node energy data.

AcctGatherEnergyType
Specifies which plugin should be used.
AcctGatherNodeFreq
Time interval between pollings in seconds.

Last modified 20 August 2020