Remote Procedure Call (RPC) Update Guide

Overview

Slurm uses RPCs (Remote Procedure Calls) to communicate between daemons and commands. The format of many RPCs change between major Slurm releases (the first two number designate the major release, and the last one the maintenance release level, 17.02.7 is major=17.02, maintenance=7, and changes in the maintenance release (also known as the "micro") value indicate bug fixes since the previous release, while changes in the major version indicate fundamental changes in logic). The reason for format changes is typically new fields being added for new Slurm capabilities. Slurm communications support the most current release plus the two previous major releases. For example, Slurm version 17.11.x daemons can communicate with Slurm commands from version 17.11.x, 17.02.x and 16.05.x. The same is true for state save files. Slurm can be upgraded through two major releases without loss of data since the older state files are still recognized. Slurm commands initiated under one version of Slurm can also continue to execute and communicate with the Slurm daemons through two major release upgrades of the daemons. Upgrades beyond two major releases will result in unrecognized state information, but intermediate upgrades can be performed to reformat the state information and prevent its loss. As new versions of Slurm are released, support for the oldest communication protocols is removed from the code.

Code Changes

The code used for packing and unpacking RPCs can be mostly found in src/common, although additional logic can also be found in each of of the Slurm daemons (slurmctld, slurmd, and slurmdbd) plus some of the plugins.

All RPCs and state save files contain Slurm version information. This version information permits the server to recognize its format and respond using a message format understood by the client. When fields are added to or removed from existing RPCs, it is essential that only the format of RPCs for the newest release of Slurm (which is still under development) altered to avoid breaking pre-existing communication protocols. It may be necessary to add a new protocol version number to the code and support for releases beyond two previous releases may be removed. In the new code, copy the pack and unpack logic from the previous version of the code and add new fields as needed. Be sure to pack and unpack data in the identical order. It is also important to keep in mind that when adding support for new fields in an RPC protocol, that logic may need to be added to the older protocols setting the new field values to a reasonable default value (e.g. NULL, 0). Slurm supports RPCs up to two newer major versions, so a 16.05 version will be able to communicate with a 16.05, 17.02 and 17.11 versions, but not with a 18.x. For example, if a job is submitted using an old version of sbatch to a newer slurmctld daemon, job specifications that are unknown to the old sbatch should be set by the slurmctld daemon to reasonable values. An example of the changes required are shown below. In this trivial example, we want to add a new "max_nodes" field to the message for Slurm version 15.08.x.

/*
 * Original code in Slurm v2.5.x
 */
if (protocol_version >= SLURM_14_11_PROTOCOL_VERSION) {
	pack32(msg->job_id, buffer);
	pack32(msg->user_id, buffer);
	pack32(msg->min_nodes, buffer);
} else if (protocol_version >= SLURM_14_03_PROTOCOL_VERSION) {
	pack32(msg->job_id, buffer);
	pack32(msg->user_id, buffer);
} else if (protocol_version >= SLURM_13_08_PROTOCOL_VERSION) {
	pack32(msg->job_id, buffer);
} else {
	error("pack_whatever_msg: protocol_version "
	      "%hu not supported", protocol_version);
}
/*
 * New code in Slurm v15.08.x
 */
if (protocol_version >= SLURM_15_08_PROTOCOL_VERSION) {
	pack32(msg->job_id, buffer);
	pack32(msg->user_id, buffer);
	pack32(msg->min_nodes, buffer);
	pack32(msg->max_nodes, buffer);
} else if (protocol_version >= SLURM_14_11_PROTOCOL_VERSION) {
	pack32(msg->job_id, buffer);
	pack32(msg->user_id, buffer);
	pack32(msg->min_nodes, buffer);
} else if (protocol_version >= SLURM_14_03_PROTOCOL_VERSION) {
	pack32(msg->job_id, buffer);
	pack32(msg->user_id, buffer);
} else {
	error("pack_whatever_msg: protocol_version "
	      "%hu not supported", protocol_version);
}

Last modified 26 April 2019