Slurm Plugin
The Slurm Job Launcher Plugin provides the capability to launch executables on a Slurm cluster.
Configuration
/etc/rstudio/launcher.slurm.conf
Config Option | Description | Required (Y/N) | Default Value |
---|---|---|---|
server-user | User to run the executable as. The plugin should be started as root and will lower its privilege to this user for normal execution. It is recommended not to change the default value, as this is populated by the Launcher service itself. | N | rstudio-server |
thread-pool-size | Size of the thread pool used by the plugin. It is recommended not to change the default value, as this is populated by the Launcher service itself. | N | Number of CPUs * 2 |
enable-debug-logging | Enables/disables verbose debug logging. Can be 1 (enabled) or 0 (disabled). | N | 0 |
scratch-path | Scratch directory where the plugin writes temporary state. | N | /var/lib/rstudio-launcher |
logging-dir | Specifies the path where debug logs should be written. | N | /var/log/rstudio/launcher |
job-expiry-hours | Number of hours before completed jobs are removed from the system. | N | 24 |
profile-config | Path to the user and group profiles configuration file (explained in more detail below). | N | /etc/rstudio/launcher.slurm.profiles.conf |
slurm-service-user | The user to run Slurm service commands as. This user must have privileges to query the Slurm cluster. | Y | |
slurm-bin-path | The installation location of the Slurm command line utilities (e.g. sbatch ,scontrol ). If left blank, the command line utilities must be available on the default path. |
N | "" |
user-storage-path | The default location to store Slurm job output. Can be templated with {HOME} or {USER} . Users must have write access to the configured location. Paths beginning with ~ will be correctly evaluated. |
N | ~/slurm-data |
max-output-stream-seconds | The maximum amount of time to keep the job output stream open after the job is completed, in seconds. Since job output may be buffered, the output stream will stay open until it sees an end of stream notifier or it waits the configured number of seconds. Setting this option to a low value may cause job output to appear truncated. Reloading the job output window should resolve that. A value of 0 will cause the output stream to close immediately when the job finishes. | N | 30 |
max-output-file-wait-seconds | The maximum amount of time to wait for the output files to be created after the output stream is started, in seconds. This can be useful if job output is being buffered for a long period of time, or if the file system is particularly slow. Setting this to a value that is too low may cause job-output-not-found errors when attempting to retrieve output shortly after the job starts. | N | 30 |
rsandbox-path | Location of the rsandbox executable. |
N | /usr/lib/rstudio-server/bin/rsandbox |
unprivileged | Runs the Launcher in unprivileged mode. Child processes will not require root permissions. If the plugin cannot acquire root permissions it will run without root and will not change users or perform any impersonation. | N | 0 |
enable-gpus | Whether to allow users to request GPU resources when submitting jobs. The types of GPUs available for request can be controlled via the gpu-types option. |
N | 0 |
enable-gres | Whether to allow users to request arbitrary GRES resources when submitting jobs. The value of this field will be passed directly to the --gres option provided by sbatch . |
N | 0 |
gpu-types | A comma-separated list of GPU types that are available. If GPUs are enabled and this field is empty, users will be able to request general GPUs (i.e. sbatch --gpus=<n> ). Otherwise users will be able to request GPUs for each type (i.e. sbatch --gpus=<type>:<n>[,<type>:<n>] ). For Singularity only: GPU brand (one of nvidia|amd|intel ) should be included with the format type:brand if default-gpu-brand is not set, or GPU is of a different brand than the default. |
N | |
default-gpu-brand | Singularity only. One of nvidia|amd|intel , the default brand of GPU available when requesting jobs. Appended to entries of gpu-types above that don’t already include a brand. |
N | |
allow-requeue | Whether to allow jobs submitted through the Slurm Launcher Plugin to be re-queued by Slurm. This option requires special consideration when using with Workbench Launcher Sessions. See Using the Slurm Launcher plugin with Workbench for more details. | N | 0 |
persist-env-vars | Whether to persist environment variables submitted on the job as Job metadata to allow the data to be persisted across restarts of the Launcher and made available in load balanced scenarios. This should be disabled if you allow Slurm users to view other users’ job data to prevent potentially sensitive information from leaking. See Job Isolation and Security for more details. | N | 1 |
Singularity and GPU Support
Singularity in Workbench supports using GPUs, but requires the brand to be specified either by default-gpu-brand
, or by appending the brand nvidia
, amd
, or intel
after the gpu type in launcher.slurm.conf
. For example: v100:nvidia
.
User storage directory
If the user-storage-path
is left as the default value (~/slurm-data
) or the {HOME}
variable is used in the definition, the server-user
must have read access to each user’s home directory in order to look up the real value. The Slurm Launcher Plugin will then attempt to automatically create the configured user-storage-path
. This action will be taken as the user starting the Job, so the server-user
does not need write access to the user’s home directory.
Job isolation and security
The Slurm Launcher prevents users from seeing details about other users’ jobs when interacting with the Launcher. However, depending on your Slurm configuration, user jobs may be visible to other users by default when using Slurm commands like sacct
directly. Some job fields, such as environment variables, could contain sensitive information, so it is recommended that you prevent Slurm users from viewing job data for other users.
This is configurable by setting the PrivateData configuration option within slurmdbd.conf
. For more information, see the Slurm documentation.
User and group profiles
The Slurm plugin also allows you to specify user and group configuration profiles, similar to Posit Workbench’s profiles, in the configuration file /etc/rstudio/launcher.slurm.profiles.conf
(or an arbitrary file as specified in profile-config
within the main configuration; see above). These are entirely optional.
Profiles are divided into sections of three different types:
Global
([*])
Per-group
([@groupname])
Per-user
([username])
Here is an example profiles file that illustrates each of these types:
/etc/rstudio/launcher.slurm.profiles.conf
[*]
default-cpus=1
default-mem-mb=512
max-cpus=2
max-mem-mb=1024
max-gpus-v100=1
max-gpus-tesla=1
allowed-partitions=mars,jupiter
singularity-image-directory=/singularity-images
[@posit-power-users]
default-cpus=4
default-mem-mb=4096
max-cpus=20
max-mem-mb=20480
max-gpus-tesla=10
default-gpus-tesla=2
max-gpus-v100=8
default-gpus-v100=0
allowed-partitions=mars,jupiter,saturn,titan
[jsmith]
max-cpus=3
This configuration specifies that by default users will be allowed to launch jobs with a maximum of 1024 MB of memory and 2 CPUs to either the mars or jupiter partitions. It also specifies that members of the posit-power-users group will be allowed to use much more resources and additional partitions.
Note that the profiles file is processed from top to bottom (i.e. settings matching the current user that occur later in the file always override ones that appeared prior). The settings available in the file are described in more depth in the table below. Also note that if the Slurm cluster has been configured to have a maximum and/or default memory value, these values will be returned whenever a maximum or default value is not configured for a user.
/etc/rstudio/launcher.slurm.profiles.conf
Config Option | Description | Required (Y/N) | Default Value |
---|---|---|---|
default-cpus | Number of CPUs available to a job by default if not specified by the job. | N | 0.0 (infinite - managed by Slurm) |
default-mem-mb | Number of MB of RAM available to a job by default if not specified by the job. | N | 0.0 (infinite - managed by Slurm) |
max-cpus | Maximum number of CPUs available to a job. Setting this to a negative value will disable setting CPUs on a job. If set, the value of default-cpus will always be used. |
N | 0.0 (infinite - managed by Slurm) |
max-mem-mb | Maximum number of MB of RAM available to a job. Setting this to a negative value will disable setting memory on a job. If set, the value of default-mem-mb will always be used. |
N | 0.0 (infinite - managed by Slurm) |
max-gpus | Only valid if enable-gpus is set to 1 and gpu-types is empty in launcher.slurm.conf . Maximum number of GPUs that can be requested per job. |
N | 0.0 (infinite - managed by Slurm) |
default-gpus | Only valid if enable-gpus is set to 1 and gpu-types is empty in launcher.slurm.conf . Number of GPUs available to a job by default if not specified by the job. |
N | 0.0 (infinite - managed by Slurm) |
max-gpus-<type> | Only valid if enable-gpus is set to 1 and type is included in the gpu-types field in launcher.slurm.conf . Maximum number of GPUs of the specified type that can be requested per job. |
N | 0.0 (infinite - managed by Slurm) |
default-gpus-<type> | Only valid if enable-gpus is set to 1 and type is included in the gpu-types field in launcher.slurm.conf . Number of GPUs of the specified type available to a job by default if not specified by the job. |
N | 0.0 (infinite - managed by Slurm) |
allowed-partitions | A comma-separated list of partitions that may be used to launch jobs. The partition(s) must be valid partitions as seen by your Slurm cluster from the sinfo command. |
N | (empty - all partitions may be used) |
default-partition | The default partition for jobs. | N | The cluster-wide default |
resource-profiles | Available resource profiles. See Resource profiles. | N | |
allow-custom-resources | Whether jobs can use the custom resource profile. See Resource profiles. |
N | 1 |
singularity-image-directory | Path to a directory of containers in the Singularity Image File (SIF) format. When set, the plugin will allow users to launch Apptainer or Singularity containers with these images. This directory should be on shared storage accessible to all Slurm nodes (e.g. /opt/apptainer/containers/). | N | “” |
default-singularity-image | Default singularity image name. | N | default.simg |
Resource profiles
Resource profiles greatly simplify the task of assigning CPU, memory, or GPU resources for a job (provided that GPUs are available). They are configured in the optional /etc/rstudio/launcher.slurm.resources.conf
file. For example:
/etc/rstudio/launcher.slurm.resources.conf
[default]
name = "Default" # optional, derived from the section name when absent
cpus=1
mem-mb=4096
[small]
cpus=1
mem-mb=512
[default-gpu]
cpus=1
mem-mb=4096
nvidia-gpus=1
amd-gpus=0
[hugemem]
name = "Huge Memory"
cpus=8
mem-mb=262144
By default, all profiles are available to all users, and jobs can also use a special custom
profile to specify CPU, memory, and GPU resources directly instead. However, users are still subject to the constraints in User and group profiles, and administrators may also limit access to individual resource profiles with that configuration file.
For example, suppose an admin wants to restrict the resource profiles above such that (1) GPUs and large memory jobs are only available to users in the bioinformatics
group; and (2) only users in the posit-power-users
group can use the custom
resource profile to set their own resources directly. This might result in the following /etc/rstudio/launcher.slurm.profiles.conf
file:
/etc/rstudio/launcher.slurm.profiles.conf
[*]
resource-profiles=default,small
allow-custom-resources=0
[@bioinformatics]
resource-profiles=default,small,default-gpu,hugemem
[@posit-power-users]
resource-profiles=default,small,default-gpu,hugemem
allow-custom-resources=1
The settings available in each section of the /etc/rstudio/launcher.slurm.resources.conf
file are described in more depth in the table below:
/etc/rstudio/launcher.slurm.resources.conf
Config Option | Description | Required (Y/N) | Default Value |
---|---|---|---|
name | A user-friendly name for the profile, e.g. Default (1 CPU, 4G mem) or m4.xlarge . |
N | The section title |
cpus | The CPU limit. | Y | |
mem-mb | The memory limit, in megabytes. | Y | |
gpus | Only valid if enable-gpus is set to 1 and gpu-types is empty in launcher.slurm.conf . Number of GPUs. |
N | 0 |
<type>-gpus | Only valid if enable-gpus is set to 1 and type is included in the gpu-types field in launcher.slurm.conf . Number of GPUs of the specified type. |
N | 0 |
partition | The specific partition (queue) to use, if any. | N |
Slurm cluster requirements
In order for the Slurm plugin to run correctly, the following assumptions about the Slurm cluster must be true:
- The Slurm service account (specified in the main configuration file) must have full cluster-admin privileges.
- The Slurm control machine (the one running
slurmctld
), the Workbench server host, and all Slurm nodes must have a shared home directory. - The Workbench server host must have the following properties:
- the Slurm executables installed (e.g.
sinfo
,scontrol
, etc.). See Supported Slurm versions for supported versions. You may experience unexpected behavior with unsupported versions. - the same
slurm.conf
file as the desired Slurm cluster - network connectivity to the machine running
slurmctld
(i.e. the Workbench server host can resolve the IP or hostname of the Slurm control machine and connect via theslurmctld
port configured inslurm.conf
) - properly configured and running Slurm plugins, as required (e.g. if using MUNGE as an authentication service,
munged
must be running under the same user on all machines connected to the Slurm cluster) - properly configured users and groups (i.e. all users with the same name have the same UID, group, group ID on all machines connected to the cluster)
- the Slurm executables installed (e.g.
For more information about configuring and running a Slurm cluster, please see the Slurm documenation. Information about available Slurm plugins can also be found in the Slurm documentation in the relevant section. For example, here is the documentation about Slurm Accounting which also includes information about the available plugins and how to use them.
Below is an example of a launcher configuration which might be used in this scenario:
/etc/rstudio/launcher.conf
[server]
address=127.0.0.1
port=5559
server-user=rstudio-server
admin-group=rstudio-server
enable-debug-logging=1
[cluster]
name=Slurm
type=Slurm
/etc/rstudio/launcher.slurm.conf
slurm-service-user=slurm
job-expiry-hours=48
user-storage-path=~/slurm-data
max-output-stream-seconds=15
slurm-bin-path=/slurm/bin
enable-gpus=1
gpu-types=tesla,v100
Supported Slurm versions
Slurm Version | Notes | Certified (Y/N) |
---|---|---|
23.02 | Y | |
22.05 | Y | |
21.08 | Y | |
20.11 | Y | |
19.05 | Slurm 19.05 no longer recommended due to CVE-2021-31215. | N |
Using the Slurm Launcher plugin with Posit Workbench
To support launching Workbench Sessions via the Slurm Launcher plugin, the following must be true in addition to the requirements listed in the Slurm cluster requirements section:
- The Workbench host must have network access to every Slurm node that may run a session via any TCP port
- Slurm nodes must have network access to the Workbench host via the
launcher-sessions-callback-address
, as described in Launcher Configuration - To incorporate RStudio Pro Session configurations,
rsession.conf
must be accessible by all Slurm nodes that may run RStudio Pro Sessions. The default expected location can be changed by addingrsession-config-file=<path/to/rsession.conf>
to/etc/rstudio/rserver.conf
If the allow-requeue
option in launcher.slurm.conf
is enabled (i.e. allow-requeue=1
) and Workbench sessions may be preempted by higher priority jobs, it is advisable to set the Slurm preemption mode to SUSPEND
rather than REQUEUE
to avoid any loss of data in the Session. For more details, please see the Slurm Preemption Documentation.
Singularity with Posit Workbench Requirements
Apptainer is the Linux Foundation’s fork of the Singularity project. At the time of writing, Posit Workbench uses features of Apptainer and Singularity that are provided by both technologies in the same way. As a consequence, any reference to Singularity applies to Apptainer and vice versa.
To use the Slurm launcher plugin with Apptainer support the following must be true in addition to the requirements listed in Using the Slurm launcher plugin with Posit Workbench:
- The
singularity-image-directory
must be on shared storage mounted on both the Workbench server node and all Slurm compute nodes. - All Singularity images must have session components that match the Workbench server version. Follow the steps in Install Workbench session components on Slurm compute nodes.
- The Workbench server has
launcher-sessions-create-container-user=0
in therserver.conf
file.
Singularity integrates with other Workbench functionality such as:
- User and group profiles
- Including GPU support in Singularity containers
- Note that Singularity requires you to specify GPU brands
- Launcher mounts
- Note that Singularity is only compatible with the
Host
MountType
- Note that Singularity is only compatible with the
launcher-sessions-forward-container-environment
option inrserver.conf
which allows environment variables set by the Slurm job in Singularity to be forwarded into the running session.
Multiple versions of R and module loading
As described in the R versions section, it is possible to use multiple versions of R and load environment modules per R Version with RStudio Pro Sessions launched via the Slurm Launcher Plugin by configuring the /etc/rstudio/r-versions
file. In order to properly support this feature the following must be true:
- R must be installed on all Slurm nodes in the same location.
- The modules in question must be installed on all Slurm nodes.
- The file
/var/lib/rstudio-server/r-versions
must be reachable by all Slurm nodes. This file is generated by Workbench, and its location may be changed by settingr-versions-path=<shared directory>/r-versions
inrserver.conf
.
Load balancing considerations
When using the Slurm Launcher Plugin with a load balanced Workbench, it is recommended to configure the Slurm cluster and Slurm Launcher Plugins so that the values for job-expiry-hours
are the same in all copies of launcher.slurm.conf
and the value for MinJobAge
in slurm.conf
is at least as long as the configured job-expiry-hours
value. Note that MinJobAge
is set in seconds, rather than hours.
Additional considerations
This section lists notable considerations related to the use of the Slurm Plugin.
- Slurm does not provide a time zone for any time values which it returns. All times related to Slurm jobs returned from the launcher have the same time zone as the configured Slurm cluster.