Slurm Plugin
The Slurm Job Launcher Plugin provides the capability to launch executables on a Slurm cluster.
Configuration
/etc/rstudio/launcher.slurm.conf
Config Option | Description | Required (Y/N) | Default Value |
---|---|---|---|
server-user | User to run the executable as. The plugin should be started as root and will lower its privilege to this user for normal execution. It is recommended not to change the default value, as this is populated by the Launcher service itself. | N | rstudio-server |
thread-pool-size | Size of the thread pool used by the plugin. It is recommended not to change the default value, as this is populated by the Launcher service itself. | N | Number of CPUs * 2 |
enable-debug-logging | Enables/disables verbose debug logging. Can be 1 (enabled) or 0 (disabled). | N | 0 |
scratch-path | Scratch directory where the plugin writes temporary state. | N | /var/lib/rstudio-launcher |
logging-dir | Specifies the path where debug logs should be written. | N | /var/log/rstudio/launcher |
job-expiry-hours | Number of hours before completed jobs are removed from the system. | N | 24 |
profile-config | Path to the user and group profiles configuration file (explained in more detail below). | N | /etc/rstudio/launcher.slurm.profiles.conf |
slurm-service-user | The user to run Slurm service commands as. This user must have privileges to query the Slurm cluster. | Y | |
slurm-bin-path | The installation location of the Slurm command line utilities (e.g. sbatch ,scontrol ). If left blank, the command line utilities must be available on the default path. |
N | "" |
user-storage-path | The default location to store Slurm job output. Can be templated with {HOME} or {USER} . Users must have write access to the configured location. Paths beginning with ~ will be correctly evaluated. |
N | ~/slurm-data |
max-output-stream-seconds | The maximum amount of time to keep the job output stream open after the job is completed, in seconds. Since job output may be buffered, the output stream will stay open until it sees an end of stream notifier or it waits the configured number of seconds. Setting this option to a low value may cause job output to appear truncated. Reloading the job output window should resolve that. A value of 0 will cause the output stream to close immediately when the job finishes. | N | 30 |
max-output-file-wait-seconds | The maximum amount of time to wait for the output files to be created after the output stream is started, in seconds. This can be useful if job output is being buffered for a long period of time, or if the file system is particularly slow. Setting this to a value that is too low may cause job-output-not-found errors when attempting to retrieve output shortly after the job starts. | N | 30 |
rsandbox-path | Location of the rsandbox executable. |
N | /usr/lib/rstudio-server/bin/rsandbox |
unprivileged | Runs the Launcher in unprivileged mode. Child processes will not require root permissions. If the plugin cannot acquire root permissions it will run without root and will not change users or perform any impersonation. | N | 0 |
enable-gpus | Whether to allow users to request GPU resources when submitting jobs. The types of GPUs available for request can be controlled via the gpu-types option. |
N | 0 |
enable-gres | Whether to allow users to request arbitrary GRES resources when submitting jobs. The value of this field will be passed directly to the --gres option provided by sbatch . |
N | 0 |
gpu-types | A comma-separated list of GPU types that are available. If GPUs are enabled and this field is empty, users will be able to request general GPUs (i.e. sbatch --gpus=<n> . Otherwise users will be able to request GPUs for each type (i.e. sbatch --gpus=<type>:<n>[,<type>:<n>] ). |
N | |
allow-requeue | Whether to allow jobs submitted through the Slurm Launcher Plugin to be requeued by Slurm. This option requires special consideration when using with RStudio Launcher Sessions. See Using the Slurm Launcher Plugin with Workbench for more details. | N | 0 |
User Storage Directory
If the user-storage-path
is left as the default value (~/slurm-data
) or the {HOME}
variable is used in the definition, the server-user
must have read access to each user’s home directory in order to look up the real value. The Slurm Launcher Plugin will then attempt to automatically create the configured user-storage-path
. This action will be taken as the user starting the Job, so the server-user
does not need write access to the user’s home directory.
User and Group Profiles
The Slurm plugin also allows you to specify user and group configuration profiles, similar to RStudio Workbench’s profiles, in the configuration file /etc/rstudio/launcher.slurm.profiles.conf
(or an arbitrary file as specified in profile-config
within the main configuration; see above). These are entirely optional.
Profiles are divided into sections of three different types:
Global ([*])
Per-group ([@groupname])
Per-user ([username])
Here’s an example profiles file that illustrates each of these types:
# /etc/rstudio/launcher.slurm.profiles.conf
[*]
default-cpus=1
default-mem-mb=512
max-cpus=2
max-mem-mb=1024
max-gpus-v100=1
max-gpus-tesla=1
[@rstudio-power-users]
default-cpus=4
default-mem-mb=4096
max-cpus=20
max-mem-mb=20480
max-gpus-tesla=10
default-gpus-tesla=2
max-gpus-v100=8
default-gpus-v100=0
[jsmith]
max-cpus=3
This configuration specifies that by default users will be allowed to launch jobs with a maximum of 1024 MB of memory and 2 CPUs. It also specifies that members of the rstudio-power-users group will be allowed to use much more resources.
Note that the profiles file is processed from top to bottom (i.e. settings matching the current user that occur later in the file always override ones that appeared prior). The settings available in the file are described in more depth in the table below. Also note that if the Slurm cluster has been configured to have a maximum and/or default memory value, these values will be returned whenever a maximum or default value is not configured for a user.
/etc/rstudio/launcher.slurm.profiles.conf
Config Option | Description | Required (Y/N) | Default Value |
---|---|---|---|
default-cpus | Number of CPUs available to a job by default if not specified by the job. | N | 0.0 (infinite - managed by Slurm) |
default-mem-mb | Number of MB of RAM available to a job by default if not specified by the job. | N | 0.0 (infinite - managed by Slurm) |
max-cpus | Maximum number of CPUs available to a job. Setting this to a negative value will disable setting CPUs on a job. If set, the value of default-cpus will always be used. |
N | 0.0 (infinite - managed by Slurm) |
max-mem-mb | Maximum number of MB of RAM available to a job. Setting this to a negative value will disable setting memory on a job. If set, the value of default-mem-mb will always be used. |
N | 0.0 (infinite - managed by Slurm) |
max-gpus | Only valid if enable-gpus is set to 1 and gpu-types is empty in launcher.slurm.conf . Maximum number of GPUs that can be requested per job. |
N | 0.0 (infinite - managed by Slurm) |
default-gpus | Only valid if enable-gpus is set to 1 and gpu-types is empty in launcher.slurm.conf . Number of GPUs available to a job by default if not specified by the job. |
N | 0.0 (infinite - managed by Slurm) |
max-gpus-<type> | Only valid if enable-gpus is set to 1 and type is included in the gpu-types field in launcher.slurm.conf . Maximum number of GPUs of the specified type that can be requested per job. |
N | 0.0 (infinite - managed by Slurm) |
default-gpus-<type> | Only valid if enable-gpus is set to 1 and type is included in the gpu-types field in launcher.slurm.conf . Number of GPUs of the specified type available to a job by default if not specified by the job. |
N | 0.0 (infinite - managed by Slurm) |
Slurm Cluster Requirements
In order for the Slurm plugin to run correctly, the following assumptions about the Slurm cluster must be true:
- The Slurm service account (specified in the main configuration file) must have full cluster-admin privileges.
- The Slurm control machine (the one running
slurmctld
), the RStudio Launcher host machine, and all Slurm nodes must have a shared home directory. - The RStudio Launcher host machine must have the following properties:
- the Slurm executables installed (e.g.
sinfo
,scontrol
, etc.). See Supported Slurm Versions for supported versions. You may experience unexpected behavior with unsupported versions. - the same
slurm.conf
file as the desired Slurm cluster - network connectivity to the machine running
slurmctld
(i.e. the RStudio Launcher host machine can resolve the IP or hostname of the Slurm control machine and connect via theslurmctld
port configured inslurm.conf
) - properly configured and running Slurm plugins, as required (e.g. if using MUNGE as an authentication service,
munged
must be running under the same user on all machines connected to the Slurm cluster) - properly configured users and groups (i.e. all users with the same name have the same UID, group, group ID on all machines connected to the cluster)
- the Slurm executables installed (e.g.
For more information about configuring and running a Slurm cluster, please see the Slurm documenation. Information about available Slurm plugins can also be found in the Slurm documentation in the relevant section. For example, here is the documentation about Slurm Accounting which also includes information about the available plugins and how to use them.
Below is an example of a launcher configuration which might be used in this scenario:
# /etc/rstudio/launcher.conf
[server]
address=127.0.0.1
port=5559
server-user=rstudio-server
admin-group=rstudio-server
enable-debug-logging=1
[cluster]
name=Slurm
type=Slurm
# /etc/rstudio/launcher.slurm.conf
slurm-service-user=slurm
job-expiry-hours=48
user-storage-path=~/slurm-data
max-output-stream-seconds=15
slurm-bin-path=/slurm/bin
enable-gpus=1
gpu-types=tesla,v100
Supported Slurm Versions
Slurm Version | Notes | Certified (Y/N) |
---|---|---|
21.08 | Y | |
20.11 | Y | |
19.05 | Slurm 19.05 no longer recommended due to CVE-2021-31215. | N |
Using the Slurm Launcher Plugin with RStudio Workbench
To support launching Workbench R Sessions via the Slurm Launcher plugin, the following must be true in addition to the requirements listed in the Slurm Cluster Requirements section:
- The Workbench host must have network access to every Slurm node that may run an R Session via any TCP port
- Slurm nodes must have network access to the Workbench host via the
launcher-sessions-callback-address
in order to support launcher jobs via the session, as described in Launcher Configuration - To incorporate R Session configurations,
rsession.conf
must be accessible by all Slurm nodes that may run R Sessions. The default expected location can be changed by addingrsession-config-file=<path/to/rsession.conf>
to/etc/rstudio/rserver.conf
If the allow-requeue
option in launcher.slurm.conf
is enabled (i.e. allow-requeue=1
) and RStudio R Sessions may be preempted by higher priority jobs, it is advisable to set the Slurm preemption mode to SUSPEND
rather than REQUEUE
to avoid any loss of data in the Session. For more details, please see the Slurm Preemption Documentation.
Multiple Versions of R and Module Loading
As described in the R Versions section, it is possible to use multiple versions of R and load environment modules per R Version with R sessions launched via the Slurm Launcher Plugin by configuring the /etc/rstudio/r-versions
file. In order to properly support this feature the following must be true:
- R must be installed on all Slurm nodes in the same location.
- The modules in question must be installed on all Slurm nodes.
- The file
/var/lib/rstudio-server/r-versions
must be reachable by all Slurm nodes. Note that this file is generated by Workbench, and that its location may be changed by settingr-versions-path=<shared directory>/r-versions
inrserver.conf
.
Load Balancing Considerations
When using the Slurm Launcher Plugin with a load balanced Workbench, it is recommended to configure the Slurm cluster and Slurm Launcher Plugins so that the values for job-expiry-hours
are the same in all copies of launcher.slurm.conf
and the value for MinJobAge
in slurm.conf
is at least as long as the configured job-expiry-hours
value. Note that MinJobAge
is set in seconds, rather than hours.
Additional Considerations
This section lists notable considerations related to the use of the Slurm Plugin.
- Slurm does not provide a time zone for any time values which it returns. All times related to slurm jobs returned from the launcher will have the same time zone as the configured Slurm cluster.