3 Kubernetes Plugin

The Kubernetes Job Launcher Plugin provides the capability to launch executables on a Kubernetes cluster.

3.1 Configuration

It is recommended not to change any of the default values and only configure required fields as outlined below.

/etc/rstudio/launcher.kubernetes.conf

Config Option	Description	Required (Y/N)	Default Value
server-user	User to run the executable as. The plugin should be started as root, and will lower its privilege to this user for normal execution. It is recommended not to change the default value, as this is populated by the Launcher service itself.	N	rstudio-server
thread-pool-size	Size of the thread pool used by the plugin. It is recommended not to change the default value, as this is populated by the Launcher service itself.	N	Number of CPUs * 2
enable-debug-logging	Enables/disables verbose debug logging. Can be 1 (enabled) or 0 (disabled).	N	0
scratch-path	Scratch directory where the plugin writes temporary state.	N	/var/lib/rstudio-launcher/{name of plugin}
job-expiry-hours	Number of hours before completed jobs are removed from the system.	N	24
profile-config	Path to the user and group profiles configuration file (explained in more detail below).	N	/etc/rstudio/launcher.kubernetes.profiles.conf
api-url	The Kubernetes API base URL. This can be an HTTP or HTTPS URL. The URL should be up to, but not including the /api endpoint.	Y	Example: https://192.168.99.100:8443
auth-token	The auth token for the `job-launcher` service account. This is used to authenticate with the Kubernetes API. This should be base-64 encoded. See below for more information.	Y
kubernetes-namespace	The Kubernetes namespace to create jobs in. Note that the account specified by the `auth-token` setting must have full API privileges within this namespace. See Kubernetes Cluster Requirements below for more information.	N	rstudio
verify-ssl-certs	Whether or not to verify SSL certificates when connecting to `api-url`. Only applicable if connecting over HTTPS. For production use, you should always leave the default or have this set to true, but it can be disabled for testing purposes.	N	1
certificate-authority	Certificate authority to use when connecting to Kuberentes over SSL and when verifying SSL certificates. This must be a Base64-encoded PEM certificate, which is what most Kubernetes systems will report as the certificate authority in use. Leave this blank to just use the system root CA store.	N
watch-timeout-seconds	Number of seconds before the watch calls to Kubernetes stops. This is to help prevent job status updates from hanging in some environments due to network middleware silently dropping idle connections. It is recommended to keep the default, but it can be raised if job status hangs are not apparent, or turned off by setting this to 0.	N	180
fetch-limit	The maximum amount of objects to request per API call from the Kubernetes Service for GET collection requests. It is recommended you only change the default if you run into size issues with the returned payloads.	N	500

In order to retrieve the auth-token value, run the following commands. Note that the account must first be created and given appropriate permissions (see Kubernetes Cluster Requirements below).

KUBERNETES_AUTH_SECRET=$(kubectl get serviceaccount job-launcher --namespace=rstudio -o jsonpath='{.secrets[0].name}')
kubectl get secret $KUBERNETES_AUTH_SECRET --namespace=rstudio -o jsonpath='{.data.token}' | base64 -d

3.1.1 User and Group Profiles

The Kubernetes plugin also allows you to specify user and group configuration profiles, similar to RStudio Workbench’s profiles, in the configuration file /etc/rstudio/launcher.kubernetes.profiles.conf (or any arbitrary file as specified in profile-config within the main configuration file; see above). These are entirely optional.

Profiles are divided into sections of three different types:

Global ([*])

Per-group ([@groupname])

Per-user ([username])

Here’s an example profiles file that illustrates each of these types:

/etc/rstudio/launcher.kubernetes.profiles.conf

[*]
placement-constraints=node,region:us,region:eu
default-cpus=1
default-mem-mb=512
max-cpus=2
max-mem-mb=1024
container-images=r-session:3.4.2,r-session:3.5.0
allow-unknown-images=0

[@rstudio-power-users]
default-cpus=4
default-mem-mb=4096
default-nvidia-gpus=0
default-amd-gpus=0
max-nvidia-gpus=2
max-amd-gpus=3
max-cpus=20
max-mem-mb=20480
container-images=r-session:3.4.2,r-session:3.5.0,r-session:preview
allow-unknown-images=1

[jsmith]
max-cpus=3

This configuration specifies that by default users will be allowed to launch jobs with a maximum of 1024 MB of memory, and use only two different R containers. It also specifies that members of the rstudio-power-users group will be allowed to use much more resources, including GPUs, and the ability to see the r-session:preview image, in addition to being able to run any image they specify.

Note that the profiles file is processed from top to bottom (i.e. settings matching the current user that occur later in the file always override ones that appeared prior). The settings available in the file are described in more depth in the table below.

/etc/rstudio/launcher.kubernetes.profiles.conf

Config Option	Description	Required (Y/N)	Default Value
container-images	Comma-separated string of allowed images that users may see and run.	N
default-container-image	The default container image to use for the Job if none is specified.	N
allow-unknown-images	Whether or not to allow users to run any image they want within their job containers, or if they have to use the ones specified in `container-images`	N	1
placement-constraints	Comma-separated string of available placement constraints in the form of `key1:value1,key2:value2,...` where the `:value` part is optional to indicate free-form fields. See next section for more details	N
default-cpus	Number of CPUs available to a job by default if not specified by the job.	N	0.0 (infinite - managed by Kubernetes)
default-mem-mb	Number of MB of RAM available to a job by default if not specified by the job.	N	0.0 (infinite - managed by Kubernetes)
max-cpus	Maximum number of CPUs available to a job.	N	0.0 (infinite - managed by Kubernetes)
max-mem-mb	Maximum number of MB of RAM available to a job.	N	0.0 (infinite - managed by Kubernetes)
job-json-overrides	JSON path overrides of the generated Kubernetes Job JSON. See Modifying Jobs.	N
cpu-request-ratio	Ratio within the range (0.0, 1.0] representing the Kubernetes container resource `request` to set for the CPU. This will be the ratio of the `limit` amount specified by the user when creating the job.	N	1.0
memory-request-ratio	Ratio within the range (0.0, 1.0] representing the Kubernetes container resource `request` to set for the memory. This will be the ratio of the `limit` amount specified by the user when creating the job.	N	1.0
default-nvidia-gpus	Number of NVIDIA GPUs available to a job by default if not specified by the job. See below for more information.	N	0
default-amd-gpus	Number of AMD GPUs available to a job by default if not specified by the job. See below for more information.	N	0
max-nvidia-gpus	Maximum number of NVIDIA GPUs available to a job. See below for more information.	N	0
max-amd-gpus	Maximum number of AMD GPUs available to a job. See below for more information.	N	0

Note that resource limits correspond to the Kubernetes container resource limits, which represent hard caps for the resources a job can use. Kubernetes allows jobs to request less resources and occasionally burst up to the limit amount, and this can be controlled by setting the cpu-request-ratio and memory-request-ratio settings as detailed above. Note that resource management in Kubernetes is a complex topic, and in general you should simply leave these to the default value of 1.0 unless you understand the implications of using both requests and limits. See here for more information.

In order to provide GPUs as a schedulable resource, you must first enable the feature in Kubernetes by installing the necessary GPU drivers and device plugins supplied by your desired vendor (AMD or NVIDIA). Once available in Kubernetes, simply set the desired default and max values for the GPU type you intend to use. If not using GPUs, no GPU configuration in the profiles is necessary. For information on adding support for GPUs in Kubernetes, see the Kubernetes documentation.

3.1.2 Modifying Jobs

Whenever a job is submitted to the Kubernetes Launcher plugin, a JSON job object is generated and sent to Kubernetes. In some cases, it may be desireable to add or modify fields within this automatically generated JSON blob.

In order to do that, you may specify job-json-overrides within the profiles file. The form of the value should be "{json path}":"{path to json value file}","{json path 2}":"{path to json value file 2}",....

The JSON path should be a valid JSON path pointer as specified in the JSON Pointer RFC.

The JSON value path specified must be a file readable by the service user, and must contain valid JSON. For example, to add Host Aliases to all submitted jobs:

/etc/rstudio/launcher.kubernetes.profiles.conf

job-json-overrides="/spec/template/spec/hostAliases":"/etc/rstudio/kube-host-aliases"

/etc/rstudio/kube-host-aliases

[
  {
    "ip": "10.2.141.12",
    "hostnames": ["db01"]
  },
  {
    "ip": "10.2.141.13",
    "hostnames": ["db02"]
  }
]

Because the pod itself is nested within the Kubernetes Job object, it is located at the path /spec/template/spec. In the example above, we simply add a JSON object representing the HostAlias array as defined by the Kubernetes API. See the Kubernetes API Documentation for an exhaustive list of fields that can be set.

Any job-json-overrides-specified fields will overwrite already existing fields in the auto-generated job spec. Note that the Kubernetes Launcher plugin requires certain fields to be set in order to properly parse saved job data. It is strongly recommended you use the job-json-overrides feature sparingly, and only use it to add additional fields to the automatically generated job object when necessary.

3.2 Kubernetes Cluster Requirements

In order for the Kubernetes plugin to run correctly, the following assumptions about the Kubernetes cluster must be true:

The Kubernetes API must be enabled and reachable from the machine running the Job Launcher
There must be a namespace to create jobs in, which can be specified via the kubernetes-namespace configuration mentioned above (this defaults to rstudio)
There must be a service account that has full API access for all endpoints and API groups underneath the aforementioned namespace, and the account’s auth token must be supplied to the plugin via the auth-token setting
The service account must have access to view the nodes list via the API (optional, but will restrict IP addresses returned for a job to the internal IP if not properly configured, as /nodes is needed to fetch a node’s external IP address)
The cluster must have the metrics-server addon running and working properly to provide job resource utilization streaming

In order to use placement constraints, you must attach labels to the node that match the given configured placement constraints. For example, if you have a node with the label az=us-east and have a placement constraint defined az:us-east, incoming jobs specified with the az:us-east placement constraint will be routed to the desired node. For more information on Kubernete’s placement constraints, see here.

The following sample script can be run to create a job-launcher service account and rstudio namespace, granting the service account (and thus, the launcher) full API access to manage RStudio jobs:

kubectl create namespace rstudio
kubectl create serviceaccount job-launcher --namespace rstudio
kubectl create rolebinding job-launcher-admin \
   --clusterrole=cluster-admin \
   --group=system:serviceaccounts:rstudio \
   --namespace=rstudio
kubectl create clusterrole job-launcher-clusters \
   --verb=get,watch,list \
   --resource=nodes
kubectl create clusterrolebinding job-launcher-list-clusters \
  --clusterrole=job-launcher-clusters \
  --group=system:serviceaccounts:rstudio