Kubernetes Plugin
The Kubernetes Job Launcher Plugin provides the capability to launch executables on a Kubernetes cluster.
Kubernetes provides a stable, backward-compatible API, and Posit Launcher relies on major features of the API that are unlikely to be removed. New versions of Posit products will remain compatible with previously released versions of Kubernetes that are still supported, and are extremely likely compatible with newer versions as they release.
The most recent compatible version of Kubernetes tested by Posit is 1.30.
Configuration
It is recommended not to change any of the default values and only configure required fields as outlined below.
/etc/rstudio/launcher.kubernetes.conf
Config Option | Description | Required (Y/N) | Default Value |
---|---|---|---|
server-user | Service user. The plugin should be started as root, and will lower its privilege to this user for normal execution. It is recommended not to change the default value, as this is populated by the Launcher service itself. | N | rstudio-server |
thread-pool-size | Size of the thread pool used by the plugin. It is recommended not to change the default value, as this is populated by the Launcher service itself. | N | Number of CPUs * 2 |
enable-debug-logging | Enables/disables verbose debug logging. Can be 1 (enabled) or 0 (disabled). | N | 0 |
scratch-path | Scratch directory where the plugin writes temporary state. | N | /var/lib/rstudio-launcher/{name of plugin} |
logging-dir | Specifies the path where debug logs should be written. | N | /var/log/rstudio/launcher |
job-expiry-hours | Number of hours before completed jobs are removed from the system. | N | 24 |
profile-config | Path to the user and group profiles configuration file (explained in more detail below). | N | /etc/rstudio/launcher.kubernetes.profiles.conf |
api-url | The Kubernetes API base URL. This can be an HTTP or HTTPS URL. The URL should be up to, but not including the /api endpoint. | Y | Example: https://192.168.99.100:8443 |
auth-token-path | The path to a file that contains the auth token for the job-launcher service account. This is used to authenticate with the Kubernetes API. See below for more information. Required unless auth-token is set. |
Y | |
auth-token | The auth token for the job-launcher service account. This is used to authenticate with the Kubernetes API. See below for more information. Required if auth-token-path is not set. |
N | |
kubernetes-namespace | The Kubernetes namespace in which to create jobs. Note that the account specified by the auth-token setting must have full API privileges within this namespace. See Kubernetes cluster requirements below for more information. |
N | rstudio |
shared-process-namespace | Use a shared process namespace. This improves behavior when sending termination signals to processes. | N | true |
verify-ssl-certs | Whether or not to verify SSL certificates when connecting to api-url . Only applicable if connecting over HTTPS. Do not disable this option in production use. |
N | 1 |
certificate-authority | Certificate authority to use when connecting to Kubernetes over SSL and when verifying SSL certificates. This must be the Base64-encoded PEM certificate reported by Kubernetes as the certificate authority in use. Leave this blank to use the system root CA store. | N | |
watch-timeout-seconds | Number of seconds before the watch calls to Kubernetes stop. It is strongly recommended to not change this value unless instructed by Posit support. | N | 180 |
fetch-limit | The maximum amount of objects to request per API call from the Kubernetes Service for GET collection requests. It is recommended you only change the default if you run into size issues with the returned payloads. | N | 500 |
use-templating | Enables the new Kubernetes object templating feature (see Kubernetes object templating below). When enabled, any configured job-json-overrides are ignored. |
N | 0 |
In order to generate the contents for the file pointed to by auth-token-path
(or the value for auth-token
), run the following commands. Note that the account must first be created and given appropriate permissions (see Kubernetes cluster requirements below). The file pointed to by auth-token-path
must be owned by the account configured as your server-user
(usually rstudio-server
).
KUBERNETES_AUTH_SECRET=$(kubectl get serviceaccount job-launcher --namespace=rstudio -o jsonpath='{.secrets[0].name}')
# Write token to file. This file must be owned by the `server-user` (usually `rstudio-server`).
kubectl get secret $KUBERNETES_AUTH_SECRET --namespace=rstudio -o jsonpath='{.data.token}' | base64 -d > /etc/rstudio/kubernetes.launcher.token
chmod 0600 /etc/rstudio/kubernetes.launcher.token
sudo chown rstudio-server /etc/rstudio/kubernetes.launcher.token
# Print token for copy/paste.
kubectl get secret $KUBERNETES_AUTH_SECRET --namespace=rstudio -o jsonpath='{.data.token}' | base64 -d
Kubernetes container auto configuration
If you are running the Launcher within a Kubernetes container, a few configuration variables can be inferred automatically by using Kubernetes-injected environment variables and files. These values are automatically added by Kubernetes when a container is launched. Therefore, it is not required to configure these options when running the Launcher within Kubernetes.
Config Option | Obtained From |
---|---|
api-url | https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT} |
auth-token | /var/run/secrets/kubernetes.io/serviceaccount/token |
certificate-authority | Base64-encoded value of /var/run/secrets/kubernetes.io/serviceaccount/ca.crt |
User and group profiles
The Kubernetes plugin also allows you to specify user and group configuration profiles, similar to Posit Workbench’s profiles, in the configuration file /etc/rstudio/launcher.kubernetes.profiles.conf
(or any arbitrary file as specified in profile-config
within the main configuration file; see above). These are entirely optional.
Profiles are divided into sections of three different types:
Global
([*])
Per-group
([@groupname])
Per-user
([username])
Here is an example profiles file that illustrates each of these types:
/etc/rstudio/launcher.kubernetes.profiles.conf
[*]
placement-constraints=node,region:us,region:eu
default-cpus=1
default-cpus-request=0.5
default-mem-mb=512
default-mem-mb-request=256
max-cpus=2
max-mem-mb=1024
container-images=r-session:3.4.2,r-session:3.5.0
allow-unknown-images=0
[@posit-power-users]
default-cpus=4
default-mem-mb=4096
default-nvidia-gpus=0
default-amd-gpus=0
max-nvidia-gpus=2
max-amd-gpus=3
max-cpus=20
max-mem-mb=20480
container-images=r-session:3.4.2,r-session:3.5.0,r-session:preview
allow-unknown-images=1
[jsmith]
max-cpus=3
This configuration specifies that by default users will be allowed to launch jobs with a maximum of 1024 MB of memory, and use only two different R containers. It also specifies that members of the posit-power-users group will be allowed to use much more resources, including GPUs, and the ability to see the r-session:preview
image, in addition to being able to run any image they specify.
Note that the profiles file is processed from top to bottom (i.e. settings matching the current user that occur later in the file always override ones that appeared prior). The settings available in the file are described in more depth in the table below.
/etc/rstudio/launcher.kubernetes.profiles.conf
Config Option | Description | Required (Y/N) | Default Value |
---|---|---|---|
container-images | Comma-separated string of allowed images that users may see and run. | N | |
default-container-image | The default container image to use for the Job if none is specified. | N | |
allow-unknown-images | Whether or not to allow users to run any image they want within their job containers, or if they have to use the ones specified in container-images |
N | 1 |
placement-constraints | Comma-separated string of available placement constraints in the form of key1:value1,key2:value2,... where the :value part is optional to indicate free-form fields. See next section for more details |
N | |
default-cpus | Number of CPUs available to a job by default if not specified by the job. | N | 0.0 (infinite - managed by Kubernetes) |
default-cpus-request | Number of CPUs requested to be available to a job by default if not specified by the job. Corresponds to the Kubernetes CPU Request. | N | 0.0 (not specified - managed by Kubernetes) |
default-mem-mb | Number of MB of RAM available to a job by default if not specified by the job. Corresponds to the Kubernetes Memory Limit. | N | 0.0 (infinite - managed by Kubernetes) |
default-mem-mb-request | Number of MB of RAM requested to be available to a job by default if not specified by the job. Corresponds to the Kubernetes Memory Request. | N | 0.0 (not specified - managed by Kubernetes) |
max-cpus | Maximum number of CPUs available to a job. Corresponds to the Kubernetes CPU Limit. | N | 0.0 (infinite - managed by Kubernetes) |
max-cpus-request | Maximum number of CPUs that can be requested. Corresponds to the Kubernetes CPU Request. | N | 0.0 (not specified - managed by Kubernetes) |
max-mem-mb | Maximum number of MB of RAM available to a job. Corresponds to the Kubernetes Memory Limit. | N | 0.0 (infinite - managed by Kubernetes) |
max-mem-mb-request | Maximum number of MB of RAM that can be requested. Corresponds to the Kubernetes Memory Request. | N | 0.0 (not speified - managed by Kubernetes) |
job-json-overrides | JSON path overrides of the generated Kubernetes Job JSON. See [Modifying jobs]. | N | |
cpu-request-ratio | Ratio within the range (0.0, 1.0] representing the Kubernetes container resource request to set for the CPU. This will be the ratio of the limit amount specified by the user when creating the job if no request was specified and no default was determined via profile settings. |
N | 1.0 |
memory-request-ratio | Ratio within the range (0.0, 1.0] representing the Kubernetes container resource request to set for the memory. This will be the ratio of the limit amount specified by the user when creating the job if no request was specified and no default was determined via profile settings. |
N | 1.0 |
default-nvidia-gpus | Number of NVIDIA GPUs available to a job by default if not specified by the job. See below for more information. | N | 0 |
default-amd-gpus | Number of AMD GPUs available to a job by default if not specified by the job. See below for more information. | N | 0 |
max-nvidia-gpus | Maximum number of NVIDIA GPUs available to a job. See below for more information. | N | 0 |
max-amd-gpus | Maximum number of AMD GPUs available to a job. See below for more information. | N | 0 |
resource-profiles | Available resource profiles. See Resource profiles. | N | |
allow-custom-resources | Whether jobs can use the custom resource profile. See Resource profiles. |
N | 1 |
Note that resource limits correspond to the Kubernetes container resource limits, which represent hard caps for the resources a job can use. Kubernetes allows jobs to request
less resources and occasionally burst up to the limit
amount, and this can be controlled by setting the cpu-request-ratio
and memory-request-ratio
settings as detailed above. Note that resource management in Kubernetes is a complex topic, and in general you should simply leave these to the default value of 1.0
unless you understand the implications of using both requests
and limits
. See here for more information.
In order to provide GPUs as a schedulable resource, you must first enable the feature in Kubernetes by installing the necessary GPU drivers and device plugins supplied by your desired vendor (AMD or NVIDIA). Once available in Kubernetes, simply set the desired default and max values for the GPU type you intend to use. If not using GPUs, no GPU configuration in the profiles is necessary. For information on adding support for GPUs in Kubernetes, see the Kubernetes documentation.
Resource profiles
Resource profiles greatly simplify the task of assigning CPU, memory, or GPU resources for a job (provided that GPUs are available). They are configured in the optional /etc/rstudio/launcher.kubernetes.resources.conf
file. For example:
/etc/rstudio/launcher.kubernetes.resources.conf
[default]
name = "Default" # optional, derived from the section name when absent
cpus=1
mem-mb=4096
[small]
cpus=1
mem-mb=512
[default-gpu]
cpus=1
mem-mb=4096
nvidia-gpus=1
amd-gpus=0
[hugemem]
name = "Huge Memory"
cpus=8
mem-mb=262144
By default, all profiles are available to all users, and jobs can also use a special custom
profile to specify CPU, memory, and GPU resources directly instead. However, users are still subject to the constraints in User and group profiles, and administrators may also limit access to individual resource profiles with that configuration file.
For example, suppose an admin wants to restrict the resource profiles above such that (1) GPUs and large memory jobs are only available to users in the bioinformatics
group; and (2) only users in the posit-power-users
group can use the custom
resource profile to set their own resources directly. This might result in the following /etc/rstudio/launcher.kubernetes.profiles.conf
file:
/etc/rstudio/launcher.kubernetes.profiles.conf
[*]
resource-profiles=default,small
allow-custom-resources=0
[@bioinformatics]
resource-profiles=default,small,default-gpu,hugemem
[@posit-power-users]
resource-profiles=default,small,default-gpu,hugemem
allow-custom-resources=1
The settings available in each section of the /etc/rstudio/launcher.kubernetes.resources.conf
file are described in more depth in the table below:
/etc/rstudio/launcher.kubernetes.resources.conf
Config Option | Description | Required (Y/N) | Default Value |
---|---|---|---|
name | A user-friendly name for the profile, e.g. Default (1 CPU, 4G mem) or m4.xlarge . |
N | The section title |
cpus | The CPU limit. | Y | |
cpus-request | The CPU request. | N | |
mem-mb | The memory limit, in megabytes. | Y | |
mem-mb-request | The memory request, in megabytes. | N | |
nvidia-gpus | Number of NVIDIA GPUs, if supported by the cluster. | N | 0 |
amd-gpus | Number of AMD GPUs, if supported by the cluster. | N | 0 |
placement-constraints | Any placement constraints, as a comma-separated list of the form key1:value1,key2:value2 . |
N |
Kubernetes object templating
The new preferred method for modifying objects (jobs and services) submitted to Kubernetes is to use the new templating feature. This allows templating of the entire YAML payload that is submitted to the Kubernetes API using a syntax similar to Helm charts. This new method is easier to use than the previous job-json-overrides
functionality, and allows for the use of conditional logic to make job modification dynamic, as opposed to the static transformations offered by job-json-overrides
.
To use the templating feature, enable it in /etc/rstudio/launcher.kubernetes.conf:
/etc/rstudio/launcher.kubernetes.conf
use-templating=1
Launcher supports templating of these Kubernetes Resource types
Resource | API Docs |
---|---|
Job | Job API |
Service | Service API |
Generating templates
After restarting the Launcher, job and service templates will automatically be written to the Kubernetes Launcher scratch-path
(/var/lib/rstudio-launcher/Kubernetes
by default), and these templates will be used to to create the jobs and services that are submitted to Kubernetes when starting Launcher jobs. These templates will only be created if they do not already exist to ensure that any changes made are not overwritten.
You can also generate the templates via a command instead of having to first run the Launcher to create them. This can be done with the --generate-templates
command:
sudo /path/to/rstudio-kubernetes-launcher --generate-templates
Note that the templates will be created in the scratch-path
as mentioned above. Only the templates found in the scratch-path
will be used for templating. The scratch-path
is determined by parsing the launcher.conf
and launcher.kubernetes.conf
files. If these files do not yet exist, you can specify the scratch-path by adding --scratch-path <path>
to the --generate-templates
command.
Modifying templates
Once the templates have been generated, you can modify them as necessary for your Kubernetes cluster. For example, you can add additional annotations to jobs, add Linux capabilities to the container, or even run a side-car container.
To modify the templates, edit the job.tpl
and service.tpl
files that were previously generated within the scratch-path
. You can add additional fields to the job, but it is strongly recommended that you do not delete fields from the template. Doing so may cause your jobs to fail to launch properly, or cause unintended subtle issues with Job Launcher functionality.
Changes to the template require either a restart of the Launcher or a SIGHUP
signal. The SIGHUP
signal can be sent to the process to cause the templates to be reloaded during run-time. Note that if the new changes are not valid, the original templates will continue to be used until valid changes are reloaded.
sudo kill -s SIGHUP $(pidof rstudio-kubernetes-launcher)
The object templates are modeled after Helm charts and have identical syntax. Like Helm charts, conditional logic, loops, and various functions are supported via the Sprig library. In contrast to Helm, there are no Values
files and no directory structure for templates - all templates are simply stored in the scratch-path
. The only values available for use in the template are available on the .Job
object which represents the job being submitted via the Launcher.
All templates must start with a comment indicating the version of the template in use, which must be compatible with the version required by the Launcher. Starting with Launcher version 2.6.0, this version follows Semantic Versioning. This is to ensure that your templates are up-to-date with what is needed by the Launcher (see Examples for details). Launcher will fail to start or load a template that does not satisfy the required version. The expected version comment is generated when invoking the --generate-templates
command.
The following fields are available on the .Job
object:
Field | Type | Description |
---|---|---|
id | string | The unique ID for the job. Only set for service objects. |
cluster | string | The name of the Launcher cluster. |
generateName | string | Used for generating unique object names within Kubernetes to ensure object names do not collide. |
name | string | The name of the Job. |
user | string | The submitting user of the Job. |
workingDirectory | string | The working directory for the job command being executed. |
container | map | The container that will run the job command. |
container.image | string | The Docker image of the container. |
container.runAsUser | int | The UID of the container. May be nil if the default of 0 (root) is used. |
container.runAsGroup | int | The GID of the container. May be nil if the default of 0 (root) is used. |
container.supplementalGroupIds | int array | An array of supplemental UIDs for the container user. May be empty. |
host | string | The desired host to run the Job on. Usually empty. |
command | string | The Job command to execute. If empty, exe will be specified. |
exe | string | The executable to execute. If empty, command will be specified. |
stdin | string | The Job’s stdin. |
args | string array | The arguments for the command/executable to be run. |
placementConstraints | map array | The placement constraints to be used for deciding where the job should be run. |
placementConstraint.name | string | The name of the Placement Constraint. Example: availability-zone , region , etc. |
placementConstraint.value | string | The value of the Placement Constraint. Example: us-east-2 , m3xlarge , etc. |
exposedPorts | map array | The ports that should be exposed/opened for the job. |
exposedPort.protocol | string | The protocol for the port (TCP or UDP). |
exposedPort.targetPort | string | The port within the container to expose/open. |
metadata | map | The arbitrary JSON object that was provided via the metadata field. Used to apply overrides to annotations, labels, init containers, etc. |
mounts | map array | The requested mounts for the job. This is generally complicated to work with to transform into Kubernetes objects, so volumes and volumeMounts are provided. |
mount.mountPath | string | The path where the mount should be mounted to. |
mount.readOnly | bool | Whether or not the mount should be read-only. |
mount.mountSource | map | The description of the mount itself. |
mount.mountSource.type | string | The type of the mount (e.g. host, nfs, etc.). |
mount.mountSource.source | map | The underlying description of the type of the mount. Varies by mount type. |
config | map array | Job-specific config unique to the Kubernetes Launcher. Currently used to specify secret env vars. |
config.name | string | The name of the Job config. |
config.value | string | The value of the Job config. |
resourceLimits | map array | The resource limits to be used for the Job. |
resourceLimit.type | string | The type of the resource limit (memory, cpuCount, NVIDIA GPUs, or AMD GPUs). |
resourceLimit.value | string | The value of the resource limit. |
volumes | map array | The Kubernetes volumes that should be mounted. This is constructed from the requested Job mounts . |
volume.name | string | The unique name of the volume. |
volume.? | map | The sub-object describing the volume. This varies based on the type of the volume being mounted. |
volumeMounts | map array | The Kubernetes volume mounts describing how volumes should be mounted. This is constructed from the requested Job mounts . |
volumeMount.name | string | The name of the volume to be mounted. Must match a volume.name. |
volumeMount.mountPath | string | The path where the mount should be mounted to. |
volumeMount.readOnly | bool | Whether or not the mount should be read-only. |
tags | string array | The tags for the job. |
servicePortsJson | string | Used by the Launcher to ensure that services are created properly after a restart. |
shareProcessNamespace | bool | Reflects the current sharedProcessNamespace configuration setting to control if a container should use shared process namespacing. |
memoryRequestRatio | decimal | Reflects the memoryRequestRatio as specified in the User/Group profiles. Deprecated - do not use. |
cpuRequestRatio | decimal | Reflects the cpuRequestRatio as specified in the User/Group profiles. Deprecated - do not use. |
serviceAccountName | string | The Kubernetes Service Account specified for the Job Pods, if any. |
The following template functions are available for use in addition to the previously mentioned Sprig template functions:
Name | Description | Example |
---|---|---|
include | Renders the specified template file (other than job.tpl and service.tpl ) with the specified values and returns the result. |
{{ include “custom.tpl” . }} |
toYaml | Renders the specified object as YAML. | {{ toYaml .Job.volumes }} |
exec | Executes the specified command or executable with the given arguments, returning an ExecResult type, which if rendered directly returns the stdout for the process. See Examples below. |
{{ exec “echo” “Hello, world” }} |
groups | Returns a list of groups for the specified user by shelling out to the groups Linux command. |
{{ groups .Job.user }} |
To see what has been modified in the template files, you can use the --diff-templates
command. This will show the output of the diff
Linux command comparing the changes that you’ve made to the original templates generated with the --generate-templates
command. Note that the diff
command must be present on the PATH
in order to use this functionality.
sudo /path/to/rstudio-kubernetes-launcher --diff-templates
Examples
Adding host aliases to the job
Host Aliases are defined on spec.template.spec
.
job.tpl
# Version: 2.2.0
apiVersion: batch/v1
kind: Job
(...omitted for brevity...)
spec:
backoffLimit: 0
template:
(...omitted for brevity...)
spec:
hostAliases:
- ip: "10.2.141.12",
hostnames: ["db01"]
- ip: "10.2.141.13",
hostnames: ["db02"]
Adding Linux capabilities to the job container
Capabilities are defined on spec.template.spec.securityContext
. The securityContext
is conditionally constructed and is not present for all jobs, so you need to modify the template so that it is always constructed with your desired capabilities.
job.tpl
# Version: 2.2.0
apiVersion: batch/v1
kind: Job
(...omitted for brevity...)
spec:
backoffLimit: 0
template:
(...omitted for brevity...)
spec:
(...omitted for brevity...)
securityContext:
{{- if $securityContext }}
{{- range $key, $val := $securityContext }}
{{ $key }}: {{ $val }}
{{- end }}
capabilities:
add: ["NET_ADMIN", "SYS_PTRACE"]
Adding an ImagePullSecret
imagePullSecrets
are defined on spec.template.spec
.
job.tpl
# Version: 2.2.0
apiVersion: batch/v1
kind: Job
(...omitted for brevity...)
spec:
backoffLimit: 0
template:
(...omitted for brevity...)
spec:
imagePullSecrets:
- name: mysecret
Custom annotation with dynamic execution
It can be useful to annotate jobs with special annotations depending on certain business logic. This business logic can be encapsulated in a command or executable that you run that decides the value of the annotation. The following example shows what it might look like to fetch the organizational cost center for a user given their user name and a custom application maintained by the business.
job.tpl
# Version: 2.2.0
apiVersion: batch/v1
kind: Job
metadata:
generateName: {{ toYaml .Job.generateName }}
spec:
backoffLimit: 0
template:
metadata:
annotations:
my.org/costCenter: {{ exec "get-user-details" "--cost-center" .Job.user }}
The hypothetical get-user-details
command would then write the user’s cost center to stdout when invoked, which would cause it to be stamped as an annotation on the job as my.org/costCenter
which could be used with various Kubernetes reporting tools.
Using the exec
function, it is also possible to obtain more information about the result of the invoked process from the ExecResult
return value:
job.tpl
# Version: 2.2.0
apiVersion: batch/v1
kind: Job
metadata:
generateName: {{ toYaml .Job.generateName }}
spec:
backoffLimit: 0
template:
metadata:
annotations:
{{- $res := exec "get-user-details" "--cost-center" .Job.user }}
{{- if eq $res.ExitCode 1 }}
my.org/costCenter: "UNKNOWN: {{ $res.Err }}"
{{- else }}
my.org/costCenter: {{ $res.Stdout }}
{{- end }}
The following fields are available on an ExecResult
:
Field | Type | Description |
---|---|---|
ExitCode | int | The exit code of the process. If the process encountered an error before running or exited due to signal, this is set to -1. |
Err | error | The error that occurred while running the process, if any. May be nil. |
Stdout | string | The stdout of the process. |
Stderr | string | The stderr of the process. |
Add annotation for group members
In this example, we use the groups
template function to determine if the job user belongs to the admin-grp
group, adding an annotation if so.
job.tpl
# Version: 2.2.0
apiVersion: batch/v1
kind: Job
metadata:
generateName: {{ toYaml .Job.generateName }}
spec:
backoffLimit: 0
template:
metadata:
annotations:
{{- $grp := groups .Job.user }}
{{- if has "admin-grp" $grp }}
my.org/admin: "true"
{{- end }}
IAM permissions with AWS roles for EKS service accounts
When running the Launcher in an EKS cluster within AWS, you can setup IAM roles to be assumed by running Launcher jobs. The setup within AWS requires the following:
Enabling the OIDC provider for your EKS cluster.
Various IAM roles which can be assumed. Each of these roles will be associated with a specific Kubernetes service account.
Various Kubernetes service accounts, one for each IAM role you want assumable by Launcher jobs.
For details on how to create these within AWS, see the AWS EKS Documentation. Once these are set, you can use the templating feature to determine which service account to use for your jobs, thus allowing EKS to automatically associate AWS credentials with the started pods to allow your users to access IAM-protected resources.
For example, let’s assume you have written a script called get-kube-svc-account
that maintains a many-to-one mapping of users to Kubernetes service accounts (which implies a mapping of users to IAM roles). We can modify the job template to stamp the correct service account on the running pod by invoking the script like so.
job.tpl
# Version: 2.2.0
apiVersion: batch/v1
kind: Job
metadata:
generateName: {{ toYaml .Job.generateName }}
spec:
backoffLimit: 0
template:
metadata:
annotations:
(...omitted for brevity...)
generateName: {{ toYaml .Job.generateName }}
spec:
{{- $res := exec "get-kube-svc-acct" .Job.user }}
{{- if eq $res.ExitCode 0 }}
serviceAccountName: {{ $res.stdOut }}
{{- end }}
(...omitted for brevity...)
{{- $res := exec "id" "-g" .Job.user }}
{{- if eq $res.ExitCode 0 }}
{{- $_ := set $securityContext "fsGroup" $res.Stdout }}
{{- end }}
{{- if $securityContext }}
securityContext:
{{- range $key, $val := $securityContext }}
{{ $key }}: {{ $val }}
{{- end }}
{{- end }}
The fsGroup parameter is required to ensure that the secret IAM temporary credential files are able to be read by the user running within the pod. Without this, only the root user would be able to read these files, preventing your user from accessing AWS resources. Care must be taken to ensure that the running user *actually* belongs to the fsGroup, as this group ID is added to the running pod user. A mistake here could inadvertently give the user access to other files unintentionally, such as any shared files that are mounted with a volume.
Though a hypothetical mapping script was used for this example, you could use a more robust approach, such as calling out to a service that maintains a mapping, or by managing the mappings in LDAP or some other user database.
Microsoft Entra ID tokens for AKS service accounts (Azure Workload Identity)
Similar to the previous example on AWS IAM roles, Azure provides Azure Workload Identity to allow access of Azure resources within AKS pods. The setup within Azure requires the following:
Enabling the OIDC provider for your AKS cluster.
Installing the Mutating Admission Webhook within the cluster.
Creation of Azure resources that you will need to access within AKS. Each resource must specify the service principal of a specific Azure Active Directory application which will be tied to an AKS service account that will be used when running Launcher jobs.
Various AKS service accounts that have been associated with the aforementioned Azure Active Directory application.
For details on how to create these within Azure, see the Azure Workload Identity documentation. Once these are set, you can use the templating feature to determine which service account to use for your jobs (similar to the previous example), thus allowing your users access to your Azure resources within Launcher jobs. Use the example in the AWS IAM section to start your Launcher jobs with the desired AKS service account.
IAM permissions with GKE (GKE Workload Identity)
GKE also allows access to IAM-protected resources within Google Cloud. The setup within Google Cloud requires the following:
Enabling Workload Identity on your GKE cluster.
Enabling Workload Identity on your GKE node pool.
IAM service account(s) that has associated IAM roles that provide access to the Google Cloud resources you want available in your Launcher jobs.
Various GKE service accounts, one for each IAM service account that provides the previously mapped IAM roles to the account.
For details on how to create these within Google Cloud, see the GKE Workload Identity documentation. Once these are set, you can use the templating feature to determine which service account to use for your jobs (similar to the previous examples), thus allowing your users access to your Google Cloud resources within Launcher jobs. Use the example in the AWS IAM section to start your Launcher jobs with the desired GKE service account.
You will also need to add a nodeSelector to the job pod (via the job.tpl file) to ensure that the GKE metadata server is accessible from the node(s) that will be running your jobs.
job.tpl
# Version: 2.2.0
apiVersion: batch/v1
kind: Job
metadata:
generateName: {{ toYaml .Job.generateName }}
spec:
backoffLimit: 0
template:
metadata:
annotations:
(...omitted for brevity...)
generateName: {{ toYaml .Job.generateName }}
spec:
nodeSelector:
iam.gke.io/gke-metadata-server-enabled: "true"
IAM permissions with kiam (deprecated)
You can extend the above example to include IAM roles per group using kiam. The following example will set the users in the group admin-grp
to an existing IAM role (rs-admin-role
). All other users are given a different IAM role (rs-user-role
).
job.tpl
# Version: 2.2.0
apiVersion: batch/v1
kind: Job
metadata:
generateName: {{ toYaml .Job.generateName }}
spec:
backoffLimit: 0
template:
metadata:
annotations:
{{- $grp := groups .Job.user }}
{{- if has "admin-grp" $grp }}
my.org/admin: "true"
iam.amazonaws.com/role: "rs-admin-role"
{{- else }}
iam.amazonaws.com/role: "rs-user-role"
{{- end }}
Change the Kubernetes Service type
Launcher supports configuring a Kubernetes Service with a type of ClusterIP
, LoadBalancer
, or NodePort
. Launcher will create a Kubernetes Service only when a job definition includes one or more exposedPorts
. Please reference the Launcher API documentation for further details.
To make the Service only be reachable from within the cluster set the type
field to ClusterIP
as shown in the following example.
service.tpl
# Version: 2.2.0
apiVersion: v1
kind: Service
metadata:
(...omitted for brevity...)
spec:
(...omitted for brevity...)
clusterIP: ''
type: ClusterIP
On cloud providers which support external load balancers, setting type
field to LoadBalancer
provisions a load balancer for your Service. Consult the documentation from your cloud provider for details on support for additional parameters (e.g. loadBalancerIP
or loadBalancerClass
).
Using the LoadBalancer
service type may result in new infrastructure being provisioned by your cloud provider. Your cloud provider will charge you for any infrastructure that is provisioned.
service.tpl
# Version: 2.2.0
apiVersion: v1
kind: Service
metadata:
(...omitted for brevity...)
spec:
(...omitted for brevity...)
clusterIP: ''
type: LoadBalancer
AWS Fargate
Launcher supports running jobs on AWS Fargate. Please review the AWS Fargate Considerations and the Fargate pod configuration to understand the limits of scheduling pods.
Pods scheduled on AWS Fargate may take 10 minutes or longer to start. Using a larger instance and a smaller container image can result in a faster startup time.
Create one or more Fargate Profiles.
Configure the Launcher Kubernetes Plugin for one or more Launcher Clusters
Modify the Kubernetes Templates for the relevant Launcher Cluster(s)
The Kubernetes Service type must be one of
ClusterIP
orLoadBalancer
. Fargate does not supportNodePort
.If your Fargate Profile specifies Kubernetes labels to match, you can use placement constraints to conditionally add these labels to the job template.
Configure a placement constraint in the appropriate profiles configuration file
launcher.kubernetes.profiles.conf
[*] placement-constraints=eks.amazonaws.com/compute-type:fargate
The matching label(s) need to be added to the pod metadata (
spec.template.metadata.labels
). In this example, the Fargate profile is configured to match the labelnode-type=fargate
.service.tpl
spec: (...omitted for brevity...) template: (...omitted for brevity...) metadata: (...omitted for brevity...) labels: (...potential additional labels...) {{- range .Job.placementConstraints -}} {{- if and (eq .name "eks.amazonaws.com/compute-type") (eq .value "fargate")}} node-type: fargate {{- end }} {{- end }}
AWS Fargate schedules each Pod within a Virtual Machine (VM). This VM will not be terminated until the Kubernetes Job
is deleted.
AWS will charge you for these EC2 instances until the Jobs are deleted. To ensure these nodes are cleaned up promptly, lower the value of job-expiry-hours
in the relevant plugin configuration file.
To clean up Launcher jobs immediately after completion use the following configuration, which will terminate the associated EC2 instance when the Kubernetes Job
is removed.
job-expiry-hours=0
Validating templates
Once changes have been made to templates, it is recommended run the --validate-templates
command to ensure that the changes are valid YAML and that no important Job Launcher pieces have been tampered with inadvertently. This command generates a test Job payload and renders the templates in the scratch-path
.
sudo /path/to/rstudio-kubernetes-launcher --validate-templates
If any errors are found during validation, they will be reported so that the templates can be modified to fix them. The validator tests for the following issues:
- Is the template itself valid? Do all functions exist and is all syntax correct?
- Is the rendered template YAML valid?
- Is there any extra white space on the YAML object?
- Are all pieces of the object needed by the Job Launcher available and of the correct type?
- Are all required hard-coded values set correctly?
Errors produced during validation are strongly indicative of issues that should be addressed, but the Launcher does not require validation to complete without errors before using the templates.
It is also sometimes desirable to inspect the actual YAML that is generated after templating. This can be done by adding --verbose
to the --validate-templates
command detailed above.
JSON overrides
Whenever a job is submitted to the Kubernetes Launcher plugin, a JSON job object is generated and sent to Kubernetes. In some cases, it may be desirable to add or modify fields within this automatically generated JSON blob.
In order to do that, you may specify job-json-overrides
within the profiles file. The form of the value should be "{json path}":"{path to json value file}","{json path 2}":"{path to json value file 2}",...
.
The JSON path should be a valid JSON path pointer as specified in the JSON Pointer RFC.
The JSON value path specified must be a file readable by the service user, and must contain valid JSON. For example, to add Host Aliases to all submitted jobs:
/etc/rstudio/launcher.kubernetes.profiles.conf
job-json-overrides="/spec/template/spec/hostAliases":"/etc/rstudio/kube-host-aliases"
/etc/rstudio/kube-host-aliases
[
{
"ip": "10.2.141.12",
"hostnames": ["db01"]
},
{
"ip": "10.2.141.13",
"hostnames": ["db02"]
}
]
Because the pod itself is nested within the Kubernetes Job object, it is located at the path /spec/template/spec
. In the example above, we simply add a JSON object representing the HostAlias
array as defined by the Kubernetes API. See the Kubernetes API Documentation for an exhaustive list of fields that can be set.
Any job-json-overrides
-specified fields will overwrite already existing fields in the auto-generated job spec. Note that the Kubernetes Launcher plugin requires certain fields to be set in order to properly parse saved job data. It is strongly recommended you use the job-json-overrides
feature sparingly, and only use it to add additional fields to the automatically generated job object when necessary.
Kubernetes cluster requirements
In order for the Kubernetes plugin to run correctly, the following assumptions about the Kubernetes cluster must be true:
- The Kubernetes API must be enabled and reachable from the machine running the Job Launcher
- There must be a namespace to create jobs in, which can be specified via the
kubernetes-namespace
configuration mentioned above (this defaults torstudio
) - There must be a service account that has full API access for all endpoints and API groups underneath the aforementioned namespace, and the account’s auth token must be supplied to the plugin via the
auth-token
setting - The service account must have access to view the nodes list via the API (optional, but will restrict IP addresses returned for a job to the internal IP if not properly configured, as
/nodes
is needed to fetch a node’s external IP address) - The cluster must have the metrics-server add-on running and working properly to provide job resource utilization streaming
In order to use placement constraints, you must attach labels to the node that match the given configured placement constraints. For example, if you have a node with the label az=us-east
and have a placement constraint defined az:us-east
, incoming jobs specified with the az:us-east
placement constraint will be routed to the desired node. For more information on Kubernetes’ placement constraints, see here.
The following sample script can be run to create a job-launcher
service account and rstudio
namespace, granting the service account (and thus, the launcher) full API access to manage Workbench jobs:
kubectl create namespace rstudio
kubectl create serviceaccount job-launcher --namespace rstudio
cat > job-launcher-role.yaml <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: job-launcher-role
namespace: rstudio
rules:
- apiGroups:
- ""
resources:
- "pods"
- "pods/log"
- "pods/attach"
- "pods/exec"
verbs:
- "get"
- "create"
- "update"
- "patch"
- "watch"
- "list"
- "delete"
- apiGroups:
- ""
resources:
- "events"
verbs:
- "watch"
- apiGroups:
- ""
resources:
- "services"
verbs:
- "create"
- "get"
- "watch"
- "list"
- "delete"
- apiGroups:
- "batch"
resources:
- "jobs"
verbs:
- "create"
- "update"
- "patch"
- "get"
- "watch"
- "list"
- "delete"
- apiGroups:
- "metrics.k8s.io"
resources:
- "pods"
verbs:
- "get"
- apiGroups:
- ""
resources:
- "serviceaccounts"
verbs:
- "list"
EOF
kubectl create -f job-launcher-role.yaml; rm job-launcher-role.yaml
kubectl create rolebinding job-launcher-role-binding --namespace rstudio \
--role=job-launcher-role \
--serviceaccount=rstudio:job-launcher
kubectl create clusterrole job-launcher-clusters \
--verb=get,watch,list \
--resource=nodes
kubectl create clusterrolebinding job-launcher-list-clusters \
--clusterrole=job-launcher-clusters \
--group=system:serviceaccounts:rstudio
It should be noted that the ClusterRole
created above is only used to get information about the nodes in the cluster that can run Launcher jobs. This is sometimes necessary to ensure that the Launcher can determine all of the IP addresses that belong to the node to ensure that they are reported properly to upstream clients of the Launcher. This ensures that external clients can connect to their Launcher jobs as required. If you ensure that all clients are connecting to the Launcher internally within the same network segment (meaning that clients can connect to the internal IP address of the Kubernetes node), you can forego the ClusterRole
and ClusterRoleBinding
. If you run into problems where clients cannot connect to their Launcher jobs, you may need the external IP address of the node(s) in your cluster, which will require the ClusterRole
to be given to the Launcher service account.