RStudio Jobs
RStudio has the ability to send long running R scripts to local and remote background jobs. This functionality can dramatically improve the productivity of data scientists and analysts using R since they can continue working in RStudio while jobs are running in the background.
Local background jobs are supported by all versions of the RStudio IDE, server and desktop. Remote background jobs are a feature of Posit Workbench and are orchestrated by the Launcher, which also supports running interactive sessions on remote resource managers like Kubernetes. More information about configuring the Job Launcher can be found in the Posit Workbench Admin Guide, and more information about using Workbench Jobs can be found in the Posit Workbench User Guide.
Background jobs can be started manually or programatically. Local background jobs are ideal for interactive, ad-hoc usage.
Background jobs
A “background job” is an R script that runs in a separate, dedicated R session. Any R script can be run in a separate session by:
from the Background Job tab in the Console, select Start Background Job.
Or from within a .R file, click the Source menu and select Source as Background Job.
Either method will give you some options for running your job.
By default, the job runs in a clean R session, and its temporary workspace is discarded when the job is complete. This is the fastest and safest configuration, suitable for reproducible scripts without side effects. Because the job runs as a child process of the current session, any environment variables set at the process-level or shell-level are passed to the background job from the main session.
However, if you want to feed data from your current R session into the job, or have the job return data to your current R session, change the dialog options as follows:
Run job with copy of global environment:
If selected, this option saves your global environment and loads it into the job’s R session before it runs. This is useful because it will allow your job to see all the same variables you can see in the IDE. Note that this can be slow if you have large objects in your environment.
Copy job results:
By default, the temporary workspace in which the job runs is not saved. If you’d like to import data from your job back into your R session, you have a couple of choices:
- Global environment: This places all the R objects your job creates back in your R session’s global environment.
Use Copy jobs results to Global environment option with caution! The objects created by the job will overwrite, without a warning, any objects that have the same name in your environment.
- Results object: This places all the R objects your job creates into a new environment named with the following pattern “yourscriptname” + “_results”. For example, a script titled
training.R
executed as Background Job would be saved back into the global environment as an object titledtraining_results
.
An additional option is to include specific R code to save out the results to files on disk such as saveRDS(object, "results.rds")
.
Lifetime
Background jobs run as non-interactive child R processes of your main R process, which means that they will be shut down if R is. While the R session is running jobs:
You will be warned if you attempt to close the window while jobs are still running (on RStudio Desktop)
Your R session will not be suspended (on Posit Workbench)
While background jobs are running, a progress bar will appear in the R console summarizing the progress of all running jobs.
Detailed progress
The progress bar RStudio shows for your job represents the execution of each top-level statement in your R script. If you want a little more insight into which part of the script is currently running, you can use RStudio’s code sections feature. Add a section marker like this to your R script:
# Apply the model ----
When your job reaches that line in your script, the name of the section will appear on the progress bar.
You can also emit output using the usual R mechanisms, such as: - print
- message
- cat
This output appears in the Jobs pane when you select your job.
Scripting
You can script the creation of jobs using the rstudioapi package method jobRunScript; it has options which correspond to each dialog option above. This makes it possible to automate and orchestrate more complicated sets of background tasks.
However, the RStudio’s background job runner is generally designed for interactive script runs. If you are writing R code and need to run a subtask asynchronously in a background R session, we recommend using the callr package instead.
Showing task progress
RStudio’s Jobs pane can show more than just the progress of Background or Workbench jobs. It can also be scripted from R packages (and R code) to show status, progress, and output for any long-running task.
To show progress and/or output from a task using the jobs UI, refer to the rstudioapi documentation for details; start with jobAdd()
, which creates a new job in the UI and returns a handle you can use to update the UI as the job progresses.
Interact with the jobs pane. | |
---|---|
jobAdd() |
Add a Job |
jobAddOutput() |
Add Background Job Output |
jobAddProgress() |
Add Background Job Progress |
jobRemove() |
Remove a Background Job |
jobSetProgress() |
Set Background Job Progress |
jobSetState() |
Set Background Job State |
jobSetStatus() |
Set Background Job Status |
Workbench (Launcher) jobs
On Posit Workbench, you also have the option of running your R script on your company’s compute infrastructure, using a Workbench Job. To do this, from the Source drop-down menu, select Source as Workbench Job:
See more documentation specific to Workbench Jobs: