Process Management
Posit Connect launches processes to perform a variety of tasks. This includes:
- Installing Python and R packages
- Rendering reports such as Jupyter Notebooks, and R Markdown and Quarto documents
- Running Shiny, Dash, Streamlit, or Bokeh Applications
- Running APIs using Flask, FastAPI, Plumber, or TensorFlow Serving
- Customizing a parameterized R Markdown document
Python installations are configured by setting Python.Executable
. Quarto installations are configured by setting Quarto.Executable
. The location of R defaults to whatever is in the path. Customize the R.Executable
setting to use a specific R installation. See the Python,R, and Quarto sections for details.
Sandboxing
This section about process sandboxing only applies to the local execution mode of Posit Connect.
If you are running Posit Connect with off-host Kubernetes execution, the Posit Connect container does not need to be run as privileged or with the CAP_SYS_ADMIN
capability. When using off-host Kubernetes execution, isolation between content items is handled using separate containers and Kubernetes VolumeMounts
. An overview of this architecture is available in the appendix.
The Posit Connect process runs as the root
user. These elevated privileges let Connect bind to protected ports, manage files and processes for your RunAs
users, and create “unshare” environments where content processes are run.
Posit Connect uses unprivileged users to run any content process. Content will run using either a system-default user or a content-specific override. See the User Account for Processes section for details. Content processes are also run within an “unshare” environment, which partially isolates those processes.
Posit Connect runs processes within an environment constructed using unshare(2)
. Within this environment, a mount namespace (CLONE_NEWNS
) is constructed, which lets Connect create bind mounts with mount(2)
that hide and isolate parts of the file system from the content process. A user namespace (CLONE_NEWUSER
) is created to partially isolate the process.
If you are running Posit Connect within a container using its local execution mode, the container must be started with additional privileges. The Docker section discusses privileged containers and the capabilities needed by Posit Connect.
Some systems disable user namespaces by default (e.g. Red Hat Enterprise Linux 7). Content process sandboxes will not include a user namespace on these systems. Enable user namespaces for additional process isolation.
You can use sysctl
to determine if user.max_user_namespaces
is set to zero, indicating that support is disabled.
Sandboxing references
These references link to the Linux documentation about namespaces and the system calls used to create the process sandbox.
File system access and isolation
The following locations are masked during content execution:
The
Server.DataDir
directory containing all variable data used by Posit Connect.The
SQLite.Dir
directory, which can optionally be placed outside the data directory.Configuration directories, including
/etc/rstudio-connect
.The
Server.TempDir
directory contains aconnect-workspaces
sub-directory with per-process temporary directories.
The following information is exposed during content execution:
Data directories containing installed packages and environments. The exact set of directories will vary depending on the type of content.
The directory containing the deployed and unpackaged code.
The document rendering destination directory (only for executable reports).
A directory to receive log files containing process output.
A per-process temporary directory specified in the
TMPDIR
environment variable of the process. This temporary directory is created underServer.TempDir/connect-workspaces
.
When Applications.HomeMounting
is enabled, the contents of /home
are masked by an additional bind mount as follows:
The contents of
/home
are masked by the home directory of theRunAs
user.If the
RunAs
does not have a home directory, an empty directory masks/home
.
The path to the home directory is always available through the HOME
environment variable. With Applications.HomeMounting
, the mounted path to the HOME
directory is subject to change. Avoid hard-coding paths to either /home
and /home/username
.
Interactive applications
Interactive applications like Shiny, Flask, FastAPI, Dash, and Plumber have write access to the directory containing the unpackaged code. This directory is the working directory when launching an application. Data written into this directory is visible to all processes associated with that application not to processes associated with other content. Application directory data remains available until that application is next deployed to Posit Connect. A deployment creates a new application directory containing only the deployed content.
Posit Connect may launch multiple processes to service requests for an application. There is no coordination between these processes. Applications that write to local files could experience problems when different processes attempt to write to a single file.
For example, two different processes writing to the same file may see output incorrectly interleaved or even overwritten.
We do not recommend using the file system for data persistence.
Rendered documents
Executable reports have write access to a directory containing its source code and a separate output directory that receives the rendered result. A new directory containing the deployed source code is created each time the content is executed. This copy of the code ensures that simultaneous rendering processes are isolated from each other and cannot overwrite each other’s output files. The temporary source directory is the working directory when rendering the content. A distinct output directory is used each time the content is rendered. Data created during one rendering is not visible to another.
Posit Connect serves rendered content using the results written into the output directory. This result remains available until the next successful render. Incomplete and unsuccessful document renderings do not affect the availability of a previously rendered result.
Rendered sites
R Markdown multi-document sites have a slightly different rendering pipeline than standalone documents. Posit Connect uses the rmarkdown::render_site
function, which does its rendering in-place. The content from the source directory is copied into the rendering destination directory in preparation for rendering. Site rendering has write access to the destination directory. Access to the original source directory is not provided because the source content is duplicated in the destination directory
The rmarkdown::render_site
call usually places its output into a subdirectory (typically, _site
). The contents of this output subdirectory will be moved to the root of the rendering destination directory, replacing any other content. No post-rendering file movement occurs if rmarkdown::render_site
is instructed to render into the current directory instead of a subdirectory. This means that both source and output files will be available for serving.
We recommend against configuring rmarkdown::render_site
to write its output into the current directory. Rendering the site into a subdirectory (the default) allows Posit Connect to remove source from the output directory.
Posit Connect serves the rendered results for a multi-document site. This result remains available until the next successful render. Incomplete and unsuccessful site renderings do not affect the availability of a previously rendered result.
Temporary Directory
Each process started by Posit Connect is given its own unique temporary directory. These directories are created under Server.TempDir
/connect-workspaces
.
The default value for Server.TempDir
is obtained by first checking the TMPDIR
environment variable for a path and falls back to /tmp
otherwise.
You may wish to override Server.TempDir
if the default temporary directory has too little space or is mounted with the noexec
option.
If you do override Server.TempDir
please ensure the location can be reached by, read from, and written to by any user on the system. On most systems, temporary directories typically have permissions of 1777
.
You can learn more about the noexec
option here.
Environment Variables
Posit Connect sets the environment variables USER
, USERNAME
, LOGNAME
, HOME
, TMPDIR
for all content processes. Additional environment variables may be set, depending on Connect’s configuration, described in more detail here.
When Posit Connect is in local execution mode, content processes inherit all environment variables from the server. Set the setting Applications.InheritSystemEnvVars
to false
to disable this behavior and prevent content from inheriting server environment variables. The environment variable PATH
is always inherited.
When using off-host execution, content does not inherit environment variables from the server.
Other methods are available to set environment variables for content:
- A supervisor script runs before all content processes, and can be used to set environment variables. Supervisors cannot be used to set per-content-item variables.
- Environment variables for individual content items can be set in the Vars pane of the content settings sidebar.
Applications & APIs
Posit Connect manages both batch-oriented and long-lived processes. Batch-oriented process tend to be narrowly scoped and short-lived, while processes for applications built with web frameworks such as Shiny, Flask, FastAPI, Dash, Plumber, or Streamlit may see a process handle many requests for many users over their lifetimes.
Posit Connect launches a process tied to a live application when the first request arrives for that application. That process will continue to service requests until it becomes idle and is eventually terminated. If there is sufficient traffic against that application, Posit Connect may launch additional processes to service those requests.
There are a number of configuration parameters which control the conditions under which processes for applications are launched and eventually reaped. The default values are appropriate for most applications but occasionally need customization in specialized environments. The Scheduler configuration appendix explains each of the options.
We recommend that adjustment to these runtime properties be done gradually.
User Account for Processes
When using off-host execution on Kubernetes, the Applications.RunAs
user may not exist within the content container. Posit Connect does nothing to provision the user for you, however, the container will be started with the uid/gid
of the Applications.RunAs
user from the Posit Connect server container.
Posit Connect executes your content with an unprivileged Unix account. The Applications.RunAs
setting tells Posit Connect which account to use. The rstudio-connect
account is created during installation and used as the default value for Applications.RunAs
.
The root
account never executes deployed user code.
Administrators can configure some pieces of content to be executed by a different Unix account than the Applications.RunAs
default. This setting is found in the Runtime tab when editing content settings. Non-administrators are prohibited from changing the RunAs
setting.
Each Unix account used as a custom RunAs
must be a member of the Unix group Applications.SharedRunAsUnixGroup
. This group membership requirement always applies, even when Applications.RunAs
does not use the default rstudio-connect
user.
The rstudio-connect
user has a primary group also named rstudio-connect
.
Let’s customize the Unix RunAs
user and SharedRunAsUnixGroup
to allow alternate Unix accounts for specific pieces of content.
We want to use the ds-system
Unix account as our default RunAs
user and the data-scientists
Unix group as our shared group.
The following configuration tells Posit Connect to use ds-system
:
; /etc/rstudio-connect/rstudio-connect.gcfg
[Applications]
RunAs = "ds-system"
SharedRunAsUnixGroup = "data-scientists"
Other Unix accounts that belong to the data-scientists
group can be used as RunAs
overrides. For example, the Unix account hadley
must be a member of the data-scientists
group before it can be used to run your application.
Package installation always happens as the Applications.RunAs
user. An application or report may override its RunAs
setting; this alters how the deployed code is executed and does not impact package installation. See the Sandboxing section for more information about process sandboxing.
The RunAs
Unix account does not need to be associated with a Posit Connect user account. Most installations use a small number of shared Unix accounts. Some configurations (e.g. PAM authentication) pair Posit Connect user accounts with Unix accounts, but this is not required.
Current user execution
The Applications.RunAsCurrentUser
setting is not available when using off-host execution.
Posit Connect can use a local Unix account associated with the currently logged-in user when executing applications. This works for Shiny apps, Shiny documents, and Python Dash, Streamlit, and Bokeh apps. This feature requires that user authentication use PAM.
See Authentication Integration with PAM for information about using PAM for user authentication.
The Applications.RunAsCurrentUser
property specifies that content can be configured to execute as the currently logged-in user.
; /etc/rstudio-connect/rstudio-connect.gcfg
[Applications]
RunAsCurrentUser = true
Content execution settings are not altered when Applications.RunAsCurrentUser
is enabled. The Applications.RunAsCurrentUser
setting permits current-user execution but by itself does not change how processes are launched. Each application or interactive document must explicitly request current-user execution.
When Applications.RunAsCurrentUser
is enabled, administrators can customize individual content items to run as the Unix account associated with the logged-in user. The Runtime content setting tab gives content the option of executing using “The Unix account of the current user”.
Content configured to run as the current-user will execute as the specified fallback RunAs
user when accessed anonymously.
Content without this current-user customization will continue to run as the specified RunAs
user.
See the User Account for Processes section for more information about RunAs
customization.
All Unix accounts used to execute content must be members of the Unix group defined by Applications.SharedRunAsUnixGroup
. Applications are not permitted to launch if the Unix account associated with the logged-in user does not have the proper group membership.
The Applications.RunAs
setting uses the rstudio-connect
user by default. This user has a primary group also named rstudio-connect
. Any Unix account that may be used to execute content must be a member of the rstudio-connect
group.
PAM sessions
PAM sessions are not forwarded to the content container when using off-host execution.
When using PAM for authentication, Posit Connect can leverage Linux PAM to establish the environment and resources available when content runs.
PAM sessions are enabled with the PAM.UseSession
setting.
; /etc/rstudio-connect/rstudio-connect.gcfg
[PAM]
UseSession = true
The default PAM service name used for PAM sessions is su
. This gives Posit Connect the ability to launch processes as the specified user without requiring a password.
You can customize the PAM service name used for PAM sessions by customizing the PAM.SessionService
setting.
; /etc/rstudio-connect/rstudio-connect.gcfg
[PAM]
SessionService = posit-connect-session
The PAM.SessionService
must contain the PAM directive that enables authentication with root privileges. Otherwise, processes will not run and will return error code 70.
# Allows root to su without passwords (required)
auth sufficient pam_rootok.so
PAM Credential Caching (Kerberos)
Posit Connect’s PAM cache is encrypted and is not stored on disk. The credentials must expire after a certain period of time.
Posit Connect can be configured to securely cache a user’s PAM credentials when they log in to Posit Connect. This enables Posit Connect to let users run processes as their current UNIX account when the PAM profile requires a user’s credentials, such as when using Kerberos.
The following config settings are required for credential caching to be enabled:
; /etc/rstudio-connect/rstudio-connect.gcfg
[Applications]
RunAsCurrentUser = true
[PAM]
; Enable PAM sessions
UseSession = true
; Forward the current user's password into the PAM session
ForwardPassword = true
; Cache passwords for 12 hours after login
PasswordLifetime = 12h
; PAM service that accepts credentials ("su" is the default)
AuthenticatedSessionService = YOUR_PAM_SERVICE_HERE
Replace 12h
with the amount of time you would like credentials to be cached. Credential lifetime is counted from the moment the user logs into Posit Connect. It is not tied to the user’s web session, except that logging in again will restart the timer for that user’s credentials.
The PAM.AuthenticatedSessionService
setting is similar to PAM.SessionService
, except that it should accept user credentials and validate them. For example, a PAM service that uses the host’s Kerberos configuration to expose functionality could be:
auth required pam_krb5.so
account [default=bad success=ok user_unknown=ignore] pam_krb5.so
password sufficient pam_krb5.so use_authtok
session requisite pam_krb5.so
Some distributions (such as Red Hat Enterprise Linux 8) do not support the use of pam_krb5.so
. It is recommended to use pam_sss.so
instead, and configure SSSD to provide Kerberos authentication.
Path Rewriting
The sandboxing used by Posit Connect involves bind mounts which map physical locations on disk onto different directory structures at runtime. Paths used by your code use these sandboxed locations. If you need to find the physical file on disk, you will need to undo the path transformation.
This section gives some examples of path rewriting and offer some ways of finding the file you need.
Let’s start with an app.R
file that describes a Shiny application. This file will be in the apps/XX/YY/
directory underneath the Server.DataDir
location. The XX
and YY
path components correspond to the application ID and bundle (or deployment) ID for this version of your application. This directory is available at runtime as /opt/rstudio-connect/mnt/app/
.
The directory structure of /opt/rstudio-connect/mnt/
is just a number of empty directories. The “unshare” environment created during sandboxing allows Posit Connect to associate different application directories with these mount directories.
Here are some common path transformations that may be helpful. All of the physical paths are beneath the Server.DataDir
hierarchy that defaults to /var/lib/rstudio-connect
. All of the sandbox paths are beneath the mount directory /opt/rstudio-connect/mnt/
. This location is not customizable.
Physical path | Sandbox Path |
---|---|
DataDir/apps/XX/YY/ |
MountDir/app/ (non-renders) |
DataDir/reports/v2/XX/YY/temp.render.TT |
MountDir/app/ (renders) |
DataDir/reports/v2/XX/VV/RR |
MountDir/report/ |
DataDir/R |
MountDir/R |
DataDir/packrat |
MountDir/packrat |
DataDir/python-environments |
MountDir/python-environments |
Here are some actual path transformations using the default Server.DataDir
location:
# A source Shiny application
/var/lib/rstudio-connect/apps/4/7/app.R
=> /opt/rstudio-connect/mnt/app/app.R
# A source Flask API
/var/lib/rstudio-connect/apps/42/12/app.py
=> /opt/rstudio-connect/mnt/app/app.py
# A source Plumber API
/var/lib/rstudio-connect/apps/38/10/plumber.R
=> /opt/rstudio-connect/mnt/app/plumber.R
# A source R Markdown document
/var/lib/rstudio-connect/reports/v2/8/12/temp.render.639085504/index.Rmd
=> /opt/rstudio-connect/mnt/app/index.Rmd
# An HTML document rendered from that R Markdown document
/var/lib/rstudio-connect/reports/v2/8/2/17/index.html
=> /opt/rstudio-connect/mnt/report/index.html
# A statically deployed document
/var/lib/rstudio-connect/apps/17/21/index.html
=> /opt/rstudio-connect/mnt/app/index.html
# The Shiny package inside the packrat cache
/var/lib/rstudio-connect/packrat/3.2.5/v2/library/shiny/
28d6903a44dc53bd4823fa43ccdc08e5/shiny
=> /opt/rstudio-connect/mnt/packrat/3.2.5/v2/library/shiny/
28d6903a44dc53bd4823fa43ccdc08e5/shiny
# A virtual environment inside the environment cache
/var/lib/rstudio-connect/python-environments/pip/3.8.12/1B2M2Y8AsgTpgAmY7PhCfg/ => /opt/rstudio-connect/mnt/python-environments/pip/3.8.12/1B2M2Y8AsgTpgAmY7PhCfg
Program Supervisors
When using off-host execution on Kubernetes, application supervisor scripts must be present on the image that is used to execute the content. The Supervisor configuration should be specified in the runtimes.yaml
configuration which defines the set of available content images. For details, see the Execution Environments appendix.
You may need to modify the environment or resources available to processes before the processes are launched. This can be accomplished using a program supervisor using the Applications.Supervisor
configuration setting.
The supervisor command is provided the full target command-line, usually Python or R, which MUST be invoked by the supervisor. The process exit code from the target command MUST be returned as the exit code of the supervisor. The file descriptors for standard input, output, and error MUST NOT be intercepted by the supervisor.
The supervisor command or script must be executable by any users that may perform package installation or run content (see next paragraph.) It must not be located in a directory that will be masked as described in the Sandboxing section. (If you are unsure where to put your supervisor script, /usr/local/bin/
is a safe location.) If the command is not executable, is in a disallowed directory, or does not execute its target command-line properly, Posit Connect will log an error and fail to start.
A supervisor is executed as the appropriate RunAs
user. Package installation always uses the Applications.RunAs
user. Other processes will use the content-specific RunAs
account, falling back to Applications.RunAs
if no override was configured. See the User Account for Processes section for details.
Supervisors run within the sandbox established for any process. See the Sandboxing section for more information about process sandboxes.
Posit Connect configures the TMPDIR
and HOME
environment variables for launched processes. Posit Connect also manages package installation and references. Avoid altering any of this behavior in program supervisors.
Supervisor scripts must echo all informational messages to standard error to prevent Posit Connect from processing them.
RSTUDIO_PANDOC
You can customize the RSTUDIO_PANDOC
environment in a supervisor script or with a content-specific environment variable.
If unset, the RSTUDIO_PANDOC
environment variable is automatically configured as R starts. The rmarkdown
package uses this environment variable to discover Pandoc binaries.
rmarkdown
versions < 1.9 use Pandoc 1. rmarkdown
versions >= 1.9 and < 2.5 use a Pandoc 2.x before 2.11. rmarkdown
versions >= 2.5 use Pandoc 2.11+.
A global RSTUDIO_PANDOC
setting may cause problems in some environments, as not all rmarkdown
package versions are compatible with all pandoc versions.
The Applications.Pandoc1Dir
, Applications.Pandoc2Dir
, and Applications.Pandoc2Dir
settings offer more granular control than the RSTUDIO_PANDOC
environment variable.
Example: nice
This configuration uses the nice
command to lower the priority of content execution processes. See https://man7.org/linux/man-pages/man1/nice.1.html for details about nice
. Because process supervisors are run as a RunAs
user and not as root
or another super-user, you may not be permitted to assign a negative (higher priority) privilege.
; /etc/rstudio-connect/rstudio-connect.gcfg
[Applications]
Supervisor = "/usr/bin/nice -n 2"
The Applications.Supervisor
setting must contain the absolute path to the target application or script.
Example: Environment Variables
This configuration uses a custom script to prepare a custom execution environment before finally running the target command.
; /etc/rstudio-connect/rstudio-connect.gcfg
[Applications]
Supervisor = "/usr/local/bin/connect-env-supervisor.sh"
Here is an example supervisor that echos its arguments, sets an environment variable, then invokes whatever arguments have been passed.
#!/usr/bin/env bash
#
# /usr/local/bin/connect-env-supervisor.sh
# echo informational messages to standard error to
# prevent Connect from processing them.
echo arguments: "$@" >&2
echo >&2
export COMPANY_DATA_HOME="/data/resides/here"
# Execute the target process after the environment is established.
# All customization must happen before this "exec".
exec "$@"
The argument list of the supervisor is the full command-line of the target command. The supervisor MUST invoke this target command using exec
or an equivalent technique.
Example: Script chaining
The supervisor is a program (script) that eventually invokes the target content command. This example combines our two previous examples: it sets some environment variables and alters process priority.
; /etc/rstudio-connect/rstudio-connect.gcfg
[Applications]
Supervisor = "/usr/local/bin/connect-env-and-nice-supervisor.sh"
This example supervisor configures its environment then uses nice
to launch the target command.
#!/usr/bin/env bash
#
# /usr/local/bin/connect-env-and-nice-supervisor.sh
export COMPANY_DATA_HOME="/data/resides/here"
# Execute the target process after the environment is established.
# All customization must happen before this "exec".
exec /usr/bin/nice -n 2 "$@"
This example uses nice
to demonstrate how your supervisor might use some other tool to run the target process. This approach can be helpful if you have existing tools and scripts that you want to combine into your single supervisor.
If you use chaining, each command in the chain MUST invoke the next command using exec
or an equivalent technique.
Supervisor Development
Your organization may use shell initialization scripts to establish a particular environment. This environment might not be completely compatible with how Posit Connect attempts to launch Python and R. We recommend building supervisor scripts gradually and carefully. Changes to the environment can alter how your content executes or even prevent Python and R from running correctly.
A configured supervisor is used to run all content in its Posit Connect environment.
- The supervisor must be readable and executable to other users (e.g.
chmod 0755
). - The supervisor must not output to standard output; use standard error.
- The supervisor must
exec
its target command. - Within shell script supervisors, remember to
export
environment variables that need to be visible to sub-processes.
Testing supervisors
You can usually test supervisors by running them manually. For example, we can use our supervisor to run Python and R and display its version:
/usr/local/bin/connect-env-supervisor.sh \
--version /opt/Python/3.8.12/bin/python
/usr/local/bin/connect-env-supervisor.sh \
--version /opt/R/3.6.3/bin/R
We can also test arbitrary commands.
/usr/local/bin/connect-env-supervisor.sh \
"Hello World!" /usr/bin/env echo
We could also examine the environment that we expect to be established by the supervisor under different runtimes.
/usr/local/bin/connect-env-supervisor.sh \
-c 'import os; print(os.getenv("COMPANY_DATA_HOME"))' /opt/Python/3.8.12/bin/python
/usr/local/bin/connect-env-supervisor.sh \
-s -e 'Sys.getenv("COMPANY_DATA_HOME")' /opt/R/3.6.3/bin/R
Using the config
Package
The config
package makes it easy to manage environment specific configuration values in R code. For example, you might want to use one value for a variable locally, and another value when deployed on Posit Connect. The package vignette contains more information.
The config
package identifies its target configuration by using the R_CONFIG_ACTIVE
environment variable.
Connect sets the R_CONFIG_ACTIVE
environment variable when running content that uses R. The value for R_CONFIG_ACTIVE
is taken from the R.ConfigActive
configuration setting, with a default value of rsconnect
.
The R_CONFIG_ACTIVE
environment variable is set when running content, not when installing packages.
Individual content items may customize the R_CONFIG_ACTIVE
environment variable; installations can prevent this customization by adding R_CONFIG_ACTIVE
to the set of prohibited environment variables defined by Applications.ProhibitedEnvironment
.
Specifying Protocols
Posit Connect provides a wide variety of techniques to keep Shiny application data in the web browser synchronized. The preferred technique, and the one most widely used, is the use of WebSockets. If WebSockets are not supported, either by some intermediate network between the server and your client or by your client’s web browser, then a fallback protocol will be used.
Python applications on Posit Connect that make use of WebSockets require WebSocket support from the network and any intermediate proxies.
In order of preference, the connection methods supported by Shiny applications are:
- WebSocket
- XHR Streaming
- iframe Eventsource
- iframe HTML File
- XHR Polling
- iframe XHR Polling
- JSONP Polling
Use the Applications.DisabledProtocol
setting to disable specific protocols.
Client Protocol Selection - Shiny Applications
To change the available protocols from the client, open a Shiny application and press the keyboard shortcut: Ctrl+Alt+Shift+A (or, from a Mac: control+option+shift+A). This will open a window that will allow you to select or deselect any of the above protocols. After you confirm the changes, these settings will be saved in your browser for future visits to this server. These settings will take effect upon loading an application hosted on this domain, and will last until you explicitly change them again; they will only have an effect on the browser in which this action was performed.