Configuration
There are several requirements for nodes within Posit Workbench clusters:
All nodes must run the same version of Posit Workbench.
Server configurations (i.e. contents of the
/etc/rstudio
directory) must be identical, with the exception of options related to the address of each node (www-host-name
inload-balancer
,www-address
andwww-port
inrserver.conf
).User accounts must be accessible from each node, and usernames and user ids must be identical on all nodes. The same applies to any groups used by Workbench users and also to the Workbench service user account,
rstudio-server
.The clocks on all nodes must be synchronized.
User home directories must be accessible via shared storage (e.g., all nodes mounting the same NFS volume or Amazon EFS. See Using Amazon EFS with Posit Team).
An explicit server-wide shared storage path also must be defined. See the Shared storage section for additional details.
Workbench must be configured to use a PostgreSQL database, and an empty database must be present for Workbench to write important cross-node state. If you have previously run Workbench with a SQLite database, it is strongly advised that you execute the database Migration to the PostgreSQL database first. For more information, see Database.
When using Launcher Sessions, see additional requirements under Launcher considerations. Please note that Local Launcher sessions are enabled by default.
The ports used by the workbench server (8787 by default) and the job launcher (5559 by default) must be open to other nodes in the cluster. If the default node address is not visible to other nodes in the cluster, set
www-host-name
in theload-balancer
config before defining the cluster. See Configuring load balancing.
Enable load balancing
The default SQLite database does not support load balanced configurations so make sure you have configured a Postgres database for the cluster. It’s used to store the cluster definition and maintain the current status of the nodes. Only one load balancing cluster can exist per database.
Set the config value load-balancing-enabled=1
in rserver.conf
to turn on load balancing.
Load balancing options are stored in a file called load-balancer
in the config dir (default: /etc/rstudio/load-balancer
). If the load-balancer
file exists and load-balancing-enabled
is not set in rserver.conf
, load balancing will be enabled for compatibility with versions prior to 2023.09.
Synchronize keys
When using a single node, the secure-cookie-key
is auto-generated and stored in /var/lib/rstudio-server/secure-cookie-key
. However, when load balancing this key must match for all nodes in the cluster.
Create a secure-cookie-key
file in the config directory (default /etc/rstudio/secure-cookie-key
) and copy this file to all nodes in the cluster. If enabling load balancing on a production system with running sessions copy the existing /var/lib/rstudio-server/secure-cookie-key
to preserve interoperability with running sessions. Otherwise, create a new secure-cookie-key
:
# generate secure-cookie-key as a simple UUID
sudo sh -c "echo `uuidgen` > /etc/rstudio/secure-cookie-key"
# ensure that the cookie is only readable by root
sudo chmod 0600 /etc/rstudio/secure-cookie-key
If using the job launcher, also create launcher.pem
and launcher.pub
files and distribute them as well. See the job launcher configuration documentation for details.
When the first workbench node in the cluster is started, it will initialize the cluster in the database. The cluster data contains the communication protocol (http, https, or https no verify), the secure cookie key, and the launcher key pair. These must match for all nodes in the cluster.
Changing keys requires a clean restart of all sessions and services so be careful to start services with the proper key when setting up a load balanced cluster.
Key mismatch errors
If a node defines conflicting keys on the file system versus what it finds in the cluster database, an error is logged, and the system uses the secure-cookie-key
file on the file system of the local node. This error must be resolved as it will cause conflicts when a user’s requests are handled by more than one server due to load balancing. Correct the keys on the node with the errors by copying the key files from the original node used to define the cluster. Restart the rstudio-server service as well as all sessions started with a different key. If using the job launcher, restart the rstudio-launcher
service as well.
Normally, a clean restart of all nodes resets the copy of the secure-cookie-key
in the database to match the file of the first node to start. If there are stale online
nodes, they will prevent the keys from being updated and the key-mismatch errors may persist. In that case, reset the cluster in one of two ways:
- Select a node with the
rstudio-server
service running and runrstudio-server reset-cluster
from the command line, then restart all services on that node. This will copy thesecure-cookie-key
from this node into the cluster database. Copy thesecure-cookie-key
from the filesystem of this node to all other nodes and restart the servers and sessions. - As a last resort, to reset the keys in the cluster when rstudio-server service is not running and cannot be restarted, clear the node and cluster information using psql (replace username and database arguments):
# psql -U username database-name -c 'sql command'
psql -U rstudio rstudio -c 'DELETE from node; DELETE from cluster;'
After resetting the keys, restart the servers and sessions.
Checking cluster status
To view the nodes and their current statuses in the load balancer cluster, on a running workbench node run the shell command rstudio-server list-nodes
. This will display all nodes in the cluster database, ordered by whether or not they are online and the node’s ID. It shows the status, health, and last seen time for each node using the server’s local time zone.
When a node is Online and healthy, it updates its last seen time every 5 minutes. A node status of “Missed check-in” is displayed when its status is Online, but the last seen time has not been updated in 10 minutes. When a node’s status is listed as “Missed check-in”, other nodes will continue to try to send it requests for some period of time.
If you see the “Missed check-in” status, it may be a valid cluster node that has stopped and did not shutdown cleanly, a node that is running that cannot connect to the database, or a stale node that did not shutdown cleanly before it was removed from the cluster.
If it is running, look in the rserver.log file for errors related to the database when updating the last seen time.
If this is a stale node that did not shutdown cleanly, use rstudio-server list-nodes
to find the node’s ID, and use rstudio-server delete-node
.
To see a specific node’s view of the online nodes and the sessions allocated, run the command: rstudio-server node-status
. It returns a formatted JSON object containing each online node and local session running on that node. This is the data used by the balancing algorithm for choosing where to start a new session.
Configuring load balancing
Additional configuration for load balancing options is set in the load-balancer
configuration file. It supports changing load balancing strategy, the node’s address used by other cluster members, and a few other less commonly used options.
If your workbench server is deployed in Kubernetes, or in an environment where each node’s address may change, set the flag delete-node-on-exit=1
in the load-balancer
configuration file. When the server is stopped, its entry in the database is removed so that stale node entries are not left in the database.
When the default address is not suitable, set www-host-name
to indicate the address at which other nodes in the cluster can reach this node. The following ordered strategies are used to determine this address, with preference given to the first that is successful:
Use the value
www-host-name
provided in the configuration file.Use the
www-address
defined inrserver.conf
in combination withwww-port
or the default port.Use the first non-loopback, non-multicast IP address found by resolving the system’s hostname.
Use a system call to determine the machine’s IP addresses and use the last v4, non-loopback, non-multicast address provided.
Most users will want to configure Posit Workbench to use one of the first two approaches.
If www-host-name
is provided without a port, but a port has been set with www-port
in rserver.conf
, www-host-name
will be used with the custom www-port
as the load balancing address.
For example, to use the www-host-name
option to define a cluster with two nodes that load balances based on the number of actively running sessions, you could use the following configuration:
On the first node, which can be reached at server1.example.com
:
# /etc/rstudio/load-balancer
balancer=sessions
www-host-name=server1.example.com
On the second node, which can be reached at server2.example.com
:
# /etc/rstudio/load-balancer
balancer=sessions
www-host-name=server2.example.com
When load balancing is configured, during startup each node will query the internal database for information about the active cluster and nodes. If the relevant data doesn’t exist for a particular node, that node will insert it using the node’s address, and assigning that node a unique integer node id. It will then alert existing nodes of its presence and configuration.
When load balancing is enabled, Posit Workbench will automatically manage the secure-cookie-key
and launcher key pair on each node. With a load balanced configuration, it’s best if these files do not exist on startup. The first node to define the cluster will create new keys, and the other nodes will update their config files to have the same ones. If a node attempts to join the cluster with a key on the file system that doesn’t match the cluster’s key it will fail to start.
Previous versions of Posit Workbench required the host name of each node be included on every active node under a [nodes] title in lieu of the www-host-name
field, and a [config] title prior to the balancing options. This configuration will continue to work, but it is no longer the preferred configuration method. It is highly recommended that you update your config files accordingly.
Previous versions of Posit Workbench relied on the presence of the load-balancer
configuration file to determine if load balancing is enabled. This configuration will continue to work, but it is highly recommended that you update the rserver.conf
file to enabled load balancing. If your load-balancer
file is empty, you can delete it once setting this option.
The following table lists the various configuration options that are available to be specified in the load-balancer
configuration file:
Config Option | Description | Possible Values | Default Value |
---|---|---|---|
www-host-name | The hostname or IP address that other nodes in the cluster should use to communicate with this node. | If not provided, the node defines its hostname by following steps 2-4 above. | |
balancer | The balancing method used by the cluster for balancing Workbench Sessions in Local mode, and rworkspaces. See Balancing Methods for details. |
|
sessions |
delete-node-on-exit | Set to 1 to automatically manage the nodes database for environments where node addresses change, like when running Workbench in a Kubernetes pod. When set to 1, this node’s entry will be removed from the database when the service is stopped. Set this to 1 to avoid stale entries in the database when pods are stopped and restarted with a new address. | 0 or 1 | 0 |
diagnostics | Enables detailed diagnostic logging for load balancing traffic and state. See Diagnostics for details. |
|
Not set, detailed diagnostics are not available. |
timeout | Amount of time in seconds that a node will wait for a response. See Node network instability for details. |
A positive integer | 10 |
verify-ssl-certs | Whether to skip SSL certificate verification. Should never be set to 1 in production, except for troubleshooting purposes or if the connection between nodes is secured by other means. See SSL for details. |
0 or 1 | 1 |
launcher-balancing-enabled | Enables or disables Local Launcher load balancing node synchronization. This should be enabled when using the Local Job Launcher with load balancing. The automatic setting is recommended for most configurations. | 0 or 1 | 1 |
launcher-balancing-cluster | Specifies the Local Launcher cluster to synchronize node state with for Load Balancing. This should only be set if more than one Local Launcher cluster is configured. | Not set | |
launcher-local-proxy | When enabled, local job launcher session requests are first proxied to the session node’s workbench instance, then forwarded to the session using a local connection. When disabled, session requests are proxied directly from the workbench instance that receives the request, requiring the system’s firewall to permit connections in the session’s port range (ip_local_port_range ) |
0 or 1 | Value of launcher-balancing-enabled |
Key file requirements
The following are the requirements for the secure cookie key file:
- The key value must have a minimum length of 128 bits (16 bytes/characters). Workbench won’t start if the key is too weak.
- The key file must have restrictive permissions (i.e. 0600) to protect its contents from other users.
- The key file must be identical on all nodes in a load-balanced cluster, so that the nodes can communicate with each other.
- The key must have a secret value that cannot be guessed. Randomly generating the value is recommended; see below for one mechanism for doing so.
Generating a key
If a secure cookie key file does not exist on a node, one will be created by Posit Workbench. Workbench looks for the secure cookie key in the default configuration file location and /var/lib/rstudio-server/secure-cookie-key
. If a secure cookie key file does not match the key stored in the cluster, the node will not come online.
This secure cookie key will also be used for encrypting and decrypting the PostgreSQL database password, if applicable. See PostgreSQL Password Encryption for more details.
Key file location
You may optionally change the path of the secure-cookie-key by changing the secure-cookie-key-file
setting in rserver.conf
, though it is not necessary. Changing the path in this manner is only recommended in very specific circumstances when running the launcher with both Posit Workbench and Package Manager simultaneously. For example:
/etc/rstudio/rserver.conf
secure-cookie-key-file=/mnt/rstudio/secure-cookie-key
In addition, an explicit server-wide shared storage path must be defined (this is used for inter-node synchronization). This path is defined in the /etc/rstudio/rserver.conf
file. For example:
/etc/rstudio/rserver.conf
server-shared-storage-path=/shared/rstudio-server/shared-storage
For convenience, this path will often be located on the same volume used for shared home directory storage (e.g. at path /home/rstudio-server/shared-storage
).
Launcher considerations
If you are running Posit Workbench load balancing in addition to using Launcher sessions, you will need to ensure that the /etc/rstudio/launcher.pub
and /etc/rstudio/launcher.pem
files match on all Workbench nodes in the cluster. These files will be created when they do not exist, but if conflicting versions of the file exist on a node, it will fail to start.
For more information, see Posit Workbench integration.
Local Launcher plugin
When the Job Launcher is configured to use Local sessions, Posit Workbench chooses the server to start the session using the configured load balancing strategy.
Follow these additional steps to configure load balancing with Local Launcher Sessions:
Ensure that each cluster node has a unique hostname. If
www-host-name
was manually specified, you should ensure that the same values are specified for the Local Launcher plugins in their respectivelauncher.local.conf
files.Ensure that individual servers in the cluster can reach each other by their configured addresses (host or IP). The same address of the RSW service will be used for the node. This can be determined from the output of the
rstudio-server list-nodes
command.Ensure the following ports are open to allow the exchange of session metadata, proxying requests between rservers, the launcher servers, and sessions started by the local launcher plugin:
- The rserver port (configured in
rserver.conf
withwww-port
) - The launcher server port (configured in
launcher.conf
with theport
value in the[server]
section)
Previous versions of Posit Workbench required open firewall ports between rservers in the range used by sessions. This is no longer required in the default configuration (see the launcher-local-proxy
config option for details).
If Jupyter Labs or Jupyter Notebooks are in use, a range of ports beginning at
8888
will need to be opened to connect to the Jupyter sessions. Each Jupyter session requires an additional port, so the length of the range must be large enough to satisfy all the Jupyter sessions expected to run on the node. For example, if 10 Jupyter sessions were expected to run on one node, a range8888-8897
would need to be opened.If you have enabled SSL for the Job Launcher with
launcher-use-ssl=1
inlauncher.conf
, make sure the SSL certificate Subject Alternative Names field includes each node’s address as used by other nodes in the cluster. The Job Launcher nodes will use the same address as Workbench whenlauncher-balancing-enabled=1
. You can see these addresses usingrstudio-server list-nodes
. Whenlauncher-balancing-enabled=0
, the Job Launcher is using nfs-based load balancing. In this case, the node’s IP address is used by other nodes in the cluster.To permanently remove a node from the cluster, it’s important to first stop all sessions running on that node. Sessions on a stopped node will remain in an active status until that node is restarted.
For more information, see the Local Plugin section of the Job Launcher documentation.
File locking
In order to synchronize the creation of sessions across multiple nodes, Posit Workbench uses a cross-node locking scheme. This scheme relies on the clocks on all nodes being synchronized. Posit Workbench includes a locktester
utility which you can use to verify that file locking is working correctly. To use the locktester
you should login (e.g. via SSH or telnet) to at least two nodes using the same user account and then invoke the utility from both sessions as follows:
$ /usr/lib/rstudio-server/bin/locktester
The first node you execute the utility from should indicate the types of locks it was able to acquire, for example:
* Acquired advisory lock
* Acquired link-based lock
After the message is printed the process will pause so that it can retain the lock (you can cause it to release the lock by interrupting it e.g. via Ctrl+C).
The second and subsequent nodes you execute the utility will attempt to acquire the lock. A message will be printed to the console indicating which type of locks are supported, for example:
* Acquired advisory lock
* Unable to acquire link-based lock
Your filesystem appears to support link-based locks.
In this example, advisory locks are not supported (because both nodes were able to acquire an advisory lock), but link-based locks are. See Lock configuration for more information on configuring lock types.
If you interrupt the first node (e.g. via Ctrl+C) the lock will be released and you can then acquire it from the other nodes.
If either of the following occurs then there is an issue with file locking capabilities (or configuration) that should be addressed prior to using load balancing:
- All nodes successfully acquire the file lock (i.e. more than one node can hold it concurrently).
- No nodes are able to acquire the file lock.
If either of the above conditions hold then Workbench won’t be able to correctly synchronize the creation of sessions throughout the cluster (potentially resulting in duplicate sessions and lost data due to sessions overwriting each others state).
Lock configuration
Workbench’s file locking scheme can be configured using a file at /etc/rstudio/file-locks
. Valid entries are:
lock-type=[linkbased|advisory]
use-symlinks=[0|1]
refresh-rate=[seconds]
timeout-interval=[seconds]
enable-logging=[0|1]
log-file=[path]
The default locking scheme, linkbased
, uses a file locking scheme whereby locks are considered acquired when the process successfully hardlinks a dummy file to a location within the folder Workbench uses for client state (typically ~/.local/share/rstudio
). This scheme is generally more robust with older network file systems, and the locks should survive temporary filesystem mounts / unmounts. The use-symlinks
option can be turned on with linkbased
locking for filesystems that do not support hardlinks.
If you are using EFS, the default lock type of link-based
will not work by default. We recommend enabling use-symlinks
for linkbased
compatibility or using the advisory
lock type.
The timeout-interval
and refresh-rate
options can be used to configure how often the locks generated in the linkbased
locking scheme are refreshed and reaped. By default, a process refreshes any locks it owns every 20 seconds, and scans for stale locks every 30 seconds. If an rsession
process crashes, it can leave behind stale lock files; those lock files will be cleaned up after they expire by any newly-launched rsession
processes.
advisory
can be selected to use advisory file locks (using e.g. fcntl()
or flock()
). These locks are robust, but are not supported by all network file systems.
If you are having issues with file locking, you can set enable-logging=1
, and set the log-file
option to a path where output should be written. When logging is enabled, Workbench will report its attempts to acquire and release locks to the log file specified by log-file
. When log-file
is unset, log entries will be emitted to the system logfile, typically located at /var/log/messages
or /var/lib/syslog
.
Managing nodes
Starting up
When configuring each node, be sure to copy all of the configuration files from /etc/rstudio/
to each node. If applicable, then add the load-balancer
configuration file with the optional www-host-name
option which will be unique for each node. Assuming that the server is already installed and running on each node, you can then apply the load balancing configuration by restarting the server:
$ sudo rstudio-server restart
Current status
Endpoint status
Once the cluster is running you can inspect its state (which sessions are running where) using the load balancing status HTTP endpoint. For example, when running the server on the default port (8787):
$ curl http://localhost:8787/load-balancer/status
Note that the status endpoint is accessed using localhost rather than an external IP address. This is because this endpoint is IP restricted to only be accessible within the cluster, so needs to be accessed directly from one of the nodes.
The status
endpoint will return output similar to the following:
192.168.55.101:8787 Load: 0.45, 0.66, 0.32
12108 - jdoe
12202 - kmccurdy
192.168.55.102:8787 Load: 1, 0.75, 0.31
3404 - bdylan
192.168.55.103:8787 (unreachable) Load: 0, 0, 0
192.168.55.104:8787 (offline) Load: 0.033, 0.38, 0.24
This output will show all of the nodes in the cluster. Each node is indicated by its address and an optional status indicating whether the node is unreachable or offline. If the node does not indicate a status, then it is healthy and servicing requests. Following the node address is its CPU Load
, indicated by three decimal values indicating the last known 1-minute, 5-minute, and 15-minute load averages, represented as a fraction of total CPU load. On subsequent output lines, each RStudio Pro session that is running on that particular node is listed along with its process ID and running user.
An unreachable node indicates an issue connecting to it via the network. In most cases, this indicates that the rstudio-server
service is not running on the node and should be troubleshooted by viewing any startup issues in the system logs for that particular node (see Diagnostics if the service is running and healthy) and by checking the node’s database status by running the command rstudio-server list-nodes
. An offline node is one that was specifically put into offline mode via the command sudo rstudio-server offline
, which prevents new sessions from being started on the node.
Database status
While the status endpoint retrieves information from each node via HTTP requests, each node also maintains its own Status field within the Postgres database. These statuses can be viewed in the output of the rstudio-server list-nodes
command under the ‘Status’ column. The possible statuses are found in the following table.
Node Database Statuses
Status | Description |
---|---|
Offline | The node was properly configured and is currently Offline. |
Starting | The node is processing and validating its configuration; each node only spends a short amount of time in this state. |
Deleting | A request is being processed to delete this node or a request was received to delete this node but it could not complete. If the request could not complete, an error is logged in the system logs with more information. |
Failed to resolve | The node attempted to come online, but could not resolve its host name to an IP address. To manually specify the node’s host name, specify www-host-name in the load balancer configuration file. |
Invalid secure cookie key | The node attempted to come online, but its secure-cookie-key does not match the existing cluster’s key. Manually remove this file or reset the stored data by following either approach described in Defining the cluster. |
Invalid permissions on secure cookie key | The node attempted to come online, but its secure cookie key file has invalid permissions. The permissions must be set to 0600. |
Error reading launcher key | An error occurred while attempting to read the launcher key pair. Investigate the associated error message in the rserver.log file. |
Invalid launcher key | The node attempted to come online, but an invalid launcher.pem or launcher.pub key was found. Manually remove these files or reset the stored data by following either approach described in Defining the cluster. |
Invalid license | The node attempted to come online, but the maximum number of Online nodes allowed by your license has been reached. |
Online | The node is properly configured and online. A node may also have this status when it is not properly shutdown. When rstudio-server list-nodes is run and the node’s Last Seen column has not been updated in the last 10 minutes, this value will be displayed as “Missed check-in”. |
Adding and removing nodes
To temporarily remove a node from the cluster you can simply stop it:
$ sudo rstudio-server stop
RStudio Pro Sessions running on that node will be automatically moved to another active node. Note that only the session state is moved, not the running processes. The node will now appear in the list-nodes
command with an offline status. To restore the node you can simply start it back up again:
$ sudo rstudio-server start
To add a new node, create a node with load-balancer-enabled = 1
and the load-balancer
config file if desired. When the rstudio-server is restarted, it will broadcast its arrival to the other online nodes in the cluster. They do not have to be restarted or reloaded. All nodes sharing a database will be part of the same cluster.
Reloading the load balancer configuration will also cause the rserver-http
proxy configuration to be updated as well, which affects Workbench’s running HTTP server. It is recommended that you do not make any other HTTP-related changes when updating the load balancer configuration unless you are aware of the potential side-effects!
The rstudio-server delete-node
command can be used to permanently remove nodes from the database and cluster, but data loss can occur if this command is run for a node that is actively running sessions. To prevent this, the node should first be stopped. Alternatively, you can follow the instructions in Endpoint status to view active sessions on the node and suspend them using any of the Session Management suspend commands. The delete-node
command can also be used to remove entries from the database that do not represent any physical node in the cluster. These entries may exist with an invalid host or IP address and a status that is not “Online”, if an attempt was made to bring a node online before it was properly configured.
The node must be deleted from an active node and requires knowing the to-be-deleted node’s ID, which can be retrieved with the list-nodes
command. For example, after starting the server with a typo in www-host-name
, your commands may look like the following:
$ sudo rstudio-server list-nodes
Cluster
-------
Protocol
Http
Nodes
-----
ID Host IPv4 Port Status
1 rsw-primaryyy 8787 Failed to resolve
2 rsw-secondary 123.456.78.100 8787 Online
3 rsw-primary 123.456.78.101 8787 Online
$ sudo rstudio-server delete-node 1
Node 1 deleted.
The output from the rstudio-server list-nodes
command above was shortened to improve readability.
When the command is run, the node’s database status will briefly change to ‘Deleting’ and then the node will be removed from the database. The node that the delete-node
command was ran from will broadcast a message to all other online nodes in the cluster that this node has been deleted. On receipt of this message, nodes will stop routing messages to the deleted node.
To permanently remove a node from the database, first stop Posit Workbench on that node. From an active node, retrieve the to-be-deleted node’s ID, then pass it to the delete-node
command. For example, your commands may look like the following:
$ sudo rstudio-server list-nodes
Cluster
-------
Protocol
Http
Nodes
-----
ID Host IPv4 Port Status
1 rsw-primaryyy 8787 Failed to resolve
2 rsw-secondary 123.456.78.100 8787 Online
3 rsw-primary 123.456.78.101 8787 Online
$ sudo rstudio-server delete-node 1
Node 1 deleted.
The output from the rstudio-server list-nodes
command above was shortened to improve readability.
When the command is run, the node’s database status will shortly change to ‘Deleting’ and then the node will be removed from the database. All other nodes in the cluster will be notified that this node has been removed and stop routing messages to it.
Troubleshooting
If users are having difficulty accessing Workbench in a load balanced configuration it’s likely due to one of the load balancing requirements not being satisfied. This section describes several scenarios where a failure due to unsatisfied requirements might occur.
Node network instability
Some scenarios may causes Workbench to wait a long time for a node to respond due to network instability. You can limit how long is this waiting period with the timeout
option, which is set to 10 seconds by default. This disable this timeout and use the system defaults, set it to zero.
# /etc/rstudio/load-balancer
balancer=sessions
timeout=5
...
SSL
If one of the nodes is temporarily using a self-signed or otherwise functional but invalid certificate the load balancer may fail to use that node. You can skip SSL certificate verification by disabling the option verify-ssl-certs
, which is only applicable if connecting over HTTPS. For production use, you should always leave the default or have this set to true, but it can be disabled for testing purposes.
# /etc/rstudio/load-balancer
balancer=sessions
verify-ssl-certs=0
...
User accounts not synchronized
One of the load balancing requirements is that user accounts must be accessible from each node and usernames and user ids must be identical on all nodes. If a user has the same username but different user ids on different nodes then permissions problems will result when the same user attempts to access shared storage using different user-ids.
You can determine the ID for a given username via the id
command. For example:
$ id -u jsmith
NFS volume mounting problems
If NFS volumes containing shared storage are unmounted during a Workbench session that session will become unreachable. Furthermore, unmounting can cause loss or corruption of file locks (see section below). If you are having problems related to accessing user directories then fully resetting the connections between Workbench nodes and NFS will often resolve them. To perform a full reset:
Stop the Workbench server on all nodes (
sudo rstudio-server stop
).Fully unmount the NFS volume from all nodes.
Remount the NFS volume on all nodes.
Restart the Workbench server on all nodes (
sudo rstudio-server start
).
File locking problems
Shared user storage (e.g. NFS) must support file locking so that Workbench can synchronize access to sessions across the various nodes in the cluster. File locking will not work correctly if the clocks on all nodes in the cluster are not synchronized. This condition may be surfaced as 502 HTTP errors. You can verify that file locking is working correctly by following the instructions in the File locking section above.
Diagnostics
To troubleshoot more complicated load balancing issues, Workbench can output detailed diagnostic information about internal load balancing traffic and state. You can enable this by using the diagnostics
setting as follows:
diagnostics=tmp
Set this on every server in the cluster, and restart the servers to apply the change. This will write a file /tmp/rstudio-load-balancer-diagnostics
on each server containing the diagnostic information.
The value stderr
can be used in place of tmp
to send diagnostics from the rserver
process to standard error instead of a file on disk; this is useful if your Posit Workbench instance runs non-daemonized.