Creating Backups

This section explains how to perform a back up of your Posit Connect installation.

See the Restore section to restore from a backup.

We recommend backing up the following information:

  1. Variable data. This is the directory given by the Server.DataDir setting, which has a default location of /var/lib/rstudio-connect.

  2. SQLite data (when enabled). This is the directory given by the SQLite.Dir setting, and is located beneath Server.DataDir by default.

  3. Auxiliary database files. This is the directory given by the Database.Dir setting, and located beneath the Server.DataDir by default. This directory includes files that help Connect manage encrypted secrets. Losing this data prevents Connect from using encrypted configuration settings and encrypted records stored in the database.

  4. Posit Connect configuration. This is usually located at /etc/rstudio-connect/rstudio-connect.gcfg.

  5. PostgreSQL data (when enabled).

Warning

Do not perform a back up operation while Posit Connect is running, as this will not capture a consistent record of the system.

The Posit Connect service is stopped during a backup, preventing modifications to the Connect database and its variable data while their state is captured. Create database and file-system backups during the same window.

Prerequisites

  1. Determine file-system back up procedure.

    The steps to back up your files-system depend on your infrastructure. Some systems offer snapshots, which record the state of the file-system at a particular point in time.

    If you are unable to use file-system snapshots, your archive MUST preserve file ownership, permissions, and special files like symbolic links.

    The rsync command, with its --archive / -a option, could be used to create the backup archive. Other commands, like cp and tar have similar options which preserve file attributes.

    # Example archival commands
    sudo rsync -a /var/lib/rstudio-connect/ /mnt/backups/rstudio-connect
    sudo cp -a /var/lib/rstudio-connect/ /mnt/backups/rstudio-connect
    sudo tar cvpzf /mnt/backups/rstudio-connect.tar.gz /var/lib/rstudio-connect

    Running commands like these requires root or sudo access.

    Your backup must include all of the Server.DataDir, Database.Dir, and SQLite.Dir directories, the /etc/rstudio-connect/rstudio-connect.gcfg Connect configuration, and other resources referenced by the configuration.

  2. Determine database back up procedure.

    When using SQLite, include its data files within your file-system backups. This may not hold if you have customized SQLite.Dir or Database.Dir.

    The steps to back up PostgreSQL depend on your infrastructure and how PostgreSQL is running. Some services, such as Amazon RDS, can perform database snapshots on your behalf.

    If you are running your own PostgreSQL server, you may need to use pg_dump:

    # Run on the PostgreSQL server as the database superuser,
    # which is often the "postgres" account.
    sudo -u postgres pg_dump connect > connect-backup.sql

Workflow

Step 1: Stop Connect

Stop the Connect service on each server in the target environment.

sudo systemctl stop rstudio-connect

Step 2: Create the backup

Run the appropriate steps to back up your file-system and database.

Your file-system and database may offer quick “snapshot” facilities; the specific commands needed varies depending on your infrastructure and provider.

Your backup MUST preserve file ownership and permissions, as these attributes are necessary for content execution.

Step 3: Start Connect

Start the Connect service on each server in the target environment.

sudo systemctl start rstudio-connect

Data Integrity

Posit Connect has two data sources that are kept consistent with each other during normal operation: the database and the variable data directory. Not all forms of backup guarantee consistency; consistent backups of these data sources are required to perform a restore.

In a single node configuration, the variable data directory is usually a local directory on the Connect node and the database is a SQLite file located within the variable data directory.

In a multiple-node configuration, or configurations deployed with containers, the variable data directory is located on networked file system and Connect uses a PostgreSQL database, external to the Connect nodes.

When backing up a single node, you will stop the Connect service on that node, copy the variable data directory (including the database files), then restart the Connect service.

With multiple nodes, you can perform the same type of operation: stop the Connect service on each node, archive the variable data directory and the database, then restart the Connect services across all nodes. This approach is the only way to guarantee that your Posit Connect environment can be restored without any inconsistency.

Tip

Create consistent backups by stopping the Connect service, then performing database and file system backup at the same point in time.

In order to reduce concerns about day-to-day data loss, some installations may perform incremental snapshots of the variable data directory and the database while Connect is running.

Warning

Incremental snapshots performed while Connect is running are not guaranteed to be consistent. You should not attempt to restore a Posit Connect installation with them.

Some of the problems with incremental snapshots include:

  • Depending on the timing between when both snapshots are taken, the database may have records that aren’t reflected in your data directory snapshot, or vice versa.

  • Because the networked filesystem may lock certain files that are currently in use, the snapshot process may finish much later than it began, leading to a snapshot that isn’t consistent with any state.

    • Example: Imagine app X v0.1 was copied successfully, and app Y is locked on NFS. While the backup script waits for app Y to unlock, app X v0.2 is uploaded. Then app Z is created. The snapshot would reflect apps X v0.1, Y, and Z, despite that the only consistent states were “app X v0.1, Y” and “app X v0.2, Y, Z”
  • Copying your data directory and snapshotting your database will create contention on both the networked filesystem and the database, which could cause service slowness or interruption for users. Consider warning users and/or performing incremental snapshots during times of low user activity.

Recognizing that you will not be able to restore the entire backup, it may be useful to only snapshot those parts of the data directory or database that you want to retain. For example, you could use rsync to find new bundles added since the last backup was perfomed, and copy them to a snapshot directory.