Upgrade and/or Migrate Posit Team — Phase 2, Migrate content and data
Don’t proceed with Phase 2 migration activities until acceptance testing in Phase 1 is complete.
To preserve consistent state between the content data and product database, the production environment must be shut down when snapshots are taken for migration purposes. The production environment can be turned back on following snapshots. Be aware that any changes made after this point aren’t captured in the new environment. Enact a separate plan to capture any new work or deployments in production occurring during this time.
Package Manager migration tasks
Do I need to migrate Package Manager?
To understand the necessity of migrating your production Posit Package Manager data and database to the upgraded staging Package Manager server, close communication with end users is required. It’s fundamental to understand that the data and content migrated includes (if present):
- Repositories and sources, including:
- Any packages served from Git-based sources.
- Any internally developed packages.
- Specialized repositories, such as curated, subset, blended, or date-locked.
- Repository rules such as allow-lists, block-lists, or vulnerability reporting.
- Metrics and usage history.
- Hashes identifying specific frozen repositories, which end users and deployed content could rely on for reproducibility. These unique hashes are created for any date-based snapshot of a repository that’s not a CRAN source or a PyPI source.
After conferring with your end users, you could determine that it’s acceptable to configure Package Manager’s repositories and sources anew, rather than migrate. If this is the case, refer to the Package Manager Quick Start to configure repositories for your users’ requirements. At a minimum, we recommend configuring CRAN and PyPI repositories. Your organization could have additional requirements for security or reproducibility that would necessitate additional repositories. Refer to the Package Manager Quick Start for more information.
Package Manager pre-migration
Follow the steps to upgrade Package Manager to the latest supported version.
Communicate anticipated downtime to end users.
Run the
rspm evict --type=clear
command to trigger an immediate cache eviction, this should limit the number of files that need to be migrated byrsync
in the steps below.Stop the production Package Manager service and perform a backup. It’s necessary to stop the service to ensure consistent state between the content data and metadata in the product database.
If production Package Manager uses PostgreSQL for product database, follow your organization’s protocols for backing up PostgreSQL. Likely this employs the
pg_dump
utility.Stop the staging Package Manager service and perform a backup of the data directory and product database.
If you restart the production Package Manager service following the backup, any changes made after this point aren’t captured in the new staging environment. Enact a separate plan to capture any new data in production occurring during this time.
Package Manager migration
Acceptance testing conducted in Phase 1 could have populated data in the staging product database and staging data directory. The migration steps that follow replace any data in the staging environment with production data.
Copy the production Data Directory to the staging environment. When copying, preserve the permissions and file ownership. The location of the default Data Directory is:
/var/lib/rstudio-pm
. However, change the location is possible by editing theServer.DataDir
configuration in the productionrstudio-pm.gcfg
configuration file:Posit recommends the use of
rsync
with the-a
flag to copy the data. This syncs directories recursively and preserve symbolic links, groups, ownership, and permissions.The Data Directory includes files with restrictive read and write permissions. You need to use
rsync
in root mode to move over the entirety of the directory.
Copy the production product database to the staging database location:
If the production product is using SQLite, the default location for this is within the Data Directory at
/var/lib/rstudio-pm
. However, you can change the default locationby editing theSQLite.Dir
configuration in the productionrstudio-pm.gcfg
configuration file.If the production product is using PostgreSQL, follow your organization’s protocols for restoring the production database in the staging database environment. Likely, this employs the
pg_restore
utility.
Connect migration tasks
Do I need to migrate Connect?
To understand the necessity of migrating your production Posit Connect data and database to the upgraded staging Connect server, close communication with end users is required. It’s fundamental to understand that the data and content migrated includes:
All content deployed to the Connect server, including rendered variants and previous versions.
Users and groups.
Content metadata and settings, including:
- Content access control lists.
- Scheduled job settings.
- Notification preferences.
Metrics, logs, audit, instrumentation, and usage history.
If the new staging environment is running a different Linux OS, clear and rebuild the content runtime caches where packages required for content are stored. Complete this as part of the migration.
After conferring with your end users, you could determine that it’s acceptable to redeploy content to the new staging server rather than migrate. The amount of content deployed to the production server. Consider the burden of reestablishing content-level settings against the effort of migration and cache rebuilding.
Connect pre-migration
Communicate anticipated downtime to end users.
Identify any content that could be using a deprecated version of Python. Redeploy content with a newer version or Python either pre-migration, or post-migration.
Stop the production Connect service and perform a backup. It’s necessary to stop the service to ensure consistent state between the content data and metadata in the product database.
If production Connect uses PostgreSQL for product database, follow your organization’s protocols for backing up PostgreSQL. Likely this employs the
pg_dump
utility.Stop the staging Connect service and perform a backup of the data directory and product database.
If you restart the production Connect service following the backup, the new staging environment doesn’t capture any changes made after this point. Enact a separate plan to capture any new data in production occurring during this time.
Connect migration
Acceptance testing conducted in Phase 1 could have populated data in the staging product database and staging data directory. The migration steps that follow replace any data in the staging environment with production data.
Depending on your configuration, additional data may need migration as well. In general, we can’t enumerate everything that’s important on your server, but this may include data directories, service account details, database connection credentials, or if users are provisioned on the system,
user/homedirs
. This may need to happen prior to other migration steps so that the system has the appropriate configuration.- Posit recommends the use of
rsync
with the-a
flag to copy the data. This syncs directories recursively and preserve symbolic links, groups, ownership, and permissions.
- Posit recommends the use of
Copy the production Data Directory to the staging environment. When copying, preserve the permissions and file ownership. The location of the default Data Directory is:
/var/lib/rstudio-connect
. However, you can change the location by editing theServer.DataDir
configuration in the productionrstudio-connect.gcfg
configuration file:Posit recommends the use of
rsync
with the-a
flag to copy the data. This syncs directories recursively and preserve symbolic links, groups, ownership, and permissions.The Data Directory includes files with restrictive read and write permissions. Use
rsync
in root mode to move over the entirety of the directory.
Copy the production product database to the staging database location:
If the production product is using SQLite, the default location for this is within the Data Directory at
/var/lib/rstudio-connect
. However, you can change by editing theSQLite.Dir
configuration in the productionrstudio-connect.gcfg
configuration file.If the production product is using PostgreSQL, follow your organization’s protocols for restoring the production database in the staging database environment. Likely this employs the
pg_restore
utility
Rebuild content runtimes
The content runtime caches containing the Python and R package dependencies for the production content deployed on Connect are incompatible with the staging server’s Linux distribution and system libraries.
If you have changed the Linux OS, you must clear these runtime caches. After clearing the runtime caches, content is automatically rebuilt when it’s first run, however Posit recommends that you rebuild the cache for your most important content. A preemptive rebuild means that visitors don’t experience a delayed load. The rebuild is useful for interactive applications and APIs. Refer to Runtime Cache Workflow for instructions.
Connect post-migration activities
Issues can arise in rebuilding content. Common ones include:
- Content builds under a now-deprecated version of Python or R.
- A package is no longer be available, or available for the new OS.
- A package installed from a source that’s no longer available.
These issues depend on the application code. Resolve these issues by working directly with the application developers.
If the Linux OS changed and the content runtime cache wasn’t cleared, content fails to render.
Workbench migration tasks
Do I need to migrate Workbench?
Consider the migration of the product database and user data independent of one another for Workbench.
For Posit Workbench, review the Data Dictionary to understand what information the product database contains. Much of the database contents are short-lived, such as active user info and temporary authentication keys. Upon a user’s next login, this information is recreated. However, the database also includes user status information such as locked
state or admin
role. Considering this information, you could determine that it’s acceptable to forgo database migration.
Users’ home directories stores user data. Confer with your end users to scope how much content they store in their home directories, and if it’s backed by a remote-host version control system. Some version control systems include GitHub, GitLab, Bitbucket, etc. Your users could find it acceptable to forgo migration of their user data if their critical project work is stored elsewhere, or if it’s recoverable using a version control system.
Importantly, if the new staging environment is running a different Linux OS, clear and rebuild the system-level, user-level, and any project-level libraries (e.g., those created with venv
, renv
) where Python and R packages are stored. Do this as part of the post-migration process detailed below.
If the new staging environment is running a different Linux OS, clear and rebuild the content runtime caches where packages required for content are stored. Complete this as part of the migration.
Workbench pre-migration
Communicate anticipated downtime to end users.
Stop the production Workbench service:
sudo rstudio-server stop
.Suspend all sessions on the production Workbench server:
sudo rstudio-server suspend-all
.Perform a backup, including the user data stored in each users’ home directory (
~/.
for each user) and any shared project data locations. It’s necessary to stop the service to ensure consistent state between the content data and metadata in the product database.If production Workbench uses PostgreSQL for product database, follow your organization’s protocols for backing up PostgreSQL. Likely this employs the
pg_dump
utility.If you use a shared site library for packages, identify any Python and R packages installed at the system level on the production server:
For each version of Python installed, identify and record the packages installed in the site library. The default location for these packages is in
opt/Python/<PYTHON_VERSION>
For each version of R installed, identify and record the packages installed in the site library. The default location for these packages is in
/opt/R/<R_VERSION>/
If you restart the production Workbench service following the backup, the new staging environment doesn’t capture any changes made after this point. Enact a separate plan to capture any new data in production occurring during this time.
Workbench migration
Acceptance testing conducted in Phase 1 could have populated data in the staging product database and staging data directory. The migration steps that follow replace any data in the staging environment with production data.
Copy the production user data to the staging environment. Most user data in Worbench is present in user home directories. Make sure those users are provisioned on the staging server. When copying, preserve the permissions and file ownership.
- Posit recommends the use of
rsync
with the-a
flag copy the data. This syncs directories recursively and preserve symbolic links, groups, ownership, and permissions.
- Posit recommends the use of
Sometimes, there can be additional drives mounted on the productions server or additional shared project directories, both of which have user data as well. These directories include files with restrictive read and write permissions. You need to use
rsync
in root mode to move over the entirety of the directory.Copy the production product database to the staging database location:
If the production product is using SQLite, the default location for this is under the
/var/lib/rstudio-server
directory. However, you can change the default location by editing the directory configuration in the production/etc/rstudio/database.conf
configuration file.If the production product is using PostgreSQL, follow your organization’s protocols for restoring the production database in the staging database environment. Likely, this employs the
pg_restore
utility.
Workbench post-migration
If the new staging environment is running a different Linux OS, clear and rebuild the system-level, user-level, and any project-level libraries (e.g., those created with venv
, renv
, etc.) where R and Python packages are stored.
Packages shouldn’t be copied over from one environment to another because they were built specific to the OS in which they were installed.
If the new staging environment is running a different Linux OS, clear and rebuild the content runtime caches where packages required for content are stored. Complete this as part of the migration.
Rebuild site libraries
If your team uses a shared site library for packages, install the system libraries identified in the Pre-Migration tasks in the staging environment.
Remove user libraries
Remove the libraries installed in users’ home directories because they were built under the old operating system.
Delete the package libraries located in:
~/R
~/.local/lib/python3.*
These libraries repopulate as the users work in their new environment and download packages anew. By configuring Workbench to use binaries from Package Manager, the installation process is faster and less prone to issues than if the packages installed from source.
Rebuild renv
and venv
libraries
If users have project-level libraries installed using venv
(Python) or renv
(R), they must reinstall these libraries because they were built under the old operating system.
An renv
project library is rebuilt by the user. This can be done using these commands, run in the R console:
# Delete the existing libraries
unlink("renv/library", recursive=TRUE)
# Restart R session
.rs.restartR()
# Re-install libraries
::restore(rebuild = TRUE) renv
A venv
project library is rebuilt by the user. This can be done using these commands, run in the terminal:
# Activate the existing venv
source .venv/bin/activate
# Make note of all installed packages
python -m pip freeze > requirements-freeze.txt
# Deactivate the venv and delete
deactivate
rm -rf .venv/
# Create a new virtual environment
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip wheel setuptools
python -m pip install -r requirements-freeze.txt
Next Step:s
Proceed to Rollout