Upgrade and/or Migrate Posit Team — Phase 2, Migrate content and data

flowchart LR
    A((0))
    B((1))
    C(("✔"))
    D((2))
    E(("#9829;"))
    A---B
    B---C
    C---D
    D---E
    A ~~~|"<a style="text-decoration:none" href="index.html">Overview</a>"|A
    B ~~~|"<a style="text-decoration:none" href="phase1-upgrade.html">Phase 1\n Install\n Upgraded\n Products</a>"|B
    C ~~~|"<a style="text-decoration:none" href="phase1-test.html">Test\n Phase 1</a>"|C
    D ~~~|"<b>Phase 2\n Migrate\n Data</b>"|D
    E ~~~|"<a style="text-decoration:none" href="rollout.html">Rollout</a>"|E

    click A href "index.html"
    click B href "phase1-upgrade.html"
    click C href "phase1-test.html"
    %% click D href "../phase2-migrate.html"
    click E href "rollout.html"

    classDef activePage fill:#447099, stroke:#D1DBE5, color:#fff, padding:4px
    classDef inactivePage fill:#D1DBE5, stroke:#447099, color:#151515, padding:4px
    classDef activeDesc font-size:20px
    class A,B,C,E inactivePage
    class D activePage

Don’t proceed with Phase 2 migration activities until acceptance testing in Phase 1 is complete.

Notice

To preserve consistent state between the content data and product database, the production environment must be shut down when snapshots are taken for migration purposes. The production environment can be turned back on following snapshots. Be aware that any changes made after this point aren’t captured in the new environment. Enact a separate plan to capture any new work or deployments in production occurring during this time.

Package Manager migration tasks

Do I need to migrate Package Manager?

To understand the necessity of migrating your production Posit Package Manager data and database to the upgraded staging Package Manager server, close communication with end users is required. It’s fundamental to understand that the data and content migrated includes (if present):

  • Repositories and sources, including:
    • Any packages served from Git-based sources.
    • Any internally developed packages.
    • Specialized repositories, such as curated, subset, blended, or date-locked.
    • Repository rules such as allow-lists, block-lists, or vulnerability reporting.
    • Metrics and usage history.
    • Hashes identifying specific frozen repositories, which end users and deployed content could rely on for reproducibility. These unique hashes are created for any date-based snapshot of a repository that’s not a CRAN source or a PyPI source.

After conferring with your end users, you could determine that it’s acceptable to configure Package Manager’s repositories and sources anew, rather than migrate. If this is the case, refer to the Package Manager Quick Start to configure repositories for your users’ requirements. At a minimum, we recommend configuring CRAN and PyPI repositories. Your organization could have additional requirements for security or reproducibility that would necessitate additional repositories. Refer to the Package Manager Quick Start for more information.

Package Manager pre-migration

  1. Follow the steps to upgrade Package Manager to the latest supported version.

  2. Communicate anticipated downtime to end users.

  3. Run the rspm evict --type=clear command to trigger an immediate cache eviction, this should limit the number of files that need to be migrated by rsync in the steps below.

  4. Stop the production Package Manager service and perform a backup. It’s necessary to stop the service to ensure consistent state between the content data and metadata in the product database.

  5. If production Package Manager uses PostgreSQL for product database, follow your organization’s protocols for backing up PostgreSQL. Likely this employs the pg_dump utility.

  6. Stop the staging Package Manager service and perform a backup of the data directory and product database.

  7. If you restart the production Package Manager service following the backup, any changes made after this point aren’t captured in the new staging environment. Enact a separate plan to capture any new data in production occurring during this time.

Package Manager migration

Acceptance testing conducted in Phase 1 could have populated data in the staging product database and staging data directory. The migration steps that follow replace any data in the staging environment with production data.

  1. Copy the production Data Directory to the staging environment. When copying, preserve the permissions and file ownership. The location of the default Data Directory is: /var/lib/rstudio-pm. However, change the location is possible by editing the Server.DataDir configuration in the production rstudio-pm.gcfg configuration file:

    1. Posit recommends the use of rsync with the -a flag to copy the data. This syncs directories recursively and preserve symbolic links, groups, ownership, and permissions.

    2. The Data Directory includes files with restrictive read and write permissions. You need to use rsync in root mode to move over the entirety of the directory.

  2. Copy the production product database to the staging database location:

    1. If the production product is using SQLite, the default location for this is within the Data Directory at /var/lib/rstudio-pm. However, you can change the default locationby editing the SQLite.Dir configuration in the production rstudio-pm.gcfg configuration file.

    2. If the production product is using PostgreSQL, follow your organization’s protocols for restoring the production database in the staging database environment. Likely, this employs the pg_restore utility.

Connect migration tasks

Do I need to migrate Connect?

To understand the necessity of migrating your production Posit Connect data and database to the upgraded staging Connect server, close communication with end users is required. It’s fundamental to understand that the data and content migrated includes:

  • All content deployed to the Connect server, including rendered variants and previous versions.

  • Users and groups.

  • Content metadata and settings, including:

    • Content access control lists.
    • Scheduled job settings.
    • Notification preferences.
  • Metrics, logs, audit, instrumentation, and usage history.

If the new staging environment is running a different Linux OS, clear and rebuild the content runtime caches where packages required for content are stored. Complete this as part of the migration.

After conferring with your end users, you could determine that it’s acceptable to redeploy content to the new staging server rather than migrate. The amount of content deployed to the production server. Consider the burden of reestablishing content-level settings against the effort of migration and cache rebuilding.

Connect pre-migration

  1. Communicate anticipated downtime to end users.

  2. Identify any content that could be using a deprecated version of Python. Redeploy content with a newer version or Python either pre-migration, or post-migration.

  3. Stop the production Connect service and perform a backup. It’s necessary to stop the service to ensure consistent state between the content data and metadata in the product database.

  4. If production Connect uses PostgreSQL for product database, follow your organization’s protocols for backing up PostgreSQL. Likely this employs the pg_dump utility.

  5. Stop the staging Connect service and perform a backup of the data directory and product database.

  6. If you restart the production Connect service following the backup, the new staging environment doesn’t capture any changes made after this point. Enact a separate plan to capture any new data in production occurring during this time.

Connect migration

Acceptance testing conducted in Phase 1 could have populated data in the staging product database and staging data directory. The migration steps that follow replace any data in the staging environment with production data.

  1. Depending on your configuration, additional data may need migration as well. In general, we can’t enumerate everything that’s important on your server, but this may include data directories, service account details, database connection credentials, or if users are provisioned on the system, user/homedirs. This may need to happen prior to other migration steps so that the system has the appropriate configuration.

    1. Posit recommends the use of rsync with the -a flag to copy the data. This syncs directories recursively and preserve symbolic links, groups, ownership, and permissions.
  2. Copy the production Data Directory to the staging environment. When copying, preserve the permissions and file ownership. The location of the default Data Directory is: /var/lib/rstudio-connect. However, you can change the location by editing the Server.DataDir configuration in the production rstudio-connect.gcfg configuration file:

    1. Posit recommends the use of rsync with the -a flag to copy the data. This syncs directories recursively and preserve symbolic links, groups, ownership, and permissions.

    2. The Data Directory includes files with restrictive read and write permissions. Use rsync in root mode to move over the entirety of the directory.

  3. Copy the production product database to the staging database location:

    1. If the production product is using SQLite, the default location for this is within the Data Directory at /var/lib/rstudio-connect. However, you can change by editing the SQLite.Dir configuration in the production rstudio-connect.gcfg configuration file.

    2. If the production product is using PostgreSQL, follow your organization’s protocols for restoring the production database in the staging database environment. Likely this employs the pg_restore utility

Rebuild content runtimes

The content runtime caches containing the Python and R package dependencies for the production content deployed on Connect are incompatible with the staging server’s Linux distribution and system libraries.

If you have changed the Linux OS, you must clear these runtime caches. After clearing the runtime caches, content is automatically rebuilt when it’s first run, however Posit recommends that you rebuild the cache for your most important content. A preemptive rebuild means that visitors don’t experience a delayed load. The rebuild is useful for interactive applications and APIs. Refer to Runtime Cache Workflow for instructions.

Connect post-migration activities

Issues can arise in rebuilding content. Common ones include:

  • Content builds under a now-deprecated version of Python or R.
  • A package is no longer be available, or available for the new OS.
  • A package installed from a source that’s no longer available.

These issues depend on the application code. Resolve these issues by working directly with the application developers.

If the Linux OS changed and the content runtime cache wasn’t cleared, content fails to render.

Workbench migration tasks

Do I need to migrate Workbench?

Consider the migration of the product database and user data independent of one another for Workbench.

For Posit Workbench, review the Data Dictionary to understand what information the product database contains. Much of the database contents are short-lived, such as active user info and temporary authentication keys. Upon a user’s next login, this information is recreated. However, the database also includes user status information such as locked state or admin role. Considering this information, you could determine that it’s acceptable to forgo database migration.

Users’ home directories stores user data. Confer with your end users to scope how much content they store in their home directories, and if it’s backed by a remote-host version control system. Some version control systems include GitHub, GitLab, Bitbucket, etc. Your users could find it acceptable to forgo migration of their user data if their critical project work is stored elsewhere, or if it’s recoverable using a version control system.

Importantly, if the new staging environment is running a different Linux OS, clear and rebuild the system-level, user-level, and any project-level libraries (e.g., those created with venv, renv) where Python and R packages are stored. Do this as part of the post-migration process detailed below.

If the new staging environment is running a different Linux OS, clear and rebuild the content runtime caches where packages required for content are stored. Complete this as part of the migration.

Workbench pre-migration

  1. Communicate anticipated downtime to end users.

  2. Stop the production Workbench service: sudo rstudio-server stop.

  3. Suspend all sessions on the production Workbench server: sudo rstudio-server suspend-all.

  4. Perform a backup, including the user data stored in each users’ home directory (~/. for each user) and any shared project data locations. It’s necessary to stop the service to ensure consistent state between the content data and metadata in the product database.

  5. If production Workbench uses PostgreSQL for product database, follow your organization’s protocols for backing up PostgreSQL. Likely this employs the pg_dump utility.

  6. If you use a shared site library for packages, identify any Python and R packages installed at the system level on the production server:

    1. For each version of Python installed, identify and record the packages installed in the site library. The default location for these packages is in opt/Python/<PYTHON_VERSION>

    2. For each version of R installed, identify and record the packages installed in the site library. The default location for these packages is in /opt/R/<R_VERSION>/

  7. If you restart the production Workbench service following the backup, the new staging environment doesn’t capture any changes made after this point. Enact a separate plan to capture any new data in production occurring during this time.

Workbench migration

Acceptance testing conducted in Phase 1 could have populated data in the staging product database and staging data directory. The migration steps that follow replace any data in the staging environment with production data.

  1. Copy the production user data to the staging environment. Most user data in Worbench is present in user home directories. Make sure those users are provisioned on the staging server. When copying, preserve the permissions and file ownership.

    1. Posit recommends the use of rsync with the -a flag copy the data. This syncs directories recursively and preserve symbolic links, groups, ownership, and permissions.
  2. Sometimes, there can be additional drives mounted on the productions server or additional shared project directories, both of which have user data as well. These directories include files with restrictive read and write permissions. You need to use rsync in root mode to move over the entirety of the directory.

  3. Copy the production product database to the staging database location:

    1. If the production product is using SQLite, the default location for this is under the /var/lib/rstudio-server directory. However, you can change the default location by editing the directory configuration in the production /etc/rstudio/database.conf configuration file.

    2. If the production product is using PostgreSQL, follow your organization’s protocols for restoring the production database in the staging database environment. Likely, this employs the pg_restore utility.

Workbench post-migration

If the new staging environment is running a different Linux OS, clear and rebuild the system-level, user-level, and any project-level libraries (e.g., those created with venv, renv, etc.) where R and Python packages are stored.

Packages shouldn’t be copied over from one environment to another because they were built specific to the OS in which they were installed.

If the new staging environment is running a different Linux OS, clear and rebuild the content runtime caches where packages required for content are stored. Complete this as part of the migration.

Rebuild site libraries

If your team uses a shared site library for packages, install the system libraries identified in the Pre-Migration tasks in the staging environment.  

Remove user libraries

Remove the libraries installed in users’ home directories because they were built under the old operating system.

Delete the package libraries located in:

  • ~/R
  • ~/.local/lib/python3.*

These libraries repopulate as the users work in their new environment and download packages anew. By configuring Workbench to use binaries from Package Manager, the installation process is faster and less prone to issues than if the packages installed from source.

Rebuild renv and venv libraries

If users have project-level libraries installed using venv (Python) or renv (R), they must reinstall these libraries because they were built under the old operating system.

An renv project library is rebuilt by the user. This can be done using these commands, run in the R console:

# Delete the existing libraries
unlink("renv/library", recursive=TRUE)

# Restart R session
.rs.restartR()

# Re-install libraries
renv::restore(rebuild = TRUE)

A venv project library is rebuilt by the user. This can be done using these commands, run in the terminal:

# Activate the existing venv
source .venv/bin/activate

# Make note of all installed packages
python -m pip freeze > requirements-freeze.txt

# Deactivate the venv and delete
deactivate
rm -rf .venv/

# Create a new virtual environment
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip wheel setuptools
python -m pip install -r requirements-freeze.txt


Next Step:s
Proceed to Rollout

Back to top