Skip to content

Repositories and Sources#

graph TD
    A(CRAN-type Repository) --> |Sources| B(cran)
        B --> L(A3)
        B --> |Packages|M(shiny)
        B --> N(covr)

Repositories are the primary vehicle for organizing and distributing packages. These are views of the data that your client (R, BiocManager, pip, etc.) uses to find and install packages. For more information on packages and repositories see the Package Ecosystem section.

In Posit Package Manager, repositories include one or more sources. A source is a collection of package files from one place. For example, CRAN and PyPI are places where a source may draw files from.

The documentation in this chapter outlines the types and structure of repositories and sources.

Repository types#

Package Manager supports repositories of three types:

  • R - A CRAN-like repository for R packages. This is the default repository type.
  • Bioconductor - A repository for Bioconductor R packages that supports use by BiocManager.
  • Python - A repository that provides the Simple Repository API specified by PEP 503. Pip and other tools that draw from PyPI use this interface.

Repository types may only include specific types of sources. For example, an R repository cannot include a PyPI source.

Note

A repository and a source may have the same name. It is common to have a repo named "cran" along with the pre-defined source named "cran." If this confuses you, be sure to name your repository something other than the source you intend to subscribe to.

Repository Structure#

A repository appears to the end user as just a set of files served from disk. Package Manager does not create repositories on disk. Instead, Package Manager maintains a single copy of each source and binary package, and uses a database and specialized web server to handle HTTP requests from R and Python. Although a repository may include many sources, it appears as one collection of packages to the client. Some example requests that can be served by the Package Manager are in the appendix.

Most importantly, these request types provide the interfaces that client software expects for installing packages. A CRAN-like repository means users can access and install packages using their regular R functions: install.packages, available.packages, renv::install, and devtools::install. A Bioconductor repository allows installing packages with BiocManager::install. A Python repository allows using pip, pipenv, poetry, or other clients for the PEP 503 interface that PyPI provides.

Repository Authentication

You cannot use Package Manager to restrict repositories to specific users. However, we suggest trying to do this on a policy level, by asking that only specific groups use specific repository URLs. Reference the Allow Users to Optionally Add Additional Repos section for steps.

Repository Versioning#

Repository versioning provides a way to reference several sources in one repository at particular snapshot points for each source. This is a useful feature for reproducibility: you can obtain a view of the packages that were available on some past date and ensure that you have the same packages that you used when you ran a particular analysis. Software updates may change interfaces or other behavior, so controlling your environment can be critically important to keeping you code working. Versioning is available for all repository types except Bioconductor, since Bioconductor includes its own versioning scheme.

Note

To reiterate, for reproducibility, we recommend configuring R to use a repository URL frozen to a particular date.

For repositories that support it, the web UI provides a simple interface to choose a particular date to freeze to. A guide to this UI is in the User Guide.

Sources#


graph TD

cranlike("/custom-cran")
curated(curated cran source)
local(internal R packages)

curated & local --- cranlike

pypilike("/custom-pypi")
curatedpy(curated pypi source)
localpy(internal python packages)

curatedpy & localpy --- pypilike

Package Manager repositories include one or more sources. Sources are collections of files from a single location (e.g. CRAN or PyPI). Package Manager tracks every change to files in a source and groups each set of changed files as a snapshot. Together, these snapshots create a full versioned history of each source. There are currently seven types of sources:

  • cran source - A single cran source is automatically created. This source contains metadata and packages from the Posit Package Service. The source can be used directly in a repository to give users access to all CRAN packages, or it can used indirectly by curated-cran sources.
  • pypi source - A single PyPI source is automatically created. This source contains metadata and packages from the Posit Package Service. The source can be used directly in a Python repository to give users access to all PyPI packages.
  • bioconductor source - Bioconductor sources combine one or more internal sources (such as "bioc", "data/annotation", etc.) into a "meta-source" to represent a given bioconductor version. These meta-sources behave similarly to cran sources. See the Bioconductor Source section for more information.
  • curated-cran source - Enhanced Advanced A curated CRAN source exposes an approved set of CRAN packages. Administrators can add packages to the set and update the set. See the Curated CRAN Source section for more information.
  • cran-snapshot source - Enhanced Advanced A CRAN source that is pinned to a specific CRAN snapshot. Administrators can periodically update the snapshot to which the cran-snapshot source is pinned. See the CRAN Snapshot Source section for more information.
  • curated-pypi source - Enhanced Advanced A curated PyPI source exposes an approved set of PyPI packages defined by a requirements.txt file. See the Curated PyPI Sources documentation for more information.
  • local source - A local source is used as a mechanism to distribute locally developed packages or other packages without native support in Package Manager. Administrators add packages to local sources by specifying a path to a package's tar file.
  • local-python source - A local Python source is used to distribute internally-developed Python packages within Package Manager. Administrators can upload their packages with the CLI or with twine. See Local Python Packages for more information.
  • git source - Advanced A git source allows Package Manager to automatically make packages in Git available to R users through install.packages (without requiring devtools). Git sources work for internal packages as well as external sites such as GitHub. Packages can be automatically updated on each commit or when a new Git tag is pushed.
  • git-python source - Advanced A git-python source allows Package Manager to automatically make packages in Git available to Python users. Git Python sources work for internal packages as well as external sites such as GitHub. Packages can be automatically updated on each commit or when a new Git tag is pushed.

Note

While the CRAN and PyPI sources are created automatically, an administrator must use the CLI before any metadata or packages are downloaded to Package Manager. Refer to the CLI section for more information on making CRAN available through Package Manager.

Snapshot Identifiers#

Package Manager has two ways to specify a snapshot associated with a repository:

  • Transaction ID: the internal numeric identifier for the action of importing the snapshot information. All sources share the same table, so transaction IDs are never duplicated between sources.
  • Date alias: A date in YYYY-MM-DD format, which obtains the latest snapshot that occurred on the specified date.

All source types support specifying snapshots using transaction IDs, but not all sources support date aliases. To date, the CRAN and PyPI sources are the only ones that currently support date aliases.

Other source types do not currently support date aliases because of the difficulty in making date information relevant across multiple Package Manager installations. For example, the date associated with curated CRAN and CRAN snapshot sources is the date when the source was created or updated using the rspm update command. For Git sources, the date is when a Git package was built and updated, not the date of the commit or release that was built. For local sources, the date is when a package was added, updated, or removed. All of these are dates/times are highly dependent on each local installation.

These identifiers are very important when considering reproducibility of referenced content. A transaction ID is highly unlikely to mean the same thing across two different Package Manager installations (for instance, staging and production). A date alias will generally be robust, and facilitates handling snapshot references across different Package Manager installations. The date aliases are the default behavior where possible, but you can choose to freeze to transaction IDs instead. Please refer to the User Guide for more information.

Note

When a date alias without a matching snapshot date is used in a URL, Package Manager attempts to find the nearest-matching snapshot that is older than the date in the URL. If a nearest-matching snapshot is found, Package Manager will serve the content from that snapshot. Package Manager will not serve content if the URL references a snapshot date that is older than the oldest available snapshot or newer than the newest available snapshot.

Repositories with Multiple Sources#

A repository can have more than one source. To serve both local packages and CRAN packages from a single repository, create a single repository that subscribes to multiple sources. For example:

  • all (a repository)
    • internal (local source)
    • cran (CRAN source)

The "all" repository above gives users access to both local and CRAN packages, and its PACKAGES list could be accessed, for example, at https://packagemanager.posit.co/all/latest/src/contrib/PACKAGES. A repository subscribes to sources, which means that changes to a source will be reflected in the repository. For example, if an administrator adds a new package to the internal source, users will automatically be able to access the new package via the all repository.

Note

Source types within a repository must be compatible. For example, you can combine CRAN-type sources in a CRAN-type repository, such as cran, local and git sources, but you cannot include those sources in a python-type repository.

Package Conflicts Between Sources#

If a repository has multiple sources and a package with the same name exists in both sources, Package Manager eliminates duplicates, giving preference in the order the sources are subscribed. In the example repository above, if a package named "plumber" exists in both the "cran" and "internal" sources, the "plumber" package from the "internal" source would be served and listed since it is the first source for the repository. The same conflict resolution occurs as sources change. For example, in the sample above, even if a new package is added to CRAN with the same name as an internal package, the internal package will continue to be served. The precedence is also maintained during updates. In the example above, the internal version of plumber will continue to be served even if the CRAN version of plumber is updated. The order of sources within a repository can be re-arranged using the reorder command.

What is the Posit Package Service?#

flowchart TD

BioConductor & CRAN & PyPI --- sync

ppm[Posit Package Manager]
sync[Posit Package Service] --- ppm

subgraph private[Private Network&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp]

ppm --- Workbench[Posit Workbench] & Connect[Posit Connect]

end

Package Manager doesn't download packages directly from CRAN, Bioconductor, or PyPI. Instead, Posit maintains a curated S3 bucket that contains metadata about CRAN, Bioconductor, and PyPI, in addition to package tar files. The metadata is used to track day-to-day changes.

Reference the Air-Gapped Package Manager section if your environment does not have access to the Posit Package Service

During a sync, the metadata is downloaded to Package Manager. The metadata is compared against the Package Manager database to determine what changes need to be applied. Package tarballs are then downloaded to the cache on demand.

Reference the Package Security section for details about the security measures that are in place for the Posit Package Service.

Publishing Snapshots to the Posit Package Service#

The Package Manager team evaluates CRAN, Bioconductor, and PyPI each business day and publishes new snapshots when updates are available. Any Package Manager installations then sync these snapshots based on:

  • CRAN and Bioconductor use their configured schedules.
  • PyPI is synchronized automatically as updates become available.

For example, suppose a CRAN package gets updated on Saturday. A new snapshot will be published to the Posit Package Service sometime on Monday, usually Monday afternoon (roughly America Central Time). Then, the user's default CRAN sync will pick up the change on Tuesday at 12:00am UTC. Occasionally there are very important updates, and extra snapshot is generated to make the updates available sooner. The timing of each snapshot varies based on the number of changes and the number of dependencies involved.

To be sure Package Manager synchronizes new snapshots as soon as possible, consider setting your sync schedule to occur more than once per day.

Why is the newest package version not available yet?

The latest version of a package from CRAN, Bioconductor, or PyPI may take several days to appear in Package Manager due to our snapshotting process.

Updates are evaluated each business day. However, this schedule is subject to change without notice. Large updates, external repository errors or inconsistencies, and other unanticipated situations may cause delays.

Package Fetching#

Package Manager fetches packages on demand as they are requested by end users. Package Manager synchronizes CRAN, Bioconductor, and PyPI metadata to serve as the source of truth for package availability. The benefit of on-demand fetching is a smaller footprint in terms of network bandwidth and disk space.

Package Caching#

Package Manager downloads each version of a package only once, and always checks the local cache to see if the required tar file is already available.

Synchronizing with the Posit Package Service#

The CRAN and Bioconductor sources are synchronized according to a schedule set using the SyncSchedule properties in the Package Manager configuration file. These properties accept a string in crontab format. Refer to the appropriate appendix section below:

By default, the configuration file includes crontabs that will cause Package Manager to sync once a day (early morning in the server's timezone), if any of the following conditions have been met:

  • For CRAN:
    • Any repository subscribes to the cran source.
    • A "curated-cran" source is used by any repo.
    • A manual sync has been run with the sync --type=cran command.
  • For Bioconductor:
    • Any repository subscribes to a Bioconductor source.
    • A Bioconductor repo has been created.

A sync schedule will not be applied if the above conditions are not met. If you only want manual syncs, change the configuration file to have a blank value for SyncSchedule:

/etc/rstudio-pm/rstudio-pm.gcfg
[CRAN]
SyncSchedule = ""

[Bioconductor]
SyncSchedule = ""

Note

The PyPI source is automatically synchronized. For more information, reference the PyPI Automatic Synchronization section.

The SyncSchedule property does not necessarily determine when a repository will make updated packages available to users. For example, if a repository subscribes directly to the cran source, users will see updates according to the sync schedule. In contrast, if the repository subscribes to a curated CRAN source, an administrator must explicitly update the source for updates to become available.

In addition, updating the repository does not automatically push updated packages to R clients. A repository specifies what packages are available, but the R user is in control of when and how to update the packages used by a project.

Refer to the section on Managing Change Control for more information.

Package Manager keeps track of old versions of packages as well. Old versions of packages are available in the repository's archive, and are listed in the Package Manager web UI. This allows users to roll back updates if necessary or install packages as they existed at a prior time.

Synchronizing More Frequently#

Source snapshots are only retrieved based on your server uptime and configured SyncSchedule, which could cause delays between new snapshots being available and your server downloading them. If you absolutely need the latest snapshots as soon as possible, increase the SyncSchedule to every few hours. For example, the crontab 0 */4 * * * would run every four hours.

Source Type Details#

The CRAN Source#

This content has moved to a dedicated source details page.

Bioconductor Sources#

This content has moved to a dedicated source details page.

The PyPI Source#

This content has moved to a dedicated source details page.

Curated CRAN Sources#

Enhanced Advanced

This content has moved to a dedicated source details page.

CRAN Snapshot Sources#

Enhanced Advanced

This content has moved to a dedicated source details page.

Curated PyPI Sources#

Enhanced Advanced

This content is available at Curated PyPI Sources.

Git Sources#

Advanced

This content has moved to the Building Git Packages page.

Local Sources#

This content is available at Local Sources.

Local Python Sources#

This content is available at Local Python Sources.