Skip to content

Curated PyPI Sources

Enhanced Advanced

This section provides an overview of what curated-pypi sources are, why they are useful, and how to use them. If you'd just like to get started, reference the Quick Start section on how to create your first curated subset of PyPI.

Overview#

Curated PyPI sources are based on a current mirror of PyPI. To learn more about PyPI, visit the PyPI Mirror documentation to understand how the Posit Package Manager PyPI source works. Since PyPI has over 400,000 packages, it can be useful to only include certain packages and versions within a source. This is especially helpful in the context of package security, where only verified sets of packages are allowed.

Creating a Curated PyPI Source#

Terminal
$ rspm create source --name=pypi-subset --type=curated-pypi
<< Source 'pypi-subset':
<<  Type:  Curated PyPI

Curated PyPI sources don't need to be pinned to a specific snapshot date at the time of creation; any date can be picked when adding packages with rspm update (described below). Once the source has been created, be sure to subscribe a repository to the source to make the packages available to users:

Terminal
# Create a repository:
$ rspm create repo --name=pypi --type=python --description='Access curated PyPI packages'

# Subscribe a repository to the curated-pypi source:
$ rspm subscribe --repo=pypi --source=pypi-subset

Including Packages in a Curated PyPI Source#

Packages are included in a curated-pypi source by uploading a requirements.txt definition with rspm update. This section explains at how a requirements file is defined and also discusses how to use a requirements file to include packages in a curated-pypi source.

Requirements Files#

A requirements.txt can be created from scratch, or you can use a pre-existing file that an organization or team already uses to define local environments. The requirements.txt format that Package Manager looks at is defined as:

Requirements Format
[package name] [optionally: version constraints]

As an example, a requirements.txt file could look like:

requirements.txt
shiny
tensorflow >= 2.4.0, < 2.5, != 2.4.2
numpy == 1.24.2

-r requirements2.txt

This fetches and installs:

  • All available versions of shiny.
  • All versions of tensorflow greater than or equal to 2.4.0, less than 2.5, and explictly not include 2.4.2.
  • Only numpy version 1.24.2.
  • All packages from requirements2.txt.

As shown in the example above, a package doesn't need to have any version constraints defined. It can also have as many version constraints as needed. The versions made available to Package Manager will depend on what is available at the snapshot date specified when updating the source.

All Python version parsing and matching criteria is based on PEP-440. Refer to the PEP-440 documentation for information on version formatting and constraints. For more information on the Requirements File Format, refer to pip's documentation.

Note

Not everything defined in the Requirements File Format specification is supported in Package Manager. The curated-pypi source only parses package names, version ranges, and recursive file references. Any other definitions (e.g., extras, option flags, environment markers) within an uploaded requirements.txt file is ignored.

The requirements.txt file also supports declaring multiple references of the same package with different version constraints:

requirements.txt
tensorflow == 2.4.2
...
tensorflow == 2.4.3

This will be treated as an OR operator, leading the curated-pypi source to evaluate the defined version constraints as:

tensorflow == 2.4.2 || tensorflow == 2.4.3

In this example, Package Manager will pull in both version 2.4.2 and version 2.4.3. This can be helpful when combining requirements.txt files from multiple sources, ensuring all versions you are expecting to be included.

Note

Be careful when referencing a package multiple times when using a != constraint. As an example:

requirements.txt
tensorflow >= 2.0.0, < 3.0.0
...
tensorflow != 2.4.2

This will still include version 2.4.2 because it is being evaluated as:

tensorflow >= 2.0.0, < 3.0.0 || tensorflow != 2.4.2

To guarantee that version 2.4.2 is excluded, include all version constraints on a single line so Package Manager evaluates all constraints together:

requirements.txt
tensorflow >= 2.0.0, < 3.0.0, != 2.4.2

Using Pipfiles Instead#

If you already have a Pipfile or Pipfile.lock defined, then you may prefer to use that. Although Package Manager doesn't support uploading the Pipfile directly, there are a few methods to convert them to the requirements.txt format.

One method is to run pip freeze from within the defined pipenv environment:

Terminal
$ pipenv run pip freeze > requirements.txt

Another alternative could be to use jq to parse the Pipfile.lock file and turn it into a requirements.txt file:

Terminal
$ jq -r '.default
        | to_entries[]
        | .key + .value.version' \
    Pipfile.lock > requirements.txt

These methods should be useful to get your package specifications into a format that Package Manager can handle.

Updating a Curated PyPI Source#

To make packages available in a Curated PyPI source, all that is necessary is to run rspm update with a requirements file for a specific PyPI snapshot date. Package Manager allows running a dry-run before committing the changes to the source:

Terminal
# Do a dry-run to visualize the changes to the source before doing them
$ rspm update --source=pypi-subset --file-in=/path/to/requirements.txt --snapshot=2023-03-24

A preview of the changes is presented:

Output
Packages from 'requirements.txt' to update source 'pypi-subset' at PyPI snapshot date '2023-03-24':

Name       Version
numpy      1.24.2
shiny      0.1, 0.2.1, 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8, 0.2.9, 0.2.10
tensorflow 2.4.0, 2.4.1, 2.4.3, 2.4.4

If the output above looks correct, execute this command again with the --commit flag to update the source with the new set of packages.

Note

If your requirements.txt file includes more than 1,000 packages, the output of the update command is simplified for performance purposes.

To commit the changes, repeat the command, adding the --commit flag:

Terminal
# Now commit the changes to the source:
$ rspm update --source=pypi-subset --file-in=/path/to/requirements.txt --snapshot=2023-03-24 --commit

The finalized contents of the source are then printed:

Output
Successfully updated source 'pypi-subset' at PyPI snapshot date '2023-03-24' with the following packages from 'requirements.txt':

Name       Version
numpy      1.24.2
shiny      0.1, 0.2.1, 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8, 0.2.9, 0.2.10
tensorflow 2.4.0, 2.4.1, 2.4.3, 2.4.4

Note

Running rspm update on a Curated PyPI source will overwrite the source with only the packages defined in your requirements.txt file. However, previous snapshots of the source are still available with a pinned repo URL.

To update the source to a different snapshot date, use the update command again:

Terminal
# Update packages in a curated-pypi source:
$ rspm update --source=pypi-subset --file-in=/path/to/requirements.txt --snapshot=2021-02-03 --commit

Curated PyPI sources can be pinned to any date for which Posit has a PyPI snapshot (typically, once per weekday). Curated PyPI sources also support using any date, regardless of the previously used snapshot dates. If the source was initially set to 2021-02-03, it can then be set to a later date with --snapshot=2022-06-01. If later you would like to pin it back to the original date used, that can be done by running rspm update again with --snapshot=2021-02-03.

Tip

This allows you to set the Curated PyPI source to any date where a PyPI snapshot has been taken on our servers. If you are trying to pin to a version of a package that doesn't exist on PyPI anymore, try pinning to a date when it existed.