Skip to content

PyPI Mirroring and Local Python Packages#

Package Manager has several methods for distributing Python packages:

  1. Mirroring the public PyPI repository
  2. Adding internally-built Python packages locally

PyPI Mirroring#

Package Manager supports creating a Python repository with a source that mirrors the Python Package Index (PyPI).

Adding a PyPI repository to a Package Manager installation will:

  • Provide a full mirror of all packages available on PyPI
  • Enable fully reproducible dependency management through historic PyPI snapshots
  • Locally cache all downloaded Python packages for quicker installs

System Requirements#

In addition to the recommended system requirements, supporting Python packages will require additional disk storage depending on the number of packages being used.

Warning

The entirety of PyPI currently requires about 10 TB of storage. Your actual storage needs will depend on your usage. Deep learning packages, such as Tensorflow and PyTorch, are notoriously large, with hundreds of gigabytes needed for each project's collection of files. If you do not anticipate using deep learning packages, a starting storage size of 50 GB is likely adequate. If you do intend to use deep learning packages, you should plan for 500 GB or more.

Quickstart#

The quickest way to make PyPI packages available for your Package Manager installation is by running these commands:

Terminal
$ rspm create repo --name=pypi --type=python --description='Access PyPI packages'
$ rspm subscribe --repo=pypi --source=pypi
$ rspm sync --type=pypi

For more information about these commands, scroll down to the Python PyPI Repository section.

Note

If Package Manager is served from a subdirectory like /rspm, the Server.Address configuration option needs to be set in the configuration file for PyPI to generate URLs properly.

User Configuration#

Once a Python repository has been successfully created and synced with the Posit Package Service, users need to configure their local system and pip to install from Package Manager.

To find instructions specific to your Package Manager installation:

  1. Follow the Quickstart or Creating a Python PyPI Repository instructions.
  2. Navigate to the Package Manager homepage.
  3. Select the relevant Python repository from the sidebar.
  4. Click the Setup button at the top of the page.

In general, users can either install from Package Manager in a one-off basis:

Terminal
$ pip install --index-url http(s)://[HOST:PORT]/latest/simple PACKAGE-TO-INSTALL

or configure pip to use Package Manager in a persistent manner:

Terminal
$ pip config set global.index-url http(s)://[HOST:PORT]/latest/simple

Note

If you use HTTP, pip will ignore your repository by default. Using only the configuration above, pip will show a warning message like this:

WARNING: The repository located at [HOST] is not a trusted or secure host and is being ignored. If this repository is available via HTTPS we recommend you use HTTPS instead, otherwise you may silence this warning and allow it anyway with '--trusted-host [HOST]'.

To configure pip to use the unencrypted HTTP Package Manager server, you must use the --trusted-host flag or configuration option.

Terminal
$ pip install --trusted-host [HOST] --index-url http://[HOST:PORT]/latest/simple PACKAGE-TO-INSTALL

or configure pip to use Package Manager in a persistent manner:

Terminal
$ pip config set global.index-url http://[HOST:PORT]/latest/simple
$ pip config set global.trusted-host [HOST]

Note

If you use HTTPS but do not provide your Package Manager installation with a valid SSL certificate, pip will throw SSL: CERTIFICATE_VERIFY_FAILED errors when installing packages, because it attempts to verify proper HTTPS configuration by default. To configure pip to ignore these errors, you need to use the --trusted-host flag or configuration option.

Terminal
$ pip install --trusted-host [HOST] --index-url https://[HOST:PORT]/latest/simple PACKAGE-TO-INSTALL

or configure pip to use Package Manager in a persistent manner:

Terminal
$ pip config set global.index-url https://[HOST:PORT]/latest/simple
$ pip config set global.trusted-host [HOST]

Creating a Python PyPI Repository#

In the Quickstart section above, we're performing the following operations:

  • Create a Python repository with a description:
Terminal
$ rspm create repo --name=pypi --type=python --description='Access PyPI packages'
<< Repository: pypi - Python
  • Subscribe the repository to the preconfigured PyPI source:
Terminal
$ rspm subscribe --repo=pypi --source=pypi
<< Repository: pypi
<< Sources:
<< --pypi (Python)
  • Ensure that Package Manager has the appropriate metadata using the sync command. Package Manager pulls packages and metadata from the Posit Package Service.
Terminal
$ rspm sync --type=pypi
<< Initiated PyPI synchronization for pypi. Depending on how much data has been previously synchronized, this could take a while. Actions will appear in the Package Manager UI as they are completed.

<< Snapshots for pypi: 0 / 34 [----------------------------------------------------------------------------------------------------------------------------------]
<< Packages in pypi snapshot: 14127 / 231734 [======>-------------------------------------------------------------------------------------------------------] 5m9

Note

If you try subscribing a non-Python type repository to a Python source, you'll get the error source type must be compatible with repository type.

The PyPI Source#

After syncing with the Posit Package Service, the local Package Manager installation will have all of the metadata, for all the packages on PyPI. Only when a package is requested, for example by pip, is it retrieved from the Posit Package Service.

Scheduled Synchronization#

By default, Package Manager will sync with the Posit Package Service once daily. This schedule can be configured using the PyPI SyncSchedule option, for example:

/etc/rstudio-pm/rstudio-pm.gcfg
[PyPI]
SyncSchedule = 0 1 * * *

Note

Although Package Manager automatically syncs daily, the Posit Package Service may not update packages every day.

Local Python Packages#

This guide will go in-depth about the various options and configurations available when using local-python sources and Twine. The quick-start guide also includes a section for getting started quickly with internally-built Python packages.

Enabling Remote Authentication#

To upload Python packages to Package Manager, the remote authentication configuration option must be enabled. In your configuration file, add the following lines:

/etc/rstudio-pm/rstudio-pm.gcfg
[Authentication]
APITokenAuth = true

This enables generating and using API tokens. To learn more about the configuration option, see APITokenAuth.

Remote Token Authentication#

An authentication token needs to be generated to allow a user to upload Python packages to the source. To create the token, run the command on the server:

Terminal
rspm create token --description="Local Python token" --sources=local-python-src --expires=30d
<< Generated an access token. Be sure to record this token immediately since you will not be able to retrieve it later.
<< eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJwYWNrYWdlbWFuYWdlciIsImV4cCI6MTY1ODU4MjA3OCwianRpIjoiYmM5ZTg1NGYtNGNlNy00Zjc4LTlhMmMtZDliYzRlYTQ0NGVkIiwiaWF0IjoxNjU1OTkwMDc4LCJpc3MiOiJodHRwOi8vbG9jYWxob3N0OjQyNDIvIiwic2NvcGVzIjp7InNvdXJjZXMiOiIzYTI4NjFhYi0xNWYwLTRjM2MtODZlMy0xNjNkMTY0ZDE0ZDYifX0.BWJXLworo44Nvfrh5a2hm_NIqgUoXTLjQlxyy7uaSWk

Local Python Sources#

Similiar to the local source for R packages, Package Manager has a local-python source type that can be used for uploading your local Python packages. Any repositories of the python type can subscribe to local-python sources, enabling you to create combined repos that subscribe to both PyPI and your internally-built Python packages. For example:

Terminal
<< # Create a local Python source:
$ rspm create source --name=local-python-src --type=local-python
<< Source 'local-python-src':
<<  Type:  Local Python
<< # Create a Python repository:
$ rspm create repo --name=python-repo --type=python --description='Access Python packages'
<< Repository: python-repo - Access Python packages - Python

<< # Subscribe the repository to the local Python source:
$ rspm subscribe --repo=python-repo --source=local-python-src
<< Repository: python-repo
<< Sources:
<< --local-python-src (Local Python)

<< # Sync and subscribe the repository to the PyPI source:
$ rspm sync --type=pypi
$ rspm subscribe --repo=python-repo --source=pypi
<< Repository: python-repo
<< Sources:
<< --local-python-src (Local Python)
<< --pypi (PyPI)

Note

Priority order of sources in a repository is important. If there is a package that is in both PyPI and the local Python source and the repository is subscribed to both, the source that subscribed to the repo first takes precedence by default. The order can be re-arranged by using the rspm reorder command. To find out more, see the command description in the CLI appendix.

Twine#

Twine is a tool that can be used to upload Python packages that have been built locally. To learn more about its full feature set, you can read the Twine documentation.

Note

Other Python upload tools like Poetry and Flit should work since Package Manager's API attempts to maintain compatibility with the PyPI Warehouse upload endpoint. However, Twine is the officially recommended and documented method of uploading Python packages to Package Manager.

To start, install Twine on the system that will be uploading packages. To do this, run:

Terminal
pip install twine

Twine Configuration#

There are several ways that Twine can be used to remotely authenticate with Package Manager:

  1. Using the TWINE_* environment variables
  2. Configuring a .pypirc file
  3. Authenticating with keyring
Using the TWINE_* environment variables#

The three environment variables Twine uses are:

  1. TWINE_REPOSITORY_URL
  2. TWINE_USERNAME
  3. TWINE_PASSWORD

The TWINE_REPOSITORY_URL environment variable is the address that is used to upload to Package Manager remotely, ending with the endpoint /upload/pypi/{source-name}. Since we are using token authentication, TWINE_USERNAME will always be set to __token__. Finally, TWINE_PASSWORD is set to the token that was generated above. To put this all together, the following is an example of what the variables could be set to:

Terminal
export TWINE_REPOSITORY_URL=https://packagemanager.posit.co/upload/pypi/local-python-src
export TWINE_USERNAME=__token__
export TWINE_PASSWORD=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJwYWNrYWdlbWFuYWdlciIsImV4cCI6MTY1ODU4MjA3OCwianRpIjoiYmM5ZTg1NGYtNGNlNy00Zjc4LTlhMmMtZDliYzRlYTQ0NGVkIiwiaWF0IjoxNjU1OTkwMDc4LCJpc3MiOiJodHRwOi8vbG9jYWxob3N0OjQyNDIvIiwic2NvcGVzIjp7InNvdXJjZXMiOiIzYTI4NjFhYi0xNWYwLTRjM2MtODZlMy0xNjNkMTY0ZDE0ZDYifX0.BWJXLworo44Nvfrh5a2hm_NIqgUoXTLjQlxyy7uaSWk

With the environment variables set, you can easily upload with Twine:

Terminal
twine upload dist/*
Configuring a .pypirc file#

An alternative approach to using the Twine environment variables is to setup a .pypirc file. Twine looks for this file in the ~/.pypirc location by default. If instead a user wants to pass in a .pypirc file that is not in that location, they can use Twine's --config-file flag to point to that file in a different directory.

The .pypirc looks like the following:

~/.pypirc
[distutils]
index-servers =
   package-manager

[package-manager]
repository = https://packagemanager.posit.co/upload/pypi/local-python-src
username = __token__
password = eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdWQiOiJwYWNrYWdlbWFuYWdlciIsImV4cCI6MTY1ODU4MjA3OCwianRpIjoiYmM5ZTg1NGYtNGNlNy00Zjc4LTlhMmMtZDliYzRlYTQ0NGVkIiwiaWF0IjoxNjU1OTkwMDc4LCJpc3MiOiJodHRwOi8vbG9jYWxob3N0OjQyNDIvIiwic2NvcGVzIjp7InNvdXJjZXMiOiIzYTI4NjFhYi0xNWYwLTRjM2MtODZlMy0xNjNkMTY0ZDE0ZDYifX0.BWJXLworo44Nvfrh5a2hm_NIqgUoXTLjQlxyy7uaSWk

Then to use this file to upload with Twine:

Terminal
twine upload -r package-manager dist/*

A nice benefit to the .pypirc file is that it makes it easy to upload to various sources quickly. When you create another repository definition in the .pypirc file, it is made available to use with Twine's -r flag.

Authenticating with keyring#

An alternative approach to putting the username/password directly into a .pypirc file or environment variables is to store them securely in a keyring. Keyring is automatically installed with Twine, so it is available to use out-of-the-box.

Note

From keyring: macOS keychain support for macOS 11 (Big Sur) and later requires Python 3.8.7 or later with the “universal2” binary.

If a user is using the .pypirc method, their file would now look like:

~/.pypirc
[distutils]
index-servers =
   package-manager

[package-manager]
repository = https://packagemanager.posit.co/upload/pypi/local-python-src

If a user is instead using the Twine environment variables, now all they need to set is:

Terminal
export TWINE_REPOSITORY_URL=https://packagemanager.posit.co/upload/pypi/local-python-src

Then, to store the username and password for the repository, a user can save them with keyring:

Terminal
keyring set https://packagemanager.posit.co/upload/pypi/local-python-src __token__
<< Password for '__token__' in 'https://packagemanager.posit.co/upload/pypi/local-python-src':

The password has been saved securely within the keyring and can be used by Twine. When uploading with Twine, it will now prompt for the username and check keyring to see if it has a password. If it does, it will successfully upload the distributions:

Terminal
cd directory/of/internal/example_package
twine upload -r package-manager dist/*
Enter your username: __token__
<< Uploading distributions to https://packagemanager.posit.co/upload/pypi/local-python-src
<< Uploading example_package.whl
<< 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.1/8.1 MB • 00:02 • 3.5 MB/s

Twine Package Signing#

Twine and Package Manager also support uploading signed packages. Package Manager detects when a signed file is uploaded and stores the file with the Python package. A user can see this by navigating to the package in the UI and clicking on the specific distribution. If the distribution has a green SIGNED label, then it successfully uploaded the .asc file with the Python distribution. To download this file for future use, a user can append .asc to the URL for that specific distribution to download the signed file.

There are two methods for uploading signed files:

  1. Manually uploading pre-signed files
  2. Automatically signing with Twine
Manually uploading pre-signed files#

If you already have a package pre-signed, the Twine command is:

Terminal
twine upload example_package.whl example_package.whl.asc

Note

Simply uploading with twine upload dist/* will automatically upload any .asc files that exist for the packages in that directory.

Automatically signing with Twine#

There is also the --sign flag that instructs Twine to sign the files before uploading:

Terminal
$ twine upload --sign example_package.whl
<< Uploading distributions to https://packagemanager.posit.co/upload/pypi/local-python-src
<< Signing example_package.whl
<< Uploading example_package.whl
<< 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.1/8.1 MB • 00:02 • 3.5 MB/s