Pins for Python

The Python pins library provides a way to easily share data sets, models, and other Python objects. Your resources can be text files (CSV, JSON, etc.), Arrow/Feather files, or any other format you use to share data. Pinned objects can be stored on a variety of boards, including local folders (to share on a networked drive or with Dropbox), Posit Connect, Amazon S3, and more.

Sharing data is useful in many situations, for example:

  1. Multiple pieces of content require the same input data. Rather than copying that data, each piece of content references a single source of truth hosted on Connect.

  2. Content depends on data or model objects that need to be regularly updated. Rather than redeploying the content each time the data changes, use a pinned resource and update only the data. The data update can occur using a scheduled Jupyter Notebook document. Your content reads the newest data on each run.

  3. You need to share resources that aren’t structured for traditional tools like databases. For example, models saved as Python objects aren’t easy to store in a database. Rather than using email or file systems to share data files, use Connect to host these resources as pins.

Pins and large data sets

An important factor in determining whether or not to use a pin is the size of the data or object in use. As a general rule of thumb, we do not recommend using pins with files over 500 MB. If you find yourself routinely pinning data larger than this, you might need to reconsider your data engineering pipeline.

Create a pin board

Connect is easy to use as a board for pinning Python objects. Create a board to use with board = board_connect(). This function takes server_url and api_key arguments which inform how you authenticate to Connect. If not specified, pins attempt to read an api_key from the CONNECT_API_KEY environment variable.

Note

To read pickle files from a Pin board, you must set the allow_pickle_read=True argument in board_connect(). The pickle module is not secure, so only read files you trust. For more information, see the Python documentation.

import os
from pins import board_connect

API_KEY = os.getenv('CONNECT_API_KEY')
SERVER = os.getenv('CONNECT_SERVER')

board = board_connect(server_url=SERVER, api_key=API_KEY)

Connect automatically applies values for these environment variables for deployed content at run time. This means that there is no need to include them in your code (never a best practice) or specify them in the Vars Pane, unless your server administrator has disabled that function.

Note

The automatic generation of these environment variables can be disabled for security reasons. Reach out to your Posit Connect server administrator or review the Admin Guide for additional details.

Read and write pins

Once you have a pin board, you can write data to it with .pin_write(). It requires three arguments: an object, a name, and a pin type:

from pins.data import mtcars
board.pin_write(mtcars.head(), "hadley/mtcars", type="csv")

The first argument is the object to save, and the second argument gives the name of pin. On Connect, this name is used along with your username to retrieve or read data from the pin. Running the code above should yield a success message that looks something like this: Writing to pin 'hadley/mtcars'.

The username you provide (e.g., ‘hadley’), must match the API key used to establish the board and authenticate to Connect. If the username does not match, you will receive an error.

After you’ve pinned an object, you can read it back with .pin_read():

board.pin_read("hadley/mtcars")

Pin metadata

Every pin is accompanied by some metadata that you can access with .pin_meta(). This returns the metadata generated by default, which includes:

  • A title, a brief textual description of the dataset.

  • An optional description, where you can provide more details.

  • The date-time when the pin was created.

  • The file_size, in bytes, of the underlying files.

  • A unique pin_hash that you can supply to .pin_read() to ensure that you are reading exactly the data that you expect.

When creating the pin, you can override the default description or provide additional metadata that is stored with the data:

board.pin_write(
    mtcars,
    name="mtcars2",
    type="csv",
    description = "Data extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).",
    metadata = {
        "source": "Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411."
    }
)

Learn more about Pin Metadata.

Using a pin

Once a pin has been deployed, it is easy to share the pin with colleagues.

You can manage content settings for deployed pins just like you would for other content types. For example, you can manage access controls to pins to determine who should be able to view and utilize the resource.

Connect provides a preview of pinned data objects, their metadata, and a direct download link which can be accessed at the content url:

Example of a Python Pin on Posit Connect.

Updating a pin

Pins are objects. They are not backed by source code and cannot be directly scheduled. A common pattern for updating pinned data on a schedule is to run .pin_write() inside a scheduled Jupyter Notebook. Writing to the same pin multiple times creates a version history which can be accessed under the More button drop-down menu.