Pins
It can be a struggle to find a home for the mid-process artifacts of a data analysis.
Think about some of your workflows. Are you:
- Using
read.csv()
to bring in emailed data? - Saving
.Rds
or.RData
objects to be called later? - Sharing your model or data across multiple apps?
- Redeploying your app every time the supporting data is updated?
A yes to any of the above indicates a solid use case for pins. Posit developed pins to make discovering, caching, and sharing resources simpler, all to promote efficient data workflows.
What are pins?
Just like you would pin a note to a physical cork board, pins lets you pin an R or Python object, or a file to a virtual board where you and others access it. The virtual board can be on Posit Connect, Amazon S3, Azure storage, Microsoft 365 (OneDrive and SharePoint), or Google Cloud, to name a few. Pins can be automatically versioned so you can track changes, re-run analyses on historical data, and undo mistakes.
Pins are best suited for objects up to a few hundred megabytes in size. Often they’re made of lightweight or ephemeral data, and multiple assets might rely on them. You can pin datasets, models, plots, files, and more.
When the data lives elsewhere, pins takes care of downloading and caching so that it’s only re-downloaded when needed. That said, most boards transmit pins over HTTP, and this is going to be slow and possibly unreliable for very large pins. As a general rule of thumb, we don’t recommend using pins with files over 500 MB. If you find yourself routinely pinning data larger that this, you might need to reconsider your data engineering pipeline.
It is typically much easier (and safer) to share a pin across multiple assets or across your team than to email files around. With Connect, you can manage who can access your pins.
In this how-to, you learn how to pin and retrieve a resource on Connect and how to schedule updates to pins so downstream analyses are always current without re-deployment.
Requirements
To utilize pins with Connect, make sure you:
- Are a Publisher or Administrator on your Connect server1
- Have Connect v1.7.8 or higher2
Have a current version of pins. The pins package is available on CRAN, and was at version 1.0.0 at the time of this writing. Install pins with:
install.packages("pins")
Pinning to Connect
Recall that virtual cork board metaphor? Every pin lives on a board, so your first step is to create a board object so you can call it. There are multiple board types, but for this example, use a board on Connect with the following:
<- pins::board_connect() board
Note that you didn’t pass any arguments to board_connect()
. This function is designed to just work for most cases.
If you run into issues, you can specify the auth
method to inform how you authenticate to Connect.
Available methods outside of the default of auth = "auto"
are to specify auth = "manual"
and provide the server and key arguments; auth = "envvar"
if you have already defined "CONNECT_SERVER"
and "CONNECT_API_KEY"
as environment variables3. Or auth = "rsconnect"
, to use the server registered using the rsconnect
package.
Now that you have your board, it’s time to pin something on it. Select an object and pin to the rsconnect board with:
<- faithful # for example
my_data ::pin_write(board,
pins
my_data,name = "faithful_data")
Think of name
like a filename. Avoid slashes and spaces.
You can optionally specify type
in the pin_write()
function. The default is type = "rds"
, but you can also pin "csv"
, "json"
, "arrow"
for arrow/feather files, or "qs"
which provides more efficient compression than RDS for faster read/writes.
With this, you’ve created your first pin! Congratulations!
Be sure to adjust the Access Settings on your pin in the Connect content dashboard if you want to share this pin with others. For more information, see the Access section of the Connect User Guide.
Retrieving a pin from Connect
When you view your pin on Connect, notice that there is some header information included for pin retrieval. Copy that code into our analysis and retrieve the pin.
Replace the code section below with the sample from your own pin:
library(pins)
<- board_connect()
board <- pin_read(
my_data
board,"your_username/my_data")
Now you, or anyone you have shared the pin with, can access this information in a secure, versioned manner.
Pins seek to make it both easy and fast to share data. Pins automatically cache remote pins locally so it’s fast to access data, but they always check to ensure that it’s up-to-date so you’re never using stale data.
Schedule updates to your pin
Up to this point, you’ve found a home for your mid-process artifacts and learned how to share them as pins. Now it’s time to put your pins on an update schedule and bask in glory as your analyses automatically refer to the most current data without requiring redeployment.
To do this, create an R Markdown document that pulls your data, does any needed processing, and then creates your pin on Connect. This is a supporting ETL (extract, transform, and load) file in your pipeline. Publish this R Markdown document to Connect to create your ETL document that publishes a pin to Connect. This creates a pipeline.
If you run into publishing issues at this step, it’s likely an environment variable issue4.
To finish this, click the Schedule button in Connect and establish a schedule for your R Markdown ETL (and resulting pin) to refresh. Now you can point your downstream data analysis at this pin to always have a fresh source of data behind it.
Next steps
In this example, pin_write()
and pin_read()
are used to pin and retrieve a dataframe. This works for any serializable R object (e.g., dataframes, model files, CSVs, feather files). You can use the more broad pin_upload()
and pin_download()
to pin files from disk to share types of data that would be otherwise unsupported by pins. A great use case here is parquet files.
Need details about that pin?
Every pin has metadata that you can access with pin_meta()
. It’s particularly helpful to reference the date of the pin using metadata, or even supply your own custom metadata as an argument in the pin_write()
or pin_upload()
functions.
Want to keep track of versions?
Specify versioned = TRUE
in pin_write()
or pin_upload()
and each pin will create a new version. List available versions with pin_versions()
and call a specific version with its unique hash in pin_read()
or pin_download()
.
Additional information
At this point, you know what a pin is, whether pins are useful for your workflow, and how to implement them. What next?
Go try pins on your own!
The pins website has a comprehensive Getting Started guide, as well as specific information on pinning to Connect: https://pins.rstudio.com/.
Looking for inspiration? See this content collection that uses a pinned model and datasets as part of a pipeline to support a Shiny app in Production. The underlying data in the pin is refreshed on a schedule, keeping the Shiny app current: https://solutions.posit.co/example/bike_predict/.
See the Connect User Guide section about pins.
Any issues? Let us know here: https://github.com/rstudio/pins-r/issues.
Last but not least, let us know how you get on with pins! Speak to your dedicated Posit Customer Success contact, or send a note to us at sales@posit.co.
Footnotes
Viewers can only retrieve pins↩︎
If you’re not yet at this version but are keen to get started with pins, talk to your R Admin and refer to the Upgrade section of the Admin Guide.↩︎
Easily set environment variables with
usethis::edit_r_profile()
to open your.Renviron
for editing, and then insertSys.setenv(CONNECT_API_KEY="paste key value")
andSys.setenv(CONNECT_SERVER="paste server value")
. Using environment variables is a best practice, but remember, if you’re using git, it’s a good idea to add your.Renviron
to your.gitignore
to ensure you’re not publishing your API key to your version control system.↩︎If you are using RStudio v 1.8.8 or higher, Connect automatically provides these environment variables for you when you publish. If you are on an older version of Connect, or if this feature has been disabled by your admin, you should reference the User Guide to learn how to create an API key and how to input environment variables into the Vars pane of Connect.↩︎