Environment Management

Immutable data science environments

The following cookbook example uses the Python and R environment management features described in the Posit Connect Admin Guide. We also use off-host execution mode with Connect for this example, to demonstrate how to create an immutable and reproducible data science environment.

This example demonstrates how to use the official rstudio/r-session-complete image, which is suitable for running content on Posit Workbench, when deploying content to Posit Connect. By re-using this image, we can ensure that the exact same packages that were used when developing our content in Posit Workbench are used when executing our content on Connect.

The source code for the content used in this example can be found in our Python examples.

Prerequisites

Completing this cookbook example requires the following:

  • a Connect installation configured to use off-host execution
  • a Connect API key with the administrator role
  • push access to a container registry

Create the image

First, we define our image which can be used to develop our content on Workbench, and then later is also used to execute our content on Connect. We are using the r-session-complete image as the base and we are installing additional Python and R packages that are required by our content.

Dockerfile
FROM ghcr.io/rstudio/r-session-complete:jammy-2023.06.1--cd1a0c5

ARG GIT_SHA="4e4be3f59f0fbcf3ccecc724a00b0da7a4ad6f07"
ARG CRAN_MIRROR="https://p3m.dev/cran/__linux__/jammy/latest"
ARG PYPI_MIRROR="https://p3m.dev/pypi/latest/simple"

# Install the Python packages
#   This commands installs the Python packages defined in the requirements.txt
#   which pins the package versions and provides an immutable set of Python dependencies.
RUN pip install --upgrade pip && \
    curl -sSfL https://raw.githubusercontent.com/sol-eng/python-examples/${GIT_SHA}/reticulated-image-classifier/requirements.txt \
    -o /tmp/requirements.txt && \
    pip install --default-timeout=1000 --index-url=${PYPI_MIRROR} -r /tmp/requirements.txt && \
    rm /tmp/requirements.txt

# Install the R packages
ENV RENV_PATHS_LIBRARY renv/library
RUN R -e $"install.packages('renv', repos = c(CRAN = '${CRAN_MIRROR}'))" && \
    curl -sSfL https://raw.githubusercontent.com/sol-eng/python-examples/${GIT_SHA}/reticulated-image-classifier/renv.lock \
    -o /tmp/renv.lock && \
    R -e $"renv::restore(lockfile='/tmp/renv.lock', repos = c(CRAN = '${CRAN_MIRROR}'))" && \
    rm /tmp/renv.lock

Build the image with:

# use a container registry that you have push access to
CONTAINER_REGISTRY="myorg/myrepo"

# build the image
docker build . -t ${CONTAINER_REGISTRY}/image-classifier:jammy

# push it to your registry
docker push ${CONTAINER_REGISTRY}/image-classifier:jammy

Add the execution environment

Next, we use the Connect Server API POST /v1/environments endpoint to create a new execution environment. This execution environment can then be used by content.

Note

The value for matching in the environment created is exact. This indicates that the environment should only be used if it is explicitly requested by a piece of content. Connect never chooses this environment during automatic selection.

Warning

Creating an environment via the /v1/environments API endpoint requires the administrator role.

curl -XPOST -H "Authorization: key ${CONNECT_API_KEY}" ${CONNECT_SERVER}/__api__/v1/environments \
--data '{
  "title": "Custom image classifier",
  "description": "My custom image classifier environment",
  "cluster_name": "Kubernetes",
  "name": "'${CONTAINER_REGISTRY}'/image-classifier:jammy",
  "matching": "exact",
  "r": {
    "installations": [
      {
        "version": "4.2.3",
        "path": "/opt/R/4.2.3/bin/R"
      }
    ]
  },
  "python": {
    "installations": [
      {
        "version": "3.9.14",
        "path": "/opt/python/3.9.14/bin/python"
      }
    ]
  }
}'

Deploying the content

First, create a new content item using the Posit Connect Server API. The request payload specifies initial values for default_image_name, default_r_environment_management, and default_py_environment_management. By setting default_image_name during the initial deployment, we ensure that Connect uses our custom image the first time the content builds during the deployment. We specify false for both default_r_environment_management and default_py_environment_management so that Connect does not attempt to install any Python or R packages during the first build and when the content executes, it uses the packages that are installed on the image instead of looking for packages in the Python/R package cache.

curl -XPOST -H "Authorization: key ${CONNECT_API_KEY}" ${CONNECT_SERVER}/__api__/v1/content \
--data '{
  "name": "my-image-classifier-app",
  "default_image_name": "'${CONTAINER_REGISTRY}'/image-classifier:jammy",
  "default_r_environment_management": false,
  "default_py_environment_management": false
}'

Make a note of the guid in the server response. We use this as our CONTENT_GUID later when we deploy our application.

Next, we need to clone the content to our workstation and create a content bundle so that we can publish it to the Connect server.

# clone the repo
git clone https://github.com/sol-eng/python-examples.git
git checkout -b connect-custom-execution-env 4e4be3f59f0fbcf3ccecc724a00b0da7a4ad6f07
cd python-examples

# create the content bundle
tar czvf bundle.tar.gz -C ./reticulated-image-classifier ./

# upload the content bundle to Posit Connect
curl -XPOST -H "Authorization: key ${CONNECT_API_KEY}" ${CONNECT_SERVER}/__api__/v1/content/${CONTENT_GUID}/bundles \
--data-binary @"bundle.tar.gz"

Make a note of the id in the server response. We use this as our BUNDLE_ID in the next step.

Now we can activate the bundle to complete the content deployment.

curl -XPOST -H "Authorization: key ${CONNECT_API_KEY}" ${CONNECT_SERVER}/__api__/v1/content/${CONTENT_GUID}/deploy \
--data '{
  "bundle_id": "'${BUNDLE_ID}'"
}'

The server logs should indicate that the content requests our custom image and that there is no package installation required for this deployment:

time="2023-09-05T20:38:17.782Z" level=info msg="Bundle created with R version 4.2.3 and Python version 3.9.14 is compatible with environment Kubernetes::myorg/myrepo/image-classifier:jammy with R version 4.2.3 from /opt/R/4.2.3/bin/R and Python version 3.9.14 from /opt/python/3.9.14/bin/python " bundle_id=24 content_guid=3578a80e-3150-417d-b24f-8c56b9a8beae content_id=20 correlation_id=e062c25c-7f18-403f-b28f-72e9d128492d
time="2023-09-05T20:38:17.782Z" level=info msg="Bundle requested no R environment restore; Connect will not perform any R package installation." bundle_id=24 content_guid=3578a80e-3150-417d-b24f-8c56b9a8beae content_id=20 correlation_id=e062c25c-7f18-403f-b28f-72e9d128492d
time="2023-09-05T20:38:17.782Z" level=info msg="Bundle requested no Python environment restore; Connect will not perform any Python package installation." bundle_id=24 content_guid=3578a80e-3150-417d-b24f-8c56b9a8beae content_id=20 correlation_id=e062c25c-7f18-403f-b28f-72e9d128492d
time="2023-09-05T20:38:17.785Z" level=info msg="Launching Shiny application..." bundle_id=24 content_guid=3578a80e-3150-417d-b24f-8c56b9a8beae content_id=20 correlation_id=e062c25c-7f18-403f-b28f-72e9d128492d

The image classifier application should now be fully published and available through the Connect dashboard.