Finding Public Content

Problem

You need to identify all content on your Connect server with access permissions set to Anyone - no login required. For each piece of public access content, you need to know the URL, content owner, and when it was last updated.

Solution

Use the Posit SDK to get all content from the server, keeping only content with the all access type. Next, prepare the data. The owner column contains a JSON object with multiple fields, and we are only interested in the username. We also need to convert the last_deployed_time field to the correct data type.

from posit import connect
import pandas as pd

client = connect.Client()

# Get public access content
content = pd.DataFrame(client.content.find())
content = content[content['access_type']=='all']

# Prepare the data
content["Owner"] = (pd.json_normalize(content["owner"]))["username"]
content["Updated"] = pd.to_datetime(content["last_deployed_time"]).dt.date
content = content[["title","Owner","content_url","Updated"]]
content = content.rename(columns={"title": "Title", "content_url": "URL"})

The recipe produces a DataFrame with the required data.

>>> print(content)
                Title       Owner                                                URL      Updated
5      Cool Slideshow  john_admin  https://connect.example.org/content/bcb281fc-a...  2024-06-28
6     Sales Dashboard  john_admin  https://connect.example.org/content/83c54639-6...  2024-06-28
7          Manuscript  john_admin  https://connect.example.org/content/de5bd4e2-6...  2024-06-28

Use connectapi to get all content from the server, keeping only content with the all access type. Next, prepare the data. The owner column contains a JSON object with multiple fields, and we are only interested in the username. We also need to convert the last_deployed_time field to the correct data type.

library(connectapi)
library(dplyr)
library(tidyr)

client <- connect()

# Get public access content
content <- get_content(client) |>
  filter(access_type == "all")

# Process the data
content <- content |>
  unnest_wider(col = c("owner"), names_sep = ".") |>
  mutate(last_deployed_time = as.Date(last_deployed_time)) |>
  select(c("Title" = title, "Owner" = owner.username, "URL" = content_url, "Updated" = last_deployed_time))

The recipe produces a data frame with the required data.

> content
# A tibble: 768 × 4
   Title           Owner      URL                                                                       Updated
   <chr>           <chr>      <chr>                                                                     <date>
 1  Cool Slideshow john_admin https://connect.example.org/content/bcb281fc-aa3b-4341-93eb-081814608c94/ 2024-06-28
 2 Sales Dashboard john_admin https://connect.example.org/content/83c54639-6030-4054-8066-5011b9fb796d/ 2024-06-28
 3      Manuscript john_admin https://connect.example.org/content/de5bd4e2-6ed5-49cb-99fa-8c6736cbcbaa/ 2024-06-28

Creating an HTML table

Transform the URL into an HTML link, and use itable to display an HTML table.

import itables

def make_clickable(val):
    return f'<a target="_blank" href="{val}">{val}</a>'
content = content.style.format({'URL': make_clickable})

itables.show(content)

Use the gt package to create an HTML widget.

library(gt)

gt(content) |>
  fmt_url(columns = c("URL")) |>
  opt_interactive(use_search = TRUE)

Discussion

The URLs in the list can be used to audit the access within networks, i.e., you can check to see whether content marked as public access is reachable outside your organization’s network or not.

You can use the Content usage recipe to check the usage of each content item.

The list above also identifies the content publisher, who can be reached out to if there is a need to modify the access.