Finding Public Content
Problem
You need to identify all content on your Connect server with access permissions set to Anyone - no login required. For each piece of public access content, you need to know the URL, content owner, and when it was last updated.
Solution
Use the Posit SDK to get all content from the server, keeping only content with the all
access type. Next, prepare the data. The owner
column contains a JSON object with multiple fields, and we are only interested in the username
. We also need to convert the last_deployed_time
field to the correct data type.
from posit import connect
import pandas as pd
= connect.Client()
client
# Get public access content
= pd.DataFrame(client.content.find())
content = content[content['access_type']=='all']
content
# Prepare the data
"Owner"] = (pd.json_normalize(content["owner"]))["username"]
content["Updated"] = pd.to_datetime(content["last_deployed_time"]).dt.date
content[= content[["title","Owner","content_url","Updated"]]
content = content.rename(columns={"title": "Title", "content_url": "URL"}) content
The recipe produces a DataFrame
with the required data.
>>> print(content)
Title Owner URL Updated5 Cool Slideshow john_admin https://connect.example.org/content/bcb281fc-a... 2024-06-28
6 Sales Dashboard john_admin https://connect.example.org/content/83c54639-6... 2024-06-28
7 Manuscript john_admin https://connect.example.org/content/de5bd4e2-6... 2024-06-28
Use connectapi
to get all content from the server, keeping only content with the all
access type. Next, prepare the data. The owner
column contains a JSON object with multiple fields, and we are only interested in the username
. We also need to convert the last_deployed_time
field to the correct data type.
library(connectapi)
library(dplyr)
library(tidyr)
<- connect()
client
# Get public access content
<- get_content(client) |>
content filter(access_type == "all")
# Process the data
<- content |>
content unnest_wider(col = c("owner"), names_sep = ".") |>
mutate(last_deployed_time = as.Date(last_deployed_time)) |>
select(c("Title" = title, "Owner" = owner.username, "URL" = content_url, "Updated" = last_deployed_time))
The recipe produces a data frame with the required data.
> content
# A tibble: 768 × 4
Title Owner URL Updated<chr> <chr> <chr> <date>
1 Cool Slideshow john_admin https://connect.example.org/content/bcb281fc-aa3b-4341-93eb-081814608c94/ 2024-06-28
2 Sales Dashboard john_admin https://connect.example.org/content/83c54639-6030-4054-8066-5011b9fb796d/ 2024-06-28
3 Manuscript john_admin https://connect.example.org/content/de5bd4e2-6ed5-49cb-99fa-8c6736cbcbaa/ 2024-06-28
Creating an HTML table
Transform the URL into an HTML link, and use itable
to display an HTML table.
import itables
def make_clickable(val):
return f'<a target="_blank" href="{val}">{val}</a>'
= content.style.format({'URL': make_clickable})
content
itables.show(content)
Use the gt
package to create an HTML widget.
library(gt)
gt(content) |>
fmt_url(columns = c("URL")) |>
opt_interactive(use_search = TRUE)
Discussion
The URLs in the list can be used to audit the access within networks, i.e., you can check to see whether content marked as public access is reachable outside your organization’s network or not.
You can use the Content usage recipe to check the usage of each content item.
The list above also identifies the content publisher, who can be reached out to if there is a need to modify the access.