library(arrow)
# Collecting user data from a file
<- arrow::open_dataset("v1/users/2024/05/") users
Appendix: Using Chronicle data
Chronicle stores the data it produces in parquet files. The Reports included with Chronicle are the easiest access to data for most users. If you want to enhance the reports, write your own reports, or use this data for other purposes, this section describes how to access the data that Chronicle stores. You can also reference the code that is in the report QMD files.
Reading parquet data
While parquet files are similar in concept to csv
files, they are optimized for better read/write performance and therefore unreadable by most text editors without the help of plugins.
The RStudio IDE is a great place to read parquet, and run both R and Python for the scripts below. Follow this link if you would like to install R and RStudio (open source Desktop edition).
If you are using VSCode, our team recommends the Parquet Explorer plugin to read and query parquet files directly in your editor.
Another common trick is to convert .parquet
files into .csv
files for easier viewing, leveraging python and the pandas library:
Terminal
>> import pandas as pd
>> df = pd.read_parquet('filename.parquet')
>> df.to_csv('filename.csv')
Using R
These scripts have been tested with R version 4.2.3. You may need to upgrade your R version if you are running into errors installing packages. In particular, arrow
may run into issues if the R version is too old.
The examples in the Apache Arrow documentation on reading parquet files show how to read data stored locally or in S3 into an arrow table class.
Opening local Chronicle data with R
You can read the May, 2024 partition of the Chronicle’s parquet data into an arrow table with the following:
Opening Chronicle data in S3 with R
You can read S3 bucket parquet contents from the month of May, 2024 into an arrow table with the following:
# Imports
library(arrow)
library(paws)
library(urltools)
# Set s3 bucket ----
<- "s3://{{YOUR_BUCKET_NAME}}"
s3_bucket <- s3(config = list(region = "us-east-2"))
svc <- svc$list_objects(Bucket = urltools::domain(s3_bucket))
bucket_str
# Collecting user data
<- paste0(s3_bucket, "/v1/users/2024/05")
users_bucket <- open_dataset(users_bucket,
users hive_style = FALSE,
format = "parquet")
Querying Chronicle data with R
Once you have run one of the above to bring your users parquet data into an arrow table, you can begin querying it:
library(arrow)
library(tidyverse)
# Viewing user data
<- head(users, 5) |>
users_head collect()
print(users_head)
Posit Connect Named Users Data Generated by Chronicle
Chronicle produces a daily snapshot of all named users in Posit Connect. These users generally represent the individuals that are consuming license seats for Posit Connect.
In order to generate the named users data, the Chronicle Agent must be configured with a valid administrator Connect API key.
The named users data is stored under /v1/users
in the configured storage location. By default, this is /var/lib/posit-chronicle/data/v1/users
.
The named users data includes the following columns:
Name | Description |
---|---|
cluster |
Reservered for future use. This column is always empty. |
environment |
A user-defined environment label set via CHRONICLE_ENVIRONMENT . See the Available Configuration Settings for the Agent appendix for setup instructions. |
service |
The source of the metric. This is always connect . |
host |
The host name where the Chronicle agent that reported the metric observation is running. |
os |
Detailed operating system information for the host on which the Chronicle agent that reported this observation is running. |
id |
The ID of the user. |
username |
The username of the user (the name they use when logging in). |
email |
The email address of the user. |
first_name |
The first name of the user. |
last_name |
The last name of the user. |
user_role |
The role of the user (e.g., publisher, viewer). |
created_at |
The timestamp when the user was created. |
updated_at |
The timestamp when the user was most recently updated. |
last_active_at |
The timestamp when the user was most recently active (logged in) in Posit Connect. |
Metrics Generated by Chronicle
Metrics are gathered and processed on a scheduled basis. This means that you may not see metrics files immediately when first starting Chronicle. It also means that there is a delay before the latest data shows up in the refined metrics files.
By default, the agent retrieves metrics data once every sixty seconds. To reduce network traffic, the agent waits until it has collected 100 metrics observations or two minutes have elapsed, at which point the agent will send the metrics to the Chronicle server. The metrics data is processed into refined metrics once an hour. This process happens shortly after the top of the hour. The exact timing is not entirely predictable due to processing delays, but the refinement process typically completes by 15 minutes after the top of the hour.
The following is a non-exhaustive list of all product metrics Chronicle produces from Posit Connect, Posit Package Manager, and Posit Workbench. These metrics are stored in separate subfolders of the configured storage location, which is /var/lib/posit-chronicle/data
by default.
All metrics files include the following columns:
Name | Description |
---|---|
timestamp |
The time in UTC when the observation was recorded by the Chronicle agent. |
type |
The metric type (gauge , sum , etc.). |
cluster |
Reservered for future use. This column is always empty. |
environment |
A user-defined environment label set via CHRONICLE_ENVIRONMENT . See the Available Configuration Settings for the Agent appendix for setup instructions. |
service |
The source of the metric. One of connect , package-manager , or workbench . |
host |
The host name where the Chronicle agent that reported the metric observation is running. |
os |
Detailed operating system information for the host on which the Chronicle agent that reported this observation is running. |
Refined metrics
In addition to the columns present on all metrics described above, each refined metric includes a column called value
which contains the value observed regardless of the underlying numeric type.
Each of these refined metrics is stored in a separate subfolder named after the refined metric. For example, the data related to the connect_content_hits_total
refined metric is stored in the v1/connect_content_hits_total
subfolder of the configured Chronicle storage location.
connect_build_info
Build information for Connect. NOTE: The value for this metric is always 1
- Subfolder:
v1/pc_build_info
- Metric type: gauge
- Requirements: A valid administrator Connect API key.
- Additional columns:
version
: the current version of Connectbuild
: the version with the build commit hash appended
connect_content
The current number of content items published in Connect.
- Subfolder:
v1/connect_content
- Metric type: gauge
- Requirements: Connect 2024.02.0 or later with metrics enabled.
- Additional columns:
content_type
: the content type of the item visited
connect_content_app_sessions_current
The current number of active user sessions on a given piece of Shiny content.
- Subfolder:
v1/connect_content_app_sessions_current
- Metric type: gauge
- Requirements: Connect 2024.02.0 or later with metrics enabled; some columns (annotated with * below) also require a valid administrator Connect API key.
- Additional columns:
content_id
: the internal ID of the content item visiteduser_id
: the internal ID of the user who visited the content itemcontent_name
*: the internal name of the content item visitedcontent_title
*: the user-visible title of the content item visitedcontent_type
*: the content type of the item visiteduser_name
*: the username of the user who visited the content item
connect_content_hits_total
The running total of user visits to a specific piece of content.
- Subfolder:
v1/connect_content_hits_total
- Metric type: sum
- Requirements: Connect 2024.02.0 or later with metrics enabled; some columns (annotated with * below) also require a valid administrator Connect API key.
- Additional columns:
content_id
: the internal ID of the content item visiteduser_id
: the internal ID of the user who visited the content itemcontent_name
*: the internal name of the content item visitedcontent_title
*: the user-visible title of the content item visitedcontent_type
*: the content type of the item visiteduser_name
*: the username of the user who visited the content item
connect_installed_versions_python
A count of the versions of Python which are currently installed.
- Subfolder:
v1/connect_installed_versions_python
- Metric type: gauge
- Requirements: A valid administrator Connect API key.
- Additional columns:
versions
: a list of the versions which are installed.
connect_installed_versions_r
A count of the versions of R which are currently installed.
- Subfolder:
v1/connect_installed_versions_r
- Metric type: gauge
- Requirements: A valid administrator Connect API key.
- Additional columns:
versions
: a list of the versions which are installed.
connect_license_active_users
The current number of users consuming license seats in Connect.
- Subfolder:
v1/connect_license_active_users
- Metric type: gauge
- Requirements: A valid administrator Connect API key.
- Additional columns: None
connect_license_user_seats
The total number of licensed seats allowed in Connect.
- Subfolder:
v1/connect_license_user_seats
- Metric type: gauge
- Requirements: A valid administrator Connect API key.
- Additional columns: None
pwb_license_active_users
The current number of users consuming license seats in Workbench.
- Subfolder:
v1/pwb_license_active_users
- Metric type: gauge
- Requirements: Workbench 2024.04.0 or later with metrics enabled.
- Additional columns: None
pwb_license_user_seats
The total number of licensed seats allowed in Workbench.
- Subfolder:
v1/pwb_license_user_seats
- Metric type: gauge
- Requirements: Workbench 2024.04.0 or later with metrics enabled.
- Additional columns: None
pwb_build_info
Build information for RStudio Server/Workbench. NOTE: The value for this metric is always 1
- Subfolder:
v1/pwb_build_info
- Metric type: gauge
- Requirements: Workbench 2024.04.0 or later with metrics enabled.
- Additional columns:
version
: the current version of workbenchrelease_name
: the release name of the workbench version
pwb_session_startup_duration_seconds_bucket
A running total of counts of session startup durations. These counts are divided into buckets based on the startup duration. Each bucket has a duration threshold called a “limit”, and the value for a given limit indicates how many sessions started up in a duration less than or equal to that limit, and greater than the next smallest limit.
For example, if Workbench reported these 5 session startup durations:
- 8 seconds
- 3 seconds
- 42 seconds
- 4 seconds
- 325 seconds
The stored histogram bucket values would look like this:
value | limit |
---|---|
0 | 0.0 |
0 | 1.0 |
2 | 5.0 |
1 | 10.0 |
0 | 30.0 |
1 | 60.0 |
0 | 300.0 |
1 | Infinity |
The row with limit 5.0
reports a count of 2
as its value (representing the 3 and 4 second durations), the row with limit 10.0
reports a count of 1
(the 8 second duration), and so on.
- Subfolder:
v1/pwb_session_startup_duration_seconds_bucket
- Metric type: histogram
- Requirements: Workbench 2024.04.0 or later with metrics enabled.
- Additional columns:
limit
: The time in seconds which is the upper-bound of the associated bucket and the lower-bound of the associated bucket with the next limit value.session_type
: The type of session (e.g.,vscode
,rstudio-pro
, etc) launched by the user.
pwb_session_startup_duration_seconds_count
A running total of the number of sessions launched in Workbench.
- Subfolder:
v1/pwb_session_startup_duration_seconds_count
- Metric type: sum
- Requirements: Workbench 2024.04.0 or later with metrics enabled.
- Additional columns:
session_type
: The type of session (e.g.,vscode
,rstudio-pro
, etc) launched by the user.
pwb_session_startup_duration_seconds_sum
A running total of all session startup time in Workbench.
- Subfolder:
v1/pwb_session_startup_duration_seconds_sum
- Metric type: sum
- Requirements: Workbench 2024.04.0 or later with metrics enabled.
- Additional columns:
session_type
: The type of session (e.g.,vscode
,rstudio-pro
, etc) launched by the user.
pwb_sessions_launched_total
A running total of all sessions launched in Workbench.
- Subfolder:
v1/pwb_sessions_launched_total
- Metric type: sum
- Requirements: Workbench 2024.04.0 or later with metrics enabled.
- Additional columns:
session_type
: The type of session (e.g.,vscode
,rstudio-pro
, etc) launched by the users.