Appendix: Using Chronicle data

Chronicle stores the data it produces in parquet files. The Reports included with Chronicle are the easiest access to data for most users. If you want to enhance the reports, write your own reports, or use this data for other purposes, this section describes how to access the data that Chronicle stores. You can also reference the code that is in the report QMD files.

Reading parquet data

While parquet files are similar in concept to csv files, they are optimized for better read/write performance and therefore unreadable by most text editors without the help of plugins.

The RStudio IDE is a great place to read parquet, and run both R and Python for the scripts below. Follow this link if you would like to install R and RStudio (open source Desktop edition).

If you are using VSCode, our team recommends the Parquet Explorer plugin to read and query parquet files directly in your editor.

Another common trick is to convert .parquet files into .csv files for easier viewing, leveraging python and the pandas library:

Terminal

>> import pandas as pd
>> df = pd.read_parquet('filename.parquet')
>> df.to_csv('filename.csv')

Using R

These scripts have been tested with R version 4.2.3. You may need to upgrade your R version if you are running into errors installing packages. In particular, arrow may run into issues if the R version is too old.

The examples in the Apache Arrow documentation on reading parquet files show how to read data stored locally or in S3 into an arrow table class.

Opening local Chronicle data with R

You can read the May, 2024 partition of the Chronicle’s parquet data into an arrow table with the following:

library(arrow)

# Collecting user data from a file
users <- arrow::open_dataset("v1/users/2024/05/")

Opening Chronicle data in S3 with R

You can read S3 bucket parquet contents from the month of May, 2024 into an arrow table with the following:

# Imports
library(arrow)
library(paws)
library(urltools)

# Set s3 bucket ----
s3_bucket <- "s3://{{YOUR_BUCKET_NAME}}"
svc <- s3(config = list(region = "us-east-2"))
bucket_str <- svc$list_objects(Bucket = urltools::domain(s3_bucket))

# Collecting user data
users_bucket <- paste0(s3_bucket, "/v1/users/2024/05")
users <- open_dataset(users_bucket,
                      hive_style = FALSE,
                      format = "parquet")

Querying Chronicle data with R

Once you have run one of the above to bring your users parquet data into an arrow table, you can begin querying it:

library(arrow)
library(tidyverse)

# Viewing user data
users_head <- head(users, 5) |>
  collect()
print(users_head)

Posit Connect Named Users Data Generated by Chronicle

Chronicle produces a daily snapshot of all named users in Posit Connect. These users generally represent the individuals that are consuming license seats for Posit Connect.

Note

In order to generate the named users data, the Chronicle Agent must be configured with a valid administrator Connect API key.

The named users data is stored under /v1/users in the configured storage location. By default, this is /var/lib/posit-chronicle/data/v1/users.

The named users data includes the following columns:

Name	Description
`cluster`	Reservered for future use. This column is always empty.
`environment`	A user-defined environment label set via `CHRONICLE_ENVIRONMENT`. See the Available Configuration Settings for the Agent appendix for setup instructions.
`service`	The source of the metric. This is always `connect`.
`host`	The host name where the Chronicle agent that reported the metric observation is running.
`os`	Detailed operating system information for the host on which the Chronicle agent that reported this observation is running.
`id`	The ID of the user.
`username`	The username of the user (the name they use when logging in).
`email`	The email address of the user.
`first_name`	The first name of the user.
`last_name`	The last name of the user.
`user_role`	The role of the user (e.g., publisher, viewer).
`created_at`	The timestamp when the user was created.
`updated_at`	The timestamp when the user was most recently updated.
`last_active_at`	The timestamp when the user was most recently active (logged in) in Posit Connect.

Metrics Generated by Chronicle

Metrics are gathered and processed on a scheduled basis. This means that you may not see metrics files immediately when first starting Chronicle. It also means that there is a delay before the latest data shows up in the refined metrics files.

By default, the agent retrieves metrics data once every sixty seconds. To reduce network traffic, the agent waits until it has collected 100 metrics observations or two minutes have elapsed, at which point the agent will send the metrics to the Chronicle server. The metrics data is processed into refined metrics once an hour. This process happens shortly after the top of the hour. The exact timing is not entirely predictable due to processing delays, but the refinement process typically completes by 15 minutes after the top of the hour.

The following is a non-exhaustive list of all product metrics Chronicle produces from Posit Connect, Posit Package Manager, and Posit Workbench. These metrics are stored in separate subfolders of the configured storage location, which is /var/lib/posit-chronicle/data by default.

All metrics files include the following columns:

Name	Description
`timestamp`	The time in UTC when the observation was recorded by the Chronicle agent.
`type`	The metric type (`gauge`, `sum`, etc.).
`cluster`	Reservered for future use. This column is always empty.
`environment`	A user-defined environment label set via `CHRONICLE_ENVIRONMENT`. See the Available Configuration Settings for the Agent appendix for setup instructions.
`service`	The source of the metric. One of `connect`, `package-manager`, or `workbench`.
`host`	The host name where the Chronicle agent that reported the metric observation is running.
`os`	Detailed operating system information for the host on which the Chronicle agent that reported this observation is running.

Refined metrics

In addition to the columns present on all metrics described above, each refined metric includes a column called value which contains the value observed regardless of the underlying numeric type.

Each of these refined metrics is stored in a separate subfolder named after the refined metric. For example, the data related to the connect_content_hits_total refined metric is stored in the v1/connect_content_hits_total subfolder of the configured Chronicle storage location.

`connect_build_info`

Build information for Connect. NOTE: The value for this metric is always 1

Subfolder: v1/pc_build_info
Metric type: gauge
Requirements: A valid administrator Connect API key.
Additional columns:
- version: the current version of Connect
- build: the version with the build commit hash appended

`connect_content`

The current number of content items published in Connect.

Subfolder: v1/connect_content
Metric type: gauge
Requirements: Connect 2024.02.0 or later with metrics enabled.
Additional columns:
- content_type: the content type of the item visited

`connect_content_app_sessions_current`

The current number of active user sessions on a given piece of Shiny content.

Subfolder: v1/connect_content_app_sessions_current
Metric type: gauge
Requirements: Connect 2024.02.0 or later with metrics enabled; some columns (annotated with * below) also require a valid administrator Connect API key.
Additional columns:
- content_id: the internal ID of the content item visited
- user_id: the internal ID of the user who visited the content item
- content_name*: the internal name of the content item visited
- content_title*: the user-visible title of the content item visited
- content_type*: the content type of the item visited
- user_name*: the username of the user who visited the content item

`connect_content_hits_total`

The running total of user visits to a specific piece of content.

Subfolder: v1/connect_content_hits_total
Metric type: sum
Requirements: Connect 2024.02.0 or later with metrics enabled; some columns (annotated with * below) also require a valid administrator Connect API key.
Additional columns:
- content_id: the internal ID of the content item visited
- user_id: the internal ID of the user who visited the content item
- content_name*: the internal name of the content item visited
- content_title*: the user-visible title of the content item visited
- content_type*: the content type of the item visited
- user_name*: the username of the user who visited the content item

`connect_installed_versions_python`

A count of the versions of Python which are currently installed.

Subfolder: v1/connect_installed_versions_python
Metric type: gauge
Requirements: A valid administrator Connect API key.
Additional columns:
- versions: a list of the versions which are installed.

`connect_installed_versions_r`

A count of the versions of R which are currently installed.

Subfolder: v1/connect_installed_versions_r
Metric type: gauge
Requirements: A valid administrator Connect API key.
Additional columns:
- versions: a list of the versions which are installed.

`connect_license_active_users`

The current number of users consuming license seats in Connect.

Subfolder: v1/connect_license_active_users
Metric type: gauge
Requirements: A valid administrator Connect API key.
Additional columns: None

`connect_license_user_seats`

The total number of licensed seats allowed in Connect.

Subfolder: v1/connect_license_user_seats
Metric type: gauge
Requirements: A valid administrator Connect API key.
Additional columns: None

`pwb_license_active_users`

The current number of users consuming license seats in Workbench.

Subfolder: v1/pwb_license_active_users
Metric type: gauge
Requirements: Workbench 2024.04.0 or later with metrics enabled.
Additional columns: None

`pwb_license_user_seats`

The total number of licensed seats allowed in Workbench.

Subfolder: v1/pwb_license_user_seats
Metric type: gauge
Requirements: Workbench 2024.04.0 or later with metrics enabled.
Additional columns: None

`pwb_build_info`

Build information for RStudio Server/Workbench. NOTE: The value for this metric is always 1

Subfolder: v1/pwb_build_info
Metric type: gauge
Requirements: Workbench 2024.04.0 or later with metrics enabled.
Additional columns:
- version: the current version of workbench
- release_name: the release name of the workbench version

`pwb_session_startup_duration_seconds_bucket`

A running total of counts of session startup durations. These counts are divided into buckets based on the startup duration. Each bucket has a duration threshold called a “limit”, and the value for a given limit indicates how many sessions started up in a duration less than or equal to that limit, and greater than the next smallest limit.

For example, if Workbench reported these 5 session startup durations:

8 seconds
3 seconds
42 seconds
4 seconds
325 seconds

The stored histogram bucket values would look like this:

value	limit
0	0.0
0	1.0
2	5.0
1	10.0
0	30.0
1	60.0
0	300.0
1	Infinity

The row with limit 5.0 reports a count of 2 as its value (representing the 3 and 4 second durations), the row with limit 10.0 reports a count of 1 (the 8 second duration), and so on.

Subfolder: v1/pwb_session_startup_duration_seconds_bucket
Metric type: histogram
Requirements: Workbench 2024.04.0 or later with metrics enabled.
Additional columns:
- limit: The time in seconds which is the upper-bound of the associated bucket and the lower-bound of the associated bucket with the next limit value.
- session_type: The type of session (e.g., vscode, rstudio-pro, etc) launched by the user.

`pwb_session_startup_duration_seconds_count`

A running total of the number of sessions launched in Workbench.

Subfolder: v1/pwb_session_startup_duration_seconds_count
Metric type: sum
Requirements: Workbench 2024.04.0 or later with metrics enabled.
Additional columns:
- session_type: The type of session (e.g., vscode, rstudio-pro, etc) launched by the user.

`pwb_session_startup_duration_seconds_sum`

A running total of all session startup time in Workbench.

Subfolder: v1/pwb_session_startup_duration_seconds_sum
Metric type: sum
Requirements: Workbench 2024.04.0 or later with metrics enabled.
Additional columns:
- session_type: The type of session (e.g., vscode, rstudio-pro, etc) launched by the user.

`pwb_sessions_launched_total`

A running total of all sessions launched in Workbench.

Subfolder: v1/pwb_sessions_launched_total
Metric type: sum
Requirements: Workbench 2024.04.0 or later with metrics enabled.
Additional columns:
- session_type: The type of session (e.g., vscode, rstudio-pro, etc) launched by the users.