Using Chronicle data

Chronicle stores the data it produces in parquet files. The Reports included with Chronicle are the easiest way to access data for most users. If you want to enhance the reports, write your own reports, or use this data for other purposes, this section describes how to access the data that Chronicle stores. You can also reference the code that is in the report QMD files.

Data Directory Structure

The Chronicle data directory is organized into a few subdirectories:

  • /var/lib/posit-chronicle/data
    • /private
    • /hourly
      • /v1
        • /<metric-name>
    • /daily
      • /v1
        • /<metric-name>

The private directory contains transient data. This data is short-lived and should not be accessed by users.

Every hour, the private data is processed and stored in the hourly directory. This data is minimally processed and relatively high volume. It includes “duplicate” values - where a metric does not change over a period of time. This data can be used for custom reporting, but the report must query the data efficiently due to the volume of data.

Every day, the hourly data is further processed and aggregated into the daily directory. This processing eliminates duplicate values and significantly reduces the data volume. The specific nature of this aggregation varies by metric. The aggregation strategies are described below. The daily data is used by Chronicle reports, and it can also be used for custom reporting.

The structure within hourly and daily is identical. Each contains one or more top-level vN subdirectories to delineate different versions of Chronicle’s internal data schema for each metric. Individual metrics are stored under the appropriate version directory. Within each metric directory, data is organized by the date/time of when it was gathered.

The following is a complete example. Note that daily data is stored for a day, while hourly data is stored for each hour.

├── daily
│   └── v1
│       ├── connect_content
│       │   └── 2024
│       │       └── 12
│       │           ├── 01
│       │           │   └── connect_content.parquet
│       │           ├── 02
│       │           │   └── connect_content.parquet
│       │           └── ...
│       └── connect_license_active_users
│           └── 2024
│               └── 12
│                   ├── 01
│                   │   └── connect_content.parquet
│                   ├── 02
│                   │   └── connect_content.parquet
│                   └── ...
└── hourly
    └── v1
        ├── connect_content
        │   └── 2024
        │       └── 12
        │           ├── 01
        │           │   ├── 00
        │           │   │   └── connect_content.parquet
        │           │   ├── 01
        │           │   │   └── connect_content.parquet
        │           │   ├── ...
        │           │   └── 23
        │           │       └── connect_content.parquet
        │           └── 02
        │               ├── 00
        │               │   └── connect_content.parquet
        │               ├── 01
        │               │   └── connect_content.parquet
        │               ├── ...
        │               └── 23
        │                   └── connect_content.parquet
        └── connect_license_active_users
            └── 2024
                └── 12
                    ├── 01
                        └── ...

Reading parquet data

While parquet files are similar in concept to csv files, they are optimized for better read/write performance and therefore unreadable by most text editors without the help of plugins.

The RStudio IDE is a great place to read parquet, and run both R and Python for the scripts below. Follow this link if you would like to install R and RStudio (open source Desktop edition).

If you are using VSCode, our team recommends the Parquet Explorer plugin to read and query parquet files directly in your editor.

Another common trick is to convert .parquet files into .csv files for easier viewing, leveraging python and the pandas library:

Terminal
>> import pandas as pd
>> df = pd.read_parquet('filename.parquet')
>> df.to_csv('filename.csv')

Using R

These scripts have been tested with R version 4.2.3. You may need to upgrade your R version if you are running into errors installing packages. In particular, arrow may run into issues if the R version is too old.

The examples in the Apache Arrow documentation on reading parquet files show how to read data stored locally or in S3 into an arrow table class.

Opening local Chronicle data with R

You can read the May, 2024 partition of the Chronicle’s parquet data into an arrow table with the following:

library(arrow)

# Collecting user data from a file
base_path <- '/var/lib/posit-chronicle/data/hourly'
users <- arrow::open_dataset(paste0(base_path, "/v1/users/2024/05/"))

Opening Chronicle data in S3 with R

You can read S3 bucket parquet contents from the month of May, 2024 into an arrow table with the following:

# Imports
library(arrow)
library(paws)
library(urltools)

# Set s3 bucket ----
s3_bucket <- "s3://{{YOUR_BUCKET_NAME}}"
svc <- s3(config = list(region = "us-east-2"))
bucket_str <- svc$list_objects(Bucket = urltools::domain(s3_bucket))

# Collecting user data
users_bucket <- paste0(s3_bucket, "/hourly/v1/users/2024/05")
users <- open_dataset(users_bucket,
                      hive_style = FALSE,
                      format = "parquet")

Querying Chronicle data with R

Once you have run one of the above to bring your users parquet data into an arrow table, you can begin querying it:

library(arrow)
library(tidyverse)

# Viewing user data
users_head <- head(users, 5) |>
  collect()
print(users_head)

Metrics Generated by Chronicle

Metrics are gathered and processed on a scheduled basis. This means that you may not see metrics files immediately when first starting Chronicle. It also means that there is a delay before the latest data shows up in the refined metrics files.

By default, the agent retrieves metrics data once every 60 seconds. The metrics data is processed into refined metrics once an hour. This process happens shortly after the top of the hour. The exact timing is not entirely predictable due to processing delays, but the refinement process typically completes by 15 minutes after the top of the hour.

Approaches to aggregation

Based on the type of metric represented in the data, metrics are aggregated according to one of the following strategies. Each individual metric type listed below includes an indication of which aggregation approach is employed to aggregate its data, or N/A if that metric is not currently aggregated.

The examples below reflect an aggregation of this data series:

Timestamp Value
01:00 12
01:01 12
01:02 12
01:03 13
01:04 15
01:05 15
01:06 15
01:07 15
01:08 16
  • Deduplication Aggregation: with this approach, a value is retained if it represents either the first or last observation with that value. With the example dataset above, this approach would aggregate the series to:
Timestamp Value
01:00 12
01:02 12
01:03 13
01:04 15
01:07 15
01:08 16
  • Delta Aggregation: with this approach, only the difference between consecutive values is considered, and this value is only retained if the difference is not 0. With the example dataset above, this approach would aggregate the series to:
Timestamp Value
01:00 0
01:03 1
01:04 2
01:08 1

Available metrics

The following is a non-exhaustive list of all product metrics Chronicle produces from Posit Connect, Posit Package Manager, and Posit Workbench. These metrics are stored in separate subfolders of the configured storage location, which is /var/lib/posit-chronicle/data by default.

All metrics files include the following columns:

Name Description
timestamp The time in UTC when the observation was recorded by the Chronicle agent.
type The metric type (gauge, sum, etc.).
cluster Reservered for future use. This column is always empty.
environment A user-defined environment label set via the agent configuration. See the Advanced Agent Configuration appendix for setup instructions.
service The source of the metric. One of connect, package-manager, or workbench.
host The host name where the Chronicle agent that reported the metric observation is running.
os Detailed operating system information for the host on which the Chronicle agent that reported this observation is running.

Refined metrics

In addition to the columns described above, each refined metric includes a column called value which contains the value observed regardless of the underlying numeric type.

Each of these refined metrics is stored in a separate subfolder named after the refined metric. For example, the data related to the connect_content_hits_total refined metric is stored in the v1/connect_content_hits_total subfolder of the configured Chronicle storage location.

connect_build_info

Build information for Connect. NOTE: The value for this metric is always 1

  • Subfolder: v1/connect_build_info
  • Metric type: gauge
  • Requirements: A valid administrator Connect API key.
  • Aggregation strategy: Deduplication Aggregation
  • Additional columns:
    • version: the current version of Connect
    • build: the version with the build commit hash appended
connect_content

The current number of content items published in Connect.

  • Subfolder: v1/connect_content
  • Metric type: gauge
  • Requirements: Connect 2024.02.0 or later with metrics enabled.
  • Aggregation strategy: Deduplication Aggregation
  • Additional columns:
    • content_type: the content type of the item visited
connect_content_app_sessions_current

The current number of active user sessions on a given piece of Shiny content.

  • Subfolder: v1/connect_content_app_sessions_current
  • Metric type: gauge
  • Requirements: Connect 2024.02.0 or later with metrics enabled; some columns (annotated with * below) also require a valid administrator Connect API key.
  • Aggregation strategy: Deduplication Aggregation
  • Additional columns:
    • content_id: the internal ID of the content item visited
    • user_id: the internal ID of the user who visited the content item
    • content_name*: the internal name of the content item visited
    • content_title*: the user-visible title of the content item visited
    • content_type*: the content type of the item visited
    • user_name*: the username of the user who visited the content item
connect_content_hits_total

The running total of user visits to a specific piece of content.

  • Subfolder: v1/connect_content_hits_total
  • Metric type: sum
  • Requirements: Connect 2024.02.0 or later with metrics enabled; some columns (annotated with * below) also require a valid administrator Connect API key.
  • Aggregation strategy: Delta Aggregation
  • Additional columns:
    • content_id: the internal ID of the content item visited
    • user_id: the internal ID of the user who visited the content item
    • content_name*: the internal name of the content item visited
    • content_title*: the user-visible title of the content item visited
    • content_type*: the content type of the item visited
    • user_name*: the username of the user who visited the content item
connect_installed_versions_python

A count of the versions of Python which are currently installed.

  • Subfolder: v1/connect_installed_versions_python
  • Metric type: gauge
  • Requirements: A valid administrator Connect API key.
  • Aggregation strategy: Deduplication Aggregation
  • Additional columns:
    • versions: a list of the versions which are installed.
connect_installed_versions_r

A count of the versions of R which are currently installed.

  • Subfolder: v1/connect_installed_versions_r
  • Metric type: gauge
  • Requirements: A valid administrator Connect API key.
  • Aggregation strategy: Deduplication Aggregation
  • Additional columns:
    • versions: a list of the versions which are installed.
connect_license_active_users

The current number of users consuming license seats in Connect.

  • Subfolder: v1/connect_license_active_users
  • Metric type: gauge
  • Requirements: A valid administrator Connect API key.
  • Aggregation strategy: Deduplication Aggregation
  • Additional columns: None
connect_license_user_seats

The total number of licensed seats allowed in Connect.

  • Subfolder: v1/connect_license_user_seats
  • Metric type: gauge
  • Requirements: A valid administrator Connect API key.
  • Aggregation strategy: Deduplication Aggregation
  • Additional columns: None
connect_users

A metric used to capture a list of users in Connect. The value of this metric is always 1.

  • Subfolder: v1/connect_users
  • Metric type: gauge
  • Requirements: A valid administrator Connect API key.
  • Aggregation strategy: Deduplication Aggregation
  • Additional columns:
    • id: The ID of the user.
    • username: The username of the user (the name they use when logging in).
    • email: The email address of the user.
    • first_name: The first name of the user.
    • last_name: The last name of the user.
    • role: The role of the user (e.g., publisher, viewer).
    • created_at: The timestamp when the user was created.
    • updated_at: The timestamp when the user was most recently updated.
    • last_active_at: The timestamp when the user was most recently active (logged in) in Posit Connect.
pwb_license_active_users

The current number of users consuming license seats in Workbench.

  • Subfolder: v1/pwb_license_active_users
  • Metric type: gauge
  • Requirements: Workbench 2024.04.0 or later with metrics enabled.
  • Aggregation strategy: Deduplication Aggregation
  • Additional columns: None
pwb_license_user_seats

The total number of licensed seats allowed in Workbench.

  • Subfolder: v1/pwb_license_user_seats
  • Metric type: gauge
  • Requirements: Workbench 2024.04.0 or later with metrics enabled.
  • Aggregation strategy: Deduplication Aggregation
  • Additional columns: None
pwb_build_info

Build information for RStudio Server/Workbench. NOTE: The value for this metric is always 1

  • Subfolder: v1/pwb_build_info
  • Metric type: gauge
  • Requirements: Workbench 2024.04.0 or later with metrics enabled.
  • Aggregation strategy: Deduplication Aggregation
  • Additional columns:
    • version: the current version of workbench
    • release_name: the release name of the workbench version
pwb_session_startup_duration_seconds_bucket

A running total of counts of session startup durations. These counts are divided into buckets based on the startup duration. Each bucket has a duration threshold called a “limit”, and the value for a given limit indicates how many sessions started up in a duration less than or equal to that limit, and greater than the next smallest limit.

For example, if Workbench reported these 5 session startup durations:

  • 8 seconds
  • 3 seconds
  • 42 seconds
  • 4 seconds
  • 325 seconds

The stored histogram bucket values would look like this:

value limit
0 0.0
0 1.0
2 5.0
1 10.0
0 30.0
1 60.0
0 300.0
1 Infinity

The row with limit 5.0 reports a count of 2 as its value (representing the 3 and 4 second durations), the row with limit 10.0 reports a count of 1 (the 8 second duration), and so on.

  • Subfolder: v1/pwb_session_startup_duration_seconds_bucket
  • Metric type: histogram
  • Requirements: Workbench 2024.04.0 or later with metrics enabled.
  • Aggregation strategy: N/A
  • Additional columns:
    • limit: The time in seconds which is the upper-bound of the associated bucket and the lower-bound of the associated bucket with the next limit value.
    • session_type: The type of session (e.g., vscode, rstudio-pro, etc) launched by the user.
pwb_session_startup_duration_seconds_count

A running total of the number of sessions launched in Workbench.

  • Subfolder: v1/pwb_session_startup_duration_seconds_count
  • Metric type: sum
  • Requirements: Workbench 2024.04.0 or later with metrics enabled.
  • Aggregation strategy: N/A
  • Additional columns:
    • session_type: The type of session (e.g., vscode, rstudio-pro, etc) launched by the user.
pwb_session_startup_duration_seconds_sum

A running total of all session startup time in Workbench.

  • Subfolder: v1/pwb_session_startup_duration_seconds_sum
  • Metric type: sum
  • Requirements: Workbench 2024.04.0 or later with metrics enabled.
  • Aggregation strategy: N/A
  • Additional columns:
    • session_type: The type of session (e.g., vscode, rstudio-pro, etc) launched by the user.
pwb_sessions_launched_total

A running total of all sessions launched in Workbench.

  • Subfolder: v1/pwb_sessions_launched_total
  • Metric type: sum
  • Requirements: Workbench 2024.04.0 or later with metrics enabled.
  • Aggregation strategy: Delta Aggregation
  • Additional columns:
    • session_type: The type of session (e.g., vscode, rstudio-pro, etc) launched by the users.
pwb_jobs_launched_total

A running total of all jobs launched in Workbench.

  • Subfolder: v1/pwb_jobs_launched_total
  • Metric type: sum
  • Requirements: Workbench 2024.09.0 or later with metrics enabled.
  • Aggregation strategy: Delta Aggregation
  • Additional columns:
    • job_type: The type of job (e.g., r) launched by the users.
pwb_users

A list of all users in Workbench. NOTE: The value for this metric is always 1

  • Subfolder: v1/pwb_users
  • Metric type: gauge
  • Requirements: Workbench 2024.12.0 or later with a valid admin api key.
  • Aggregation strategy: Deduplication Aggregation
  • Additional columns:
    • id: The UID of the Workbench user.
    • guid: The GUID of the Workbench user.
    • username: The Username of the Workbench user.
    • email: The Email Address of the Workbench user.
    • status: The Status of the Workbench user. (Active, Inactive)
    • is_admin: True if the Workbench user is an Administrator.
    • is_super_admin: True if the Workbench user is an Administrator Superuser.
    • role: The role of the Workbench user. (User, Administrator, Superuser)
    • last_active_at: Timestamp of the Workbench user’s last sign in.
    • created_at: Timestamp when the Workbench user was created.
Back to top