Understanding Chronicle data

Chronicle stores the data it produces in CSV and Parquet files. The Reports available with Chronicle are the easiest way to access data for most users. If you want to enhance the reports, write your own reports, or use this data for other purposes, this section describes how to access the data that Chronicle stores. You can also reference the code that is in the report source files.

Data directory structure

The Chronicle data directory is organized into a few subdirectories:

/var/lib/posit-chronicle/data
- /private
- /hourly
  - /<server-id>
    - /v2
      - /<metric-name>
- /daily
  - /v2
    - /<metric-name>
- /curated
  - /v2
    - /connect
      - /<metric-name>
    - /workbench
      - /<metric-name>

The private directory contains transient data. This data is short-lived. Do not access it directly.

The hourly directory contains minimally processed data that Chronicle updates hourly. Do not access this data directly. This data is relatively high volume and stored in CSV files. It includes “duplicate” values where a metric does not change over a period of time. The contents of the hourly directory will be updated if observations from that hour are received at a later time.

Every day, the hourly data is further processed and deduplicated into Parquet files in the daily directory. This processing eliminates duplicate values and significantly reduces the data volume. The specific nature of this aggregation varies by metric and select metrics are not aggregated. The aggregation strategies are described in the Metrics Overview. Note that some metrics might still have duplicate values when the same metric data is gathered from multiple hosts in a high availability (HA) cluster. The daily data can be used for reporting. This data will never be updated after it has been created. The aggregation process happens shortly after 0600 UTC each day.

After the daily data is created, it is further processed into Parquet files in the curated directory. This processing includes additional transformations and aggregations to make the data ready for reporting. The curated data is stored in separate subdirectories for Connect and Workbench metrics. This processing happens shortly after 0600 UTC each day.

The directory structure within hourly and daily is identical. Each contains one or more top-level vN subdirectories to delineate different versions of Chronicle’s internal data schema for each metric. Individual metrics are stored under the appropriate version directory. Within each metric directory, data is organized by the date/time of when it was gathered. As noted above, the curated directory structure is slightly different, with separate subdirectories for Connect and Workbench metrics.

The following is a partial example of the directory structure. Note that curated and daily data is stored for each day, while hourly data is stored for each hour.

├── curated
│   └── v2
│       ├── connect
│       |   └── user_list
│       |       └── 2024
│       |           └── 12
│       |               ├── 01
│       |               │   └── chronicle-data-aggregate.parquet
│       |               ├── 02
│       |               │   └── chronicle-data-aggregate.parquet
│       |               └── ...
│       └── workbench
│           └── user_list
│               └── 2024
│                   └── 12
│                       ├── 01
│                       │   └── chronicle-data-aggregate.parquet
│                       └── 02
│                           └── chronicle-data-aggregate.parquet
├── daily
│   └── v2
│       ├── connect_build_info
│       │   └── 2024
│       │       └── 12
│       │           ├── 01
│       │           │   └── chronicle-data-aggregate.parquet
│       │           ├── 02
│       │           │   └── chronicle-data-aggregate.parquet
│       │           └── ...
│       └── connect_license_active_users
│           └── 2024
│               └── 12
│                   ├── 01
│                   │   └── chronicle-data-aggregate.parquet
│                   ├── 02
│                   │   └── chronicle-data-aggregate.parquet
│                   └── ...
└── hourly
    └── <server-id>
        └── v2
            ├── connect_build_info
            │   └── 2024
            │       └── 12
            │           ├── 01
            │           │   ├── 00
            │           │   │   └── chronicle-data-chunk-<timestamp>.csv
            │           │   ├── 01
            │           │   │   └── chronicle-data-chunk-<timestamp>.csv
            │           │   ├── ...
            │           │   └── 23
            │           │       └── chronicle-data-chunk-<timestamp>.csv
            │           └── 02
            │               ├── 00
            │               │   └── chronicle-data-chunk-<timestamp>.csv
            │               ├── 01
            │               │   └── chronicle-data-chunk-<timestamp>.csv
            │               ├── ...
            │               └── 23
            │                   └── chronicle-data-chunk-<timestamp>.csv
            └── connect_license_active_users
               └── 2024
                   └── 12
                       ├── 01
                           └── ...

Reading Parquet data

There are examples of reading Chronicle data in the Curated data section of this documentation. There are also examples in the source code of the reports that are included with Chronicle, which can be found in the chronicle-reports repository.

While Parquet files are similar in concept to csv files, they are optimized for better read and write performance and therefore unreadable by most text editors without the help of plugins.

Both the RStudio IDE and Positron support viewing Parquet files without additional extensions.

If you are using VS Code, our team recommends the Parquet Explorer plugin to read and query Parquet files directly in your editor.

Another common trick is to convert .parquet files into .csv files for easier viewing, using Python and the pandas library:

Python

import pandas as pd
df = pd.read_parquet('filename.parquet')
df.to_csv('filename.csv')

Chronicle Metrics

The Metrics appendix includes details on the data that Chronicle gathers and stores.