Capacity planning - content with non-default process settings

Problem

You want to understand potential resource outliers and review capacity planning decisions.

Note

When a content item’s process settings are null, it uses the server-level defaults from the [Scheduler] configuration section in Posit Connect. Non-null values indicate that a publisher or administrator has explicitly overridden the defaults for that content item.

Solution

Retrieve all content from the server, filter for items with non-default process settings (max_processes, min_processes, max_conns_per_process, load_factor, or idle_timeout), and cross-reference with usage data to assess whether the overrides are justified.

from posit import connect
import polars as pl

client = connect.Client()

# Retrieve all content items (include owner details)
all_content = client.content.find(include="owner")
content_df = pl.DataFrame(all_content, infer_schema_length=None)

# Define the process settings columns to check
process_settings = [
    "max_processes",
    "min_processes",
    "max_conns_per_process",
    "load_factor",
    "idle_timeout",
]

# Filter to content with at least one non-null process setting
content_with_overrides = (
    content_df
    .filter(
        pl.any_horizontal([pl.col(col).is_not_null() for col in process_settings])
    )
    .with_columns(
        pl.col("owner")
        .map_elements(lambda x: x["username"], return_dtype=pl.String)
        .alias("owner_username")
    )
    .select(
        ["guid", "title", "owner_username", "app_mode"]
        + process_settings
        + ["dashboard_url"]
    )
)

The resulting table lists all content with at least one customized process setting.

>>> content_with_overrides
shape: (4, 10)
┌──────────────────┬───────────────────┬────────────────┬──────────┬───────────────┬───────────────┬───────────────────────┬─────────────┬──────────────┬──────────────────┐
│ guid             ┆ title             ┆ owner_username ┆ app_mode ┆ max_processes ┆ min_processes ┆ max_conns_per_process ┆ load_factor ┆ idle_timeout ┆ dashboard_url    │
------------------------------
strstrstrstr      ┆ i64           ┆ i64           ┆ i64                   ┆ f64         ┆ i64          ┆ str
╞══════════════════╪═══════════════════╪════════════════╪══════════╪═══════════════╪═══════════════╪═══════════════════════╪═════════════╪══════════════╪══════════════════╡
5258049f-fe5e-…  ┆ Sales Dashboard   ┆ publisher1     ┆ shiny    ┆ 102             ┆ null                  ┆ null        ┆ null         ┆ https://connect… │
11471207-1059-…  ┆ Forecast API      ┆ publisher2     ┆ python-51500.5         ┆ null         ┆ https://connect… │
│                  ┆                   ┆                ┆ api      ┆               ┆               ┆                       ┆             ┆              ┆                  │
│ deec1ee8-3f14-…  ┆ Risk Model        ┆ publisher1     ┆ python-  ┆ null          ┆ null          ┆ null                  ┆ null        ┆ 3600         ┆ https://connect… │
│                  ┆                   ┆                ┆ api      ┆               ┆               ┆                       ┆             ┆              ┆                  │
│ a1b2c3d4-e5f6-…  ┆ Analytics App     ┆ publisher3     ┆ shiny    ┆ 205200.8900          ┆ https://connect… │
└──────────────────┴───────────────────┴────────────────┴──────────┴───────────────┴───────────────┴───────────────────────┴─────────────┴──────────────┴──────────────────┘

Cross-referencing with usage data

To assess whether the overrides are justified, cross-reference with content usage data. Rather than fetching all usage data from the server, query usage only for the content items that have overrides.

from datetime import datetime, timedelta, timezone

# Retrieve usage data from the last 90 days, only for content with overrides
since = datetime.now(timezone.utc) - timedelta(days=90)
override_guids = content_with_overrides["guid"].to_list()

usage_records = []
for guid in override_guids:
    records = client.metrics.usage.find(
        content_guid=guid, start=since.isoformat()
    )
    usage_records.extend(records)

usage_df = pl.DataFrame(usage_records, infer_schema_length=None)

# Count usage events per content item
usage_counts = (
    usage_df
    .group_by("content_guid")
    .agg(pl.len().alias("usage_count"))
)

# Join usage data with the overrides table
overrides_with_usage = (
    content_with_overrides
    .join(usage_counts, left_on="guid", right_on="content_guid", how="left")
    .with_columns(pl.col("usage_count").fill_null(0))
    .sort("usage_count", descending=True)
)

The resulting table shows which content items have process overrides alongside their usage volume.

>>> overrides_with_usage.select(["title", "max_processes", "min_processes", "usage_count"])
shape: (4, 4)
┌───────────────────┬───────────────┬───────────────┬─────────────┐
│ title             ┆ max_processes ┆ min_processes ┆ usage_count │
------------
str               ┆ i64           ┆ i64           ┆ u32         │
╞═══════════════════╪═══════════════╪═══════════════╪═════════════╡
│ Sales Dashboard   ┆ 1025230
│ Forecast API      ┆ 511843
│ Analytics App     ┆ 20542
│ Risk Model        ┆ null          ┆ null          ┆ 0
└───────────────────┴───────────────┴───────────────┴─────────────┘

Full example

from posit import connect
from datetime import datetime, timedelta, timezone
import polars as pl

client = connect.Client()

# Retrieve all content items (include owner details)
all_content = client.content.find(include="owner")
content_df = pl.DataFrame(all_content, infer_schema_length=None)

# Define the process settings columns to check
process_settings = [
    "max_processes",
    "min_processes",
    "max_conns_per_process",
    "load_factor",
    "idle_timeout",
]

# Filter to content with at least one non-null process setting
content_with_overrides = (
    content_df
    .filter(
        pl.any_horizontal([pl.col(col).is_not_null() for col in process_settings])
    )
    .with_columns(
        pl.col("owner")
        .map_elements(lambda x: x["username"], return_dtype=pl.String)
        .alias("owner_username")
    )
    .select(
        ["guid", "title", "owner_username", "app_mode"]
        + process_settings
        + ["dashboard_url"]
    )
)

# Retrieve usage data from the last 90 days, only for content with overrides
since = datetime.now(timezone.utc) - timedelta(days=90)
override_guids = content_with_overrides["guid"].to_list()

usage_records = []
for guid in override_guids:
    records = client.metrics.usage.find(
        content_guid=guid, start=since.isoformat()
    )
    usage_records.extend(records)

usage_df = pl.DataFrame(usage_records, infer_schema_length=None)

# Count usage events per content item
usage_counts = (
    usage_df
    .group_by("content_guid")
    .agg(pl.len().alias("usage_count"))
)

# Join usage data with the overrides table
overrides_with_usage = (
    content_with_overrides
    .join(usage_counts, left_on="guid", right_on="content_guid", how="left")
    .with_columns(pl.col("usage_count").fill_null(0))
    .sort("usage_count", descending=True)
)
library(connectapi)
library(dplyr)
library(tidyr)

client <- connect()

# Retrieve all content items
content_df <- get_content(client)

# Define the process settings columns to check
process_settings <- c(
  "max_processes",
  "min_processes",
  "max_conns_per_process",
  "load_factor",
  "idle_timeout"
)

# Filter to content with at least one non-null process setting
content_with_overrides <- content_df |>
  filter(if_any(all_of(process_settings), \(x) !is.na(x))) |>
  hoist(owner, owner_username = "username") |>
  select(guid, title, owner_username, app_mode,
         all_of(process_settings), dashboard_url)

The resulting table lists all content with at least one customized process setting.

> content_with_overrides
# A tibble: 4 x 10
  guid            title            owner_username app_mode   max_processes min_processes max_conns_per_process load_factor idle_timeout dashboard_url
  <chr>           <chr>            <chr>          <chr>              <int>         <int>                 <int>       <dbl>        <int> <chr>
1 5258049f-fe5e-… Sales Dashboard  publisher1     shiny                 10             2                    NA        NA             NA https://connect…
2 11471207-1059-… Forecast API     publisher2     python-api             5             1                    50         0.5           NA https://connect…
3 deec1ee8-3f14-… Risk Model       publisher1     python-api            NA            NA                    NA        NA           3600 https://connect…
4 a1b2c3d4-e5f6-… Analytics App    publisher3     shiny                 20             5                    20         0.8          900 https://connect…

Cross-referencing with usage data

To assess whether the overrides are justified, cross-reference with content usage data. Rather than fetching all usage data from the server, query usage only for the content items that have overrides.

# Retrieve usage data from the last 90 days, only for content with overrides
since <- Sys.time() - as.difftime(90, units = "days")
override_guids <- content_with_overrides$guid

usage <- purrr::map(override_guids, \(guid) {
  shiny <- get_usage_shiny(client, content_guid = guid, from = since,
                           limit = Inf)
  static <- get_usage_static(client, content_guid = guid, from = since,
                             limit = Inf) |>
    rename(started = time)
  bind_rows(shiny, static)
}) |>
  list_rbind()

# Count usage events per content item
usage_counts <- usage |>
  count(content_guid, name = "usage_count")

# Join usage data with the overrides table
overrides_with_usage <- content_with_overrides |>
  left_join(usage_counts, by = c("guid" = "content_guid")) |>
  mutate(usage_count = replace_na(usage_count, 0)) |>
  arrange(desc(usage_count))

The resulting table shows which content items have process overrides alongside their usage volume.

> overrides_with_usage |> select(title, max_processes, min_processes, usage_count)
# A tibble: 4 x 4
  title            max_processes min_processes usage_count
  <chr>                    <int>         <int>       <int>
1 Sales Dashboard             10             2        5230
2 Forecast API                 5             1        1843
3 Analytics App               20             5          42
4 Risk Model                  NA            NA           0

Full example

library(connectapi)
library(dplyr)
library(tidyr)
library(purrr)

client <- connect()

# Retrieve all content items
content_df <- get_content(client)

# Define the process settings columns to check
process_settings <- c(
  "max_processes",
  "min_processes",
  "max_conns_per_process",
  "load_factor",
  "idle_timeout"
)

# Filter to content with at least one non-null process setting
content_with_overrides <- content_df |>
  filter(if_any(all_of(process_settings), \(x) !is.na(x))) |>
  hoist(owner, owner_username = "username") |>
  select(guid, title, owner_username, app_mode,
         all_of(process_settings), dashboard_url)

# Retrieve usage data from the last 90 days, only for content with overrides
since <- Sys.time() - as.difftime(90, units = "days")
override_guids <- content_with_overrides$guid

usage <- map(override_guids, \(guid) {
  shiny <- get_usage_shiny(client, content_guid = guid, from = since,
                           limit = Inf)
  static <- get_usage_static(client, content_guid = guid, from = since,
                             limit = Inf) |>
    rename(started = time)
  bind_rows(shiny, static)
}) |>
  list_rbind()

# Count usage events per content item
usage_counts <- usage |>
  count(content_guid, name = "usage_count")

# Join usage data with the overrides table
overrides_with_usage <- content_with_overrides |>
  left_join(usage_counts, by = c("guid" = "content_guid")) |>
  mutate(usage_count = replace_na(usage_count, 0)) |>
  arrange(desc(usage_count))

Discussion

Content items with high max_processes or min_processes values can consume significant server resources, especially when usage is low. The cross-reference with usage data helps you identify potential mismatches:

  • High max_processes with low max_conns_per_process: You might have over-provisioned the content. Consider load testing to determine whether the settings are justified.
  • High min_processes with low usage: Idle processes consume memory even when no users are active. Consider lowering min_processes or removing the override entirely.
  • idle_timeout set very high: Content processes stay alive longer, consuming resources during idle periods.
  • Low load_factor: Spawns new processes aggressively, which may result in more concurrent processes than necessary.

Content with null values for all process settings uses the server-level defaults defined in the [Scheduler] section of the Connect configuration file.

See also