Sizing and capacity planning

Note

There is no single answer to “I have x users, what resources do I need?”

Rather, resource requirements for Posit Package Manager depend on the number and type of repositories configured, the volume of package installs, and whether the deployment is connected or air-gapped. Use the recommendations on this page as starting points and adjust based on monitoring data.

What affects resource needs

Overall, disk space is the most important resource consideration for Package Manager. Four factors drive Package Manager resource requirements: product connectivity, repository types, package install volume, and whether you use binary or source packages.

Product connectivity

Note

While air-gapped deployments are discussed conceptually, this guide provides recommendations for connected deployments only. If you require an air-gapped environment, see the air-gapped environments appendix for storage requirements.

Product connectivity affects the amount of storage needed, and when it is needed.

  • In connected deployments, Package Manager fetches packages on demand and caches each version once. The cache grows gradually as users install packages, but overall connected deployments require less immediate network bandwidth and disk space.
  • In air-gapped environments, you must download all packages upfront, which requires significantly more storage from the start.

Repository types

Storage needs also depend on which repository types you enable, how many sources you configure, and whether you use full mirrors or curated subsets.

A full CRAN or PyPI mirror gives users access to every package, but requires more storage and longer sync times. A curated CRAN source or curated PyPI source exposes only an approved set of packages, reducing storage and giving administrators control over what is available.

Package install volume

CPU and memory scale with the number of concurrent package requests. Each user can download dozens or hundreds of packages per day. Deployments that serve CI/CD pipelines or function as a build server for Git-based repositories have higher CPU requirements than those serving only interactive users.

Binary or source packages

Installing a binary package avoids compiling source code, which can be time-consuming and might require additional system dependencies.

Package Manager provides precompiled CRAN binaries for Windows, macOS, and Linux across five R versions and four years of historical releases. While enabling binary serving increases storage needs (each R version + distribution + architecture combination adds storage), it significantly reduces install times for users and lowers client-side CPU requirements.

Consequences of undersizing

Note

Disk size is the most common constraint. Insufficient storage causes user-visible failures immediately.

Resource too low What happens
CPU Slow package downloads, long Git build times, sync delays
Memory Builds fail, sync operations crash
Disk Package installs fail for users, sync cannot complete, binary builds cannot be stored

Determining initial size

Package Manager does not have a per-user sizing formula like Posit Workbench or Posit Connect. As discussed above, considerations like repository configuration and install volume drive resource needs, rather than user count.

The reference architectures provide tested configurations at two scales:

  • Single server: 1 million package installs per month (30,000 per day)
  • Load-balanced (2 nodes): 30 million installs per month (1,000,000 per day) with 100 concurrent Git builders

If you do not know your workload yet, start with the single-server configuration recommendations below.

Single server

Here are the recommendations for single server, connected deployments:

Resource Minimum Recommended starting point
CPU 2 cores 4 cores
RAM 4 GB 16 GB
Disk 100 GB 500 GB
NoteDatabase for single server

Single-server deployments use a SQLite database built into Package Manager by default. Package Manager stores it on the local disk sized above, and it needs no separate provisioning.

Load-balanced or HA

A load-balanced or high availability (HA) deployment distributes package requests across multiple nodes. Application process limits and settings apply per node. Package Manager shares storage across nodes and manages and scales disk growth centrally using a shared storage layer (such as Network File System (NFS) or Amazon Simple Storage Service (S3)).

Per-node specs

These recommendations were tested at 30 million installs per month with 100 concurrent Git builders on the AWS load-balanced reference architecture (2x m6i.2xlarge).

Resource Recommendation
CPU 8 vCPU
RAM 32 GB
Disk Local disk for the OS, Package Manager application, and logs. Package data on shared storage (see Infrastructure requirements).

Infrastructure requirements

Component Details
Shared storage NFS or S3. Use NFS mount option lookupcache=pos. Azure NetApp Files recommended for high-throughput workloads.
Load balancer Application Load Balancer (e.g., AWS ALB)
Clock sync Network Time Protocol (NTP) across all nodes
Database PostgreSQL
Database sizing

In load-balanced or HA deployments, Package Manager stores its metadata in a single PostgreSQL database. That database requires at least 1 GB of storage. If you run several Posit products against one PostgreSQL instance, give each product its own database.

Those figures size the database. Sizing the instance you provision is a separate concept. The reference architectures generously provision 100 GiB of gp3 general-purpose solid-state drive (SSD) storage for the PostgreSQL instance on AWS, or 128 GiB on Azure. In testing, this architecture handled one million package installs per day while staying within 10% to 20% CPU utilization.

Monitoring usage

Once your deployment is running, monitoring helps you understand whether your current resources are sufficient and when it is time to scale. Package Manager exposes metrics that track disk consumption, request load, and build performance, the same factors that drive capacity decisions.

Important

Enable monitoring at install time. You cannot backfill metrics that were not recorded.

What to enable

Package Manager exposes a Prometheus endpoint at port 2112 by default. Enable StorageAudit in the configuration to expose disk usage metrics.

See Operational metrics for setup.

Key metrics to watch

The following metrics are the most relevant for capacity planning. Together, they tell you whether disk, CPU, or network is becoming a bottleneck.

Metric What it tells you
ppm_storage_used / ppm_storage_size Disk utilization by storage path. Alert before you run out.
ppm_http_requests_inflight Current concurrent requests. Sustained high values suggest the server is near capacity.
ppm_repo_source_sync_duration_seconds How long repository syncs take. Increasing values might indicate disk I/O pressure or network issues.
ppm_git_build_duration_seconds Git builder performance. Increasing values suggest CPU or I/O contention.
ppm_git_builds_failed_total Failed Git builds. Non-zero sustained values need investigation.
ppm_binary_routing_fallback Binary routing failures by reason. Rising values might indicate missing system dependencies.

For the full metric catalog, see Operational metrics.

When to resize

The metrics above establish a baseline for your deployment. When those numbers start trending in the wrong direction, it might be time to resize. This section walks through the warning signs, your options for scaling, and configuration changes you can try before adding infrastructure.

Signals

Use the metrics from the previous section to identify which resource is under pressure. The table below maps common patterns to their likely cause and recommended response.

What you see What it means What to do
ppm_storage_used approaching ppm_storage_size Disk filling up Expand storage, move to S3, or reduce repository scope
ppm_http_requests_inflight sustained high Server overloaded with requests Scale up (more CPU or RAM) or scale out (add nodes)
ppm_repo_source_sync_duration_seconds increasing Sync jobs slowing down Check disk I/O and network throughput
ppm_git_build_duration_seconds increasing Git builds backing up Add CPU or reduce concurrent Git builder count
ppm_git_builds_failed_total climbing Builds failing Check disk space, memory, and build logs
Package installs timing out for users Server cannot serve requests fast enough Scale up or scale out

Scaling options

If the signals above confirm that your current resources are no longer sufficient, the following options are listed from least to most complex. Start with the simplest option that addresses the bottleneck.

  1. Tune settings (see What to adjust first): No infrastructure change required.
  2. Scale vertically: Increase CPU, RAM, or disk on the existing server. This is the most common path, because storage is usually the primary constraint.
  3. Scale horizontally: Add nodes behind a load balancer. Requires at least the Enhanced license tier, NFS or S3 shared storage, and PostgreSQL. See Load-balanced or HA for sizing and High availability for setup.
  4. Kubernetes: Run the Package Manager server in K8s for orchestration and scaling. Requires the Advanced license tier. See the Kubernetes reference architecture for details.

What to adjust first

Before adding infrastructure, these configuration changes can often relieve pressure without any hardware changes:

  1. Repository scope: Reduce the number of curated sources or repository types if storage or sync performance is a concern.
  2. Storage class separation: Move large storage classes (binaries, PyPI) to faster or larger volumes.
  3. Sync scheduling: If syncs coincide with peak user activity, adjust sync timing.
  4. Git builder concurrency: Reduce concurrent Git builders if they compete with user-facing requests for CPU.
Back to top