Sizing and Capacity Planning
There is no single answer to I have x users, what resources do I need?
Resource requirements for Posit Connect depend on the type of content deployed, the number of concurrent content processes, and how those processes are configured.
Use the recommendations on this page as starting points and adjust based on monitoring data.
What affects resource needs
Three factors drive Connect resource requirements: content types, concurrent content processes, and runtime settings.
Content types
Connect hosts three categories of content, each with a different resource profile:
- Interactive applications (Shiny, Streamlit, Dash): Hold CPU and memory for the duration of a user’s session. A single application can spawn multiple worker processes depending on how many users are connected simultaneously.
- APIs (Plumber, Flask, FastAPI): Consume CPU on demand when requests arrive. Individual API processes are typically lighter than interactive applications, but high request volume can require many concurrent workers.
- Static or rendered content (Quarto, R Markdown, Jupyter Notebooks): Consume CPU and memory at render time, then release resources once rendering completes. If pre-rendered, they require minimal resources to serve.
The mix of content types on your server is the primary driver of resource needs. A deployment that serves mostly rendered scripts and reports needs far less capacity than one running dozens of concurrent interactive Shiny applications.
Concurrent content processes
Connect runs each piece of content as one or more operating system processes. The total number of concurrent processes, across all content items and all users, determines how much CPU and memory the server needs at any given time.
An interactive application serving five simultaneous users might run anywhere from one to five worker processes, depending on its MaxProcesses and MaxConnsPerProcess settings. An API handling periodic requests might run a single long-lived process. Scheduled reports spin up processes at render time and release them when finished.
The default MinProcesses setting of 0 allows Connect to terminate all worker processes once they become idle. A worker process is terminated when it has no active connections, has passed its IdleTimeout period, and terminating it would not violate the MinProcesses constraint. If MinProcesses is set to 1 or higher (either at the server level via MinProcessesLimit or per-content), at least that many processes will remain running even with no traffic.
Runtime settings
The Connect process configuration settings control how processes are created, how long they live, and how many can run simultaneously. These settings can be customized per content item. The server-wide defaults are defined in the Scheduler configuration. These settings directly affect resource consumption:
MaxProcessessets the upper bound on worker processes per content item.MinProcesseskeeps processes alive even when idle, reducing startup latency at the cost of holding resources.IdleTimeoutdetermines how long an idle process stays alive before being reclaimed.MaxConnsPerProcessandLoadFactorcontrol when the scheduler starts new workers to handle additional connections.
Choosing conservative values for these settings is as important as sizing hardware. See What to adjust first for tuning guidance.
Consequences of undersizing
Running Connect with insufficient resources affects content availability and user experience.
| Resource too low | What happens |
|---|---|
| CPU | Slow application startup, long report render times, API timeouts |
| Memory | Content processes killed by out-of-memory (OOM) errors, user sessions lost, new content fails to launch |
| Disk | Deployments fail, report rendering cannot complete, bundle storage fills up |
These symptoms can also result from runtime settings rather than insufficient hardware. See process configurations for common scenarios and tuning guidance.
Determining initial size
The reference architectures overview provides guidelines and a formula to estimate your sizing needs for each resource:
- CPU: (1 CPU per concurrent processing task) + (1 CPU for background tasks)
- RAM: (250 MB to 2 GB) x number of concurrently open applications
- Disk: 100 GB minimum
A concurrent processing task is a running worker process on the server, not a user or a content item. A single content item can run multiple worker processes depending on its settings and the number of connected users.
These formulas give a rough starting point, but the actual values depend heavily on your content mix. A deployment serving lightweight APIs needs less per-process memory than one running data-intensive Shiny applications.
Single server
If you do not know your workload yet, start with the single-server configuration below. These recommendations follow the AWS single-server reference architecture, which was tested at 1,189 requests per second with 250 virtual users and 0% error rate.
| Resource | Minimum | Recommended starting point |
|---|---|---|
| CPU | 4 cores | 8 cores |
| RAM | 16 GB | 32 GB |
| Disk | 100 GB | 200 GB |
Load-balanced or HA
A load-balanced or high availability (HA) deployment distributes content processes across multiple nodes. See Scheduled document rendering for details on how process limits and scheduling work across nodes.
Per-node specs
These recommendations follow the AWS load-balanced reference architecture (2x m6i.2xlarge), which was tested at 2,329 requests per second with 400 virtual users and 0% error rate.
| Resource | Recommendation |
|---|---|
| CPU | 8 vCPU (virtual cores) |
| RAM | 32 GB |
| Disk | Local disk for the OS, Connect application, and logs. Content data on shared storage (see Infrastructure requirements). |
Infrastructure requirements
| Component | Details |
|---|---|
| Shared storage | Network File System (NFS) (v3 or v4) or AWS Elastic File System (EFS) for the Connect data directory. EFS auto-scales with usage. Azure NetApp Files recommended for high-throughput workloads. |
| Load balancer | Application Load Balancer (for example, AWS ALB) with sticky sessions enabled |
| Clock sync | Network Time Protocol (NTP) across all nodes |
| Database | PostgreSQL |
For the full list of prerequisites, see the HA checklist.
Database sizing
In a load-balanced or HA configuration, you are required to provision a PostgreSQL database for storing Connect application metadata. That database requires at least 1 GB of storage. If you run several Posit products against one PostgreSQL instance, give each product its own database.
Those figures size the database. Sizing the instance you provision is a separate concept. The reference architectures provision a minimum of 15 GB of storage for the PostgreSQL instance on AWS (a db.m5.large instance), or 64 GB on Azure (a Standard_D4ds_v4 instance). Scale up for more demanding workloads.
Off-host execution (Kubernetes)
Connect supports off-host execution on Kubernetes, which runs content processes as pods on a Kubernetes cluster rather than on the Connect server itself.
When off-host execution is enabled, the Connect server handles request routing and content management while the cluster handles compute. This is most relevant when you need per-content resource isolation or elastic scaling for variable workloads.
Off-host execution shifts the capacity planning question from “how large should the Connect server be?” to “how large should the Kubernetes cluster be?” The Connect server itself still needs resources for routing and management, but compute-intensive work runs on the cluster.
For Kubernetes deployment details, see the Kubernetes reference architecture.
Monitoring usage
Once your deployment is running, monitoring helps you understand whether your current resources are sufficient and when it is time to scale. Connect exposes metrics that track content process utilization, request handling, queue health, and infrastructure performance.
Enable monitoring at install time. You cannot backfill metrics that were not recorded.
What to enable
Connect offers a number of ways to monitor a running deployment. For infrastructure-level metrics, OpenTelemetry and Prometheus are complementary, so enable both for the most complete picture:
- OpenTelemetry (OTel): Exports metrics, traces, and logs to any OTel-compatible backend (Datadog, Grafana, New Relic). OTel provides the most complete picture of how your infrastructure is handling requests, including the capacity planning metrics listed in Key metrics to watch. See the OpenTelemetry guide for setup.
- Prometheus: Exposes a scrape endpoint with metrics that supplement OTel, including active user counts by role, active content sessions per user, and job queue depth by queue type. These metrics are useful for tracking how many users and content items are driving load. See Operational metrics for setup.
- Posit Chronicle scrapes the Connect Prometheus endpoint to collect session telemetry, audit events, and runtime metrics. It is an embedded service that aggregates these metrics and allows you to query them over time. See Enabling Chronicle for setup.
Key metrics to watch
The following OTel metrics are the most relevant for capacity planning decisions. Together, they tell you whether worker capacity, request handling, or background job processing is becoming a bottleneck.
| Metric | What it tells you |
|---|---|
worker.pool.utilization |
Ratio of busy workers to total workers (0 to 1). Values near 1.0 mean the worker pool is saturated. |
requests.rejected |
Requests turned away, broken down by reason (capacity, license, auth). Any sustained capacity rejections indicate insufficient resources. |
queue.items.age |
How long jobs sit in the queue before processing starts. Growing values mean the queue is backing up. |
http.server.request.duration |
Request latency. Thresholds: under 0.5 seconds is fast, 0.5 to 2 seconds is acceptable, over 2 seconds is slow. |
job.completion with job.status=failure |
Content execution failures. Sustained failures, especially with exit code 137 (OOM), indicate memory pressure. |
process.count |
Running content processes. Drops might indicate crash loops. Steady rises might indicate processes not terminating. |
For the complete metric catalog and pre-built dashboards, see the OTel signal reference and alerting recommendations.
When to resize
The metrics above establish a baseline for your deployment. When those numbers start trending in the wrong direction, it might be time to resize. This section walks through the warning signs, your options for scaling, and configuration changes you can try before adding infrastructure.
Signals
Use the metrics from the previous section to identify which resource is under pressure. The table below maps common patterns to their likely cause and recommended response.
| What you see | What it means | What to do |
|---|---|---|
worker.pool.utilization sustained near 1.0 |
Worker pool saturated | Increase MaxProcesses (if resources allow) or add capacity |
requests.rejected with reason capacity |
Server turning away requests | Scale up (more CPU or RAM), scale out (add nodes), or reduce process count per content item |
queue.items.age growing |
Background jobs backing up | Increase ScheduleConcurrency (if CPU allows) or add nodes |
http.server.request.duration climbing |
Content processes responding slowly | Check CPU (contention) and shared storage latency (EFS or NFS) |
job.completion failures with exit code 137 |
Content processes killed (OOM) | Increase server RAM or set per-content MemoryLimit to prevent runaway processes from affecting others |
| System CPU sustained above 70% | CPU saturated | Scale up (more cores) or scale out (add nodes) |
Scaling options
If the signals above confirm that your current resources are no longer sufficient, the following options go from least to most complex. Start with the simplest option that addresses the bottleneck.
- Tune settings (see What to adjust first): No infrastructure change required.
- Scale vertically: Increase CPU, RAM, or disk on the existing server. This is the most common first step when the bottleneck is compute, rather than concurrent process count.
- Scale horizontally: Add nodes behind a load balancer. This requires at least the Enhanced license tier. See Load-balanced or HA on this page for sizing and the High availability section for setup.
- Off-host execution: Move content execution to a Kubernetes cluster. This requires the Advanced license tier. See Off-host execution on this page for an overview and the Kubernetes reference architecture for setup details.
What to adjust first
Before adding infrastructure, start by understanding your current usage patterns:
- Review System > Usage and System > Scheduled Content in the Connect dashboard to identify which content items are in high demand or are rendered frequently.
- Review the runtime settings across your content and look for outliers, such as content items with unusually high
MaxProcessesorMinProcessesvalues. For a step-by-step approach, see Capacity planning - content with non-default process settings, which lists content with overridden process settings and cross-references it with usage data. - Adjust settings on individual content items first. Reducing
MinProcesseson idle content can free significant resources without affecting the rest of your deployment. - Consider global defaults only after addressing content-level outliers. The server-wide Scheduler settings (
MaxProcesses,MinProcesses,IdleTimeout,ScheduleConcurrency,MinProcessesLimit,MaxProcessesLimit) apply to all content that has not been individually configured. To limit disk usage from stored bundles, seeBundleRetentionLimit.