Kubernetes

Advanced

Posit Connect can be configured to run on a Kubernetes cluster within Azure. This architecture is designed for the reliability and scale that comes with a distributed, highly available deployment using modern container orchestration. This architecture is best suited for organizations that already use Kubernetes for production workloads or have specific needs that are not provided by our high availability (HA) architecture.

Architectural overview

This deployment of Posit Connect utilizes Off-Host Execution and uses the following Azure resources:

Architecture diagram

%%{init: {"look": "handDrawn"} }%%
graph TD
    Users(["Users"])

    subgraph VNet["Virtual Network (VNet)"]
        ING["NGINX Ingress<br/>Load Balancer"]
        subgraph AKS["AKS Cluster"]
            subgraph AZ1["Availability Zone 1"]
                subgraph Pub1["Public Subnet"]
                    ENI1["Ingress Endpoint"]
                end
                subgraph Priv1["Private Subnet"]
                    subgraph Node1["Azure VM Node"]
                        Connect1["Connect pod(s)"]
                        Content1["Content pod(s)"]
                    end
                end
            end
            subgraph AZ2["Availability Zone 2"]
                subgraph Pub2["Public Subnet"]
                    ENI2["Ingress Endpoint"]
                end
                subgraph Priv2["Private Subnet"]
                    subgraph Node2["Azure VM Node"]
                        Connect2["Connect pod(s)"]
                        Content2["Content pod(s)"]
                    end
                end
            end
        end
        PG[("Azure Database<br/>for PostgreSQL")]
        NF[("Azure NetApp Files")]
    end

    Users --> ING
    ING --> ENI1 --> Connect1
    ING --> ENI2 --> Connect2
    Connect1 & Connect2 --- PG
    Connect1 & Connect2 & Content1 & Content2 --- NF

Kubernetes cluster

The Kubernetes cluster can be provisioned using Azure Kubernetes Service (AKS).

Nodes

We recommend a user node pool with at least three nodes of instance type Standard_D8_v5, but both the number of nodes and the instance types can be increased for more demanding workloads. Your instance needs will depend on the size of your audience for Connect content as well as the compute and memory needs of your data scientist’s applications.

  • Node pools should be provisioned across more than one availability zone and within private subnets.
  • This reference architecture does not assume autoscaling node groups. It assumes you have a fixed number of nodes within your node group.

To help distribute content pods evenly across worker nodes, follow standard Kubernetes scheduling practices: add pod topology spread constraints or set CPU and memory requests and limits on content pods. Topology spread constraints tell the kube-scheduler to spread matching pods across nodes or across availability zones. Resource requests give the scheduler each pod’s expected CPU and memory footprint, so it places new pods on nodes with available headroom rather than stacking them on whichever node it picks first; limits cap runtime usage so a single busy job cannot starve its neighbors. Without either mechanism, the scheduler may place many content pods on the same node, producing CPU and memory contention.

Database

This architecture utilizes an Azure Database for PostgreSQL instance with PostgreSQL running on a Standard_D4ds_v4 instance, provisioned with a minimum of 64 GB of storage and running the latest minor version of PostgreSQL 16 (see supported versions). Both the instance type and the storage can be scaled up for more demanding workloads.

  • The database instance should be configured with an empty PostgreSQL database for the Connect metadata.

Storage

This architecture utilizes an encrypted Azure NetApp Files. NetApp Files is configured with a 2 TiB capacity pool and a 100 GiB volume.

Load balancer

This architecture utilizes a Managed NGINX ingress with the application routing add-on in order to provide public ingress and load balancing to the Connect service within AKS.

  • The ingress service must be configured with sticky sessions enabled.

Networking

The architecture is implemented in a virtual network, utilizing both public and private subnets across multiple availability zones. This setup ensures high availability and fault tolerance for all deployed resources. The database instance, NetApp Files, and cluster node pools are located within the private subnets and ingress to the virtual machines is managed through a load balancer.

NAT Gateway

We recommend attaching a NAT Gateway to the AKS subnets to handle outbound traffic. Without a NAT Gateway, AKS uses the default Standard Load Balancer for outbound connections, which provides approximately 1,024 SNAT ports per node. When executing large numbers of jobs, Connect makes frequent connections to the Kubernetes API server, which can quickly exhaust the available SNAT ports and cause intermittent connection failures and degraded service.

A NAT Gateway provides 64,512 SNAT ports per assigned public IP address—a 16x increase over the Standard Load Balancer default—and allocates ports dynamically across all nodes, eliminating per-node starvation. If additional capacity is needed, up to 16 public IPs can be assigned (over one million ports total).

Configure AKS to use userAssignedNATGateway as the outboundType in the network profile, and associate the NAT Gateway with both the system and user node pool subnets. See Microsoft’s AKS NAT Gateway documentation for configuration details.

Configuration details

The required configuration details are outlined in the off-host execution installation & configuration steps.

Resiliency and availability

This implementation of Connect is resilient to AZ failures, but not full region failures. Assuming worker nodes in separate availability zones, with Connect pods running on each worker node, a failure in either node will result in disruption to user sessions on the failed node, but will not result in overall service downtime.

We recommend aligning with your organizational standards for backup and disaster recovery procedures with the database instances and Azure NetApp Files filesystems supporting this deployment. These two components, along with your Helm values.yaml file are needed to restore Connect in the event of a cluster or regional failure.

Performance

The Connect team conducts performance testing on this architecture using the Grafana k6 tool. The workload consists of one virtual user (VU) publishing an R-based Plumber application repeatedly, while other VUs are making API fetch requests to a Python-based Flask application.

The first test is a scalability test, where the number of VUs fetching the Flask app is increased steadily until the throughput is maximized. After noting the number of VUs needed to saturate the server, a second “load” test is run with that same number of VUs for 30 minutes, to accurately measure request latency when the server is fully utilized.

Below are the results for the load test:

  • Average requests per second: 2052 rps
  • Average request latency (fetch): 162 ms
  • Number of VUs: 400
  • Error rate: 0%

(NOTE that k6 VUs are not equivalent to real-world users, as they were being run without sleeps, to maximize throughput. To approximate the number of real-world users, you could multiply the RPS by 10).

Please note that applications performing complex processing tasks will likely require nodes with larger amounts of CPU and RAM to perform that processing, in order to achieve the same throughput and latency results above. We suggest executing performance tests on your applications to accurately determine hardware requirements.

FAQ

See the Architecture FAQs page for the general FAQ.