Using Amazon Elastic File System (EFS) with Posit Team

How to use Amazon EFS with Posit Team.
Important

The Amazon Elastic File System (EFS) has unique design characteristics that can make it challenging to use with Posit Team. To be successful with EFS, please be sure to read through this full document and adhere to its guidance.

Overview

This document explains how to set up Amazon EFS with Posit Team and details best practices for ongoing usage of EFS. Additionally, this document covers how we came up with these recommendations.

Amazon EFS is a managed shared file system that scales elastically with the amount of storage you use. Since it supports Network File System (NFS), it’s possible to use EFS as the shared file system with Posit products. However, in some situations relevant to Posit Professional Products, EFS can suffer from slower performance relative to Elastic Block Store (EBS).

This slower performance is particularly prevalent in workloads that are sensitive to latency (as opposed to throughput). Specifically, EFS isn’t performant when reading and writing thousands of small files. When managing R workloads on a server, this can be problematic because some R packages that contain C++ code can contain a great many C++ header files. For example, the BH package on CRAN contains ~12K header files, so installation of BH can be slow on EFS.

On some rare occasions, this can also affect direct data science work in cases where the workflow requires reading many files. For example, performance could be poor when training a deep neural network that processes image files in bulk.

Warning

Product-specific limitations and recommendations

This document is relevant to Workbench, Connect, and Package Manager usage. We only have a few product-specific limitations and recommendations.

For Workbench:

  • The default lock type of link-based won’t work; use the advisory type instead.

  • RStudio Workbench 1.4, or greater (including new versions of Posit Workbench)

    We highly recommend upgrading to version 2021-09.0 because this release includes several performance improvements for EFS.

For Connect:

  • If you choose to configure Database.Dir, this also must point to the same shared location.

For Package Manager:

  • Use the lookupcache=pos mount option to prevent long service delays due to attribute caching. See the NFS documentation for more information.

Special considerations

  • Operations that write many small files (thousands or more) don’t perform well in most EFS settings.

  • To prevent users from having to repeatedly install R package, Posit recommends preinstalling the R packages.

  • Consider adopting code patterns that prefer reading large files over splitting data between many small files.

  • When using an EFS file system for many users, we recommend segmenting data files into user-specific directories as much as is possible. Since writing large files blocks metadata operations in the same directory until the write operation is complete, keeping the users’ data isolated in separate directories minimizes the impact of large file operations on other users.

Since EFS performance is largely based on individual usage patterns, this document serves as a starting point rather than an absolute directive. Be aware that you need to tune your EFS configuration after monitoring user behavior, and it could require adjustments over time to ensure long-term performance.

Testing and benchmarking your configuration with fsbench

If you want to collect data about file system performance on your own EFS installation, use Posit’s benchmarking tool. The benchmarking tool runs a suite of file operations to help characterize system behavior and compare it against other known configurations.

For information on how to configure and run benchmark testing, please refer to the fsbench package documentation.

Back to top