|
Today we are announcing support for Elastic Fabric Adapter (EFA) and NVIDIA GPUDirect Storage (GDS) in Amazon FSx for Luster. EFA is a network interface for Amazon EC2 instances that allows you to run applications that require high-level inter-node communication at scale. GDS is a technology that creates a direct data path between local or remote storage and GPU memory. With these enhancements, Amazon FSx for Luster with EFA/GDS support delivers up to 12x higher per-client throughput (up to 1200 Gbps) compared to previous FSx for Luster versions.
FSx for Luster lets you build and run the most demanding applications, including deep learning training, drug discovery, financial modeling, and autonomous vehicle development. As data sets grow and new technologies emerge, you can adopt increasingly powerful GPUs and HPC instances, such as Amazon EC2 P5, Trn1, and Hpc7a. Until now, using traditional TCP networking to access FSx for Luster file systems limited the throughput of individual client instances to 100 Gbps. This adoption requires the FSx for Luster file system to deliver the performance needed to optimally utilize the increasing network bandwidth of leading-edge EC2 instances when accessing large data sets.
With EFA and GDS support in FSx for Luster, applications can now achieve up to 1,200 Gbps throughput per client instance (12x more than before) when using P5 GPU instances and NVIDIA CUDA.
This new feature allows you to take full advantage of the network bandwidth of your most powerful compute instances and accelerate machine learning (ML) and HPC workloads. EFA improves performance by bypassing the operating system and optimizing data transfer using the AWS Scalable Reliable Datagram (AWS SRD) protocol. GDS further improves performance by enabling direct data transfer between the file system and GPU memory, bypassing the CPU and eliminating redundant memory copies.
Let’s see how this works in practice.
Creating an Amazon FSx for Luster file system with EFA enabled
To get started, in the Amazon FSx console choose: Create file system Then Amazon FSx for Luster.
Enter a name for the file system. at Deployment and storage type I choose from the section. Persistent, SSD and new When EFA is enabled Options. i choose 1000MB/sec/TiB at Throughput per unit of storage part time job. With these settings you get 4.8 TiB. storage capacityThis is the minimum supported for these settings.
For networking, we use a default Virtual Private Cloud (VPC) and EFA-enabled security groups. Leave all other options as default.
Review all options and proceed with creating the file system. The file system will be available for use in a few minutes.
Mount an EFA-enabled Amazon FSx for Luster file system on an Amazon EC2 instance
In the Amazon EC2 console, choose: Instance launchEnter an instance name and select Ubuntu Amazon Machine Image (AMI). for instance typei choose trn1.32xlarge.
in network settingsEdit the preferences and select the same subnet used by your FSx Luster file system. in Firewall (security group)Select three existing security groups: the EFA-enabled security group used by the FSx for Luster file system, the default security group, and the security group that provides Secure Shell (SSH) access.
in Advanced network configurationi choose ENA and EFA like interface type. Without this setting, instances use traditional TCP networking, and connection throughput with FSx for Luster file systems is still limited to 100 Gbps.
To achieve more throughput, you can add more EFA network interfaces depending on the instance type.
Launch the instance, and once the instance is ready, connect using EC2 Instance Connect and follow the Luster client installation instructions and EFA client configuration instructions in the FSx for Luster User Guide.
Then follow the instructions for mounting the FSx for Luster file system on your EC2 instance.
Create a folder to use as a mount point.
In the FSx console, select the file system and DNS name and mount name. Use this value to mount the file system.
When you access an EFA-enabled file system from a client instance that supports EFA and uses Luster version 2.15 or later, EFA is automatically enabled.
What you need to know
EFA and GDS support is available now at no additional cost for the new Amazon FSx for Luster file systems in all AWS Regions where Persistent 2 is available. FSx for Luster automatically uses EFA when customers access EFA-enabled file systems from client instances that support EFA without any additional configuration. For a list of EC2 client instances that support EFA, see Supported Instance Types in the Amazon EC2 User Guide. This network specifications table describes network bandwidth and EFA support for instance types in the Accelerated Computing category.
To use an EFA-enabled instance with an FSx for Luster file system, you must use the Luster 2.15 client on Ubuntu 22.04 with kernel 6.8 or higher.
The client instance and file system must be in the same subnet within an Amazon Virtual Private Cloud (Amazon VPC) connection.
GDS is automatically supported on EFA-enabled file systems. Using GDS with the FSx for Luster file system requires the NVIDIA Compute Unified Device Architecture (CUDA) package, open source NVIDIA drivers, and the NVIDIA GPUDirect storage driver installed on the client instance. These packages come pre-installed on the AWS Deep Learning AMI. CUDA-enabled applications can then use GPUDirect storage to transfer data between the file system and the GPU.
When planning your deployment, keep in mind that EFA-enabled file systems have a larger minimum storage capacity increase than file systems that do not support EFA. For example, if you select the 1,000 MB/s/TiB throughput tier, the minimum storage capacity for an EFA-enabled file system starts at 4.8 TiB, compared to 1.2 TB for FSx for Luster file systems without EFA enabled. If you are migrating existing workloads, you can use AWS DataSync to move data from your existing file system to a new file system that supports EFA and GDS.
For maximum flexibility, FSx for Luster maintains compatibility with both EFA and non-EFA workloads. When accessing EFA-enabled file systems, traffic from non-EFA client instances automatically flows over existing TCP/IP networking using the Elastic Network Adapter (ENA), providing seamless access to all workloads without additional configuration.
To learn more about EFA and GDS support in FSx for Luster, including detailed setup instructions and best practices, see the Amazon FSx for Luster documentation. Get started today and experience the fastest storage performance available for GPU instances in the cloud.
— Danilo
Update 11/27: Post updated to reflect 12x throughput.