Use HAQM FSx for Lustre to share HAQM S3 data across accounts

Update 4/9/2025: The cross-account bucket policy in the blog has been updated. It was missing a required principal: “arn:aws:iam::accountID:role/AWS-Signed-In-Console-Role.” This omission causes an access denied error.

As enterprises evolve their cloud governance practices, multiple teams working in separate accounts may need to share data. One team may oversee an enterprise data lake in one account, while a data science team develops a high-performance computing (HPC) use case in another account. Customers want to take advantage of low-cost object storage and be able to quickly consume this data from a high-performance file system to support HPC use cases without creating additional copies of the data.

HAQM FSx for Lustre has become a critical building block for customers accelerating machine learning (ML) and HPC use cases on AWS. HAQM FSx for Lustre offers a fully POSIX-compliant, high performance file system that delivers sub-millisecond latencies, up to hundreds of gigabytes per second throughput, and millions of IOPS. It seamlessly integrates with HAQM Simple Storage Service (HAQM S3), offering cloud practitioners seamless access to their S3 datasets and cost-efficiency for colder data sets.

In this blog post, we guide you through the process of seamlessly integrating an HAQM FSx for Lustre file system with an HAQM S3 data lake, where the HAQM FSx file system and HAQM S3 bucket reside in different AWS accounts in the same AWS Region. This solution will help you scale your AWS environment by allowing data to be shared from a centralized enterprise data lake to specialized team accounts consuming that data for ML and HPC use cases.

Solution overview

The following solution architecture consists of addressing two primary permissions issues. The first is authorizing HAQM FSx for Lustre to read from an HAQM S3 bucket in another account for the initial load. The second is authorizing the file system to receive bucket put notifications to replicate ongoing changes to keep the data synched.

Solution architecture

Prerequisites

To deploy the solution described in this blog, you will need the following:

Two (2) AWS accounts. You can create an AWS account if you do not already have two accounts available.

The following sections describe how to integrate an HAQM FSx for Lustre file system in ACCOUNT-A with a Data Repository Association (DRA) HAQM S3 bucket in ACCOUNT-B.

Implement the solution

The following sections walk through integrating an HAQM FSx for Lustre file system in ACCOUNT-A with a Data Repository Association (DRA) HAQM S3 bucket in ACCOUNT-B.

Step 1: Create HAQM FSx file system
Step 2: Create source bucket
Step 3: Create data repository association
Step 4: Configure bucket policy

Step 1: Create HAQM FSx file system

In ACCOUNT-A, confirm you are in the US East (N. Virginia) Region and navigate to the HAQM FSx console.

Confirm you are in US East (N.Virgina)

1. Click Create file system. On the next screen, you will be presented with different types of HAQM FSx file systems. Select HAQM Fsx for Lustre and then click Next.

Select File System Type

2. Enter the File system name, Storage capacity and set Data compression type to LZ4 to enable compression as shown in the following image.

Enter file system name

3. In the Network & security section choose the Virtual Private Cloud (VPC), VPC Security Groups, and a Subnet for our new file system.

Network and security

The selected security group must allow inbound access for HAQM FSx for Lustre traffic (TCP ports 988, 1018-1023) to enable HAQM EC2 instances in the same VPC to mount the HAQM FSx file system. For more information, see the documentation on file system access control with HAQM VPC in the FSx for Lustre User Guide.

HAQM FSx does not support backups on file systems linked to an HAQM S3 bucket so we need to disable backups for our new file system.

4. Under the Backup and maintenance section, choose Disabled and then click Next.

Backup and maintenance

5. Review options for accuracy and click Create file system. It will take a few minutes to initialize. When the file system is ready the status will show Available.

Step 2: Create source bucket

Create an HAQM S3 bucket in ACCOUNT-B. You can find the detailed instructions for creating a bucket in the HAQM S3 User Guide. In our example, we choose the US East (N. Virginia) Region and name the bucket “new-lustre-file-system.” After we create the data repository association in the next section, we will return to update the bucket policy.

Create an HAQM S3 bucket in ACCOUNT-B. The detailed instructions for Creating a bucket can be found in the HAQM Simple Storage Service User Guide.

1. In our example, we choose the US East (N. Virginia) Region and name the bucket “new-lustre-file-system”. After we create the data repository association in the next section, we will return to further lock down the bucket policy.

2. In ACCOUNT-B, navigate to the HAQM S3 console and choose the bucket you created. Click on the Permissions tab and in in the Bucket policy section choose Edit. Replace the current policy with the policy below.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": [
                "s3:AbortMultipartUpload",
                "s3:DeleteObject",
                "s3:PutObject",
                "s3:Get*",
                "s3:List*",
                "s3:PutBucketNotification"
            ],
            "Resource": [
                "arn:aws:s3:::new-lustre-file-system",
                "arn:aws:s3:::new-lustre-file-system/*"
            ],
            "Condition": {
                "StringLike": {
                    "aws:PrincipalArn": "arn:aws:iam::accountAID:role/aws-service-role/s3.data-source.lustre.fsx.amazonaws.com/AWSServiceRoleForFSxS3Access_fs-*"
                }
            }
        },
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": [
                "s3:PutObject",
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": [
                "arn:aws:s3:::new-lustre-file-system",
                "arn:aws:s3:::new-lustre-file-system/*"
            ],
            "Condition": {
                "StringLike": {
                    "aws:PrincipalArn": "arn:aws:iam::accountAID:role/Your-AWS-Signed-In-Console-Role"
                }
            }
        }
    ]
}

3. Replace the value for the AWS principal with the ARN of ACCOUNT-A ID and the role used to sign into ACCOUNT-A

4. Replace “new-lustre-file-system” with your own bucket name. Click on Save changes.

If you want HAQM FSx to encrypt data when writing to your S3 bucket, you need to set the default encryption on your S3 bucket to either SSE-S3 or SSE-KMS. For more information, refer to Working with server-side encrypted HAQM S3 buckets in the HAQM FSx for Lustre User Guide.

Step 3: Create data repository association

Now we will create a data repository association (DRA) to link the HAQM FSx for Lustre file system to our HAQM S3 bucket.

1. In ACCOUNT-A, navigate to the HAQM FSx console and select the file system we created. Select the Data repository tab and then choose Create data repository association.

Data Repository Association

2. Enter the File system path and the path to the HAQM S3 bucket. Note that for our example we used the entire bucket, but we could instead restrict the DRA to a specific prefix.

Data repository association information

3. Click Create. It will take a few minutes to initialize before the status shows Available.

File system "available"

4. When the DRA was created, an HAQM FSx service-linked role for HAQM S3 access was created. Navigate to the AWS Identity and Access Management (AWS IAM) console and search for the service role created for our new file system.

Identity, Access, Management

5. Find the HAQM Resource Name (ARN) for the HAQM FSx for Lustre service-linked role and save this somewhere. We’ll need it for the bucket policy in the next section.

HAQM Resource Name

Step 4: Lock down bucket policy

With the ARN from the previous section we’ll apply a bucket policy to our HAQM S3 bucket.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Example permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::accountAID:role/Your-AWS-Signed-In-Console-Role",
                    "arn:aws:iam::accountAID:role/aws-service-role/s3.data-source.lustre.fsx.amazonaws.com/AWSServiceRoleForFSxS3Access_fs-XXXXXXX"
                ]
            },
            "Action": [
                "s3:AbortMultipartUpload",
                "s3:DeleteObject",
                "s3:PutObject",
                "s3:Get*",
                "s3:List*",
                "s3:PutBucketNotification"
            ],
            "Resource": [
                "arn:aws:s3:::new-lustre-file-system",
                "arn:aws:s3:::new-lustre-file-system/*"
            ]
        }
    ]
}

In ACCOUNT-B, navigate to the HAQM S3 console and choose the bucket you created. Click on the Permissions tab and in in the Bucket policy section choose Edit. Replace the current policy with the policy below.

Replace the value for the AWS principal with the ARN from ACCOUNT-A and the service-linked role (found in IAM) which was created by the data repository association the previous section.

Testing the solution

Now we have an HAQM FSx for Lustre file system that is syncing with an HAQM S3 bucket in a different AWS account.

Step 1. Create the HAQM EC2 instance

To test the syncing, we need an HAQM EC2 instance so we can mount the file system.

In ACCOUNT-A, navigate to the HAQM EC2 console. Launch an instance using an HAQM Linux 2 AMI in the same VPC as your HAQM FSx file system. For instructions on how to launch an instance, refer to the documentation on launching your instance.

Step 2. Mount the file system

Connect to your Linux instance using one of several methods described in the documentation.

From the terminal window, mount the HAQM FSx file system. You can find instructions for how to mount your file system from the HAQM FSx console. Select the file system and then choose Attach.

Step 3. Create test files

After successfully mounting the file system, create a test file in the mounted directory /fsx/ns1/. We’ll call the file “file1.txt.”

test file in mounted directory

Switch to ACCOUNT-B, and check the HAQM S3 bucket you created. You should find file1.txt.

S3 bucket

Now upload another file directly to your HAQM S3 bucket. Let’s call it “file2.txt.”

Go back to the EC2 terminal and type ls -l. you should see file2.txt in /fsx/ns1/.

EC2 Terminal

You can repeat the testing process with delete and update.

Cleaning up

Now that we tested the solution, execute the following four steps to delete the provisioned resources to avoid incurring unnecessary charges.

Terminate the HAQM EC2 instance you used to mount and test the file system.
Delete the HAQM FSx for Lustre file system you created in ACCOUNT-A.
Delete the sample data and the HAQM S3 bucket you created in ACCOUNT-B.
Delete the IAM service-linked role you created to provide HAQM S3 access to the HAQM FSx for Lustre file system.

Conclusion

HAQM FSx for Lustre’s native integration with S3 provides a proven, easy to deploy solution that leverages the high performance of a scale-out Lustre file system with the benefits of a data lake built on HAQM S3. In this post, we demonstrated how to deploy a solution to keep an HAQM FSx file system in sync with changes made to source data in an HAQM S3 bucket in a different AWS account. This solution helps enterprises scale their AWS environment by allowing data to be shared from a centralized enterprise data lake to specialized team accounts consuming that data for ML and HPC use cases.

Do you have other challenges serving data from an enterprise data lake to ML and HPC teams? Let us know in the comments if this approach improves your delivery times!

AWS Storage Blog