AWS Big Data Blog
Introducing shared VPC support on HAQM MWAA
In this post, we demonstrate automating deployment of HAQM Managed Workflows for Apache Airflow (HAQM MWAA) using customer-managed endpoints in a VPC, providing compatibility with shared, or otherwise restricted, VPCs.
Data scientists and engineers have made Apache Airflow a leading open source tool to create data pipelines due to its active open source community, familiar Python development as Directed Acyclic Graph (DAG) workflows, and extensive library of pre-built integrations. HAQM MWAA is a managed service for Airflow that makes it easy to run Airflow on AWS without the operational burden of having to manage the underlying infrastructure. For each Airflow environment, HAQM MWAA creates a single-tenant service VPC, which hosts the metadatabase that stores states and the web server that provides the user interface. HAQM MWAA further manages Airflow scheduler and worker instances in a customer-owned and managed VPC, in order to schedule and run tasks that interact with customer resources. Those Airflow containers in the customer VPC access resources in the service VPC via a VPC endpoint.
Many organizations choose to centrally manage their VPC using AWS Organizations, allowing a VPC in an owner account to be shared with resources in a different participant account. However, because creating a new route outside of a VPC is considered a privileged operation, participant accounts can’t create endpoints in owner VPCs. Furthermore, many customers don’t want to extend the security privileges required to create VPC endpoints to all users provisioning HAQM MWAA environments. In addition to VPC endpoints, customers also wish to restrict data egress via HAQM Simple Queue Service (HAQM SQS) queues, and HAQM SQS access is a requirement in the HAQM MWAA architecture.
Shared VPC support for HAQM MWAA adds the ability for you to manage your own endpoints within your VPCs, adding compatibility to shared and otherwise restricted VPCs. Specifying customer-managed endpoints also provides the ability to meet strict security policies by explicitly restricting VPC resource access to just those needed by your HAQM MWAA environments. This post demonstrates how customer-managed endpoints work with HAQM MWAA and provides examples of how to automate the provisioning of those endpoints.
Solution overview
Shared VPC support for HAQM MWAA allows multiple AWS accounts to create their Airflow environments into shared, centrally managed VPCs. The account that owns the VPC (owner) shares the two private subnets required by HAQM MWAA with other accounts (participants) that belong to the same organization from AWS Organizations. After the subnets are shared, the participants can view, create, modify, and delete HAQM MWAA environments in the subnets shared with them.
When users specify the need for a shared, or otherwise policy-restricted, VPC during environment creation, HAQM MWAA will first create the service VPC resources, then enter a pending state for up to 72 hours, with an HAQM EventBridge notification of the change in state. This allows owners to create the required endpoints on behalf of participants based on endpoint service information from the HAQM MWAA console or API, or programmatically via an AWS Lambda function and EventBridge rule, as in the example in this post.
After those endpoints are created on the owner account, the endpoint service in the single-tenant HAQM MWAA VPC will detect the endpoint connection event and resume environment creation. Should there be an issue, you can cancel environment creation by deleting the environment during this pending state.
This feature also allows you to remove the create, modify, and delete VPCE privileges from the AWS Identity and Access Management (IAM) principal creating HAQM MWAA environments, even when not using a shared VPC, because that permission will instead be imposed on the IAM principal creating the endpoint (the Lambda function in our example). Furthermore, the HAQM MWAA environment will provide the SQS queue HAQM Resource Name (ARN) used by the Airflow Celery Executor to queue tasks (the Celery Executor Queue), allowing you to explicitly enter those resources into your network policy rather than having to provide a more open and generalized permission.
In this example, we create the VPC and HAQM MWAA environment in the same account. For shared VPCs across accounts, the EventBridge rule and Lambda function would exist in the owner account, and the HAQM MWAA environment would be created in the participant account. See Sending and receiving HAQM EventBridge events between AWS accounts for more information.
Prerequisites
You should have the following prerequisites:
- An AWS account
- An AWS user in that account, with permissions to create VPCs, VPC endpoints, and HAQM MWAA environments
- An HAQM Simple Storage Service (HAQM S3) bucket in that account, with a folder called
dags
Create the VPC
We begin by creating a restrictive VPC using an AWS CloudFormation template, in order to simulate creating the necessary VPC endpoint and modifying the SQS endpoint policy. If you want to use an existing VPC, you can proceed to the next section.
- Download the CloudFormation template referenced in Option three: Creating an HAQM VPC network without Internet access.
- Extract the file
cfn-vpc-private-bjs.yml
from the downloaded ZIP archive. - Now we edit our CloudFormation template to restrict access to HAQM SQS. In
cfn-vpc-private-bjs.yml
, edit theSqsVpcEndoint
section to appear as follows:
This additional policy document entry prevents HAQM SQS egress to any resource not explicitly listed.
Now we can create our CloudFormation stack.
- On the AWS CloudFormation console, choose Create stack.
- Select Upload a template file.
- Choose Choose file.
- Browse to the file you modified.
- Choose Next.
- For Stack name, enter
MWAA-Environment-VPC
. - Choose Next until you reach the review page.
- Choose Submit.
Create the Lambda function
We have two options for self-managing our endpoints: manual and automated. In this example, we create a Lambda function that responds to the HAQM MWAA EventBridge notification. You could also use the EventBridge notification to send an HAQM Simple Notification Service (HAQM SNS) message, such as an email, to someone with permission to create the VPC endpoint manually.
First, we create a Lambda function to respond to the EventBridge event that HAQM MWAA will emit.
- On the Lambda console, choose Create function.
- For Name, enter
mwaa-create-lambda
. - For Runtime, choose Python 3.11.
- Choose Create function.
- For Code, in the Code source section, for
lambda_function
, enter the following code: - Choose Deploy.
- On the Configuration tab of the Lambda function, in the General configuration section, choose Edit.
- For Timeout, increate to 5 minutes, 0 seconds.
- Choose Save.
- In the Permissions section, under Execution role, choose the role name to edit the permissions of this function.
- For Permission policies, choose the link under Policy name.
- Choose Edit and add a comma and the following statement:
The complete policy should look similar to the following:
- Choose Next until you reach the review page.
- Choose Save changes.
Create an EventBridge rule
Next, we configure EventBridge to send the HAQM MWAA notifications to our Lambda function.
- On the EventBridge console, choose Create rule.
- For Name, enter mwaa-create.
- Select Rule with an event pattern.
- Choose Next.
- For Creation method, choose User pattern form.
- Choose Edit pattern.
- For Event pattern, enter the following:
- Choose Next.
- For Select a target, choose Lambda function.
You may also specify an SNS notification in order to receive a message when the environment state changes.
- For Function, choose
mwaa-create-lambda
. - Choose Next until you reach the final section, then choose Create rule.
Create an HAQM MWAA environment
Finally, we create an HAQM MWAA environment with customer-managed endpoints.
- On the HAQM MWAA console, choose Create environment.
- For Name, enter a unique name for your environment.
- For Airflow version, choose the latest Airflow version.
- For S3 bucket, choose Browse S3 and choose your S3 bucket, or enter the HAQM S3 URI.
- For DAGs folder, choose Browse S3 and choose the
dags/
folder in your S3 bucket, or enter the HAQM S3 URI. - Choose Next.
- For Virtual Private Cloud, choose the VPC you created earlier.
- For Web server access, choose Public network (Internet accessible).
- For Security groups, deselect Create new security group.
- Choose the shared VPC security group created by the CloudFormation template.
Because the security groups of the AWS PrivateLink endpoints from the earlier step are self-referencing, you must choose the same security group for your HAQM MWAA environment.
- For Endpoint management, choose Customer managed endpoints.
- Keep the remaining settings as default and choose Next.
- Choose Create environment.
When your environment is available, you can access it via the Open Airflow UI link on the HAQM MWAA console.
Clean up
Cleaning up resources that are not actively being used reduces costs and is a best practice. If you don’t delete your resources, you can incur additional charges. To clean up your resources, complete the following steps:
- Delete your HAQM MWAA environment, EventBridge rule, and Lambda function.
- Delete the VPC endpoints created by the Lambda function.
- Delete any security groups created, if applicable.
- After the above resources have completed deletion, delete the CloudFormation stack to ensure that you have removed all of the remaining resources.
Summary
This post described how to automate environment creation with shared VPC support in HAQM MWAA. This gives you the ability to manage your own endpoints within your VPC, adding compatibility to shared, or otherwise restricted, VPCs. Specifying customer-managed endpoints also provides the ability to meet strict security policies by explicitly restricting VPC resource access to just those needed by their HAQM MWAA environments. To learn more about HAQM MWAA, refer to the HAQM MWAA User Guide. For more posts about HAQM MWAA, visit the HAQM MWAA resources page.
About the author
John Jackson has over 25 years of software experience as a developer, systems architect, and product manager in both startups and large corporations and is the AWS Principal Product Manager responsible for HAQM MWAA.