AWS Big Data Blog
Best practices for least privilege configuration in HAQM MWAA
HAQM Managed Workflows for Apache Airflow (HAQM MWAA) provides a secure and managed environment to run Apache Airflow on AWS. Airflow is often used in highly regulated industries, such as finance and healthcare. These customers might want to further restrict access and traffic to enhance security posture than what the HAQM MWAA default configurations provide. This post covers some recommended practices.
The principle of least privilege is a fundamental tenet that should be followed diligently. When it comes to configuring AWS services, it’s essential to grant only the minimum required permissions to resources, avoiding overly broad or permissive policies.
In this post, we explore how to apply the principle of least privilege to your HAQM MWAA environment by tightening network security using security groups, network access control lists (ACLs), and virtual private cloud (VPC) endpoints. We also discuss the HAQM MWAA execution and deployment roles and their respective permissions.
Understanding the HAQM MWAA environment
When an HAQM MWAA environment is created, resources are created in an AWS managed service VPC and your customer managed VPC. In the customer VPC provided at environment creation, the necessary resources to run the Airflow environment are deployed, including schedulers and workers running on HAQM Elastic Container Service (HAQM ECS) clusters. These clusters are deployed in your VPC and they assume Elastic Network Interfaces (ENIs) with private IP addresses in the customer account. These ENIs span private subnets across two Availability Zones to connect to the Airflow database and web server, which reside in the service-owned account (if in private access mode). The following diagram illustrates this architecture.
VPC security groups act as virtual firewalls that can control network traffic at the ENI level, or instance level. Security groups are stateful, meaning that inbound traffic is automatically permitted outbound and vice versa. The default security group configuration in a VPC starts with is no inbound rules and an outbound rule allowing all traffic. By definition, a security group with no inbound rules denies all ingress traffic that wasn’t allowed out through the 0.0.0.0/0 outbound rule.
HAQM MWAA offers two web server access modes inside the customer VPC: public and private. Public web server mode must have a way for traffic to access the web servers in the customer-owned VPC through the public internet. This requires routing to the public internet using public subnets and a NAT gateway. A NAT gateway can be used to provide internet access for resources in private subnets. With private access mode, the security group for the HAQM MWAA environment doesn’t need to allow traffic to and from the NAT gateway, only granting access to the Airflow UI to users with appropriate permissions from within the VPC. An Application Load Balancer is only provisioned in public mode to route traffic to the public web servers. The customer must provision the rest of the networking components.
If your HAQM MWAA environment needs to communicate with resources outside your VPC (such as external data sources or APIs), you might need to configure appropriate security group rules and routing to allow the necessary traffic. In such cases, you would typically use a NAT gateway or VPN connection to facilitate the communication between your HAQM MWAA environment and the external resources and VPC endpoints for AWS resources.
For tighter security restrictions, an environment with private routing without internet access is possible, and finer-grained security group rules can be applied and VPC endpoint policies can be used. Because this post is focusing on least privilege, we will focus on the minimum security requirements needed for an HAQM MWAA environment.
Security groups: Minimizing permissions
Your HAQM MWAA environment will have a security group associated with your VPC’s environment resources. This security group is also used by the ENIs created by the interface VPC endpoint that is used to communicate with the database and web server. By default, security groups deny all inbound traffic and security group rules need to be explicitly stated, denoting the ports and source that the instance will allow network traffic from. At a minimum, the HAQM MWAA environment must allow for traffic to and from the HAQM Aurora PostgreSQL-Compatible Edition metadata database that is owned and managed by HAQM MWAA. The metadata database is a crucial component of Airflow that acts as a centralized source of truth for task execution, configuration, and monitoring. Both the scheduler and workers require access to this database to perform their respective roles in orchestrating and running tasks. This database listens on TCP port 5432. Additionally, the web server traffic can be restricted to HTTPS through TCP port 443. At a minimum, the HAQM MWAA security group must have the two inbound rules, detailed in the following table.
Type | Protocol | Port Range | Source Type | Source |
Custom TCP | TCP | 5432 | Custom | sg-xxxxx / my-mwaa-vpc-security-group |
HTTPS | TCP | 443 | Custom | sg-xxxxx / my-mwaa-vpc-security-group |
Many customers have other AWS resources residing in VPCs, to which the HAQM MWAA workers need access. These resources can be granted network access in a private routing configuration using security groups as well. If the resource sits in the same security group, add an additional inbound rule with the port needed. For example, if an HAQM Redshift cluster sits in the same security group, add the following rule.
Type | Protocol | Port Range | Source Type | Source |
Custom TCP | TCP | 5439 | Custom | sg-xxxxx / my-mwaa-vpc-security-group |
If the Redshift cluster is in a different security group, change the source to the Redshift security group.
Type | Protocol | Port Range | Source Type | Source |
Custom TCP | TCP | 5439 | Custom | sg-xxxxx / redshift-security-group |
If the resources are in another VPC, then VPC peering must be enabled before referencing that other VPC’s security group. For resources that don’t reside in a subnet, a VPC endpoint will also provide private routing to and from the HAQM MWAA environment and those resources. For example, a VPC endpoint for HAQM Simple Storage Service (HAQM S3) can provide enhanced security, improved performance, and lower costs.
Network ACLs: Minimizing permissions
Network ACLs can manage (by allow or deny rules) inbound and outbound traffic at the subnet level. An ACL is stateless, which means that inbound and outbound rules must be specified separately and explicitly. It is used to specify the types of network traffic that are allowed in or out from the instances in a VPC network.
Every HAQM VPC has a default ACL that allows all inbound and outbound traffic, with a rule as follows.
Rule number | Type | Protocol | Port Range | Source | Allow/Deny |
100 | All IPv4 traffic | All | All | 0.0.0.0/0 | Allow |
* | All IPv4 traffic | All | All | 0.0.0.0/0 | Deny |
You can edit the default ACL rules or create a custom ACL and attach it to your subnets. A subnet can only have one ACL attached to it at any time, but one ACL can be attached to multiple subnets. To implement least privilege in your HAQM MWAA environment, restrict the inbound ACL to allow traffic from the metadata database and web server and restrict the outbound to allow traffic to only the clients in the private subnet. Note the following examples use example private IPs for the subnets used.
Inbound NACL
Rule number | Type | Protocol | Port Range | Source | Allow/Deny | Comments |
100 | Custom TCP | TCP | 5432 | 10.192.21.0/16 | Allow | Allow inbound database traffic from private subnet |
110 | HTTPS | TCP | 443 | 10.192.21.0/16 | Allow | Allow inbound HTTPS traffic from private subnet |
* | All traffic | All | All | 0.0.0.0/0 | Deny | Denies all inbound IPv4 traffic not already handled by a preceding rule (not modifiable) |
Outbound NACL
Rule number | Type | Protocol | Port Range | Source | Allow/Deny | Comments |
100 | Custom TCP | TCP | 1024-65535 | 10.192.21.0/24 | Allow | Allows outbound return IPv4 traffic to clients in private subnet |
* | All traffic | All | All | 0.0.0.0/0 | Deny | Denies all outbound IPv4 traffic not already handled by a preceding rule (not modifiable) |
VPC endpoints: Minimizing permissions
When you create an HAQM MWAA environment, it is deployed within a VPC. This allows you to control the network access and security of your Airflow deployment. However, some customer workloads executing in the HAQM MWAA environment might need to orchestrate tasks using other AWS services, such as HAQM S3 to access files, AWS Glue to start ETL (extract, transform, and load) jobs, or HAQM Redshift for running data warehouse queries, which reside outside of your VPC. To establish a secure and private connection between your HAQM MWAA environment and these external AWS services, you can use VPC endpoints. The purpose of VPC endpoints in HAQM MWAA is to provide a secure and private connection between your HAQM MWAA environment and other AWS services within your VPC. VPC endpoints are virtual devices that are provisioned within your VPC and act as an entry point for the specified AWS service, allowing your HAQM MWAA environment to communicate with the service using a private IP address, without needing to go through the public internet. The following diagram illustrates this architecture.
VPC endpoints allow you to keep your HAQM MWAA environment’s network traffic within the AWS network, reducing the exposure to the public internet and enhancing the overall security of your Airflow deployment. Although private VPC endpoints are automatically created for the database and web server, to create a least privileged environment without internet access, additional VPC endpoints will be needed for the additional HAQM MWAA required resources. HAQM S3, HAQM Simple Queue Service (HAQM SQS), HAQM CloudWatch, and optionally AWS Key Management Service (AWS KMS) will need VPC endpoints created. For more details, see Creating the required VPC service endpoints in an HAQM VPC with private routing. Outside of the necessary services, many customers run HAQM MWAA workflows that orchestrate additional AWS services, such as HAQM Redshift, HAQM EMR, and AWS Glue. Let’s look at an example VPC endpoint that we want to use to connect to HAQM Redshift, which is commonly called in the Airflow DAGS using the Redshift Operator for workflows that interact with HAQM Redshift as a data warehouse. For more information on creating HAQM VPC interface endpoints, see Access an AWS service using an interface VPC endpoint.
Create a VPC endpoint
Complete the following steps to create a VPC endpoint using HAQM Virtual Private Cloud (HAQM VPC):
- On the HAQM VPC console, create a new VPC endpoint for the
amazonaws.region.redshift
service, whereregion
is the AWS Region where your HAQM MWAA environment and Redshift cluster are located. Make sure that private DNS is enabled. - Create a VPC endpoint policy. This can be used to limit access to the Redshift cluster only to the HAQM MWAA environment, preventing unauthorized access from other resources. The following is an example policy:
- The
Version
field specifies the policy language version. - The
Statement
section contains a single statement that allows the specified actions on the Redshift cluster. - The
Effect
field is set to Allow, which means the policy grants the specified permissions. - The
Principal
field specifies the AWS Identity and Access Management (IAM) role associated with your HAQM MWAA execution role, which is authorized to access the Redshift cluster. - The
Action
field lists the specific Redshift actions that the HAQM MWAA execution role is allowed to perform, such as describing the cluster, getting cluster credentials, and restoring from a snapshot. - The
Resource
field specifies the HAQM Resource Name (ARN) of the Redshift cluster that the policy applies to.
- Associate the VPC endpoint with the correct route table. This route table should be used by the subnets where your HAQM MWAA environment is deployed. If using a VPC interface endpoint, associate the endpoint with the two private subnets and security group used by HAQM MWAA.
- Make sure that the security groups associated with the HAQM MWAA environment and the Redshift cluster allow the necessary inbound and outbound traffic between them. This typically includes allowing access on the Redshift port (typically 5439) from the HAQM MWAA environment’s security group.
- On the HAQM MWAA console, under Admin, Connections, update the Redshift connection details to use the VPC endpoint address instead of the public Redshift endpoint. This makes sure that the connection between HAQM MWAA and HAQM Redshift is secure and stays within the VPC.
By configuring VPC endpoints for the AWS services your HAQM MWAA environment needs to access, you can provide secure, private, and efficient communication between your Airflow deployment and AWS resources.
Restricting traffic within AWS with a customer managed endpoints for HAQM MWAA resources
As mentioned earlier, HAQM MWAA integrates with various AWS services, such as CloudWatch for logging, HAQM S3 for DAGs and requirements, HAQM SQS as a messaging middleware, and optionally AWS KMS for encryption. You can create VPC endpoints for these services to make sure traffic stays within the AWS network. Access to these endpoints can be restricted by allowing only the HAQM MWAA security group as the ingress source. For details on how to create these endpoints and policies, see Introducing shared VPC support on HAQM MWAA. If the HAQM MWAA environment was updated after April 2, 2024, it will be on AWS Fargate v1.4 and will not use HAQM Elastic Container Registry (HAQM ECR) and therefore you will not need to create a VPC endpoint for it.
Managing permissions to deploy an HAQM MWAA environment
To create and deploy an HAQM MWAA environment, you need to have the appropriate permissions granted to your IAM user or role. The required permissions can be granted through an IAM policy attached to your user or role. When you create an HAQM MWAA environment, you can specify an execution role that will be assumed by the Airflow workers to perform tasks. The execution role should have the necessary permissions to access the required AWS services and resources based on your workflow requirements. It’s important to follow the principle of least privilege when granting permissions to IAM roles and users. You should only grant the minimum permissions required for your HAQM MWAA environment and Airflow workflows to function correctly.
HAQM MWAA trust policy
HAQM MWAA needs to be able to assume the execution role in order to perform actions on your behalf. To do this, create a trust policy, allowing the HAQM MWAA service the ability to AssumeRole
. To avoid the confused deputy problem, we add a condition to the trust policy, and replace the AWS account number and Region as needed. The following is an example policy:
VPC endpoint permissions for the deployer role
Although the service-linked role creates the VPC endpoints, the deployer role requires permissions to create VPC endpoints and perform a dry run. You can limit these permissions by allowing the ec2:CreateVpcEndpoint
action and specifying resource ARNs for VPC endpoints, VPCs, subnets, and security groups. Additionally, you can use the aws:CalledVia
condition key to restrict access to the airflow.amazonaws.com
service.
HAQM MWAA execution role: Required permissions
When creating an HAQM MWAA environment, you need to specify an execution role that grants the necessary permissions for Airflow to interact with other AWS services. Instead of using a wildcard policy, you can create a custom policy with the minimum required permissions.
The following is an example of an execution role policy that allows HAQM MWAA to interact with various services using an AWS managed key:
This policy grants HAQM MWAA the necessary permissions to interact with CloudWatch Logs, HAQM S3, HAQM SQS, and AWS KMS when using the AWS managed key offering, while explicitly specifying the resources it can access. You can further refine this policy based on your specific requirements.
The following is an example of an execution policy that allows HAQM MWAA to interact with various services using a KMS customer managed key:
For the use case of using the customer managed key, attach the following JSON policy to the key to provide access to the Airflow logs in CloudWatch Logs:
You can attach multiple policies to the execution role as needed to allow your workers to access additional AWS resources. For example, let’s explore how to enable HAQM EMR access. You can create a JSON policy that contains the narrowest permissions you can configure, as in the following example:
Conclusion
In this post, we discussed best practices for least privilege configuration in HAQM MWAA. By following these approaches, you can adhere to the principle of least privilege and maintain a secure posture within your HAQM MWAA environment, without compromising functionality or relying on overly permissive policies. Security is always top priority; to learn more about security in HAQM MWAA, see Security in HAQM Managed Workflows for Apache Airflow and Security best practices on HAQM MWAA.
About the Authors
Elizabeth Davis is a Sr Solutions Architect at HAQM Web Services (AWS). She currently works with educational technology companies and has a passion for serverless and data orchestration technologies. She has been an HAQM MWAA as a subject matter expert (SME) for the last 3+ years.
Mark Richman is a Principal Solutions Architect at HAQM Web Services with 30 years of experience building complex web and enterprise software. He contributes to Apache Airflow, bringing his expertise in cloud computing and serverless technologies to the open-source platform. Mark is also an accomplished writer and speaker who has authored commercial publications and AWS courses while regularly presenting at industry events.