AWS Big Data Blog
Category: HAQM Managed Workflows for Apache Airflow (HAQM MWAA)
HAQM MWAA best practices for managing Python dependencies
Customers with data engineers and data scientists are using HAQM Managed Workflows for Apache Airflow (HAQM MWAA) as a central orchestration platform for running data pipelines and machine learning (ML) workloads. To support these pipelines, they often require additional Python packages, such as Apache Airflow Providers. For example, a pipeline may require the Snowflake provider […]
Disaster recovery strategies for HAQM MWAA – Part 2
HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a fully managed orchestration service that makes it straightforward to run data processing workflows at scale. HAQM MWAA takes care of operating and scaling Apache Airflow so you can focus on developing workflows. However, although HAQM MWAA provides high availability within an AWS Region through features […]
Introducing HAQM MWAA support for the Airflow REST API and web server auto scaling
Apache Airflow is a popular platform for enterprises looking to orchestrate complex data pipelines and workflows. HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a managed service that streamlines the setup and operation of secure and highly available Airflow environments in the cloud. In this post, we’re excited to introduce two new features that […]
Orchestrate an end-to-end ETL pipeline using HAQM S3, AWS Glue, and HAQM Redshift Serverless with HAQM MWAA
HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a managed orchestration service for Apache Airflow that you can use to set up and operate data pipelines in the cloud at scale. Apache Airflow is an open source tool used to programmatically author, schedule, and monitor sequences of processes and tasks, referred to as workflows. […]
Dynamic DAG generation with YAML and DAG Factory in HAQM MWAA
HAQM Managed Workflow for Apache Airflow (HAQM MWAA) is a managed service that allows you to use a familiar Apache Airflow environment with improved scalability, availability, and security to enhance and scale your business workflows without the operational burden of managing the underlying infrastructure. In Airflow, Directed Acyclic Graphs (DAGs) are defined as Python code. […]
Introducing HAQM MWAA larger environment sizes
HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a managed service for Apache Airflow that streamlines the setup and operation of the infrastructure to orchestrate data pipelines in the cloud. Customers use HAQM MWAA to manage the scalability, availability, and security of their Apache Airflow environments. As they design more intensive, complex, and ever-growing […]
Introducing HAQM MWAA support for Apache Airflow version 2.8.1
HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a managed orchestration service for Apache Airflow that makes it straightforward to set up and operate end-to-end data pipelines in the cloud. Organizations use HAQM MWAA to enhance their business workflows. For example, C2i Genomics uses HAQM MWAA in their data platform to orchestrate the validation […]
Disaster recovery strategies for HAQM MWAA – Part 1
In the dynamic world of cloud computing, ensuring the resilience and availability of critical applications is paramount. Disaster recovery (DR) is the process by which an organization anticipates and addresses technology-related disasters. For organizations implementing critical workload orchestration using HAQM Managed Workflows for Apache Airflow (HAQM MWAA), it is crucial to have a DR plan […]
Orchestrate HAQM EMR Serverless Spark jobs with HAQM MWAA, and data validation using HAQM Athena
As data engineering becomes increasingly complex, organizations are looking for new ways to streamline their data processing workflows. Many data engineers today use Apache Airflow to build, schedule, and monitor their data pipelines. However, as the volume of data grows, managing and scaling these pipelines can become a daunting task. HAQM Managed Workflows for Apache […]
Introducing shared VPC support on HAQM MWAA
In this post, we demonstrate automating deployment of HAQM Managed Workflows for Apache Airflow (HAQM MWAA) using customer-managed endpoints in a VPC, providing compatibility with shared, or otherwise restricted, VPCs. Data scientists and engineers have made Apache Airflow a leading open source tool to create data pipelines due to its active open source community, familiar […]