AWS Big Data Blog
Category: HAQM Managed Workflows for Apache Airflow (HAQM MWAA)
How HAQM GTTS runs large-scale ETL jobs on AWS using HAQM MWAA
The HAQM Global Transportation Technology Services (GTTS) team owns a set of products called INSITE (Insights Into Transportation Everywhere). These products are user-facing applications that solve specific business problems across different transportation domains: network topology management, capacity management, and network monitoring. As of this writing, GTTS serves around 10,000 customers globally on a monthly basis, […]
Integrate HAQM MWAA with Microsoft Entra ID using SAML authentication
HAQM Managed Workflows for Apache Airflow (HAQM MWAA) provides a fully managed solution for orchestrating and automating complex workflows in the cloud. HAQM MWAA offers two network access modes for accessing the Apache Airflow web UI in your environments: public and private. Customers often deploy HAQM MWAA in private mode and want to use existing […]
Migrate workloads from AWS Data Pipeline
After careful consideration, we have made the decision to close new customer access to AWS Data Pipeline, effective July 25, 2024. AWS Data Pipeline existing customers can continue to use the service as normal. AWS continues to invest in security, availability, and performance improvements for AWS Data Pipeline, but we do not plan to introduce […]
Introducing HAQM MWAA support for Apache Airflow version 2.9.2
HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a managed orchestration service for Apache Airflow that significantly improves security and availability, and reduces infrastructure management overhead when setting up and operating end-to-end data pipelines in the cloud. Today, we are announcing the availability of Apache Airflow version 2.9.2 environments on HAQM MWAA. Apache Airflow […]
Run Apache XTable on HAQM MWAA to translate open table formats
In this post, we show you how to get started with Apache XTable on AWS and how you can use it in a batch pipeline orchestrated with HAQM Managed Workflows for Apache Airflow (HAQM MWAA). To understand how XTable and similar solutions work, we start with a high-level background on metadata management in an OTF and then dive deeper into XTable and its usage.
HAQM MWAA best practices for managing Python dependencies
Customers with data engineers and data scientists are using HAQM Managed Workflows for Apache Airflow (HAQM MWAA) as a central orchestration platform for running data pipelines and machine learning (ML) workloads. To support these pipelines, they often require additional Python packages, such as Apache Airflow Providers. For example, a pipeline may require the Snowflake provider […]
Disaster recovery strategies for HAQM MWAA – Part 2
HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a fully managed orchestration service that makes it straightforward to run data processing workflows at scale. HAQM MWAA takes care of operating and scaling Apache Airflow so you can focus on developing workflows. However, although HAQM MWAA provides high availability within an AWS Region through features […]
Introducing HAQM MWAA support for the Airflow REST API and web server auto scaling
Apache Airflow is a popular platform for enterprises looking to orchestrate complex data pipelines and workflows. HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a managed service that streamlines the setup and operation of secure and highly available Airflow environments in the cloud. In this post, we’re excited to introduce two new features that […]
Orchestrate an end-to-end ETL pipeline using HAQM S3, AWS Glue, and HAQM Redshift Serverless with HAQM MWAA
HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a managed orchestration service for Apache Airflow that you can use to set up and operate data pipelines in the cloud at scale. Apache Airflow is an open source tool used to programmatically author, schedule, and monitor sequences of processes and tasks, referred to as workflows. […]
Dynamic DAG generation with YAML and DAG Factory in HAQM MWAA
HAQM Managed Workflow for Apache Airflow (HAQM MWAA) is a managed service that allows you to use a familiar Apache Airflow environment with improved scalability, availability, and security to enhance and scale your business workflows without the operational burden of managing the underlying infrastructure. In Airflow, Directed Acyclic Graphs (DAGs) are defined as Python code. […]