AWS Big Data Blog

Category: HAQM Managed Workflows for Apache Airflow (HAQM MWAA)

Introducing simplified interaction with the Airflow REST API in HAQM MWAA

Today, we are excited to announce an enhancement to the HAQM MWAA integration with the Airflow REST API. This improvement streamlines the ability to access and manage your Airflow environments and their integration with external systems, and allows you to interact with your workflows programmatically. The Airflow REST API facilitates a wide range of use cases, from centralizing and automating administrative tasks to building event-driven, data-aware data pipelines. In this post, we discuss the enhancement and present several use cases that the enhancement unlocks for your HAQM MWAA environment.

How ZS built a clinical knowledge repository for semantic search using HAQM OpenSearch Service and HAQM Neptune

In this blog post, we will highlight how ZS Associates used multiple AWS services to build a highly scalable, highly performant, clinical document search platform. This platform is an advanced information retrieval system engineered to assist healthcare professionals and researchers in navigating vast repositories of medical documents, medical literature, research articles, clinical guidelines, protocol documents, […]

How Kaplan, Inc. implemented modern data pipelines using HAQM MWAA and HAQM AppFlow with HAQM Redshift as a data warehouse

Kaplan, Inc. provides individuals, educational institutions, and businesses with a broad array of services, supporting our students and partners to meet their diverse and evolving needs throughout their educational and professional journeys. In this post, we discuss how the Kaplan data engineering team implemented data integration from the Salesforce application to HAQM Redshift. The solution uses HAQM Simple Storage Service as a data lake, HAQM Redshift as a data warehouse, HAQM Managed Workflows for Apache Airflow (HAQM MWAA) as an orchestrator, and Tableau as the presentation layer.

Optimize cost and performance for HAQM MWAA

HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a managed service for Apache Airflow that allows you to orchestrate data pipelines and workflows at scale. With HAQM MWAA, you can design Directed Acyclic Graphs (DAGs) that describe your workflows without managing the operational burden of scaling the infrastructure. In this post, we provide guidance […]

How HAQM GTTS runs large-scale ETL jobs on AWS using HAQM MWAA

The HAQM Global Transportation Technology Services (GTTS) team owns a set of products called INSITE (Insights Into Transportation Everywhere). These products are user-facing applications that solve specific business problems across different transportation domains: network topology management, capacity management, and network monitoring. As of this writing, GTTS serves around 10,000 customers globally on a monthly basis, […]

Integrate HAQM MWAA with Microsoft Entra ID using SAML authentication

HAQM Managed Workflows for Apache Airflow (HAQM MWAA) provides a fully managed solution for orchestrating and automating complex workflows in the cloud. HAQM MWAA offers two network access modes for accessing the Apache Airflow web UI in your environments: public and private. Customers often deploy HAQM MWAA in private mode and want to use existing […]

Migrate workloads from AWS Data Pipeline

After careful consideration, we have made the decision to close new customer access to AWS Data Pipeline, effective July 25, 2024. AWS Data Pipeline existing customers can continue to use the service as normal. AWS continues to invest in security, availability, and performance improvements for AWS Data Pipeline, but we do not plan to introduce […]

Introducing HAQM MWAA support for Apache Airflow version 2.9.2

HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a managed orchestration service for Apache Airflow that significantly improves security and availability, and reduces infrastructure management overhead when setting up and operating end-to-end data pipelines in the cloud. Today, we are announcing the availability of Apache Airflow version 2.9.2 environments on HAQM MWAA. Apache Airflow […]

Run Apache XTable on HAQM MWAA to translate open table formats

In this post, we show you how to get started with Apache XTable on AWS and how you can use it in a batch pipeline orchestrated with HAQM Managed Workflows for Apache Airflow (HAQM MWAA). To understand how XTable and similar solutions work, we start with a high-level background on metadata management in an OTF and then dive deeper into XTable and its usage.