AWS Big Data Blog

Category: HAQM Managed Workflows for Apache Airflow (HAQM MWAA)

Build unified pipelines spanning multiple AWS accounts and Regions with HAQM MWAA

In this blog post, we demonstrate how to use HAQM MWAA for centralized orchestration, while distributing data processing and machine learning tasks across different AWS accounts and Regions for optimal performance and compliance.

HAQM Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

HAQM Web Services (AWS) has been recognized as a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools. We were positioned in the Challengers Quadrant in 2023. This recognition, we feel, reflects our ongoing commitment to innovation and excellence in data integration, demonstrating our continued progress in providing comprehensive data management solutions.

Building and operating data pipelines at scale using CI/CD, HAQM MWAA and Apache Spark on HAQM EMR by Wipro

This blog post discusses how a programmatic data processing framework developed by Wipro can help data engineers overcome obstacles and streamline their organization’s ETL processes. The framework leverages HAQM EMR improved runtime for Apache Spark and integrates with AWS Managed services.

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

ANZ Institutional Division has transformed its data management approach by implementing a federated data platform based on data mesh principles. This shift aims to unlock untapped data potential, improve operational efficiency, and increase agility. The new strategy empowers domain teams to create and manage their own data products, treating data as a valuable asset rather than a byproduct. This post explores how the shift to a data product mindset is being implemented, the challenges faced, and the early wins that are shaping the future of data management in the Institutional Division.

Introducing HAQM MWAA micro environments for Apache Airflow

Today, we’re excited to announce mw1.micro, the latest addition to HAQM MWAA environment classes. This offering is designed to provide an even more cost-effective solution for running Airflow environments in the cloud. With mw1.micro, we’re bringing the power of HAQM MWAA to teams who require a lightweight environment without compromising on essential features. In this post, we’ll explore mw1.micro characteristics, key benefits, ideal use cases, and how you can set up an HAQM MWAA environment based on this new environment class.

Introducing simplified interaction with the Airflow REST API in HAQM MWAA

Today, we are excited to announce an enhancement to the HAQM MWAA integration with the Airflow REST API. This improvement streamlines the ability to access and manage your Airflow environments and their integration with external systems, and allows you to interact with your workflows programmatically. The Airflow REST API facilitates a wide range of use cases, from centralizing and automating administrative tasks to building event-driven, data-aware data pipelines. In this post, we discuss the enhancement and present several use cases that the enhancement unlocks for your HAQM MWAA environment.

How ZS built a clinical knowledge repository for semantic search using HAQM OpenSearch Service and HAQM Neptune

In this blog post, we will highlight how ZS Associates used multiple AWS services to build a highly scalable, highly performant, clinical document search platform. This platform is an advanced information retrieval system engineered to assist healthcare professionals and researchers in navigating vast repositories of medical documents, medical literature, research articles, clinical guidelines, protocol documents, […]

How Kaplan, Inc. implemented modern data pipelines using HAQM MWAA and HAQM AppFlow with HAQM Redshift as a data warehouse

Kaplan, Inc. provides individuals, educational institutions, and businesses with a broad array of services, supporting our students and partners to meet their diverse and evolving needs throughout their educational and professional journeys. In this post, we discuss how the Kaplan data engineering team implemented data integration from the Salesforce application to HAQM Redshift. The solution uses HAQM Simple Storage Service as a data lake, HAQM Redshift as a data warehouse, HAQM Managed Workflows for Apache Airflow (HAQM MWAA) as an orchestrator, and Tableau as the presentation layer.

Optimize cost and performance for HAQM MWAA

HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a managed service for Apache Airflow that allows you to orchestrate data pipelines and workflows at scale. With HAQM MWAA, you can design Directed Acyclic Graphs (DAGs) that describe your workflows without managing the operational burden of scaling the infrastructure. In this post, we provide guidance […]