AWS Big Data Blog
Category: Application Integration
Access private code repositories for installing Python dependencies on HAQM MWAA
This post demonstrates a method to selectively install Python dependencies based on the HAQM MWAA component type (web server scheduler, or worker) from a Git repository only accessible from your virtual private cloud (VPC).
Enrich your serverless data lake with HAQM Bedrock
Organizations are collecting and storing vast amounts of structured and unstructured data like reports, whitepapers, and research documents. By consolidating this information, analysts can discover and integrate data from across the organization, creating valuable data products based on a unified dataset. This post shows how to integrate HAQM Bedrock with the AWS Serverless Data Analytics Pipeline architecture using HAQM EventBridge, AWS Step Functions, and AWS Lambda to automate a wide range of data enrichment tasks in a cost-effective and scalable manner.
How ZS built a clinical knowledge repository for semantic search using HAQM OpenSearch Service and HAQM Neptune
In this blog post, we will highlight how ZS Associates used multiple AWS services to build a highly scalable, highly performant, clinical document search platform. This platform is an advanced information retrieval system engineered to assist healthcare professionals and researchers in navigating vast repositories of medical documents, medical literature, research articles, clinical guidelines, protocol documents, […]
How Kaplan, Inc. implemented modern data pipelines using HAQM MWAA and HAQM AppFlow with HAQM Redshift as a data warehouse
Kaplan, Inc. provides individuals, educational institutions, and businesses with a broad array of services, supporting our students and partners to meet their diverse and evolving needs throughout their educational and professional journeys. In this post, we discuss how the Kaplan data engineering team implemented data integration from the Salesforce application to HAQM Redshift. The solution uses HAQM Simple Storage Service as a data lake, HAQM Redshift as a data warehouse, HAQM Managed Workflows for Apache Airflow (HAQM MWAA) as an orchestrator, and Tableau as the presentation layer.
Optimize cost and performance for HAQM MWAA
HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a managed service for Apache Airflow that allows you to orchestrate data pipelines and workflows at scale. With HAQM MWAA, you can design Directed Acyclic Graphs (DAGs) that describe your workflows without managing the operational burden of scaling the infrastructure. In this post, we provide guidance […]
How HAQM GTTS runs large-scale ETL jobs on AWS using HAQM MWAA
The HAQM Global Transportation Technology Services (GTTS) team owns a set of products called INSITE (Insights Into Transportation Everywhere). These products are user-facing applications that solve specific business problems across different transportation domains: network topology management, capacity management, and network monitoring. As of this writing, GTTS serves around 10,000 customers globally on a monthly basis, […]
Integrate HAQM MWAA with Microsoft Entra ID using SAML authentication
HAQM Managed Workflows for Apache Airflow (HAQM MWAA) provides a fully managed solution for orchestrating and automating complex workflows in the cloud. HAQM MWAA offers two network access modes for accessing the Apache Airflow web UI in your environments: public and private. Customers often deploy HAQM MWAA in private mode and want to use existing […]
Migrate workloads from AWS Data Pipeline
After careful consideration, we have made the decision to close new customer access to AWS Data Pipeline, effective July 25, 2024. AWS Data Pipeline existing customers can continue to use the service as normal. AWS continues to invest in security, availability, and performance improvements for AWS Data Pipeline, but we do not plan to introduce […]
Introducing HAQM MWAA support for Apache Airflow version 2.9.2
HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a managed orchestration service for Apache Airflow that significantly improves security and availability, and reduces infrastructure management overhead when setting up and operating end-to-end data pipelines in the cloud. Today, we are announcing the availability of Apache Airflow version 2.9.2 environments on HAQM MWAA. Apache Airflow […]
Run Apache XTable on HAQM MWAA to translate open table formats
In this post, we show you how to get started with Apache XTable on AWS and how you can use it in a batch pipeline orchestrated with HAQM Managed Workflows for Apache Airflow (HAQM MWAA). To understand how XTable and similar solutions work, we start with a high-level background on metadata management in an OTF and then dive deeper into XTable and its usage.