AWS Big Data Blog
Category: Application Integration
Disaster recovery strategies for HAQM MWAA – Part 1
In the dynamic world of cloud computing, ensuring the resilience and availability of critical applications is paramount. Disaster recovery (DR) is the process by which an organization anticipates and addresses technology-related disasters. For organizations implementing critical workload orchestration using HAQM Managed Workflows for Apache Airflow (HAQM MWAA), it is crucial to have a DR plan […]
Enable metric-based and scheduled scaling for HAQM Managed Service for Apache Flink
Thousands of developers use Apache Flink to build streaming applications to transform and analyze data in real time. Apache Flink is an open source framework and engine for processing data streams. It’s highly available and scalable, delivering high throughput and low latency for the most demanding stream-processing applications. Monitoring and scaling your applications is critical […]
Orchestrate HAQM EMR Serverless Spark jobs with HAQM MWAA, and data validation using HAQM Athena
As data engineering becomes increasingly complex, organizations are looking for new ways to streamline their data processing workflows. Many data engineers today use Apache Airflow to build, schedule, and monitor their data pipelines. However, as the volume of data grows, managing and scaling these pipelines can become a daunting task. HAQM Managed Workflows for Apache […]
Introducing shared VPC support on HAQM MWAA
In this post, we demonstrate automating deployment of HAQM Managed Workflows for Apache Airflow (HAQM MWAA) using customer-managed endpoints in a VPC, providing compatibility with shared, or otherwise restricted, VPCs. Data scientists and engineers have made Apache Airflow a leading open source tool to create data pipelines due to its active open source community, familiar […]
Introducing HAQM MWAA support for Apache Airflow version 2.7.2 and deferrable operators
Today, we are announcing the availability of Apache Airflow version 2.7.2 environments and support for deferrable operators on HAQM MWAA. In this post, we provide an overview of deferrable operators and triggers, including a walkthrough of an example showcasing how to use them. We also delve into some of the new features and capabilities of Apache Airflow, and how you can set up or upgrade your HAQM MWAA environment to version 2.7.2.
Use Snowflake with HAQM MWAA to orchestrate data pipelines
This blog post is co-written with James Sun from Snowflake. Customers rely on data from different sources such as mobile applications, clickstream events from websites, historical data, and more to deduce meaningful patterns to optimize their products, services, and processes. With a data pipeline, which is a set of tasks used to automate the movement […]
Simplify data transfer: Google BigQuery to HAQM S3 using HAQM AppFlow
In today’s data-driven world, the ability to effortlessly move and analyze data across diverse platforms is essential. HAQM AppFlow, a fully managed data integration service, has been at the forefront of streamlining data transfer between AWS services, software as a service (SaaS) applications, and now Google BigQuery. In this blog post, you explore the new Google BigQuery connector in HAQM AppFlow and discover how it simplifies the process of transferring data from Google’s data warehouse to HAQM Simple Storage Service (HAQM S3), providing significant benefits for data professionals and organizations, including the democratization of multi-cloud data access.
Build event-driven architectures with HAQM MSK and HAQM EventBridge
Based on immutable facts (events), event-driven architectures (EDAs) allow businesses to gain deeper insights into their customers’ behavior, unlocking more accurate and faster decision-making processes that lead to better customer experiences. In EDAs, modern event brokers, such as HAQM EventBridge and Apache Kafka, play a key role to publish and subscribe to events. EventBridge is […]
Set up fine-grained permissions for your data pipeline using MWAA and EKS
This blog post shows how to improve security in a data pipeline architecture based on HAQM Managed Workflows for Apache Airflow (HAQM MWAA) and HAQM Elastic Kubernetes Service (HAQM EKS) by setting up fine-grained permissions, using HashiCorp Terraform for infrastructure as code.
Simplify operational data processing in data lakes using AWS Glue and Apache Hudi
AWS has invested in native service integration with Apache Hudi and published technical contents to enable you to use Apache Hudi with AWS Glue (for example, refer to Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 1: Getting Started). In AWS ProServe-led customer engagements, the use cases we work on usually come with technical complexity and scalability requirements. In this post, we discuss a common use case in relation to operational data processing and the solution we built using Apache Hudi and AWS Glue.