AWS Big Data Blog

Category: HAQM Managed Workflows for Apache Airflow (HAQM MWAA)

Bolster security with role-based access control in HAQM MWAA

HAQM Studios invests in content that drives global growth of HAQM Prime Video and IMDb TV. HAQM Studios has a number of internal-facing applications that aim to streamline end-to-end business processes and information workflows for the entire content creation lifecycle. The HAQM Studios Data Infrastructure (ASDI) is a centralized, curated, and secure data lake that […]

Manage and process your big data workflows with HAQM MWAA and HAQM EMR on HAQM EKS

Many customers are gathering large amount of data, generated from different sources such as IoT devices, clickstream events from websites, and more. To efficiently extract insights from the data, you have to perform various transformations and apply different business logic on your data. These processes require complex workflow management to schedule jobs and manage dependencies […]

Orchestrate AWS Glue DataBrew jobs using HAQM Managed Workflows for Apache Airflow

As the industry grows with more data volume, big data analytics is becoming a common requirement in data analytics and machine learning (ML) use cases. Analysts are building complex data transformation pipelines that include multiple steps for data preparation and cleansing. However, analysts may want a simpler orchestration mechanism with a graphical user interface that […]

The following diagram illustrates the workflow.

Orchestrating analytics jobs on HAQM EMR Notebooks using HAQM MWAA

May 2024: This post was reviewed and updated with a new dataset. In a previous post, we introduced the HAQM EMR notebook APIs, which allow you to programmatically run a notebook on HAQM EMR Studio (preview) without accessing the AWS web console. With the APIs, you can schedule running EMR notebooks with cron scripts, chain multiple notebooks, […]

The state machine transforms data using AWS Glue.

Building complex workflows with HAQM MWAA, AWS Step Functions, AWS Glue, and HAQM EMR

HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a fully managed service that makes it easy to run open-source versions of Apache Airflow on AWS and build workflows to run your extract, transform, and load (ETL) jobs and data pipelines. You can use AWS Step Functions as a serverless function orchestrator to build scalable […]