AWS Big Data Blog
Category: HAQM Managed Workflows for Apache Airflow (HAQM MWAA)
Bolster security with role-based access control in HAQM MWAA
HAQM Studios invests in content that drives global growth of HAQM Prime Video and IMDb TV. HAQM Studios has a number of internal-facing applications that aim to streamline end-to-end business processes and information workflows for the entire content creation lifecycle. The HAQM Studios Data Infrastructure (ASDI) is a centralized, curated, and secure data lake that […]
Manage and process your big data workflows with HAQM MWAA and HAQM EMR on HAQM EKS
Many customers are gathering large amount of data, generated from different sources such as IoT devices, clickstream events from websites, and more. To efficiently extract insights from the data, you have to perform various transformations and apply different business logic on your data. These processes require complex workflow management to schedule jobs and manage dependencies […]
Orchestrate AWS Glue DataBrew jobs using HAQM Managed Workflows for Apache Airflow
As the industry grows with more data volume, big data analytics is becoming a common requirement in data analytics and machine learning (ML) use cases. Analysts are building complex data transformation pipelines that include multiple steps for data preparation and cleansing. However, analysts may want a simpler orchestration mechanism with a graphical user interface that […]
Orchestrating analytics jobs on HAQM EMR Notebooks using HAQM MWAA
May 2024: This post was reviewed and updated with a new dataset. In a previous post, we introduced the HAQM EMR notebook APIs, which allow you to programmatically run a notebook on HAQM EMR Studio (preview) without accessing the AWS web console. With the APIs, you can schedule running EMR notebooks with cron scripts, chain multiple notebooks, […]
Building complex workflows with HAQM MWAA, AWS Step Functions, AWS Glue, and HAQM EMR
HAQM Managed Workflows for Apache Airflow (HAQM MWAA) is a fully managed service that makes it easy to run open-source versions of Apache Airflow on AWS and build workflows to run your extract, transform, and load (ETL) jobs and data pipelines. You can use AWS Step Functions as a serverless function orchestrator to build scalable […]