AWS Big Data Blog

Unified scheduling for visual ETL flows and query books in HAQM SageMaker Unified Studio

Today, we’re excited to introduce a new unified scheduling feature that simplifies this process. SageMaker Unified Studio allows you to create ETL flows using a visual interface and write SQL analytics queries using query books. In this post, we walk through how to schedule your visual ETL flows and query books with just a few clicks, explore the underlying architecture, and demonstrate how this feature can streamline your data workflow automation.

How Flutter UKI optimizes data pipelines with AWS Managed Workflows for Apache Airflow

In this post, we share how Flutter UKI transitioned from a monolithic HAQM Elastic Compute Cloud (HAQM EC2)-based Airflow setup to a scalable and optimized HAQM Managed Workflows for Apache Airflow (HAQM MWAA) architecture using features like Kubernetes Pod Operator, continuous integration and delivery (CI/CD) integration, and performance optimization techniques.

How BMW Group built a serverless terabyte-scale data transformation architecture with dbt and HAQM Athena

At the BMW Group, our Cloud Efficiency Analytics (CLEA) team has developed a FinOps solution to optimize costs across over 10,000 cloud accounts This post explores our journey, from the initial challenges to our current architecture, and details the steps we took to achieve a highly efficient, serverless data transformation setup.

Best practices for least privilege configuration in HAQM MWAA

In this post, we explore how to apply the principle of least privilege to your HAQM MWAA environment by tightening network security using security groups, network access control lists (ACLs), and virtual private cloud (VPC) endpoints. We also discuss the HAQM MWAA execution and deployment roles and their respective permissions.

Access your existing data and resources through HAQM SageMaker Unified Studio, Part 1: AWS Glue Data Catalog and HAQM Redshift

This series of posts demonstrates how you can onboard and access existing AWS data sources using SageMaker Unified Studio. This post focuses on onboarding existing AWS Glue Data Catalog tables and database tables available in HAQM Redshift.

Access your existing data and resources through HAQM SageMaker Unified Studio, Part 2: HAQM S3, HAQM RDS, HAQM DynamoDB, and HAQM EMR

In this post we discuss integrating additional vital data sources such as HAQM Simple Storage Service (HAQM S3) buckets, HAQM Relational Database Service (HAQM RDS), HAQM DynamoDB, and HAQM EMR clusters. We demonstrate how to configure the necessary permissions, establish connections, and effectively use these resources within SageMaker Unified Studio. Whether you’re working with object storage, relational databases, NoSQL databases, or big data processing, this post can help you seamlessly incorporate your existing data infrastructure into your SageMaker Unified Studio workflows.

Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg

Natural Intelligence (NI) is a world leader in multi-category marketplaces. In this blog post, NI shares their journey, the innovative solutions developed, and the key takeaways that can guide other organizations considering a similar path. This article details NI’s practical approach to this complex migration, focusing less on Apache Iceberg’s technical specifications, but rather on the real-world challenges and solutions encountered during the transition to Apache Iceberg, a challenge that many organizations are grappling with.

HAQM SageMaker Lakehouse now supports attribute-based access control

HAQM SageMaker Lakehouse now supports attribute-based access control (ABAC) with AWS Lake Formation, using AWS Identity and Access Management (IAM) principals and session tags to simplify data access, grant creation, and maintenance. In this post, we demonstrate how to get started with SageMaker Lakehouse with ABAC.

Accelerate data pipeline creation with the new visual interface in HAQM OpenSearch Ingestion

Today, we’re launching a new visual interface for OpenSearch Ingestion that makes it simple to create and manage your data pipelines from the AWS Management Console. With this new feature, you can build pipelines in minutes without writing complex configurations manually. In this post, we walk through how these new features work and how you can use them to accelerate your data ingestion projects.

Read and write Apache Iceberg tables using AWS Lake Formation hybrid access mode

In this post, we demonstrate how to use Lake Formation for read access while continuing to use AWS Identity and Access Management (IAM) policy-based permissions for write workloads that update the schema and upsert (insert and update combined) data records into the Iceberg tables.