AWS Big Data Blog

HAQM OpenSearch Service launches flow builder to empower rapid AI search innovation

The AI search flow builder is available in all AWS Regions that support OpenSearch 2.19+ on OpenSearch Service. In this post, we walk through a couple of scenarios to demonstrate the flow builder. First, we’ll enable semantic search on your old keyword-based OpenSearch application without client-side code changes. Next, we’ll create a multi-modal RAG flow, to showcase how you can redefine image discovery within your applications.

Build end-to-end Apache Spark pipelines with HAQM MWAA, Batch Processing Gateway, and HAQM EMR on EKS clusters

This post shows how to enhance the multi-cluster solution by integrating HAQM Managed Workflows for Apache Airflow (HAQM MWAA) with BPG. By using HAQM MWAA, we add job scheduling and orchestration capabilities, enabling you to build a comprehensive end-to-end Spark-based data processing pipeline.

Unified scheduling for visual ETL flows and query books in HAQM SageMaker Unified Studio

Today, we’re excited to introduce a new unified scheduling feature that simplifies this process. SageMaker Unified Studio allows you to create ETL flows using a visual interface and write SQL analytics queries using query books. In this post, we walk through how to schedule your visual ETL flows and query books with just a few clicks, explore the underlying architecture, and demonstrate how this feature can streamline your data workflow automation.

How Flutter UKI optimizes data pipelines with AWS Managed Workflows for Apache Airflow

In this post, we share how Flutter UKI transitioned from a monolithic HAQM Elastic Compute Cloud (HAQM EC2)-based Airflow setup to a scalable and optimized HAQM Managed Workflows for Apache Airflow (HAQM MWAA) architecture using features like Kubernetes Pod Operator, continuous integration and delivery (CI/CD) integration, and performance optimization techniques.

How BMW Group built a serverless terabyte-scale data transformation architecture with dbt and HAQM Athena

At the BMW Group, our Cloud Efficiency Analytics (CLEA) team has developed a FinOps solution to optimize costs across over 10,000 cloud accounts This post explores our journey, from the initial challenges to our current architecture, and details the steps we took to achieve a highly efficient, serverless data transformation setup.

Best practices for least privilege configuration in HAQM MWAA

In this post, we explore how to apply the principle of least privilege to your HAQM MWAA environment by tightening network security using security groups, network access control lists (ACLs), and virtual private cloud (VPC) endpoints. We also discuss the HAQM MWAA execution and deployment roles and their respective permissions.

Access your existing data and resources through HAQM SageMaker Unified Studio, Part 1: AWS Glue Data Catalog and HAQM Redshift

This series of posts demonstrates how you can onboard and access existing AWS data sources using SageMaker Unified Studio. This post focuses on onboarding existing AWS Glue Data Catalog tables and database tables available in HAQM Redshift.

Access your existing data and resources through HAQM SageMaker Unified Studio, Part 2: HAQM S3, HAQM RDS, HAQM DynamoDB, and HAQM EMR

In this post we discuss integrating additional vital data sources such as HAQM Simple Storage Service (HAQM S3) buckets, HAQM Relational Database Service (HAQM RDS), HAQM DynamoDB, and HAQM EMR clusters. We demonstrate how to configure the necessary permissions, establish connections, and effectively use these resources within SageMaker Unified Studio. Whether you’re working with object storage, relational databases, NoSQL databases, or big data processing, this post can help you seamlessly incorporate your existing data infrastructure into your SageMaker Unified Studio workflows.

Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg

Natural Intelligence (NI) is a world leader in multi-category marketplaces. In this blog post, NI shares their journey, the innovative solutions developed, and the key takeaways that can guide other organizations considering a similar path. This article details NI’s practical approach to this complex migration, focusing less on Apache Iceberg’s technical specifications, but rather on the real-world challenges and solutions encountered during the transition to Apache Iceberg, a challenge that many organizations are grappling with.

HAQM SageMaker Lakehouse now supports attribute-based access control

HAQM SageMaker Lakehouse now supports attribute-based access control (ABAC) with AWS Lake Formation, using AWS Identity and Access Management (IAM) principals and session tags to simplify data access, grant creation, and maintenance. In this post, we demonstrate how to get started with SageMaker Lakehouse with ABAC.