AWS Big Data Blog

Category: *Post Types

Design patterns for implementing Hive Metastore for HAQM EMR on EKS

In this post, we explore the design patterns for implementing the Hive Metastore (HMS) with EMR on EKS with Spark Operator, each offering distinct advantages depending on your requirements. Whether you choose to deploy HMS as a sidecar container within the Apache Spark Driver pod, or as a Kubernetes deployment in the data processing EKS cluster, or as an external HMS service in a separate EKS cluster, the key considerations revolve around communication efficiency, scalability, resource isolation, high availability, and security.

Governing streaming data in HAQM DataZone with the Data Solutions Framework on AWS

In this post, we explore how AWS customers can extend HAQM DataZone to support streaming data such as HAQM Managed Streaming for Apache Kafka (HAQM MSK) topics. Developers and DevOps managers can use HAQM MSK, a popular streaming data service, to run Kafka applications and Kafka Connect connectors on AWS without becoming experts in operating it.

Top analytics announcements of AWS re:Invent 2024

AWS re:Invent 2024, the flagship annual conference, took place December 2–6, 2024, in Las Vegas, bringing together thousands of cloud enthusiasts, innovators, and industry leaders from around the globe. Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. In this post, we walk you through the top analytics announcements from re:Invent 2024 and explore how these innovations can help you unlock the full potential of your data.

Building and operating data pipelines at scale using CI/CD, HAQM MWAA and Apache Spark on HAQM EMR by Wipro

This blog post discusses how a programmatic data processing framework developed by Wipro can help data engineers overcome obstacles and streamline their organization’s ETL processes. The framework leverages HAQM EMR improved runtime for Apache Spark and integrates with AWS Managed services.

Enhance your workload resilience with new HAQM EMR instance fleet features

HAQM EMR has introduced new features for instance fleets that address critical challenges in big data operations. This post explores how these innovations improve cluster resilience, scalability, and efficiency, enabling you to build more robust data processing architectures on AWS.

HAQM Redshift announces history mode for zero-ETL integrations to simplify historical data tracking and analysis

This post will explore brief history of zero-ETL, its importance for customers, and introduce an exciting new feature: history mode for HAQM Aurora PostgreSQL-Compatible Edition, HAQM Aurora MySQL-Compatible Edition, HAQM Relational Database Service (HAQM RDS) for MySQL, and HAQM DynamoDB zero-ETL integration with HAQM Redshift.

Streamline AWS WAF log analysis with Apache Iceberg and HAQM Data Firehose

In this post, we demonstrate how to build a scalable AWS WAF log analysis solution using Firehose and Apache Iceberg. Firehose simplifies the entire process—from log ingestion to storage—by allowing you to configure a delivery stream that delivers AWS WAF logs directly to Apache Iceberg tables in HAQM S3. The solution requires no infrastructure setup and you pay only for the data you process.

Migrate from Standard brokers to Express brokers in HAQM MSK using HAQM MSK Replicator

Creating a new cluster with Express brokers is straightforward, as described in HAQM MSK Express brokers. However, if you have an existing MSK cluster, you need to migrate to a new Express based cluster. In this post, we discuss how you should plan and perform the migration to Express brokers for your existing MSK workloads on Standard brokers. Express brokers offer a different user experience and a different shared responsibility boundary, so using them on an existing cluster is not possible. However, you can use HAQM MSK Replicator to copy all data and metadata from your existing MSK cluster to a new cluster comprising of Express brokers.

foundational planes

Foundational blocks of HAQM SageMaker Unified Studio: An admin’s guide to implement unified access to all your data, analytics, and AI

In this post, we discuss the foundational building blocks of SageMaker Unified Studio and how, by abstracting complex technical implementations behind user-friendly interfaces, organizations can maintain standardized governance while enabling efficient resource management across business units. This approach provides consistency in infrastructure deployment while providing the flexibility needed for diverse business requirements.