AWS Big Data Blog

Category: Best Practices

Embracing event driven architecture to enhance resilience of data solutions built on HAQM SageMaker

This post provides guidance on how you can use event driven architecture to enhance the resiliency of data solutions built on the next generation of HAQM SageMaker, a unified platform for data, analytics, and AI. SageMaker is a managed service with high availability and durability.

Architecture patterns to optimize HAQM Redshift performance at scale

In this post, we will show you five HAQM Redshift architecture patterns that you can consider to optimize your HAQM Redshift data warehouse performance at scale using features such as HAQM Redshift Serverless, HAQM Redshift data sharing, HAQM Redshift Spectrum, zero-ETL integrations, and HAQM Redshift streaming ingestion.

Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with HAQM EMR Serverless

In this post, we share how AppsFlyer successfully migrated their massive data infrastructure from self-managed Hadoop clusters to HAQM EMR Serverless, detailing their best practices, challenges to overcome, and lessons learned that can help guide other organizations in similar transformations.

Introducing HAQM Q Developer in HAQM OpenSearch Service

today we introduced HAQM Q Developer support in OpenSearch Service. With this AI-assisted analysis, both new and experienced users can navigate complex operational data without training, analyze issues, and gain insights in a fraction of the time. In this post, we share how to get started using HAQM Q Developer in OpenSearch Service and explore some of its key capabilities.

Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg

Natural Intelligence (NI) is a world leader in multi-category marketplaces. In this blog post, NI shares their journey, the innovative solutions developed, and the key takeaways that can guide other organizations considering a similar path. This article details NI’s practical approach to this complex migration, focusing less on Apache Iceberg’s technical specifications, but rather on the real-world challenges and solutions encountered during the transition to Apache Iceberg, a challenge that many organizations are grappling with.

Architect fault-tolerant applications with instance fleets on HAQM EMR on EC2

In this post, we show how to optimize capacity by analyzing EMR workloads and implementing strategies tailored to your workload patterns. We walk through assessing the historical compute usage of a workload and use a combination of strategies to reduce the likelihood of InsufficientCapacityExceptions (ICE) when HAQM EMR launches specific EC2 instance types. We implement flexible instance fleet strategies to reduce dependency on specific instance types and use HAQM EC2 On-Demand Capacity Reservation (ODCRs) for predictable, steady-state workloads. Following this approach can help prevent job failures due to capacity limits while optimizing your cluster for cost and performance.

Design patterns for implementing Hive Metastore for HAQM EMR on EKS

In this post, we explore the design patterns for implementing the Hive Metastore (HMS) with EMR on EKS with Spark Operator, each offering distinct advantages depending on your requirements. Whether you choose to deploy HMS as a sidecar container within the Apache Spark Driver pod, or as a Kubernetes deployment in the data processing EKS cluster, or as an external HMS service in a separate EKS cluster, the key considerations revolve around communication efficiency, scalability, resource isolation, high availability, and security.