AWS Big Data Blog
Category: Best Practices
Embracing event driven architecture to enhance resilience of data solutions built on HAQM SageMaker
This post provides guidance on how you can use event driven architecture to enhance the resiliency of data solutions built on the next generation of HAQM SageMaker, a unified platform for data, analytics, and AI. SageMaker is a managed service with high availability and durability.
Architecture patterns to optimize HAQM Redshift performance at scale
In this post, we will show you five HAQM Redshift architecture patterns that you can consider to optimize your HAQM Redshift data warehouse performance at scale using features such as HAQM Redshift Serverless, HAQM Redshift data sharing, HAQM Redshift Spectrum, zero-ETL integrations, and HAQM Redshift streaming ingestion.
Best practices for upgrading HAQM MWAA environments
In this post, we explore best practices for upgrading your HAQM MWAA environment and provide a step-by-step guide to seamlessly transition to the latest version.
Enhancing data durability in HAQM EMR HBase on HAQM S3 with the HAQM EMR WAL feature
In this post, we dive deep into the new HAQM EMR WAL feature to help you understand how it works, how it enhances durability, and why it’s needed. We explore several scenarios that are well-suited for this feature.
Petabyte-scale data migration made simple: AppsFlyer’s best practice journey with HAQM EMR Serverless
In this post, we share how AppsFlyer successfully migrated their massive data infrastructure from self-managed Hadoop clusters to HAQM EMR Serverless, detailing their best practices, challenges to overcome, and lessons learned that can help guide other organizations in similar transformations.
Introducing HAQM Q Developer in HAQM OpenSearch Service
today we introduced HAQM Q Developer support in OpenSearch Service. With this AI-assisted analysis, both new and experienced users can navigate complex operational data without training, analyze issues, and gain insights in a fraction of the time. In this post, we share how to get started using HAQM Q Developer in OpenSearch Service and explore some of its key capabilities.
Melting the ice — How Natural Intelligence simplified a data lake migration to Apache Iceberg
Natural Intelligence (NI) is a world leader in multi-category marketplaces. In this blog post, NI shares their journey, the innovative solutions developed, and the key takeaways that can guide other organizations considering a similar path. This article details NI’s practical approach to this complex migration, focusing less on Apache Iceberg’s technical specifications, but rather on the real-world challenges and solutions encountered during the transition to Apache Iceberg, a challenge that many organizations are grappling with.
Enhance Agentforce data security with Private Connect for Salesforce Data Cloud and HAQM Redshift – Part 3
In this post, we discuss how to create AWS endpoint services to improve data security with Private Connect for Salesforce Data Cloud.
Architect fault-tolerant applications with instance fleets on HAQM EMR on EC2
In this post, we show how to optimize capacity by analyzing EMR workloads and implementing strategies tailored to your workload patterns. We walk through assessing the historical compute usage of a workload and use a combination of strategies to reduce the likelihood of InsufficientCapacityExceptions (ICE) when HAQM EMR launches specific EC2 instance types. We implement flexible instance fleet strategies to reduce dependency on specific instance types and use HAQM EC2 On-Demand Capacity Reservation (ODCRs) for predictable, steady-state workloads. Following this approach can help prevent job failures due to capacity limits while optimizing your cluster for cost and performance.
Design patterns for implementing Hive Metastore for HAQM EMR on EKS
In this post, we explore the design patterns for implementing the Hive Metastore (HMS) with EMR on EKS with Spark Operator, each offering distinct advantages depending on your requirements. Whether you choose to deploy HMS as a sidecar container within the Apache Spark Driver pod, or as a Kubernetes deployment in the data processing EKS cluster, or as an external HMS service in a separate EKS cluster, the key considerations revolve around communication efficiency, scalability, resource isolation, high availability, and security.