AWS Big Data Blog
Design patterns for implementing Hive Metastore for HAQM EMR on EKS
In this post, we explore the design patterns for implementing the Hive Metastore (HMS) with EMR on EKS with Spark Operator, each offering distinct advantages depending on your requirements. Whether you choose to deploy HMS as a sidecar container within the Apache Spark Driver pod, or as a Kubernetes deployment in the data processing EKS cluster, or as an external HMS service in a separate EKS cluster, the key considerations revolve around communication efficiency, scalability, resource isolation, high availability, and security.
Governing streaming data in HAQM DataZone with the Data Solutions Framework on AWS
In this post, we explore how AWS customers can extend HAQM DataZone to support streaming data such as HAQM Managed Streaming for Apache Kafka (HAQM MSK) topics. Developers and DevOps managers can use HAQM MSK, a popular streaming data service, to run Kafka applications and Kafka Connect connectors on AWS without becoming experts in operating it.
HAQM Prime Video advances search for sports using HAQM OpenSearch Service
In this post, we will walk you through how Prime Video used HAQM OpenSearch Service and its AI and machine learning (AI/ML) capabilities to build a more intuitive and enhanced sports search experience.
Top analytics announcements of AWS re:Invent 2024
AWS re:Invent 2024, the flagship annual conference, took place December 2–6, 2024, in Las Vegas, bringing together thousands of cloud enthusiasts, innovators, and industry leaders from around the globe. Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. In this post, we walk you through the top analytics announcements from re:Invent 2024 and explore how these innovations can help you unlock the full potential of your data.
HAQM Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools
HAQM Web Services (AWS) has been recognized as a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools. We were positioned in the Challengers Quadrant in 2023. This recognition, we feel, reflects our ongoing commitment to innovation and excellence in data integration, demonstrating our continued progress in providing comprehensive data management solutions.
Building and operating data pipelines at scale using CI/CD, HAQM MWAA and Apache Spark on HAQM EMR by Wipro
This blog post discusses how a programmatic data processing framework developed by Wipro can help data engineers overcome obstacles and streamline their organization’s ETL processes. The framework leverages HAQM EMR improved runtime for Apache Spark and integrates with AWS Managed services.
Supercharge your RAG applications with HAQM OpenSearch Service and Aryn DocParse
In this post, we demonstrate how to use HAQM OpenSearch Service with purpose-built document ETL tools, Aryn DocParse and Sycamore, to quickly build a RAG application that relies on complex documents. We use over 75 PDF reports from the National Transportation Safety Board (NTSB) about aircraft incidents. You can refer to the following example document from the collection. As you can see, these documents are complex, containing tables, images, section headings, and complicated layouts.
Improve search results for AI using HAQM OpenSearch Service as a vector database with HAQM Bedrock
In this post, you’ll learn how to use OpenSearch Service and HAQM Bedrock to build AI-powered search and generative AI applications. You’ll learn about how AI-powered search systems employ foundation models (FMs) to capture and search context and meaning across text, images, audio, and video, delivering more accurate results to users. You’ll learn how generative AI systems use these search results to create original responses to questions, supporting interactive conversations between humans and machines.
Enhance your workload resilience with new HAQM EMR instance fleet features
HAQM EMR has introduced new features for instance fleets that address critical challenges in big data operations. This post explores how these innovations improve cluster resilience, scalability, and efficiency, enabling you to build more robust data processing architectures on AWS.
HAQM Redshift announces history mode for zero-ETL integrations to simplify historical data tracking and analysis
This post will explore brief history of zero-ETL, its importance for customers, and introduce an exciting new feature: history mode for HAQM Aurora PostgreSQL-Compatible Edition, HAQM Aurora MySQL-Compatible Edition, HAQM Relational Database Service (HAQM RDS) for MySQL, and HAQM DynamoDB zero-ETL integration with HAQM Redshift.