AWS Big Data Blog

Category: Analytics

Governing streaming data in HAQM DataZone with the Data Solutions Framework on AWS

In this post, we explore how AWS customers can extend HAQM DataZone to support streaming data such as HAQM Managed Streaming for Apache Kafka (HAQM MSK) topics. Developers and DevOps managers can use HAQM MSK, a popular streaming data service, to run Kafka applications and Kafka Connect connectors on AWS without becoming experts in operating it.

Top analytics announcements of AWS re:Invent 2024

AWS re:Invent 2024, the flagship annual conference, took place December 2–6, 2024, in Las Vegas, bringing together thousands of cloud enthusiasts, innovators, and industry leaders from around the globe. Analytics remained one of the key focus areas this year, with significant updates and innovations aimed at helping businesses harness their data more efficiently and accelerate insights. In this post, we walk you through the top analytics announcements from re:Invent 2024 and explore how these innovations can help you unlock the full potential of your data.

HAQM Web Services named a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools

HAQM Web Services (AWS) has been recognized as a Leader in the 2024 Gartner Magic Quadrant for Data Integration Tools. We were positioned in the Challengers Quadrant in 2023. This recognition, we feel, reflects our ongoing commitment to innovation and excellence in data integration, demonstrating our continued progress in providing comprehensive data management solutions.

Building and operating data pipelines at scale using CI/CD, HAQM MWAA and Apache Spark on HAQM EMR by Wipro

This blog post discusses how a programmatic data processing framework developed by Wipro can help data engineers overcome obstacles and streamline their organization’s ETL processes. The framework leverages HAQM EMR improved runtime for Apache Spark and integrates with AWS Managed services.

Supercharge your RAG applications with HAQM OpenSearch Service and Aryn DocParse

In this post, we demonstrate how to use HAQM OpenSearch Service with purpose-built document ETL tools, Aryn DocParse and Sycamore, to quickly build a RAG application that relies on complex documents. We use over 75 PDF reports from the National Transportation Safety Board (NTSB) about aircraft incidents. You can refer to the following example document from the collection. As you can see, these documents are complex, containing tables, images, section headings, and complicated layouts.

Improve search results for AI using HAQM OpenSearch Service as a vector database with HAQM Bedrock

In this post, you’ll learn how to use OpenSearch Service and HAQM Bedrock to build AI-powered search and generative AI applications. You’ll learn about how AI-powered search systems employ foundation models (FMs) to capture and search context and meaning across text, images, audio, and video, delivering more accurate results to users. You’ll learn how generative AI systems use these search results to create original responses to questions, supporting interactive conversations between humans and machines.

Enhance your workload resilience with new HAQM EMR instance fleet features

HAQM EMR has introduced new features for instance fleets that address critical challenges in big data operations. This post explores how these innovations improve cluster resilience, scalability, and efficiency, enabling you to build more robust data processing architectures on AWS.

HAQM Redshift announces history mode for zero-ETL integrations to simplify historical data tracking and analysis

This post will explore brief history of zero-ETL, its importance for customers, and introduce an exciting new feature: history mode for HAQM Aurora PostgreSQL-Compatible Edition, HAQM Aurora MySQL-Compatible Edition, HAQM Relational Database Service (HAQM RDS) for MySQL, and HAQM DynamoDB zero-ETL integration with HAQM Redshift.

Streamline AWS WAF log analysis with Apache Iceberg and HAQM Data Firehose

In this post, we demonstrate how to build a scalable AWS WAF log analysis solution using Firehose and Apache Iceberg. Firehose simplifies the entire process—from log ingestion to storage—by allowing you to configure a delivery stream that delivers AWS WAF logs directly to Apache Iceberg tables in HAQM S3. The solution requires no infrastructure setup and you pay only for the data you process.