AWS Big Data Blog

Category: Analytics

Architecture Overview

Build a real-time streaming generative AI application using HAQM Bedrock, HAQM Managed Service for Apache Flink, and HAQM Kinesis Data Streams

Data streaming enables generative AI to take advantage of real-time data and provide businesses with rapid insights. This post looks at how to integrate generative AI capabilities when implementing a streaming architecture on AWS using managed services such as Managed Service for Apache Flink and HAQM Kinesis Data Streams for processing streaming data and HAQM Bedrock to utilize generative AI capabilities. We include a reference architecture and a step-by-step guide on infrastructure setup and sample code for implementing the solution with the AWS Cloud Development Kit (AWS CDK). You can find the code to try it out yourself on the GitHub repo.

HAQM DataZone announces custom blueprints for AWS services

Last week, we announced the general availability of custom AWS service blueprints, a new feature in HAQM DataZone allowing you to customize your HAQM DataZone project environments to use existing AWS Identity and Access Management (IAM) roles and AWS services to embed the service into your existing processes. In this post, we share how this […]

Access HAQM Redshift data from Salesforce Data Cloud with Zero Copy Data Federation

This post is co-authored by Vijay Gopalakrishnan, Director of Product, Salesforce Data Cloud. In today’s data-driven business landscape, organizations collect a wealth of data across various touch points and unify it in a central data warehouse or a data lake to deliver business insights. This data is primarily used for analytical and machine learning purposes, […]

Perform reindexing in HAQM OpenSearch Serverless using HAQM OpenSearch Ingestion

In this post, we outline the steps to copy data between two indexes in the same OpenSearch Serverless collection using the new OpenSearch source feature of OpenSearch Ingestion. This is particularly useful for reindexing operations where you want to change your data schema. OpenSearch Serverless and OpenSearch Ingestion are both serverless services that enable you to seamlessly handle your data workflows, providing optimal performance and scalability.

Uncover social media insights in real time using HAQM Managed Service for Apache Flink and HAQM Bedrock

This post takes a step-by-step approach to showcase how you can use Retrieval Augmented Generation (RAG) to reference real-time tweets as a context for large language models (LLMs). RAG is the process of optimizing the output of an LLM so it references an authoritative knowledge base outside of its training data sources before generating a response. LLMs are trained on vast volumes of data and use billions of parameters to generate original output for tasks such as answering questions, translating languages, and completing sentences.

Configure a custom domain name for your HAQM MSK cluster

HAQM Managed Streaming for Kafka (HAQM MSK) is a fully managed service that enables you to build and run applications that use Apache Kafka to process streaming data. It runs open-source versions of Apache Kafka. This means existing applications, tooling, and plugins from partners and the Apache Kafka community are supported without requiring changes to […]

Run Apache Spark 3.5.1 workloads 4.5 times faster with HAQM EMR runtime for Apache Spark

The HAQM EMR runtime for Apache Spark is a performance-optimized runtime that is 100% API compatible with open source Apache Spark. It offers faster out-of-the-box performance than Apache Spark through improved query plans, faster queries, and tuned defaults. HAQM EMR on EC2, HAQM EMR Serverless, HAQM EMR on HAQM EKS, and HAQM EMR on AWS […]

Image showing multiple producers and consumers each publishing to a stream-per-tenant

Stream multi-tenant data with HAQM MSK

AWS helps SaaS vendors by providing the building blocks needed to implement a streaming application with HAQM Kinesis Data Streams and HAQM Managed Streaming for Apache Kafka (HAQM MSK), and real-time processing applications with HAQM Managed Service for Apache Flink. In this post, we look at implementation patterns a SaaS vendor can adopt when using a streaming platform as a means of integration between internal components, where streaming data is not directly exposed to third parties. In particular, we focus on HAQM MSK.

Apply fine-grained access and transformation on the SUPER data type in HAQM Redshift

HAQM Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing ETL (extract, transform, and load), business intelligence (BI), and reporting tools. Tens of thousands of customers use HAQM Redshift to process exabytes of data per […]

Build multimodal search with HAQM OpenSearch Service

Multimodal search enables both text and image search capabilities, transforming how users access data through search applications. Consider building an online fashion retail store: you can enhance the users’ search experience with a visually appealing application that customers can use to not only search using text but they can also upload an image depicting a […]