AWS Big Data Blog

Category: HAQM Managed Streaming for Apache Kafka (HAQM MSK)

Data Ingestion Workflow

How Zoom implemented streaming log ingestion and efficient GDPR deletes using Apache Hudi on HAQM EMR

In today’s digital age, logging is a critical aspect of application development and management, but efficiently managing logs while complying with data protection regulations can be a significant challenge. Zoom, in collaboration with the AWS Data Lab team, developed an innovative architecture to overcome these challenges and streamline their logging and record deletion processes. In […]

How SOCAR handles large IoT data with HAQM MSK and HAQM ElastiCache for Redis

This is a guest blog post co-written with SangSu Park and JaeHong Ahn from SOCAR.  As companies continue to expand their digital footprint, the importance of real-time data processing and analysis cannot be overstated. The ability to quickly measure and draw insights from data is critical in today’s business landscape, where rapid decision-making is key. […]

Connect Kafka client applications securely to your HAQM MSK cluster from different VPCs and AWS accounts

You can now use HAQM Managed Streaming for Apache Kafka (HAQM MSK) multi-VPC private connectivity (powered by AWS PrivateLink) and cluster policy support for MSK clusters to simplify connectivity of your Kafka clients to your brokers. HAQM MSK is a fully managed service that makes it easy for you to build and run applications that […]

Connect to HAQM MSK Serverless from your on-premises network

HAQM Managed Streaming for Apache Kafka (HAQM MSK) is a fully managed, highly available, and secure Apache Kafka service. HAQM MSK reduces the work needed to set up, scale, and manage Apache Kafka in production. With HAQM MSK, you can create a cluster in minutes and start sending data. With HAQM MSK Serverless, you can […]

Accelerating revenue growth with real-time analytics: Poshmark’s journey

August 30, 2023: HAQM Kinesis Data Analytics has been renamed to HAQM Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. This post was co-written by Mahesh Pasupuleti and Gaurav Shah from Poshmark. Poshmark is a leading social marketplace for new and secondhand styles for women, men, kids, […]

How to choose the right HAQM MSK cluster type for you

March 2025: This post was reviewed and updated for accuracy. HAQM Managed Streaming for Apache Kafka (HAQM MSK) is an AWS streaming data service that manages Apache Kafka infrastructure and operations, making it easy for developers and DevOps managers to run Apache Kafka applications and Kafka Connect connectors on AWS, without the need to become […]

Build an end-to-end change data capture with HAQM MSK Connect and AWS Glue Schema Registry

The value of data is time sensitive. Real-time processing makes data-driven decisions accurate and actionable in seconds or minutes instead of hours or days. Change data capture (CDC) refers to the process of identifying and capturing changes made to data in a database and then delivering those changes in real time to a downstream system. […]

Enhance operational insights for HAQM MSK using HAQM Managed Service for Prometheus and HAQM Managed Grafana

HAQM Managed Streaming for Apache Kafka (HAQM MSK) is an event streaming platform that you can use to build asynchronous applications by decoupling producers and consumers. Monitoring of different HAQM MSK metrics is critical for efficient operations of production workloads. HAQM MSK gathers Apache Kafka metrics and sends them to HAQM CloudWatch, where you can […]

Create more partitions and retain data for longer in your MSK Serverless clusters

In April 2022, HAQM Managed Streaming for Apache Kafka (HAQM MSK) launched an exciting new capability, HAQM MSK Serverless. HAQM MSK is a fully managed service for Apache Kafka that makes it easier for developers to build and run highly available, secure, and scalable applications based on Apache Kafka. With MSK Serverless, developers can run […]

Build a serverless streaming pipeline with HAQM MSK Serverless, HAQM MSK Connect, and MongoDB Atlas

This post was cowritten with Babu Srinivasan and Robert Walters from MongoDB. HAQM Managed Streaming for Apache Kafka (HAQM MSK) is a fully managed, highly available Apache Kafka service. HAQM MSK makes it easy to ingest and process streaming data in real time and use that data easily within the AWS ecosystem. With HAQM MSK […]