AWS Big Data Blog

Category: HAQM Managed Streaming for Apache Kafka (HAQM MSK)

Increase Apache Kafka’s resiliency with a multi-Region deployment and MirrorMaker 2

Customers create business continuity plans and disaster recovery (DR) strategies to maximize resiliency for their applications, because downtime or data loss can result in losing revenue or halting operations. Ultimately, DR planning is all about enabling the business to continue running despite a Regional outage. This post explains how to make Apache Kafka resilient to […]

Securing Apache Kafka is easy and familiar with IAM Access Control for HAQM MSK

This is a guest blog post by AWS Data Hero Stephane Maarek.  AWS launched IAM Access Control for HAQM MSK, which is a security option offered at no additional cost that simplifies cluster authentication and Apache Kafka API authorization using AWS Identity and Access Management (IAM) roles or user policies to control access. This eliminates […]

How Goldman Sachs migrated from their on-premises Apache Kafka cluster to HAQM MSK

This is a guest post by Zachary Whitford, Associate, Richa Prajapati, Vice President and Aldo Piddiu, Vice President in the Global Investment Research engineering team at Goldman Sachs. To see how Goldman Sachs is innovating more with AWS visit Goldman Sachs Leading Cloud Innovator page. The Global Investment Research (GIR) division at Goldman Sachs delivers […]

Validate, evolve, and control schemas in HAQM MSK and HAQM Kinesis Data Streams with AWS Glue Schema Registry

August 30, 2023: HAQM Kinesis Data Analytics has been renamed to HAQM Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. Data streaming technologies like Apache Kafka and HAQM Kinesis Data Streams capture and distribute data generated by thousands or millions of applications, websites, or machines. These technologies […]

Stream Twitter data into HAQM Redshift using HAQM MSK and AWS Glue streaming ETL

This post demonstrates how customers, system integrator (SI) partners, and developers can use the serverless streaming ETL capabilities of AWS Glue with HAQM Managed Streaming for Kafka (HAQM MSK) to stream data to a data warehouse such as HAQM Redshift. We also show you how to view Twitter streaming data on HAQM QuickSight via HAQM Redshift.

Vortexa delivers real-time insights on HAQM MSK with Lenses.io

This post discusses how Vortexa harnesses the power of Apache Kafka to improve real-time data accuracy and accelerate time-to-market by using a combination of Lenses.io for greater observability and HAQM Managed Streaming for Apache Kafka (HAQM MSK) to create clusters on demand.

Best practices from Delhivery on migrating from Apache Kafka to HAQM MSK

August 30, 2023: HAQM Kinesis Data Analytics has been renamed to HAQM Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. This is a guest post by Delhivery. In this post, we describe the steps Delhivery took to migrate from self-managed Apache Kafka running on HAQM Elastic Compute […]

Streaming web content with a log-based architecture with HAQM MSK

Content, such as breaking news or sports scores, requires updates in near-real-time. To stay up to date, you may be constantly refreshing your browser or mobile app. Building APIs to deliver this content at speed and scale can be challenging. In this post, I present an alternative to an API-based approach. I outline the concept […]

How Goldman Sachs builds cross-account connectivity to their HAQM MSK clusters with AWS PrivateLink

August 2023: HAQM MSK now offers a managed feature called multi-VPC private connectivity to simplify connectivity of your Kafka clients to your brokers. Refer this blog to learn more. This guest post presents patterns for accessing an HAQM Managed Streaming for Apache Kafka cluster across your AWS account or HAQM Virtual Private Cloud (HAQM VPC) […]

Govern how your clients interact with Apache Kafka using API Gateway

In this blog post, we will show you how HAQM API Gateway can answer these questions as a component between your HAQM MSK cluster and your clients. HAQM MSK is a fully managed service for Apache Kafka that makes it easy to provision Kafka clusters with just a few clicks without the need to provision servers, manage storage, or configure Apache Zookeeper manually. Apache Kafka is an open-source platform for building real-time streaming data pipelines and applications.