AWS Big Data Blog
Category: HAQM Managed Streaming for Apache Kafka (HAQM MSK)
Introducing Protocol buffers (protobuf) schema support in AWS Glue Schema Registry
AWS Glue Schema Registry now supports Protocol buffers (protobuf) schemas in addition to JSON and Avro schemas. This allows application teams to use protobuf schemas to govern the evolution of streaming data and centrally control data quality from data streams to data lake. AWS Glue Schema Registry provides an open-source library that includes Apache-licensed serializers […]
Back up and restore Kafka topic data using HAQM MSK Connect
This blog is only meant to be used as a reference for backing up and restoring data for an HAQM MSK cluster. AWS does not offer any support for it. You can use Apache Kafka to run your streaming workloads. Kafka provides resiliency to failures and protects your data out of the box by replicating […]
Best practices for right-sizing your Apache Kafka clusters to optimize performance and cost
Apache Kafka is well known for its performance and tunability to optimize for various use cases. But sometimes it can be challenging to find the right infrastructure configuration that meets your specific performance requirements while minimizing the infrastructure cost. This post explains how the underlying infrastructure affects Apache Kafka performance. We discuss strategies on how […]
Create a low-latency source-to-data lake pipeline using HAQM MSK Connect, Apache Flink, and Apache Hudi
August 30, 2023: HAQM Kinesis Data Analytics has been renamed to HAQM Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. During the recent years, there has been a shift from monolithic to the microservices architecture. The microservices architecture makes applications easier to scale and quicker to develop, […]
Validate streaming data over HAQM MSK using schemas in cross-account AWS Glue Schema Registry
August 30, 2023: HAQM Kinesis Data Analytics has been renamed to HAQM Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. Today’s businesses face an unprecedented growth in the volume of data. A growing portion of the data is generated in real time by IoT devices, websites, business […]
Evolve JSON Schemas in HAQM MSK and HAQM Kinesis Data Streams with the AWS Glue Schema Registry
Data is being produced, streamed, and consumed at an immense rate, and that rate is projected to grow exponentially in the future. In particular, JSON is the most widely used data format across streaming technologies and workloads. As applications, websites, and machines increasingly adopt data streaming technologies such as Apache Kafka and HAQM Kinesis Data […]
Now Available: Updated guidance on the Data Analytics Lens for AWS Well-Architected Framework
Nearly all businesses today require some form of data analytics processing, from auditing user access to generating sales reports. For all your analytics needs, the Data Analytics Lens for AWS Well-Architected Framework provides prescriptive guidance to help you assess your workloads and identify best practices aligned to the AWS Well-Architected Pillars: Operational Excellence, Security, Reliability, […]
Query your HAQM MSK topics interactively using HAQM Managed Service for Apache Flink Studio
August 30, 2023: HAQM Kinesis Data Analytics has been renamed to HAQM Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. HAQM Managed Service for Apache Flink Studio makes it easy to analyze streaming data in real time and build stream processing applications powered by Apache Flink using […]
Power your Kafka Streams application with HAQM MSK and AWS Fargate
November 2024: This post was reviewed and updated for accuracy. Today, companies of all sizes across all verticals design and build event-driven architectures centered around real-time streaming and stream processing. HAQM Managed Streaming for Apache Kafka (HAQM MSK) is a fully managed service that makes it easy for you to build and run applications that […]
Secure connectivity patterns to access HAQM MSK across AWS Regions
August 2023: HAQM MSK now offers a managed feature called multi-VPC private connectivity to simplify connectivity of your Kafka clients to your brokers. Refer this blog to learn more. AWS customers often segment their workloads across accounts and HAQM Virtual Private Cloud (HAQM VPC) to streamline access management while being able to expand their footprint. […]