AWS Big Data Blog
Category: HAQM Kinesis
Modernize a legacy real-time analytics application with HAQM Managed Service for Apache Flink
In this post, we discuss challenges with relational databases when used for real-time analytics and ways to mitigate them by modernizing the architecture with serverless AWS solutions. We introduce you to HAQM Managed Service for Apache Flink Studio and get started querying streaming data interactively using HAQM Kinesis Data Streams. We walk through a call center analytics solution that provides insights into the call center’s performance in near-real time through metrics that determine agent efficiency in handling calls in the queue. Key performance indicators (KPIs) of interest for a call center from a near-real-time platform could be calls waiting in the queue, highlighted in a performance dashboard within a few seconds of data ingestion from call center streams.
Non-JSON ingestion using HAQM Kinesis Data Streams, HAQM MSK, and HAQM Redshift Streaming Ingestion
Organizations are grappling with the ever-expanding spectrum of data formats in today’s data-driven landscape. From Avro’s binary serialization to the efficient and compact structure of Protobuf, the landscape of data formats has expanded far beyond the traditional realms of CSV and JSON. As organizations strive to derive insights from these diverse data streams, the challenge […]
How Chime Financial uses AWS to build a serverless stream analytics platform and defeat fraudsters
This is a guest post by Khandu Shinde, Staff Software Engineer and Edward Paget, Senior Software Engineering at Chime Financial. Chime is a financial technology company founded on the premise that basic banking services should be helpful, easy, and free. Chime partners with national banks to design member first financial products. This creates a more […]
Perform HAQM Kinesis load testing with Locust
February 9, 2024: HAQM Kinesis Data Firehose has been renamed to HAQM Data Firehose. Read the AWS What’s New post to learn more. Building a streaming data solution requires thorough testing at the scale it will operate in a production environment. Streaming applications operating at scale often handle large volumes of up to GBs per […]
Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, HAQM Kinesis, AWS Glue streaming ETL, and data visualization using HAQM QuickSight
We recently announced support for streaming extract, transform, and load (ETL) jobs in AWS Glue version 4.0, a new version of AWS Glue that accelerates data integration workloads in AWS. AWS Glue streaming ETL jobs continuously consume data from streaming sources, clean and transform the data in-flight, and make it available for analysis in seconds. AWS also offers a broad selection of services to support your needs. A database replication service such as AWS Database Migration Service (AWS DMS) can replicate the data from your source systems to HAQM Simple Storage Service (HAQM S3), which commonly hosts the storage layer of the data lake. This post demonstrates how to apply CDC changes from HAQM Relational Database Service (HAQM RDS) or other relational databases to an S3 data lake, with flexibility to denormalize, transform, and enrich the data in near-real time.
HAQM Kinesis Data Streams on-demand capacity mode now scales up to 1 GB/second ingest capacity
HAQM Kinesis Data Streams is a serverless data streaming service that makes it easy to capture, process, and store streaming data at any scale. As customers collect and stream more types of data, they have asked for simpler, elastic data streams that can handle variable and unpredictable data traffic. In November 2021, HAQM Web Services […]
A side-by-side comparison of Apache Spark and Apache Flink for common streaming use cases
Apache Flink and Apache Spark are both open-source, distributed data processing frameworks used widely for big data processing and analytics. Spark is known for its ease of use, high-level APIs, and the ability to process large amounts of data. Flink shines in its ability to handle processing of data streams in real-time and low-latency stateful […]
Near-real-time analytics using HAQM Redshift streaming ingestion with HAQM Kinesis Data Streams and HAQM DynamoDB
HAQM Redshift is a fully managed, scalable cloud data warehouse that accelerates your time to insights with fast, easy, and secure analytics at scale. Tens of thousands of customers rely on HAQM Redshift to analyze exabytes of data and run complex analytical queries, making it the widely used cloud data warehouse. You can run and […]
Migrate from HAQM Kinesis Data Analytics for SQL Applications to HAQM Managed Service for Apache Flink Studio
February 9, 2024: HAQM Kinesis Data Firehose has been renamed to HAQM Data Firehose. Read the AWS What’s New post to learn more. August 30, 2023: HAQM Kinesis Data Analytics has been renamed to HAQM Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. In this post, we […]
Stream VPC Flow Logs to Datadog via HAQM Kinesis Data Firehose
February 9, 2024: HAQM Kinesis Data Firehose has been renamed to HAQM Data Firehose. Read the AWS What’s New post to learn more. It’s common to store the logs generated by customer’s applications and services in various tools. These logs are important for compliance, audits, troubleshooting, security incident responses, meeting security policies, and many other […]