AWS Big Data Blog
Category: HAQM Kinesis
Stream data to an HTTP endpoint with HAQM Data Firehose
November 2024: This post was reviewed and updated for accuracy. The value of data is time sensitive. Streaming data services can help you move data quickly from data sources to new destinations for downstream processing. For example, HAQM Data Firehose can reliably load streaming data into data stores like HAQM Simple Storage Service (HAQM S3), HAQM Redshift, HAQM OpenSearch Service, and […]
Best practices from Delhivery on migrating from Apache Kafka to HAQM MSK
August 30, 2023: HAQM Kinesis Data Analytics has been renamed to HAQM Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. This is a guest post by Delhivery. In this post, we describe the steps Delhivery took to migrate from self-managed Apache Kafka running on HAQM Elastic Compute […]
How Wind Mobility built a serverless data architecture
We parse through millions of scooter and user events generated daily (over 300 events per second) to extract actionable insight. We selected AWS Glue to perform this task. Our primary ETL job reads the newly added raw event data from HAQM S3, processes it using Apache Spark, and writes the results to our HAQM Redshift data warehouse. AWS Glue plays a critical role in our ability to scale on demand. After careful evaluation and testing, we concluded that AWS Glue ETL jobs meet all our needs and free us from procuring and managing infrastructure.
Build an AWS Well-Architected environment with the Analytics Lens
Building a modern data platform on AWS enables you to collect data of all types, store it in a central, secure repository, and analyze it with purpose-built tools. Yet you may be unsure of how to get started and the impact of certain design decisions. To address the need to provide advice tailored to specific technology and application domains, AWS added the concept of well-architected lenses 2017. AWS now is happy to announce the Analytics Lens for the AWS Well-Architected Framework. This post provides an introduction of its purpose, topics covered, common scenarios, and services included.
Ingest streaming data into HAQM OpenSearch Service within the privacy of your VPC with HAQM Data Firehose
September 8, 2021: HAQM Elasticsearch Service has been renamed to HAQM OpenSearch Service. See details. February 9, 2024: HAQM Kinesis Data Firehose has been renamed to HAQM Data Firehose. Read the AWS What’s New post to learn more. Today we are adding a new HAQM Data Firehose feature to set up VPC delivery to your […]
Build a cloud-native network performance analytics solution on AWS for wireless service providers
February 9, 2024: HAQM Kinesis Data Firehose has been renamed to HAQM Data Firehose. Read the AWS What’s New post to learn more. This post demonstrates a serverless, cloud-based approach to building a network performance analytics solution using AWS services that can provide flexibility and performance while keeping costs under control with pay-per-use AWS services. […]
Streaming ETL with Apache Flink and HAQM Kinesis Data Analytics
August 30, 2023: HAQM Kinesis Data Analytics has been renamed to HAQM Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. February 9, 2024: HAQM Kinesis Data Firehose has been renamed to HAQM Data Firehose. Read the AWS What’s New post to learn more. Most businesses generate data […]
How FactSet automated exporting data from HAQM DynamoDB to HAQM S3 Parquet to build a data analytics platform
February 9, 2024: HAQM Kinesis Data Firehose has been renamed to HAQM Data Firehose. Read the AWS What’s New post to learn more. This is a guest post by Arvind Godbole, Lead Software Engineer with FactSet and Tarik Makota, AWS Principal Solutions Architect. In their own words “FactSet creates flexible, open data and software solutions […]
Under the hood: Scaling your Kinesis data streams
Real-time delivery of data and insights enables businesses to pivot quickly in response to changes in demand, user engagement, and infrastructure events, among many others. HAQM Kinesis offers a managed service that lets you focus on building your applications, rather than managing infrastructure. Scalability is provided out-of-the-box, allowing you to ingest and process gigabytes of […]
Optimize downstream data processing with HAQM Data Firehose and HAQM EMR running Apache Spark
This blog post shows how to use HAQM Kinesis Data Firehose to merge many small messages into larger messages for delivery to HAQM S3, which results in faster processing with HAQM EMR running Spark. This post also shows how to read the compressed files using Apache Spark that are in HAQM S3, which does not have a proper file name extension and store back in HAQM S3 in parquet format.