HAQM Data Firehose features

Why HAQM Data Firehose?

HAQM Data Firehose is the easiest way to load streaming data into data stores and analytics tools. Data Firehose is a fully managed service that makes it easy to capture, transform, and load massive volumes of streaming data from hundreds of thousands of sources into HAQM S3, HAQM Redshift, HAQM OpenSearch Service, Snowflake, Apache Iceberg tables, HAQM S3 Tables (preview), generic HTTP endpoints, and service providers like Datadog, New Relic, MongoDB, and Splunk, enabling real-time analytics and insights.

Firehose streams

Open all

A Firehose stream is the underlying entity of Firehose. You use Firehose by creating a Firehose stream and then sending data to it.

Key features

Open all

You can launch HAQM Data Firehose and create a delivery stream to load data into HAQM S3, HAQM Redshift, HAQM OpenSearch Service, Snowflake, Apache Iceberg tables, HAQM S3 Tables (preview), HTTP endpoints, Datadog, New Relic, MongoDB, or Splunk with just a few clicks in the AWS Management Console. You can send data to the delivery stream by calling the Firehose API, or running the Linux agent we provide on the data source. Data Firehose then continuously loads the data into the specified destinations.

Once launched, your Firehose streams automatically scale up to handle gigabytes per second or more of input data rate, and maintain data latency at levels you specify for the stream, within the limits. No intervention or maintenance is needed.

You can specify a batch size or batch interval to control how quickly data is uploaded to destinations. For example, you can set the batch interval anywhere from zero seconds to 15 minutes. Additionally, you can specify whether data should be compressed or not. The service supports common compression algorithms including GZip, Hadoop-Compatible Snappy, Zip, and Snappy. Batching and compressing data before uploading enables you to control how quickly you receive new data at the destinations.

Firehose reads data easily from 20+ data sources, including HAQM MSK and MSK Serverless clusters, HAQM Kinesis Data Streams, Databases (preview), HAQM CloudWatch Logs, HAQM SNS, AWS IoT Core, and more.

Firehose supports columnar data formats such as Apache Parquet and Apache ORC are optimized for cost-effective storage and analytics using services such as HAQM Athena, HAQM Redshift Spectrum, HAQM EMR, and other Hadoop based tools. Firehose can convert the format of incoming data from JSON to Parquet or ORC formats before storing the data in HAQM S3, so you can save storage and analytics costs.

Dynamically partition your streaming data before delivery to S3 using static or dynamically defined keys like “customer_id” or “transaction_id”.  Firehose groups data by these keys and delivers into key-unique S3 prefixes, making it easier for you to perform high performance, cost efficient analytics in S3 using Athena, EMR, and Redshift Spectrum. Learn more

You can configure HAQM Data Firehose to prepare your streaming data before it is loaded to data stores. Simply select an AWS Lambda function from the HAQM Data Firehose stream configuration tab in the AWS Management console. HAQM Data Firehose will automatically apply that function to every input data record and load the transformed data to destinations. HAQM Data Firehose provides pre-built Lambda blueprints for converting common data sources such as Apache logs and system logs to JSON and CSV formats. You can use these pre-built blueprints without any change, or customize them further, or write your own custom functions. You can also configure HAQM Data Firehose to automatically retry failed jobs and back up the raw streaming data. Learn more

Firehose reads data easily from 20+ data sources, including HAQM MSK and MSK Serverless clusters, HAQM Kinesis Data Streams, HAQM CloudWatch Logs, HAQM SNS, AWS IoT Core, and more.HAQM Data Firehose currently supports HAQM S3, HAQM Redshift, HAQM OpenSearch Service, Snowflake, Apache Iceberg tables, HAQM S3 Tables (preview), HTTP endpoints, Datadog, New Relic, MongoDB, and Splunk as destinations. You can specify the destination HAQM S3 bucket, the HAQM Redshift table, the HAQM OpenSearch Service domain, generic HTTP endpoints, or a service provider where the data should be loaded.

HAQM Data Firehose provides you the option to have your data automatically encrypted after it is uploaded to the destination. As part of the Firehose stream configuration, you can specify an AWS Key Management System (KMS) encryption key.

HAQM Data Firehose exposes several metrics through the console, as well as HAQM CloudWatch, including volume of data submitted, volume of data uploaded to destination, time from source to destination, the Firehose stream limits, throttled records number and upload success rate.

With HAQM Data Firehose, you pay only for the volume of data you transmit through the service, and if applicable, for data format conversion. You also pay for HAQM VPC delivery and data transfer when applicable. There are no minimum fees or upfront commitments. You don’t need staff to operate, scale, and maintain infrastructure or custom applications to capture and load streaming data.