Posted On: Aug 31, 2021

Today we announced Dynamic Partitioning in HAQM Kinesis Data Firehose. With Dynamic Partitioning, you can continuously partition streaming data in Kinesis Data Firehose using keys within data like “customer_id” or “transaction_id” and deliver data grouped by these keys into corresponding HAQM Simple Storage Service (HAQM S3) prefixes, making it easier for you to run high performance, cost-efficient analytics on streaming data in HAQM S3 using HAQM Athena, HAQM EMR, and HAQM Redshift Spectrum.

Partitioning your data minimizes the amount of data scanned, optimizing performance and reducing costs of your analytics queries on HAQM S3, and increasing granular access to data. Traditionally, customers use Kinesis Data Firehose delivery streams to capture and load their data streams into HAQM S3. To partition a streaming data set for HAQM S3-based analytics, customers would need to run partitioning applications between HAQM S3 buckets prior to making the data available for analysis, which could become complicated or costly. 

Now with Dynamic Partitioning, Kinesis Data Firehose will continuously group data in-transit by dynamically or statically defined data keys, and deliver to individual HAQM S3 prefixes by key. This will reduce time-to-insight by minutes or hours, reducing costs and simplifying architectures. Along with Apache Parquet and Apache ORC format conversion features, this feature makes Kinesis Data Firehose the best place to capture, prepare, and load analytics-ready streaming data to HAQM S3. 

Visit the Kinesis Data Firehose user guide to get started with dynamic partitioning, or visit the pricing page to learn more about on-demand pricing for dynamic partitioning. Dynamic partitioning can be used in all commercial AWS regions where Kinesis Data Firehose is available.