Posted On: Oct 31, 2022
HAQM EMR release 6.8 now supports Apache Hudi 0.11.1 and Apache Iceberg 0.14.0. You can use these frameworks on HAQM EMR on EC2, and HAQM EMR on EKS as well as on HAQM EMR Serverless.
Apache Hudi 0.11.1 on HAQM EMR 6.8 includes support for Spark 3.3.0, adds Multi-Modal Index support and Data Skipping with Metadata Table that allows adding bloom filter and column stats indexes to tables which can significantly improve query performance, adds an Async Indexer service which allows users to create different kinds of indices (e.g., files, bloom filters, and column stats) in the metadata table without blocking ingestion, includes Spark SQL improvements adding support for update or delete records in Hudi tables using non-primary-key fields and Time travel query via timestamp as of syntax, includes Flink integration improvements with support for both Flink 1.13.x and 1.14.x and support for complex data types such as Map and Array etc. In addition, Hudi 0.11.1 includes bug fixes over Hudi 0.11.0 available in HAQM EMR release 6.7. For more details, refer to the OSS Hudi release docs.
Apache Iceberg 0.14.0 on HAQM EMR 6.8 includes support for Spark 3.3.0, adds Merge-on-read support for MERGE and UPDATE statements, adds support to rewrite partitions using Z-order that allows to re-organize partitions to be efficient with query predicates on multiple columns and also to keep similar data together, includes several performance improvements for scan planning in Spark queries, add support for row group skipping using Parquet bloom filters, etc. For more details, refer to the OSS Iceberg release docs.
HAQM EMR release 6.8 is generally available in all regions where HAQM EMR is available. See Regional Availability of HAQM EMR, and our release notes for more details.