Posted On: Nov 29, 2022

HAQM EMR announces HAQM Redshift integration with Apache Spark. This integration helps data engineers build and run Spark applications that can consume and write data from an HAQM Redshift cluster. Starting with HAQM EMR 6.9, this integration is available across all three deployment models for EMR - EC2, EKS, and Serverless.

You can use this integration to build applications that directly write to Redshift tables as a part of your ETL workflows or to combine data in Redshift with data in other source. Developers can load data from Redshift tables to Spark data frames or write data to Redshift tables. Developers don’t have to worry about downloading open source connectors to connect to Redshift.

HAQM Redshift integration for Apache Spark enables applications on HAQM EMR that access Redshift data to run up to 10x faster compared to existing Redshift-Spark connectors. It supports pushing down relational operations such as joins, aggregations, sort and scalar functions from Spark to Redshift to improve your query performance. It supports IAM-based roles to enable single sign on capabilities and integrates with AWS Secrets Manager for securely managing keys.

HAQM Redshift integration for Apache Spark is available in all regions where HAQM EMR, HAQM EMR on EKS and HAQM Serverless are available. To get started, refer to our documentation for HAQM EMR, HAQM EMR on EKS and HAQM EMR Serverless.