Posted On: Nov 22, 2022

HAQM EMR Serverless announces support for reading and writing data in HAQM DynamoDB with your Spark and Hive workflows. You can now export, import, query and, join tables in HAQM DynamoDB directly from your EMR Serverless Spark and/or Hive applications. HAQM DynamoDB is a fully managed NoSQL database that meets the latency and throughput requirements of highly demanding applications by providing single-digit millisecond latency and predictable performance with seamless throughput and storage scalability. 

AWS users often have a need to process data stored in HAQM DynamoDB efficiently and at scale for downstream analytics. HAQM EMR team built and open-sourced emr-dynamodb-connector to help customers simplify access and configuration to HAQM DynamoDB using their Apache Spark and Apache Hive applications. This connector enables multiple analytics use cases including efficiently processing data in HAQM DynamoDB or joining tables in HAQM DynamoDB with external tables in HAQM S3, HAQM RDS, or other data stores that can be accessed by HAQM EMR Serverless. With HAQM EMR release 6.9, you can get all the benefits of the HAQM DynamoDB connector with your HAQM EMR Serverless applications. You can use both cross-region and cross-account access HAQM DynamoDB tables. 

We are also delighted to share that EMR Serverless supports accessing specific HAQM S3 buckets from other AWS accounts to process data from your Spark and Hive applications. AWS customers use multiple AWS accounts to better separate different projects or lines of business. Having cross-account capabilities simplifies securing and managing distributed data lakes across multiple accounts through a centralized approach. With cross-account access to HAQM S3, you can use your EMR Serverless Spark or Hive application in an AWS account and access data stored in specific buckets from other AWS accounts for processing. 

These features are now available in all EMR Serverless regions. To learn more, refer to the HAQM EMR Serverless documentation.