Posted On: Dec 22, 2017

You can now use Apache Spark 2.2.1, Apache Hive 2.3.2, and HAQM SageMaker integration with Apache Spark on HAQM EMR release 5.11.0. Spark 2.2.1 and Hive 2.3.2 include various bug fixes and improvements. HAQM SageMaker Spark is an open-source Spark library for HAQM SageMaker, a fully-managed service which can build, train, and deploy machine learning models at scale. It enables you to interleave Spark stages and stages that interact with HAQM SageMaker in your Spark ML Pipelines, allowing you to train models using Spark DataFrames in HAQM SageMaker with HAQM-provided ML algorithms like K-Means clustering or XGBoost.

You can create an HAQM EMR cluster with release 5.11.0 by choosing release label “emr-5.11.0” from the AWS Management Console, AWS CLI, or SDK. You can select Spark and Hive to install these applications on your cluster. The HAQM SageMaker Spark library is automatically included when you install Spark. Please visit the HAQM EMR documentation for more information about release 5.11.0, Spark 2.2.1, Hive 2.3.2, and using HAQM SageMaker with Spark

HAQM EMR release 5.11.0 is available in all supported regions for HAQM EMR