AWS News Blog
HAQM EMR Update – Apache Spark 1.5.2, Ganglia, Presto, Zeppelin, and Oozie
My colleague Jon Fritz wrote the guest post below to introduce you to the newest version of HAQM EMR.
— Jeff;
Today we are announcing HAQM EMR release 4.2.0, which adds support for Apache Spark 1.5.2, Ganglia 3.6 for Apache Hadoop and Spark monitoring, and new sandbox releases for Presto (0.125), Apache Zeppelin (0.5.5), and Apache Oozie (4.2.0).
New Applications in Release 4.2.0
HAQM EMR provides an easy way to install and configure distributed big data applications in the Hadoop and Spark ecosystems on managed clusters of HAQM EC2 instances. You can create HAQM EMR clusters from the HAQM EMR Create Cluster Page in the AWS Management Console, AWS Command Line Interface (AWS CLI), or using a SDK with EMR API. In the latest release, we added support for several new versions of applications:
- Spark 1.5.2 – Spark 1.5.2 was released on November 9th, and we’re happy to give you access to it within two weeks of general availability. This version is a maintenance release, with improvements to Spark SQL, SparkR, the DataFrame API, and miscellaneous enhancements and bug fixes. Also, Spark documentation now includes information on enabling wire encryption for the block transfer service. For a complete set of changes, view the JIRA. To learn more about Spark on HAQM EMR, click here.
- Ganglia 3.6 – Ganglia is a scalable, distributed monitoring system which can be installed on your HAQM EMR cluster to display HAQM EC2 instance level metrics which are also aggregated at the cluster level. We also configure Ganglia to ingest and display Hadoop and Spark metrics along with general resource utilization information from instances in your cluster, and metrics are displayed in a variety of time spans. You can view these metrics using the Ganglia web-UI on the master node of your HAQM EMR cluster. To learn more about Ganglia on HAQM EMR, click here.
- Presto 0.125 – Presto is an open-source, distributed SQL query engine designed for low-latency queries on large datasets in HAQM S3 and the Hadoop Distributed Filesystem (HDFS). Presto 0.125 is a maintenance release, with optimizations to SQL operations, performance enhancements, and general bug fixes. To learn more about Presto on HAQM EMR, click here.
- Zeppelin 0.5.5 – Zeppelin is an open-source interactive and collaborative notebook for data exploration using Spark. You can use Scala, Python, SQL, or HiveQL to manipulate data and visualize results. Zeppelin 0.5.5 is a maintenance release, and contains miscellaneous improvements and bug fixes. To learn more about Zeppelin on HAQM EMR, click here.
- Oozie 4.2.0 – Oozie is a workflow designer and scheduler for Hadoop and Spark. This version now includes Spark and HiveServer2 actions, making it easier to incorporate Spark and Hive jobs in Oozie workflows. Also, you can create and manage your Oozie workflows using the Oozie Editor and Dashboard in Hue, an application which offers a web-UI for Hive, Pig, and Oozie. Please note that in Hue 3.7.1, you must still use Shell actions to run Spark jobs. To learn more about Oozie in HAQM EMR, click here.
Launch an HAQM EMR Cluster with Release 4.2.0 Today
To create an HAQM EMR cluster with 4.2.0, select release 4.2.0 on the Create Cluster page in the AWS Management Console, or use the release label emr-4.2.0 when creating your cluster from the AWS CLI or using a SDK with the EMR API.
— Jon Fritz, Senior Product Manager