AWS Big Data Blog

Tag: HAQM EMR

Deploying Cloudera’s Enterprise Data Hub on AWS

Karthik Krishnan is an AWS Solutions Architect UPDATE April 6, 2015: The newest quickstart reference guide supports Cloudera Director 1.1.0. To manage your cluster with Cloudera Director 1.1.0, refer to the updated reference guide.  Apache Hadoop is an open-source software framework to store and process large scale data-sets.  In this post, we discuss the deployment of […]

Statistical Analysis with Open-Source R and RStudio on HAQM EMR

Markus Schmidberger is a Senior Big Data Consultant for AWS Professional Services Big Data is on every CIO’s mind. It is synonymous with technologies like Hadoop and the ‘NoSQL’ class of databases. Another technology shaking things up in Big Data is R. This blog post describes how to set up R, RHadoop packages and RStudio […]

Using HAQM EMR with SQL Workbench and other BI Tools

This is a guest post by Kyle Porter, a Sales Engineer at Simba Technologies. Jon Einkauf, a Senior Product Manager for HAQM Elastic MapReduce and AWS Senior Technical Writer Jeff Slone also contributed to this post. —————- Note: Ports have changed on EMR 4.x,. Before walking through this post, please consult the EMR documentation to […]

Using HAQM EMR and Tableau to Analyze and Visualize Data

Rahul Bhartia is an AWS Solutions Architect Introduction Hadoop provides a great ecosystem of tools for extracting value from data in various formats and sizes. Originally focused on large-batch processing with tools like MapReduce, Pig and Hive, Hadoop now provides many tools for running interactive queries on your data, such as Impala, Drill, and Presto. […]

Getting Started with HAQM EMR Bootstrap Actions

Steve McPherson is a Senior Manager for HAQM Elastic MapReduce Note: This post was updated 2/8/16. The Presto bootstrap action documented in the original post has been deprecated because EMR now offers a Presto-Sandbox as a full-fledged EMR application. For details, see the EMR sandbox.   HAQM Elastic MapReduce (EMR) is a fully managed Hadoop-as-a-service platform […]