AWS Big Data Blog

Tag: HAQM EMR

Crunching Statistics at Scale with SparkR on HAQM EMR

Christopher Crosbie is a Healthcare and Life Science Solutions Architect with HAQM Web Services. This post is co-authored by Gopal Wunnava, a Senior Consultant with AWS Professional Services. SparkR is an R package that allows you to integrate complex statistical analysis with large datasets. In this blog post, we introduce you running R with the […]

Analyze Your Data on HAQM DynamoDB with Apache Spark

Manjeet Chayel is a Solutions Architect with AWS Every day, tons of customer data is generated, such as website logs, gaming data, advertising data, and streaming videos. Many companies capture this information as it’s generated and process it in real time to understand their customers. HAQM DynamoDB is a fast and flexible NoSQL database service […]

Submitting User Applications with spark-submit

Francisco Oliveira is a consultant with AWS Professional Services Customers starting their big data journey often ask for guidelines on how to submit user applications to Spark running on HAQM EMR. For example, customers ask for guidelines on how to size memory and compute resources available to their applications and the best resource allocation model […]

Running an External Zeppelin Instance using S3 Backed Notebooks with Spark on HAQM EMR

Dominic Murphy is an Enterprise Solution Architect with HAQM Web Services Apache Zeppelin is an open source GUI which creates interactive and collaborative notebooks for data exploration using Spark. You can use Scala, Python, SQL (using Spark SQL), or HiveQL to manipulate data and quickly visualize results. Zeppelin notebooks can be shared among several users, […]

Using BlueTalon with HAQM EMR

This is a guest post by Pratik Verma, Founder and Chief Product Officer at BlueTalon. Leonid Fedotov, Senior Solution Architect at BlueTalon, also contributed to this post. HAQM Elastic MapReduce (HAQM EMR) makes it easy to quickly and cost-effectively process vast amounts of data in the cloud. EMR gets used for log, financial, fraud, and […]