AWS Big Data Blog
Tag: HAQM EMR
Crunching Statistics at Scale with SparkR on HAQM EMR
Christopher Crosbie is a Healthcare and Life Science Solutions Architect with HAQM Web Services. This post is co-authored by Gopal Wunnava, a Senior Consultant with AWS Professional Services. SparkR is an R package that allows you to integrate complex statistical analysis with large datasets. In this blog post, we introduce you running R with the […]
Anomaly Detection Using PySpark, Hive, and Hue on HAQM EMR
Veronika Megler, Ph.D., is a Senior Consultant with AWS Professional Services We are surrounded by more and more sensors – some of which we’re not even consciously aware. As sensors become cheaper and easier to connect, they create an increasing flood of data that’s getting cheaper and easier to store and process. However, sensor readings […]
Import Zeppelin notes from GitHub or JSON in Zeppelin 0.5.6 on HAQM EMR
Jonathan Fritz is a Senior Product Manager for HAQM EMR Many HAQM EMR customers use Zeppelin to create interactive notebooks to run workloads with Spark using Scala, Python, and SQL. These customers have found HAQM EMR to be a great platform for running Zeppelin because of strong integration with other AWS services and the ability […]
Analyze Your Data on HAQM DynamoDB with Apache Spark
Manjeet Chayel is a Solutions Architect with AWS Every day, tons of customer data is generated, such as website logs, gaming data, advertising data, and streaming videos. Many companies capture this information as it’s generated and process it in real time to understand their customers. HAQM DynamoDB is a fast and flexible NoSQL database service […]
Submitting User Applications with spark-submit
Francisco Oliveira is a consultant with AWS Professional Services Customers starting their big data journey often ask for guidelines on how to submit user applications to Spark running on HAQM EMR. For example, customers ask for guidelines on how to size memory and compute resources available to their applications and the best resource allocation model […]
Turning HAQM EMR into a Massive HAQM S3 Processing Engine with Campanile
Michael Wallman is a senior consultant with AWS ProServ Have you ever had to copy a huge HAQM S3 bucket to another account or region? Or create a list based on object name or size? How about mapping a function over millions of objects? HAQM EMR to the rescue! EMR allows you to deploy large […]
Running an External Zeppelin Instance using S3 Backed Notebooks with Spark on HAQM EMR
Dominic Murphy is an Enterprise Solution Architect with HAQM Web Services Apache Zeppelin is an open source GUI which creates interactive and collaborative notebooks for data exploration using Spark. You can use Scala, Python, SQL (using Spark SQL), or HiveQL to manipulate data and quickly visualize results. Zeppelin notebooks can be shared among several users, […]
Securely Access Web Interfaces on HAQM EMR Launched in a Private Subnet
Ben Snively is a Solutions Architect with AWS Private subnets allow you to limit access to deployed components, and to control security and routing of the system. You can also use a private subnet to connect an on-premises local network to AWS through a VPN or AWS Direct Connect. HAQM EMR allows customers to launch […]
Analyze Data with Presto and Airpal on HAQM EMR
Songzhi Liu is a Professional Services Consultant with AWS You can now launch Presto version 0.119 on HAQM EMR, allowing you to easily spin up a managed EMR cluster with the Presto query engine and run interactive analysis on data stored in HAQM S3. You can integrate with Spot instances, publish logs to an S3 […]
Using BlueTalon with HAQM EMR
This is a guest post by Pratik Verma, Founder and Chief Product Officer at BlueTalon. Leonid Fedotov, Senior Solution Architect at BlueTalon, also contributed to this post. HAQM Elastic MapReduce (HAQM EMR) makes it easy to quickly and cost-effectively process vast amounts of data in the cloud. EMR gets used for log, financial, fraud, and […]