AWS Big Data Blog
Tag: HAQM EMR
Integrating IoT Events into Your Analytic Platform
AWS IoT makes it easy to integrate and control your devices from other AWS services for even more powerful IoT applications. In particular, IoT provides tight integration with AWS Lambda, HAQM Kinesis, HAQM S3, HAQM Machine Learning, HAQM DynamoDB, HAQM CloudWatch, and HAQM OpenSearch Service.
Processing VPC Flow Logs with HAQM EMR
In this post, I show you how to gain valuable insight into your network by using HAQM EMR and HAQM VPC Flow Logs. The walkthrough implements a pattern often found in network equipment called ‘Top Talkers’, an ordered list of the heaviest network users, but the model can also be used for many other types of network analysis.
Building and Deploying Custom Applications with Apache Bigtop and HAQM EMR
This post shows you how to build a custom application for EMR for Apache Bigtop-based releases 4.x and greater. EMR nodes are based on the HAQM Linux AMI, so I will deploy on RPM packages and use Elasticsearch as the example application.
Use Spark 2.0, Hive 2.1 on Tez, and the latest from the Hadoop ecosystem on HAQM EMR release 5.0
Jonathan Fritz is a Senior Product Manager for HAQM EMR We are excited to launch HAQM EMR release 5.0 today, giving customers the latest versions of 16 supported open-source applications in the big data ecosystem, including new major versions of Spark and Hive. Almost exactly a year ago, we shipped release 4.0, which brought significant […]
Installing and Running JobServer for Apache Spark on HAQM EMR
In this blog post, you will learn how to install JobServer on EMR using a bootstrap action (BA) derived from the JobServer GitHub repository. Then we’ll run JobServer using a sample dataset.
How SmartNews Built a Lambda Architecture on AWS to Analyze Customer Behavior and Recommend Content
This is a guest post by Takumi Sakamoto, a software engineer at SmartNews. SmartNews in their own words: “SmartNews is a machine learning-based news discovery app that delivers the very best stories on the Web for more than 18 million users worldwide.” Data processing is one of the key technologies for SmartNews. Every team’s workload […]
Generating Recommendations at HAQM Scale with Apache Spark and HAQM DSSTNE
In this post, I discuss an alternate solution; namely, running separate CPU and GPU clusters, and driving the end-to-end modeling process from Apache Spark.
Use Sqoop to Transfer Data from HAQM EMR to HAQM RDS
In this post, I will show you how to transfer data using Apache Sqoop, which is a tool designed to transfer data between Hadoop and relational databases. Support for Apache Sqoop is available in HAQM EMR releases 4.4.0 and later.
Analyze Realtime Data from HAQM Kinesis Streams Using Zeppelin and Spark Streaming
This post shows you how you can use Spark Streaming to process data coming from HAQM Kinesis streams, build some graphs using Zeppelin, and then store the Zeppelin notebook in HAQM S3.
Apache Tez Now Available with HAQM EMR
HAQM EMR has added Apache Tez version 0.8.3 as a supported application in release 4.7.0. Tez is an extensible framework for building batch and interactive data processing applications on top of Hadoop YARN.