AWS Big Data Blog

Tag: HAQM EMR

Implementing Authorization and Auditing using Apache Ranger on HAQM EMR

Updated 3/30/2022: HAQM EMR has announced official support of Apache Ranger (link). Open-source plugin support will not be maintained moving forward and compatibility with latest versions will not be tested. We recommend customers to move to the HAQM EMR support for Apache Ranger. Ranger Presto plugin support on EMR has been deprecated. Updated 12/03/2020: Support for […]

Low-Latency Access on Trillions of Records: FINRA’s Architecture Using Apache HBase on HAQM EMR with HAQM S3

John Hitchingham is Director of Performance Engineering at FINRA The Financial Industry Regulatory Authority (FINRA) is a private sector regulator responsible for analyzing 99% of the equities and 65% of the option activity in the US. In order to look for fraud, market manipulation, insider trading, and abuse, FINRA’s technology group has developed a robust […]

Dynamically Scale Applications on HAQM EMR with Auto Scaling

Jonathan Fritz is a Senior Product Manager for HAQM EMR Customers running Apache Spark, Presto, and the Apache Hadoop ecosystem take advantage of HAQM EMR’s elasticity to save costs by terminating clusters after workflows are complete and resizing clusters with low-cost HAQM EC2 Spot Instances. For instance, customers can create clusters for daily ETL or machine learning […]

Use Apache Flink on HAQM EMR

Today we are making it even easier to run Flink on AWS as it is now natively supported in HAQM EMR 5.1.0. EMR supports running Flink-on-YARN so you can create either a long-running cluster that accepts multiple jobs or a short-running Flink session in a transient cluster that helps reduce your costs by only charging you for the time that you use.

Encrypt Data At-Rest and In-Flight on HAQM EMR with Security Configurations

ustomers running analytics, stream processing, machine learning, and ETL workloads on personally identifiable information, health information, and financial data have strict requirements for encryption of data at-rest and in-transit. The Apache Spark and Hadoop ecosystems lend themselves to these big data use cases, and customers have asked us to provide a quick and easy way to encrypt data at-rest and data in-transit between nodes in each execution framework.