AWS Big Data Blog
Tag: HAQM EMR
Metadata classification, lineage, and discovery using Apache Atlas on HAQM EMR
This blog post was last reviewed and updated April, 2022. The code repositories used in this blog have been reviewed and updated to fix the solution With the ever-evolving and growing role of data in today’s world, data governance is an essential aspect of effective data management. Many organizations use a data lake as a […]
Reduce costs by migrating Apache Spark and Hadoop to HAQM EMR
Apache Spark and Hadoop are popular frameworks to process data for analytics, often at a fraction of the cost of legacy approaches, yet at scale they may still become expensive propositions. This blog post discusses ways to reduce your total costs of ownership, while also improving staff productivity at the same time. This can be […]
Best Practices for Securing HAQM EMR
This post walks you through some of the principles of HAQM EMR security. It also describes features that you can use in HAQM EMR to help you meet the security and compliance objectives for your business. We cover some common security best practices that we see used. We also show some sample configurations to get you started.
Dynamically scale up storage on HAQM EMR clusters
February 2025: The bootstrap action script in this blog post uses IMDS v1 for accessing EC2 instance metadata. The script does not support IMDS v2 and cannot be used in an AWS account which has IMDS v2 enforced across the account. Using the script in an IMDS v2 enabled account will cause issues and unexpected […]
Getting started: Training resources for Big Data on AWS
Whether you’ve just signed up for your first AWS account or you’ve been with us for some time, there’s always something new to learn as our services evolve to meet the ever-changing needs of our customers. To help ensure you’re set up for success as you build with AWS, we put together this quick reference guide for Big Data training and resources available here on the AWS site.
How to migrate a Hue database from an existing HAQM EMR cluster
This post describes the step-by-step process for migrating the Hue database from an existing EMR cluster.
Easily manage table metadata for Presto running on HAQM EMR using the AWS Glue Data Catalog
In this post, we will explore how the AWS Glue Data Catalog addresses discoverability and manageability for table metadata for Presto on HAQM EMR.
Build a Multi-Tenant HAQM EMR Cluster with Kerberos, Microsoft Active Directory Integration and IAM Roles for EMRFS
In this post, we will discuss what EMRFS authorization is (HAQM S3 storage-level access control) and show how to configure the role mappings with detailed examples.
Dynamically Create Friendly URLs for Your HAQM EMR Web Interfaces
This solution provides a serverless approach to automatically assigning a friendly name for your EMR cluster for easy access to popular notebooks and other web interfaces.
Use Kerberos Authentication to Integrate HAQM EMR with Microsoft Active Directory
This post walks you through the process of using AWS CloudFormation to set up a cross-realm trust and extend authentication from an Active Directory network into an HAQM EMR cluster with Kerberos enabled. By establishing a cross-realm trust, Active Directory users can use their Active Directory credentials to access an HAQM EMR cluster and run jobs as themselves.