AWS Big Data Blog

Category: HAQM EMR on EKS

Run Apache Spark with HAQM EMR on EKS backed by HAQM FSx for Lustre storage

September 2023: This post was reviewed and updated for accuracy to reflect recent improvements and changes. Traditionally, Spark workloads have been run on a dedicated setup like a Hadoop stack with YARN or MESOS as a resource manager. Starting from Apache Spark 2.3, Spark added support for Kubernetes as a resource manager. The new Kubernetes […]

Removing complexity to improve business performance: How Bridgewater Associates built a scalable, secure, Spark-based research service on AWS

This is a guest post co-written by Sergei Dubinin, Oleksandr Ierenkov, Illia Popov and Joel Thompson, from Bridgewater. Bridgewater’s core mission is to understand how the world works by analyzing the drivers of markets and turning that understanding into high-quality portfolios and investment advice for our clients. Within Bridgewater Technology, we strive to make our […]

HAQM EMR on EKS gets up to 19% performance boost running on AWS Graviton3 Processors vs. Graviton2

HAQM EMR on EKS is a deployment option that enables you to run Spark workloads on HAQM Elastic Kubernetes Service (HAQM EKS) easily. It allows you to innovate faster with the latest Apache Spark on Kubernetes architecture while benefiting from the performance-optimized Spark runtime powered by HAQM EMR. This deployment option elects HAQM EKS as […]

Walkthrough Overview

Design patterns to manage HAQM EMR on EKS workloads for Apache Spark

HAQM EMR on HAQM EKS enables you to submit Apache Spark jobs on demand on HAQM Elastic Kubernetes Service (HAQM EKS) without provisioning clusters. With EMR on EKS, you can consolidate analytical workloads with your other Kubernetes-based applications on the same HAQM EKS cluster to improve resource utilization and simplify infrastructure management. Kubernetes uses namespaces to provide isolation between […]

Stream HAQM EMR on EKS logs to third-party providers like Splunk, HAQM OpenSearch Service, or other log aggregators

Spark jobs running on HAQM EMR on EKS generate logs that are very useful in identifying issues with Spark processes and also as a way to see Spark outputs. You can access these logs from a variety of sources. On the HAQM EMR virtual cluster console, you can access logs from the Spark History UI. […]

HAQM EMR on HAQM EKS provides up to 61% lower costs and up to 68% performance improvement for Spark workloads

HAQM EMR on HAQM EKS is a deployment option offered by HAQM EMR that enables you to run Apache Spark applications on HAQM Elastic Kubernetes Service (HAQM EKS) in a cost-effective manner. It uses the EMR runtime for Apache Spark to increase performance so that your jobs run faster and cost less. In our benchmark […]

How SailPoint solved scaling issues by migrating legacy big data applications to HAQM EMR on HAQM EKS

This post is co-written with Richard Li from SailPoint. SailPoint Technologies is an identity security company based in Austin, TX. Its software as a service (SaaS) solutions support identity governance operations in regulated industries such as healthcare, government, and higher education. SailPoint distinguishes multiple aspects of identity as individual identity security services, including cloud governance, […]

Configure HAQM EMR Studio and HAQM EKS to run notebooks with HAQM EMR on EKS

HAQM EMR on HAQM EKS provides a deployment option for HAQM EMR that allows you to run analytics workloads on HAQM Elastic Kubernetes Service (HAQM EKS). This is an attractive option because it allows you to run applications on a common pool of resources without having to provision infrastructure. In addition, you can use HAQM […]

Reduce costs and increase resource utilization of Apache Spark jobs on Kubernetes with HAQM EMR on HAQM EKS

HAQM EMR on HAQM EKS is a deployment option for HAQM EMR that allows you to run Apache Spark on HAQM Elastic Kubernetes Service (HAQM EKS). If you run open-source Apache Spark on HAQM EKS, you can now use HAQM EMR to automate provisioning and management, and run Apache Spark up to three times faster. […]

Run and debug Apache Spark applications on AWS with HAQM EMR on HAQM EKS

Customers today want to focus more on their core business model and less on the underlying infrastructure and operational burden. As customers migrate to the AWS Cloud, they’re realizing the benefits of being able to innovate faster on their own applications by relying on AWS to handle big data platforms, operations, and automation. Many of […]