HAQM EMR on EKS | AWS Big Data Blog

Introducing HAQM EMR on EKS job submission with Spark Operator and spark-submit

HAQM EMR on EKS provides a deployment option for HAQM EMR that allows organizations to run open-source big data frameworks on HAQM Elastic Kubernetes Service (HAQM EKS). With EMR on EKS, Spark applications run on the HAQM EMR runtime for Apache Spark. This performance-optimized runtime offered by HAQM EMR makes your Spark jobs run fast […]

Improve reliability and reduce costs of your Apache Spark workloads with vertical autoscaling on HAQM EMR on EKS

HAQM EMR on HAQM EKS is a deployment option offered by HAQM EMR that enables you to run Apache Spark applications on HAQM Elastic Kubernetes Service (HAQM EKS) in a cost-effective manner. It uses the EMR runtime for Apache Spark to increase performance so that your jobs run faster and cost less. Apache Spark allows […]

HAQM EMR on EKS widens the performance gap: Run Apache Spark workloads 5.37 times faster and at 4.3 times lower cost

HAQM EMR on EKS provides a deployment option for HAQM EMR that allows organizations to run open-source big data frameworks on HAQM Elastic Kubernetes Service (HAQM EKS). With EMR on EKS, Spark applications run on the HAQM EMR runtime for Apache Spark. This performance-optimized runtime offered by HAQM EMR makes your Spark jobs run fast […]

Build event-driven data pipelines using AWS Controllers for Kubernetes and HAQM EMR on EKS

An event-driven architecture is a software design pattern in which decoupled applications can asynchronously publish and subscribe to events via an event broker. By promoting loose coupling between components of a system, an event-driven architecture leads to greater agility and can enable components in the system to scale independently and fail without impacting other services. […]

How SafeGraph built a reliable, efficient, and user-friendly Apache Spark platform with HAQM EMR on HAQM EKS

This is a guest post by Nan Zhu, Tech Lead Manager, SafeGraph, and Dave Thibault, Sr. Solutions Architect – AWS SafeGraph is a geospatial data company that curates over 41 million global points of interest (POIs) with detailed attributes, such as brand affiliation, advanced category tagging, and open hours, as well as how people interact […]

Accelerate your data exploration and experimentation with the AWS Analytics Reference Architecture library

Organizations use their data to solve complex problems by starting small, running iterative experiments, and refining the solution. Although the power of experiments can’t be ignored, organizations have to be cautious about the cost-effectiveness of such experiments. If time is spent creating the underlying infrastructure for enabling experiments, it further adds to the cost. Developers […]

Run fault tolerant and cost-optimized Spark clusters using HAQM EMR on EKS and HAQM EC2 Spot Instances

HAQM EMR on EKS is a deployment option in HAQM EMR that allows you to run Spark jobs on HAQM Elastic Kubernetes Service (HAQM EKS). HAQM Elastic Compute Cloud (HAQM EC2) Spot Instances save you up to 90% over On-Demand Instances, and is a great way to cost optimize the Spark workloads running on HAQM […]

Introducing ACK controller for HAQM EMR on EKS

AWS Controllers for Kubernetes (ACK) was announced in August, 2020, and now supports 14 AWS service controllers as generally available with an additional 12 in preview. The vision behind this initiative was simple: allow Kubernetes users to use the Kubernetes API to manage the lifecycle of AWS resources such as HAQM Simple Storage Service (HAQM […]

Use Karpenter to speed up HAQM EMR on EKS autoscaling

HAQM EMR on HAQM EKS is a deployment option for HAQM EMR that allows organizations to run Apache Spark on HAQM Elastic Kubernetes Service (HAQM EKS). With EMR on EKS, the Spark jobs run on the HAQM EMR runtime for Apache Spark. This increases the performance of your Spark jobs so that they run faster […]

Get a quick start with Apache Hudi, Apache Iceberg, and Delta Lake with HAQM EMR on EKS

A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. You can keep your data as is in your object store or file-based storage without having to first structure the data. Additionally, you can run different types of analytics against your loosely formatted data […]

Category: HAQM EMR on EKS