AWS Big Data Blog
Category: HAQM EMR on EKS
HAQM EMR on EKS gets up to 19% performance boost running on AWS Graviton3 Processors vs. Graviton2
HAQM EMR on EKS is a deployment option that enables you to run Spark workloads on HAQM Elastic Kubernetes Service (HAQM EKS) easily. It allows you to innovate faster with the latest Apache Spark on Kubernetes architecture while benefiting from the performance-optimized Spark runtime powered by HAQM EMR. This deployment option elects HAQM EKS as […]
Design patterns to manage HAQM EMR on EKS workloads for Apache Spark
HAQM EMR on HAQM EKS enables you to submit Apache Spark jobs on demand on HAQM Elastic Kubernetes Service (HAQM EKS) without provisioning clusters. With EMR on EKS, you can consolidate analytical workloads with your other Kubernetes-based applications on the same HAQM EKS cluster to improve resource utilization and simplify infrastructure management. Kubernetes uses namespaces to provide isolation between […]
Stream HAQM EMR on EKS logs to third-party providers like Splunk, HAQM OpenSearch Service, or other log aggregators
Spark jobs running on HAQM EMR on EKS generate logs that are very useful in identifying issues with Spark processes and also as a way to see Spark outputs. You can access these logs from a variety of sources. On the HAQM EMR virtual cluster console, you can access logs from the Spark History UI. […]
HAQM EMR on HAQM EKS provides up to 61% lower costs and up to 68% performance improvement for Spark workloads
HAQM EMR on HAQM EKS is a deployment option offered by HAQM EMR that enables you to run Apache Spark applications on HAQM Elastic Kubernetes Service (HAQM EKS) in a cost-effective manner. It uses the EMR runtime for Apache Spark to increase performance so that your jobs run faster and cost less. In our benchmark […]
How SailPoint solved scaling issues by migrating legacy big data applications to HAQM EMR on HAQM EKS
This post is co-written with Richard Li from SailPoint. SailPoint Technologies is an identity security company based in Austin, TX. Its software as a service (SaaS) solutions support identity governance operations in regulated industries such as healthcare, government, and higher education. SailPoint distinguishes multiple aspects of identity as individual identity security services, including cloud governance, […]
Configure HAQM EMR Studio and HAQM EKS to run notebooks with HAQM EMR on EKS
HAQM EMR on HAQM EKS provides a deployment option for HAQM EMR that allows you to run analytics workloads on HAQM Elastic Kubernetes Service (HAQM EKS). This is an attractive option because it allows you to run applications on a common pool of resources without having to provision infrastructure. In addition, you can use HAQM […]
Reduce costs and increase resource utilization of Apache Spark jobs on Kubernetes with HAQM EMR on HAQM EKS
HAQM EMR on HAQM EKS is a deployment option for HAQM EMR that allows you to run Apache Spark on HAQM Elastic Kubernetes Service (HAQM EKS). If you run open-source Apache Spark on HAQM EKS, you can now use HAQM EMR to automate provisioning and management, and run Apache Spark up to three times faster. […]
Run and debug Apache Spark applications on AWS with HAQM EMR on HAQM EKS
Customers today want to focus more on their core business model and less on the underlying infrastructure and operational burden. As customers migrate to the AWS Cloud, they’re realizing the benefits of being able to innovate faster on their own applications by relying on AWS to handle big data platforms, operations, and automation. Many of […]
Run a Spark SQL-based ETL pipeline with HAQM EMR on HAQM EKS
Increasingly, a business’s success depends on its agility in transforming data into actionable insights, which requires efficient and automated data processes. In the previous post – Build a SQL-based ETL pipeline with Apache Spark on HAQM EKS, we described a common productivity issue in a modern data architecture. To address the challenge, we demonstrated how to utilize a declarative approach as the key enabler to improve efficiency, which resulted in a faster time to value for businesses. Generally speaking, managing applications declaratively in Kubernetes is a widely adopted best practice. You can use the same approach to build and deploy Spark applications with open-source or in-house build frameworks to achieve the same productivity goal.
Manage and process your big data workflows with HAQM MWAA and HAQM EMR on HAQM EKS
Many customers are gathering large amount of data, generated from different sources such as IoT devices, clickstream events from websites, and more. To efficiently extract insights from the data, you have to perform various transformations and apply different business logic on your data. These processes require complex workflow management to schedule jobs and manage dependencies […]