AWS Cloud Operations Blog
Category: HAQM EMR
Monitoring HAQM EMR on EKS with HAQM Managed Prometheus and HAQM Managed Grafana
Apache Spark is an open-source lightning-fast cluster computing framework built for distributed data processing. With the combination of Cloud, Spark delivers high performance for both batch and real-time data processing at a petabyte scale. Spark on Kubernetes is supported from Spark 2.3 onwards, and it gained a lot of traction among enterprises for high performance and […]
Collecting Apache Flink metrics in the HAQM CloudWatch agent
Apache Flink is a distributed stream processing engine. You can run Flink on HAQM EMR as a YARN application. You can view Flink metrics through its web UI, but what if you want to react to them? In this blog post, I’ll show you how to use the CloudWatch agent to collect Flink metrics into […]
Using AWS Systems Manager Run Command to submit Spark/Hadoop jobs on HAQM EMR
Many customers use HAQM EMR with Apache Spark to build scalable big data pipelines. For large-scale production pipelines, a common use case is to read complex data from a variety of sources. This data must be transformed to make it useful to downstream applications, such as machine learning pipelines, analytics dashboards, and business reports. Such […]