AWS Big Data Blog
Category: HAQM Elastic Kubernetes Service
Use Batch Processing Gateway to automate job management in multi-cluster HAQM EMR on EKS environments
AWS customers often process petabytes of data using HAQM EMR on EKS. In enterprise environments with diverse workloads or varying operational requirements, customers frequently choose a multi-cluster setup due to the following advantages: Better resiliency and no single point of failure – If one cluster fails, other clusters can continue processing critical workloads, maintaining business […]
How ZS built a clinical knowledge repository for semantic search using HAQM OpenSearch Service and HAQM Neptune
In this blog post, we will highlight how ZS Associates used multiple AWS services to build a highly scalable, highly performant, clinical document search platform. This platform is an advanced information retrieval system engineered to assist healthcare professionals and researchers in navigating vast repositories of medical documents, medical literature, research articles, clinical guidelines, protocol documents, […]
Introducing HAQM EMR on EKS with Apache Flink: A scalable, reliable, and efficient data processing platform
AWS recently announced that Apache Flink is generally available for HAQM EMR on HAQM Elastic Kubernetes Service (EKS). Apache Flink is a scalable, reliable, and efficient data processing framework that handles real-time streaming and batch workloads (but is most commonly used for real-time streaming). HAQM EMR on EKS is a deployment option for HAQM EMR […]
Set up fine-grained permissions for your data pipeline using MWAA and EKS
This blog post shows how to improve security in a data pipeline architecture based on HAQM Managed Workflows for Apache Airflow (HAQM MWAA) and HAQM Elastic Kubernetes Service (HAQM EKS) by setting up fine-grained permissions, using HashiCorp Terraform for infrastructure as code.
Introducing HAQM EMR on EKS job submission with Spark Operator and spark-submit
HAQM EMR on EKS provides a deployment option for HAQM EMR that allows organizations to run open-source big data frameworks on HAQM Elastic Kubernetes Service (HAQM EKS). With EMR on EKS, Spark applications run on the HAQM EMR runtime for Apache Spark. This performance-optimized runtime offered by HAQM EMR makes your Spark jobs run fast […]
How SOCAR handles large IoT data with HAQM MSK and HAQM ElastiCache for Redis
This is a guest blog post co-written with SangSu Park and JaeHong Ahn from SOCAR. As companies continue to expand their digital footprint, the importance of real-time data processing and analysis cannot be overstated. The ability to quickly measure and draw insights from data is critical in today’s business landscape, where rapid decision-making is key. […]
Build event-driven data pipelines using AWS Controllers for Kubernetes and HAQM EMR on EKS
An event-driven architecture is a software design pattern in which decoupled applications can asynchronously publish and subscribe to events via an event broker. By promoting loose coupling between components of a system, an event-driven architecture leads to greater agility and can enable components in the system to scale independently and fail without impacting other services. […]
Run fault tolerant and cost-optimized Spark clusters using HAQM EMR on EKS and HAQM EC2 Spot Instances
HAQM EMR on EKS is a deployment option in HAQM EMR that allows you to run Spark jobs on HAQM Elastic Kubernetes Service (HAQM EKS). HAQM Elastic Compute Cloud (HAQM EC2) Spot Instances save you up to 90% over On-Demand Instances, and is a great way to cost optimize the Spark workloads running on HAQM […]
Introducing ACK controller for HAQM EMR on EKS
AWS Controllers for Kubernetes (ACK) was announced in August, 2020, and now supports 14 AWS service controllers as generally available with an additional 12 in preview. The vision behind this initiative was simple: allow Kubernetes users to use the Kubernetes API to manage the lifecycle of AWS resources such as HAQM Simple Storage Service (HAQM […]
Design patterns to manage HAQM EMR on EKS workloads for Apache Spark
HAQM EMR on HAQM EKS enables you to submit Apache Spark jobs on demand on HAQM Elastic Kubernetes Service (HAQM EKS) without provisioning clusters. With EMR on EKS, you can consolidate analytical workloads with your other Kubernetes-based applications on the same HAQM EKS cluster to improve resource utilization and simplify infrastructure management. Kubernetes uses namespaces to provide isolation between […]