Containers

Tag: HAQM EKS

Optimize your container workloads for sustainability

This blog was authored by Karthik Rajendran, Senior Solutions Architect (AWS) and Isha Dua, Senior Solutions Architect (AWS).  The software architect’s job is mostly one of trade-offs, weighing the considerations of different approaches and then choosing the one that strikes the best balance. Some architects are surprised to find that, in the AWS Cloud at least, architecting […]

Monitoring and automating recovery from AZ impairments in HAQM EKS with Istio and ARC Zonal Shift

Introduction Running microservice-style architectures in the cloud can quickly become a complex operation. Teams must account for a growing number of moving pieces, such as multiple instances of independent workloads, along with their infrastructure dependencies. These components can then be distributed across different topology domains, such as multiple HAQM Elastic Compute Cloud (HAQM EC2) instances, […]

HAQM EKS and Kubernetes sessions at AWS re:Invent 2024

Introduction AWS re:Invent 2024, the annual HAQM Web Services conference, is fast approaching. This year’s event will feature a full track of sessions focused on Kubernetes and other cloud-native technologies. To help you navigate the extensive session catalog, we’ve compiled a list of sessions around Kubernetes and cloud-native related topics. They have been grouped by […]

HAQM EKS now supports HAQM Application Recovery Controller

Introduction HAQM Elastic Kubernetes Service (HAQM EKS) now supports HAQM Application Recovery Controller (ARC). ARC is an AWS service that allows you to prepare for and recover from AWS Region or Availability Zone (AZ) impairments. ARC provides two sets of capabilities: Multi-AZ recovery, which includes zonal shift and zonal autoshift, and multi-Region recovery, which includes routing […]

HAQM EKS optimized HAQM Linux 2023 accelerated AMIs now available

Introduction Earlier this year we announced support for HAQM EKS optimized AL2023 AMIs that provided many enhancements in terms of security and performance. HAQM Linux 2023 (AL2023) is the next generation of HAQM Linux from HAQM Web Services (AWS) and is designed to provide a secure, stable, and high-performance environment to develop and run your […]

Scaling a Large Language Model with NVIDIA NIM on HAQM EKS with Karpenter

Many organizations are building artificial intelligence (AI) applications using Large Language Models (LLMs) to deliver new experiences to their customers, from content creation to customer service and data analysis. However, the substantial size and intensive computational requirements of these models may have challenges in configuring, deploying, and scaling them effectively on graphic processing units (GPUs). […]

Inside Pinterest’s Custom Spark Job logging and monitoring on HAQM EKS: Using AWS for Fluent Bit, HAQM S3, and ADOT

In Part 1, we explored Moka’s high-level design and logging infrastructure, showcasing how AWS for Fluent Bit, HAQM S3, and a robust logging framework make sure of operational visibility and facilitate issue resolution. For more details, read part 1 here. Introduction As we transition to the second part of our series, our focus shifts to […]

Automating custom HAQM EKS worker node builds using EC2 Image Builder

Customers who are building their “Golden Image” HAQM Machine Images (AMIs) using EC2 Image Builder may wish to extend their Image Builder pipelines to build out their HAQM Elastic Kubernetes Service (HAQM EKS) worker nodes as well. In this blog, we will show you how to do this and provide you with AWS CloudFormation templates […]

Powering the Next Generation of AI Workloads on HAQM EKS with Anyscale

Ray is an open-source framework that manages, executes, and optimizes compute needs for AI workloads. It is designed to make it easy to write parallel and distributed Python applications by providing a simple and intuitive API for distributed computing. Ray unifies infrastructure by leveraging any compute instance and accelerator on AWS via a single, flexible […]

Announcing AWS Neuron Helm Chart

Introduction We are pleased to announce the launch of the Neuron Helm Chart, which streamlines the deployment of AWS Neuron components on HAQM Elastic Kubernetes Service (HAQM EKS). With this new Helm Chart, you can now seamlessly install the necessary Kubernetes artifacts needed to run training and inference workloads on AWS Trainium and AWS Inferentia instances. Until now, […]