AWS Machine Learning Blog
Category: HAQM Managed Service for Prometheus
Open source observability for AWS Inferentia nodes within HAQM EKS clusters
This post walks you through the Open Source Observability pattern for AWS Inferentia, which shows you how to monitor the performance of ML chips, used in an HAQM Elastic Kubernetes Service (HAQM EKS) cluster, with data plane nodes based on HAQM Elastic Compute Cloud (HAQM EC2) instances of type Inf1 and Inf2.