AWS Machine Learning Blog
Category: HAQM EC2
Accelerate NLP inference with ONNX Runtime on AWS Graviton processors
ONNX is an open source machine learning (ML) framework that provides interoperability across a wide range of frameworks, operating systems, and hardware platforms. ONNX Runtime is the runtime engine used for model inference and training with ONNX. AWS Graviton3 processors are optimized for ML workloads, including support for bfloat16, Scalable Vector Extension (SVE), and Matrix […]
Scale AI training and inference for drug discovery through HAQM EKS and Karpenter
This is a guest post co-written with the leadership team of Iambic Therapeutics. Iambic Therapeutics is a drug discovery startup with a mission to create innovative AI-driven technologies to bring better medicines to cancer patients, faster. Our advanced generative and predictive artificial intelligence (AI) tools enable us to search the vast space of possible drug […]
Large language model inference over confidential data using AWS Nitro Enclaves
This post discusses how Nitro Enclaves can help protect LLM model deployments, specifically those that use personally identifiable information (PII) or protected health information (PHI). This post is for educational purposes only and should not be used in production environments without additional controls.
Introducing three new NVIDIA GPU-based HAQM EC2 instances
HAQM Elastic Compute Cloud (HAQM EC2) accelerated computing portfolio offers the broadest choice of accelerators to power your artificial intelligence (AI), machine learning (ML), graphics, and high performance computing (HPC) workloads. We are excited to announce the expansion of this portfolio with three new instances featuring the latest NVIDIA GPUs: HAQM EC2 P5e instances powered […]
HAQM EC2 DL2q instance for cost-efficient, high-performance AI inference is now generally available
This is a guest post by A.K Roy from Qualcomm AI. HAQM Elastic Compute Cloud (HAQM EC2) DL2q instances, powered by Qualcomm AI 100 Standard accelerators, can be used to cost-efficiently deploy deep learning (DL) workloads in the cloud. They can also be used to develop and validate performance and accuracy of DL workloads that […]
Enable pod-based GPU metrics in HAQM CloudWatch
This post details how to set up container-based GPU metrics and provides an example of collecting these metrics from EKS pods.
Maximize Stable Diffusion performance and lower inference costs with AWS Inferentia2
Generative AI models have been experiencing rapid growth in recent months due to its impressive capabilities in creating realistic text, images, code, and audio. Among these models, Stable Diffusion models stand out for their unique strength in creating high-quality images based on text prompts. Stable Diffusion can generate a wide variety of high-quality images, including […]
Accelerate PyTorch with DeepSpeed to train large language models with Intel Habana Gaudi-based DL1 EC2 instances
Training large language models (LLMs) with billions of parameters can be challenging. In addition to designing the model architecture, researchers need to set up state-of-the-art training techniques for distributed training like mixed precision support, gradient accumulation, and checkpointing. With large models, the training setup is even more challenging because the available memory in a single […]
Optimized PyTorch 2.0 inference with AWS Graviton processors
New generations of CPUs offer a significant performance improvement in machine learning (ML) inference due to specialized built-in instructions. Combined with their flexibility, high speed of development, and low operating cost, these general-purpose processors offer an alternative to other existing hardware solutions. AWS, Arm, Meta and others helped optimize the performance of PyTorch 2.0 inference […]
Modulate makes voice chat safer while reducing infrastructure costs by a factor of 5 with HAQM EC2 G5g instances
This is a guest post by Carter Huffman, CTO and Co-founder at Modulate. Modulate is a Boston-based startup on a mission to build richer, safer, more inclusive online gaming experiences for everyone. We’re a team of world-class audio experts, gamers, allies, and futurists who are eager to build a better online world and make voice […]