Posted On: Apr 13, 2023

Today, AWS announces the general availability of HAQM Elastic Compute Cloud (HAQM EC2) Inf2 instances. These instances deliver high performance at the lowest cost in HAQM EC2 for generative AI models including large language models (LLMs) and vision transformers. Inf2 instances are powered by up to 12 AWS Inferentia2 chips, the latest AWS designed deep learning (DL) accelerator. They deliver up to four times higher throughput and up to 10 times lower latency than first-generation HAQM EC2 Inf1 instances.

You can use Inf2 instances to run popular applications such as text summarization, code generation, video and image generation, speech recognition, personalization, and more. Inf2 instances are the first inference-optimized instances in HAQM EC2 to introduce scale-out distributed inference supported by NeuronLink, a high-speed, nonblocking interconnect. You can now efficiently deploy models with hundreds of billions of parameters across multiple accelerators on Inf2 instances. Inf2 instances deliver up to three times higher throughput, up to eight times lower latency, and up to 40% better price performance than other comparable HAQM EC2 instances. To help you meet your sustainability goals, Inf2 instances offer up to 50% better performance per watt compared to other comparable HAQM EC2 instances.

Inf2 instances offer up to 2.3 petaflops of DL performance and up to 384 GB of total accelerator memory with 9.8 TB/s bandwidth. AWS Neuron SDK integrates natively with popular machine learning frameworks, such as PyTorch and TensorFlow. So, you can continue using your existing frameworks and application code to deploy on Inf2. Developers can get started with Inf2 instances using AWS Deep Learning AMIs, AWS Deep Learning Containers, or managed services such as HAQM Elastic Container Service (HAQM ECS), HAQM Elastic Kubernetes Service (HAQM EKS), and HAQM SageMaker.

Inf2 instances are available in four sizes: inf2.xlarge, inf2.8xlarge, inf2.24xlarge, inf2.48xlarge in the following AWS Regions as On-Demand Instances, Reserved Instances, and Spot Instances, or as part of a Savings Plan: US East (N. Virginia) and US East (Ohio). 

To learn more about Inf2 instances, see the HAQM EC2 Inf2 Instances webpage and the AWS Neuron Documentation.