AWS Machine Learning Blog

Tag: SageMaker Inference

Get started with NVIDIA NIM Inference Microservices on HAQM SageMaker

Accelerate Generative AI Inference with NVIDIA NIM Microservices on HAQM SageMaker

In this post, we provide a walkthrough of how customers can use generative artificial intelligence (AI) models and LLMs using NVIDIA NIM integration with SageMaker. We demonstrate how this integration works and how you can deploy these state-of-the-art models on SageMaker, optimizing their performance and cost.

Model hosting patterns in HAQM SageMaker, Part 3: Run and optimize multi-model inference with HAQM SageMaker multi-model endpoints

HAQM SageMaker multi-model endpoint (MME) enables you to cost-effectively deploy and host multiple models in a single endpoint and then horizontally scale the endpoint to achieve scale. As illustrated in the following figure, this is an effective technique to implement multi-tenancy of models within your machine learning (ML) infrastructure. We have seen software as a […]