AWS Machine Learning Blog
Category: HAQM SageMaker
Use HAQM Bedrock tooling with HAQM SageMaker JumpStart models
In this post, we explore how to deploy AI models from SageMaker JumpStart and use them with HAQM Bedrock’s powerful features. Users can combine SageMaker JumpStart’s model hosting with Bedrock’s security and monitoring tools. We demonstrate this using the Gemma 2 9B Instruct model as an example, showing how to deploy it and use Bedrock’s advanced capabilities.
Build generative AI applications quickly with HAQM Bedrock in SageMaker Unified Studio
In this post, we’ll show how anyone in your company can use HAQM Bedrock in SageMaker Unified Studio to quickly create a generative AI chat agent application that analyzes sales performance data. Through simple conversations, business teams can use the chat agent to extract valuable insights from both structured and unstructured data sources without writing code or managing complex data pipelines.
Scale ML workflows with HAQM SageMaker Studio and HAQM SageMaker HyperPod
The integration of HAQM SageMaker Studio and HAQM SageMaker HyperPod offers a streamlined solution that provides data scientists and ML engineers with a comprehensive environment that supports the entire ML lifecycle, from development to deployment at scale. In this post, we walk you through the process of scaling your ML workloads using SageMaker Studio and SageMaker HyperPod.
Building Generative AI and ML solutions faster with AI apps from AWS partners using HAQM SageMaker
Today, we’re excited to announce that AI apps from AWS Partners are now available in SageMaker. You can now find, deploy, and use these AI apps privately and securely, all without leaving SageMaker AI, so you can develop performant AI models faster.
HAQM SageMaker launches the updated inference optimization toolkit for generative AI
Today, HAQM SageMaker is excited to announce updates to the inference optimization toolkit, providing new functionality and enhancements to help you optimize generative AI models even faster.In this post, we discuss these new features of the toolkit in more detail.
Speed up your AI inference workloads with new NVIDIA-powered capabilities in HAQM SageMaker
At re:Invent 2024, we are excited to announce new capabilities to speed up your AI inference workloads with NVIDIA accelerated computing and software offerings on HAQM SageMaker. In this post, we will explore how you can use these new capabilities to enhance your AI inference on HAQM SageMaker. We’ll walk through the process of deploying NVIDIA NIM microservices from AWS Marketplace for SageMaker Inference. We’ll then dive into NVIDIA’s model offerings on SageMaker JumpStart, showcasing how to access and deploy the Nemotron-4 model directly in the JumpStart interface. This will include step-by-step instructions on how to find the Nemotron-4 model in the JumpStart catalog, select it for your use case, and deploy it with a few clicks.
Unlock cost savings with the new scale down to zero feature in SageMaker Inference
Today at AWS re:Invent 2024, we are excited to announce a new feature for HAQM SageMaker inference endpoints: the ability to scale SageMaker inference endpoints to zero instances. This long-awaited capability is a game changer for our customers using the power of AI and machine learning (ML) inference in the cloud.
Supercharge your auto scaling for generative AI inference – Introducing Container Caching in SageMaker Inference
Today at AWS re:Invent 2024, we are excited to announce the new Container Caching capability in HAQM SageMaker, which significantly reduces the time required to scale generative AI models for inference. This innovation allows you to scale your models faster, observing up to 56% reduction in latency when scaling a new model copy and up to 30% when adding a model copy on a new instance. In this post, we explore the new Container Caching feature for SageMaker inference, addressing the challenges of deploying and scaling large language models (LLMs).
Introducing Fast Model Loader in SageMaker Inference: Accelerate autoscaling for your Large Language Models (LLMs) – part 1
Today at AWS re:Invent 2024, we are excited to announce a new capability in HAQM SageMaker Inference that significantly reduces the time required to deploy and scale LLMs for inference using LMI: Fast Model Loader. In this post, we delve into the technical details of Fast Model Loader, explore its integration with existing SageMaker workflows, discuss how you can get started with this powerful new feature, and share customer success stories.
Introducing Fast Model Loader in SageMaker Inference: Accelerate autoscaling for your Large Language Models (LLMs) – Part 2
In this post, we provide a detailed, hands-on guide to implementing Fast Model Loader in your LLM deployments. We explore two approaches: using the SageMaker Python SDK for programmatic implementation, and using the HAQM SageMaker Studio UI for a more visual, interactive experience. Whether you’re a developer who prefers working with code or someone who favors a graphical interface, you’ll learn how to take advantage of this powerful feature to accelerate your LLM deployments.