Unlock cost-effective AI inference using HAQM Bedrock serverless capabilities with an HAQM SageMaker trained model

In this post, I’ll show you how to use HAQM Bedrock—with its fully managed, on-demand API—with your HAQM SageMaker trained or fine-tuned model.

HAQM Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and HAQM through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Previously, if you wanted to use your own custom fine-tuned models in HAQM Bedrock, you either had to self-manage your inference infrastructure in SageMaker or train the models directly within HAQM Bedrock, which requires costly provisioned throughput.

With HAQM Bedrock Custom Model Import, you can use new or existing models that have been trained or fine-tuned within SageMaker using HAQM SageMaker JumpStart. You can import the supported architectures into HAQM Bedrock, allowing you to access them on demand through the HAQM Bedrock fully managed invoke model API.

Solution overview

At the time of writing, HAQM Bedrock supports importing custom models from the following architectures:

Mistral
Flan
Meta Llama 2 and Llama 3

For this post, we use a Hugging Face Flan-T5 Base model.

In the following sections, I walk you through the steps to train a model in SageMaker JumpStart and import it into HAQM Bedrock. Then you can interact with your custom model through the HAQM Bedrock playgrounds.

Prerequisites

Before you begin, verify that you have an AWS account with HAQM SageMaker Studio and HAQM Bedrock access.

If you don’t already have an instance of SageMaker Studio, see Launch HAQM SageMaker Studio for instructions to create one.

Train a model in SageMaker JumpStart

Complete the following steps to train a Flan model in SageMaker JumpStart:

Open the AWS Management Console and go to SageMaker Studio.

HAQM SageMaker Console

In SageMaker Studio, choose JumpStart in the navigation pane.

With SageMaker JumpStart, machine learning (ML) practitioners can choose from a broad selection of publicly available FMs using pre-built machine learning solutions that can be deployed in a few clicks.

Search for and choose the Hugging Face Flan-T5 Base

HAQM SageMaker JumpStart Page

On the model details page, you can review a short description of the model, how to deploy it, how to fine-tune it, and what format your training data needs to be in to customize the model.

Choose Train to begin fine-tuning the model on your training data.

Flan-T5 Base Model Card

Create the training job using the default settings. The defaults populate the training job with recommended settings.

The example in this post uses a prepopulated example dataset. When using your own data, enter its location in the Data section, making sure it meets the format requirements.

Fine-tune model page

Configure the security settings such as AWS Identity and Access Management (IAM) role, virtual private cloud (VPC), and encryption.
Note the value for Output artifact location (S3 URI) to use later.
Submit the job to start training.

You can monitor your job by selecting Training on the Jobs dropdown menu. When the training job status shows as Completed, the job has finished. With default settings, training takes about 10 minutes.

Training Jobs

Import the model into HAQM Bedrock

After the model has completed training, you can import it into HAQM Bedrock. Complete the following steps:

On the HAQM Bedrock console, choose Imported models under Foundation models in the navigation pane.
Choose Import model.

HAQM Bedrock - Custom Model Import

For Model name, enter a recognizable name for your model.
Under Model import settings, select HAQM SageMaker model and select the radio button next to your model.

Importing a model from HAQM SageMaker

Under Service access, select Create and use a new service role and enter a name for the role.
Choose Import model.

Creating a new service role

The model import will complete in about 15 minutes.

Successful model import

Under Playgrounds in the navigation pane, choose Text.
Choose Select model.

Using the model in HAQM Bedrock text playground

For Category, choose Imported models.
For Model, choose flan-t5-fine-tuned.
For Throughput, choose On-demand.
Choose Apply.

Selecting the fine-tuned model for use

You can now interact with your custom model. In the following screenshot, we use our example custom model to summarize a description about HAQM Bedrock.

Using the fine-tuned model

Clean up

Complete the following steps to clean up your resources:

If you’re not going to continue using SageMaker, delete your SageMaker domain.
If you no longer want to maintain your model artifacts, delete the HAQM Simple Storage Service (HAQM S3) bucket where your model artifacts are stored.
To delete your imported model from HAQM Bedrock, on the Imported models page on the HAQM Bedrock console, select your model, and then choose the options menu (three dots) and select Delete.

Clean-Up

Conclusion

In this post, we explored how the Custom Model Import feature in HAQM Bedrock enables you to use your own custom trained or fine-tuned models for on-demand, cost-efficient inference. By integrating SageMaker model training capabilities with the fully managed, scalable infrastructure of HAQM Bedrock, you now have a seamless way to deploy your specialized models and make them accessible through a simple API.

Whether you prefer the user-friendly SageMaker Studio console or the flexibility of SageMaker notebooks, you can train and import your models into HAQM Bedrock. This allows you to focus on developing innovative applications and solutions, without the burden of managing complex ML infrastructure.

As the capabilities of large language models continue to evolve, the ability to integrate custom models into your applications becomes increasingly valuable. With the HAQM Bedrock Custom Model Import feature, you can now unlock the full potential of your specialized models and deliver tailored experiences to your customers, all while benefiting from the scalability and cost-efficiency of a fully managed service.

To dive deeper into fine-tuning on SageMaker, see Instruction fine-tuning for FLAN T5 XL with HAQM SageMaker Jumpstart. To get more hands-on experience with HAQM Bedrock, check out our Building with HAQM Bedrock workshop.

About the Author

Joseph Sadler is a Senior Solutions Architect on the Worldwide Public Sector team at AWS, specializing in cybersecurity and machine learning. With public and private sector experience, he has expertise in cloud security, artificial intelligence, threat detection, and incident response. His diverse background helps him architect robust, secure solutions that use cutting-edge technologies to safeguard mission-critical systems

AWS Machine Learning Blog