AWS Machine Learning Blog

Category: HAQM SageMaker

Easily deploy and manage hundreds of LoRA adapters with SageMaker efficient multi-adapter inference

The new efficient multi-adapter inference feature of HAQM SageMaker unlocks exciting possibilities for customers using fine-tuned models. This capability integrates with SageMaker inference components to allow you to deploy and manage hundreds of fine-tuned Low-Rank Adaptation (LoRA) adapters through SageMaker APIs. In this post, we show how to use the new efficient multi-adapter inference feature in SageMaker.

Embodied AI Chess with HAQM Bedrock

In this post, we demonstrate Embodied AI Chess with HAQM Bedrock, bringing a new dimension to traditional chess through generative AI capabilities. Our setup features a smart chess board that can detect moves in real time, paired with two robotic arms executing those moves. Each arm is controlled by different FMs—base or custom. This physical implementation allows you to observe and experiment with how different generative AI models approach complex gaming strategies in real-world chess matches.

Efficiently train models with large sequence lengths using HAQM SageMaker model parallel

In this post, we demonstrate how the HAQM SageMaker model parallel library (SMP) addresses this need through support for new features such as 8-bit floating point (FP8) mixed-precision training for accelerated training performance and context parallelism for processing large input sequence lengths, expanding the list of its existing features.

Flow diagram of custom hallucination detection and mitigation : The user's question is fed to a search engine (with optional LLM-based step to pre-process it to a good search query). The documents or snippets returned by the search engine, together with the user's question, are inserted into a prompt template - and an LLM generates a final answer based on the retrieved documents. The final answer can be evaluated against the reference answer from the dataset to get a custom hallucination score. Based on a pre-defined empirical threshold, a customer service agent is requested to join the conversation using SNS notification

Reducing hallucinations in large language models with custom intervention using HAQM Bedrock Agents

This post demonstrates how to use HAQM Bedrock Agents, HAQM Knowledge Bases, and the RAGAS evaluation metrics to build a custom hallucination detector and remediate it by using human-in-the-loop. The agentic workflow can be extended to custom use cases through different hallucination remediation techniques and offers the flexibility to detect and mitigate hallucinations using custom actions.

Using LLMs to fortify cyber defenses: Sophos’s insight on strategies for using LLMs with HAQM Bedrock and HAQM SageMaker

In this post, SophosAI shares insights in using and evaluating an out-of-the-box LLM for the enhancement of a security operations center’s (SOC) productivity using HAQM Bedrock and HAQM SageMaker. We use Anthropic’s Claude 3 Sonnet on HAQM Bedrock to illustrate the use cases.

Apply HAQM SageMaker Studio lifecycle configurations using AWS CDK

This post serves as a step-by-step guide on how to set up lifecycle configurations for your HAQM SageMaker Studio domains. With lifecycle configurations, system administrators can apply automated controls to their SageMaker Studio domains and their users. We cover core concepts of SageMaker Studio and provide code examples of how to apply lifecycle configuration to […]

Rad AI reduces real-time inference latency by 50% using HAQM SageMaker

This post is co-written with Ken Kao and Hasan Ali Demirci from Rad AI. Rad AI has reshaped radiology reporting, developing solutions that streamline the most tedious and repetitive tasks, and saving radiologists’ time. Since 2018, using state-of-the-art proprietary and open source large language models (LLMs), our flagship product—Rad AI Impressions— has significantly reduced the […]

How Crexi achieved ML models deployment on AWS at scale and boosted efficiency

Commercial Real Estate Exchange, Inc. (Crexi), is a digital marketplace and platform designed to streamline commercial real estate transactions. In this post, we will review how Crexi achieved its business needs and developed a versatile and powerful framework for AI/ML pipeline creation and deployment. This customizable and scalable solution allows its ML models to be efficiently deployed and managed to meet diverse project requirements.

Deploy Meta Llama 3.1 models cost-effectively in HAQM SageMaker JumpStart with AWS Inferentia and AWS Trainium

We’re excited to announce the availability of Meta Llama 3.1 8B and 70B inference support on AWS Trainium and AWS Inferentia instances in HAQM SageMaker JumpStart. Trainium and Inferentia, enabled by the AWS Neuron software development kit (SDK), offer high performance and lower the cost of deploying Meta Llama 3.1 by up to 50%. In this post, we demonstrate how to deploy Meta Llama 3.1 on Trainium and Inferentia instances in SageMaker JumpStart.

John Snow Labs Medical LLMs are now available in HAQM SageMaker JumpStart

Today, we are excited to announce that John Snow Labs’ Medical LLM – Small and Medical LLM – Medium large language models (LLMs) are now available on HAQM SageMaker Jumpstart. For medical doctors, this tool provides a rapid understanding of a patient’s medical journey, aiding in timely and informed decision-making from extensive documentation. This summarization capability not only boosts efficiency but also makes sure that no critical details are overlooked, thereby supporting optimal patient care and enhancing healthcare outcomes.