AWS Machine Learning Blog

Tag: Generative AI

Implement RAG while meeting data residency requirements using AWS hybrid and edge services

In this post, we show how to extend HAQM Bedrock Agents to hybrid and edge services such as AWS Outposts and AWS Local Zones to build distributed Retrieval Augmented Generation (RAG) applications with on-premises data for improved model outcomes. With Outposts, we also cover a reference pattern for a fully local RAG application that requires both the foundation model (FM) and data sources to reside on premises.

Using natural language in HAQM Q Business: From searching and creating ServiceNow incidents and knowledge articles to generating insights

In this post, we’ll demonstrate how to configure an HAQM Q Business application and add a custom plugin that gives users the ability to use a natural language interface provided by HAQM Q Business to query real-time data and take actions in ServiceNow.

Figure 2: Depicting high level architecture of Tecton & SageMaker showing end-to-end feature lifecycle

Real value, real time: Production AI with HAQM SageMaker and Tecton

In this post, we discuss how HAQM SageMaker and Tecton work together to simplify the development and deployment of production-ready AI applications, particularly for real-time use cases like fraud detection. The integration enables faster time to value by abstracting away complex engineering tasks, allowing teams to focus on building features and use cases while providing a streamlined framework for both offline training and online serving of ML models.

Embodied AI Chess with HAQM Bedrock

In this post, we demonstrate Embodied AI Chess with HAQM Bedrock, bringing a new dimension to traditional chess through generative AI capabilities. Our setup features a smart chess board that can detect moves in real time, paired with two robotic arms executing those moves. Each arm is controlled by different FMs—base or custom. This physical implementation allows you to observe and experiment with how different generative AI models approach complex gaming strategies in real-world chess matches.

Efficiently train models with large sequence lengths using HAQM SageMaker model parallel

In this post, we demonstrate how the HAQM SageMaker model parallel library (SMP) addresses this need through support for new features such as 8-bit floating point (FP8) mixed-precision training for accelerated training performance and context parallelism for processing large input sequence lengths, expanding the list of its existing features.

Flow diagram of custom hallucination detection and mitigation : The user's question is fed to a search engine (with optional LLM-based step to pre-process it to a good search query). The documents or snippets returned by the search engine, together with the user's question, are inserted into a prompt template - and an LLM generates a final answer based on the retrieved documents. The final answer can be evaluated against the reference answer from the dataset to get a custom hallucination score. Based on a pre-defined empirical threshold, a customer service agent is requested to join the conversation using SNS notification

Reducing hallucinations in large language models with custom intervention using HAQM Bedrock Agents

This post demonstrates how to use HAQM Bedrock Agents, HAQM Knowledge Bases, and the RAGAS evaluation metrics to build a custom hallucination detector and remediate it by using human-in-the-loop. The agentic workflow can be extended to custom use cases through different hallucination remediation techniques and offers the flexibility to detect and mitigate hallucinations using custom actions.

Illustration of Semantic Cache

Build a read-through semantic cache with HAQM OpenSearch Serverless and HAQM Bedrock

This post presents a strategy for optimizing LLM-based applications. Given the increasing need for efficient and cost-effective AI solutions, we present a serverless read-through caching blueprint that uses repeated data patterns. With this cache, developers can effectively save and access similar prompts, thereby enhancing their systems’ efficiency and response times.

Connect SharePoint Online to HAQM Q Business using OAuth 2.0 ROPC flow authentication

In this post, we explore how to integrate HAQM Q Business with SharePoint Online using the OAuth 2.0 ROPC flow authentication method. We provide both manual and automated approaches using PowerShell scripts for configuring the required Azure AD settings. Additionally, we demonstrate how to enter those details along with your SharePoint authentication credentials into the HAQM Q console to finalize the secure connection.

HAQM SageMaker Inference now supports G6e instances

G6e instances on SageMaker unlock the ability to deploy a wide variety of open source models cost-effectively. With superior memory capacity, enhanced performance, and cost-effectiveness, these instances represent a compelling solution for organizations looking to deploy and scale their AI applications. The ability to handle larger models, support longer context lengths, and maintain high throughput makes G6e instances particularly valuable for modern AI applications.

Build generative AI applications on HAQM Bedrock with the AWS SDK for Python (Boto3)

In this post, we demonstrate how to use HAQM Bedrock with the AWS SDK for Python (Boto3) to programmatically incorporate FMs. We explore invoking a specific FM and processing the generated text, showcasing the potential for developers to use these models in their applications for a variety of use cases