Build a .NET Context-Aware Generative AI Chatbot using HAQM Bedrock and LangChain

Generative AI is taking chatbots to the next level by empowering them to engage in human-like dialogues. These advanced conversational agents understand and respond to complex queries, provide personalized assistance, and even generate creative content. This blog post shows how to build a context-aware chatbot using HAQM Bedrock and LangChain in a .NET environment. The choice of .NET is relevant for organizations with a substantial portfolio of .NET applications, looking to modernize and integrate advanced AI capabilities. HAQM Bedrock provides serverless access to foundation models from leading AI companies, including HAQM’s own models.

Large language models (LLMs), including those based on transformer architectures like generative pre-trained transformer (GPT), are stateless, which means they process each input independently. The model generates an output based solely on the current input. When a client sends a request to an LLM, the following steps occur:

The client (user interface, application, or API) sends a prompt to the LLM.
Then the LLM processes the prompt and generates a response based on the context provided by the input.
The LLM does not keep any memory of previous interactions. It doesn’t know about previous requests or responses.

For applications requiring continuous and contextually relevant interactions, such as virtual assistants or customer support chatbots, this stateless nature poses a major challenge. It is important to note that each Large Language Model (LLM) has a specific context window. The context window represents the maximum amount of text (measured in tokens) that the model will process at once. For example, Anthropic’s Claude 3 Sonnet has a context window of 200,000 tokens, limiting how much combined text from prompts and conversation history it considers when generating responses. This context window is crucial for understanding the practical limitations of maintaining conversation history in chatbot applications.

Solution Overview

This solution builds a context-aware chatbot that retains the conversational context by integrating LangChain Conversational Memory with HAQM Bedrock models in a .NET application. LangChain is an open source framework for developers to build applications with LLMs. It provides a set of abstractions and utilities that simplify working with LLMs, including text generation, question answering, and task completion. By integrating LangChain with HAQM Bedrock, developers leverage AWS scalability and manageability while focusing on creating AI-driven applications. It is important to note that LangChain for .NET is a community-driven effort.

LangChain Conversational Memory is a specialized module within the LangChain framework designed to manage the storage and retrieval of conversational data. It stores past conversations in various formats and retrieves relevant information from these conversations to provide a context for answering the current query. This allows the application to maintain a continuous context throughout the conversation.

Here are four methods that LangChain provides for managing conversation history:

ConversationBufferMemory is the simplest form of memory that stores the entire conversation history as a simple list of messages with no filtering or modifications. It allows the model to remember the previous conversation within the context window of the LLM. If the conversation exceeds the context window size, the model starts cropping the earlier parts and appends the new content on top.
ConversationBufferWindowMemory is a type of short-term memory that remembers recent conversations based on the WindowSize parameter, which determines the number of previous conversations to retain. For example, if WindowSize is set to 1, it remembers only the last conversation.
ConversationSummaryBufferMemory combines buffer storage with summarization. It maintains a buffer of recent interactions in memory, but instead of completely flushing old interactions, it uses an LLM to compile them into a summary. It uses token length rather than the number of interactions.
ConversationSummaryMemory optimizes memory usage by storing only summaries of conversations rather than the whole interaction. The model uses these summaries as context for subsequent queries.

In our solution, we use ConversationBufferMemory to maintain chat history, which provides continuous and relevant dialogue with users.

In Figure 1 the diagram illustrates the high-level architecture of a context-aware generative AI chatbot.

Figure 1: Architecture diagram showing Generative AI chatbot to maintain context for user conversations

Figure 1: Generative AI chatbot application architecture

The solution architecture involves the following steps:

Chat memory objects records from previous interactions.
A user submits a new question.
The user question appends with chat history to provide context for the current question.
HAQM Bedrock processes the combined prompt and generates an appropriate response.

Prerequisites

Before proceeding, make sure you have:

An AWS account with HAQM Bedrock access permissions.
Get the latest version of AWS Command Line Interface (AWS CLI) and Configure the AWS CLI.
Request Anthropic Claude 3 Sonnet model access from HAQM Bedrock.
Install Microsoft Visual Studio (or your preferred .NET IDE).
Install .NET 8.0 SDK.

Setup .NET application to Build Generative AI Chatbot

While most generative AI chatbots are implemented as web applications, we will use a .NET console application to focus on LangChain integration with HAQM Bedrock and context management, without the complexity of a user interface.

Step 1: Integrate HAQM Bedrock using LangChain

Create a .NET 8 console application in Visual Studio. Select Do not use top-level statements.
Using a variable, define a prompt to test the LLM interaction. Prompts are a specific set of inputs from users that guide LLMs on HAQM Bedrock to generate an appropriate response or output for a task or instruction.

private static readonly string prompt = "What are the first three colors of the rainbow?";

Install the LangChain.Providers.HAQM.Bedrock (v0.17.0) NuGet package to the project. The package gives .NET applications the ability to interact with foundation models from HAQM Bedrock. The sample application uses Anthropic’s Claude 3 Sonnet model from HAQM Bedrock, though other models are also available.

private static readonly BedrockProvider provider = new BedrockProvider();
private static readonly Claude3SonnetModel llm = new(provider);

Update the Program.cs file with the following code:

private static readonly string prompt = "What are the first three colors of the rainbow?";
private static readonly BedrockProvider provider = new BedrockProvider();
private static readonly Claude3SonnetModel llm = new(provider);
static async Task Main(string[] args)
{
    var response = await GenerateTextAsync(prompt);
    var content = response.Messages.ToArray()[1].Content;
    Console.Write("AI: ");
    Console.WriteLine(content);
    Console.ReadLine();
}
private static async Task<ChatResponse> GenerateTextAsync(string prompt)
{
    var chatRequest = ChatRequest.ToChatRequest(Message.Human(prompt));
    var response = await llm.GenerateAsync(chatRequest);
    return response;
}

This code shows the integration between HAQM Bedrock and LangChain frameworks using .NET. The GenerateTextAsync method sends the user prompt to the Claude 3 Sonnet model and processes the response.

After running the .NET application, it generates the LLM response shown in Figure 2:

Figure 2: Dotnet console window showing LLM response to a rainbow colors question.

Figure 2: LLM response in .NET application

Step 2: Implement Streaming Response from LLM

The initial implementation receives responses from HAQM Bedrock as a single block of text. However, to receive responses in a streaming manner or chunks, similar to interactive chat interfaces, we will need to make the following modifications.

Enable stream response through the AnthropicChatSettings parameter.

var chatSetting = new AnthropicChatSettings()
{
UseStreaming = true
};

After enabling the stream, you must subscribe to the Llm_DeltaReceived event to receive partial responses (response in chunks). Additionally, subscribe to the Llm_ResponseReceived event to receive notifications when the complete response is generated.

Inference parameters play a crucial role in determining the quality and coherence of the generated text. By fine-tuning these parameters through AnthropicChatSettings, developers customize the behavior of the chatbot to meet specific requirements.

Update the Program.cs file with the streaming implementation:

private static readonly string prompt = "What are the first three colors of the rainbow?";
private static readonly BedrockProvider provider = new BedrockProvider();
private static readonly Claude3SonnetModel llm = new(provider);
static async Task Main(string[] args)
{
    llm.DeltaReceived += Llm_DeltaReceived;
    llm.ResponseReceived += Llm_ResponseReceived;

    await GenerateTextAsync(prompt);
    Console.ReadLine();
}

private static void Llm_DeltaReceived(object? sender, ChatResponseDelta e)
{
    Console.Write(e.Content);
}

private static void Llm_ResponseReceived(object? sender, ChatResponse e)
{
    Console.WriteLine("Complete Response Received");
}

private static async Task GenerateTextAsync(string prompt)
{
    var chatRequest = ChatRequest.ToChatRequest(Message.Human(prompt));
    var chatSetting = new AnthropicChatSettings()
    {
        UseStreaming = true
    };

    Console.Write("AI: ");
    await llm.GenerateAsync(chatRequest, chatSetting);
}

After running the .NET application, the stream response generates from the LLM, as shown in Figure 3:

Figure 3: Dotnet console window displaying streaming LLM response to a rainbow colors question.

Figure 3: LLM stream response in .NET application

This implementation creates a dynamic interaction where responses appear progressively similar to popular AI chat interfaces.

Step 3: Implement Interactive User Prompts

Transform the single prompt application into an interactive chatbot that maintains ongoing conversations.

Modify the Main method to handle continuous user input:

static async Task Main(string[] args)
{
    llm.DeltaReceived += Llm_DeltaReceived;
    llm.ResponseReceived += Llm_ResponseReceived;
    
    Console.WriteLine("Hey, I'm your virtual assistant, how may I help you?");
    Console.WriteLine("Enter 'exit' or hit Ctrl-C to end the conversation");
    while (true)
    {
        Console.Write("Human: ");
        var prompt = Console.ReadLine() ?? string.Empty;
        if (!string.IsNullOrEmpty(prompt))
        {
            if (prompt.ToLower() == "exit")
                break;
            await GenerateTextAsync(prompt);
        }
    }
}

The code implements an interactive console application for continuous conversation with the LLM. Verify the implementation using these sequential prompts:

What are the first three colors of the rainbow?
What are the other three colors?
My name is [your name]
What is my name?

The output of sequential prompts shows in Figure 4:

Figure 4: Dotnet console window displaying chatbot responses to multiple user prompts.

Figure 4: LLM response with user prompts

While the chatbot successfully processes individual queries, it is failing to maintain the context provided in the conversation. LLMs are stateless, treating each query in isolation and do not remember past conversations.

Step 4: Manage Context using LangChain

LangChain memory module offers various methods to maintain context, each designed for specific use cases. This solution implements ConversationBufferMemory to store the chat history, enabling the chatbot to maintain continuous and relevant dialogue with users.

Install LangChain.Core (v0.17.0) NuGet package to the project. The package provides memory management components, for maintaining conversation state between interactions.
Initialize ConversationBufferMemory to maintain the context:

private static readonly ConversationBufferMemory memory = new();

Add the user prompt to the chat history and then pass the chat history to the LLM in GenerateTextAsync method.

private static async Task GenerateTextAsync(string prompt)
{
    await memory.ChatHistory.AddUserMessage(prompt);
    var chatRequest = ChatRequest.ToChatRequest(memory.ChatHistory.Messages);

    var chatSetting = new AnthropicChatSettings()
    {
        UseStreaming = true
    };

    Console.Write("AI: ");
    await llm.GenerateAsync(chatRequest, chatSetting);
}

Add the LLM generated response to the chat history

private static async void Llm_ResponseReceived(object? sender, ChatResponse e)
{
await memory.ChatHistory.AddAiMessage(e.LastMessageContent);
}

After adding the code changes for context management and passing the same prompt to the chatbot, the chatbot now maintains the conversation context and provides relevant responses, as shown in Figure 5.

Figure 5: Dotnet console window displaying context-aware chatbot responses to multiple user prompt

Figure 5: Context aware LLM response with user prompts

The complete source code for this implementation, including all components discussed in the previous steps, is as follows:

using LangChain.Providers.HAQM.Bedrock;
using LangChain.Providers;
using LangChain.Memory;

namespace ChatbotApp
{
    internal class Program
    {
        private static readonly string prompt = "What are the first three colors of the rainbow?";
        private static readonly BedrockProvider provider = new BedrockProvider();
        private static readonly Claude3SonnetModel llm = new(provider);
        private static readonly ConversationBufferMemory memory = new();

        private static async Task Main(string[] args)
        {
            llm.DeltaReceived += Llm_DeltaReceived;
            llm.ResponseReceived += Llm_ResponseReceived;

            Console.WriteLine("Hey, I'm your virtual assistant, how may I help you?");
            Console.WriteLine("Enter 'exit' or hit Ctrl-C to end the conversation");
            while (true)
            {
                Console.Write("Human: ");
                var prompt = Console.ReadLine() ?? string.Empty;
                if (!string.IsNullOrEmpty(prompt))
                {
                    if (prompt.ToLower() == "exit")
                        break;
                     await GenerateTextAsync(prompt);
                }
            }
        }

        private static void Llm_DeltaReceived(object? sender, ChatResponseDelta e)
        {
            Console.Write(e.Content);
        }

        private static async void Llm_ResponseReceived(object? sender, ChatResponse e)
        {
            await memory.ChatHistory.AddAiMessage(e.LastMessageContent);
        }

        private static async Task GenerateTextAsync(string prompt)
        {
            await memory.ChatHistory.AddUserMessage(prompt);
            var chatRequest = ChatRequest.ToChatRequest(memory.ChatHistory.Messages);

            var chatSetting = new AnthropicChatSettings()
            {
                UseStreaming = true
            };

            Console.Write("AI: ");
            await llm.GenerateAsync(chatRequest, chatSetting);
        }
    }
}

Clean-up

If you tested the sample application in your own AWS account, it is important to delete the created resources to avoid incurring charges. If you don’t need the foundational model, refer to Add or remove access to HAQM Bedrock foundation models and follow the steps to remove the model access.

Conclusion

We showed how to build a context-aware chatbot using .NET and LangChain with HAQM Bedrock. We used the LangChain memory module to store conversation context in-memory. There are, however, other modules available to persist context information outside of the application, such as Redis. This approach retrieves the context data even after restarting the application. If you plan to build production-ready applications on HAQM Bedrock, we recommend reviewing the HAQM Bedrock pricing page to understand how pricing for HAQM Bedrock works.

.NET on AWS Blog