Enhance customer experience with an integrated AI assistant

As enterprises increasingly embrace the power of large language models (LLMs) and conversational AI, the need for contextually accurate and domain-specific tailored responses has become paramount. However, traditional chatbots have opportunity to provide accurate and contextually relevant responses, leading to undermine user experiences. This challenge can be addressed using the natural language processing capabilities of LLMs with HAQM Bedrock and the precision of enterprise-specific knowledge sources using Retrieval Augmented Generation (RAG).

In this post, we present a serverless AI assistant architecture that integrates private enterprise knowledge bases with the capabilities of HAQM Bedrock and AWS Lambda. Our solution empowers organizations to use their proprietary data domains, enabling LLMs to generate contextually accurate responses tailored to their specific business needs.

We have provided a sample code repository that guides you through the process of building the necessary infrastructure resources and backend pipeline. Additionally, we have included a preconfigured, deployable AWS CloudFormation template that streamlines the provisioning of required AWS services and sets up a continuous integration pipeline. This template provides a seamless deployment process, enabling you to focus on integrating your enterprise knowledge bases and fine-tuning the LLM for your specific use case.

Solution overview

Our AI assistant solution uses RAG, which combines the power of LLMs with the contextual accuracy of enterprise-specific knowledge sources. RAG-based AI assistants enable LLMs to generate responses with improved accuracy and adaptability within specific data domains, reducing the risk of hallucinations (incorrect or misleading results). A significant advantage of RAG is its ability to incorporate the latest knowledge into the response generation context, in addition to the facts learned during the model’s training. Our serverless RAG-based assistant architecture integrates the capabilities of enterprise knowledge bases and Lambda. After user authentication, a Lambda function triggered by a WebSocket API orchestrates the entire RAG process, delivering a contextual AI assistant experience by using enterprise datasets stored in an HAQM Simple Storage Service (HAQM S3) bucket and indexed in a vector database. The solution includes a continuous integration and continuous deployment (CI/CD) pipeline using AWS CodePipeline packaged in a deployable CloudFormation template, facilitating continuous development and maintenance of the RAG solution.

This AI assistant solution is built using the following architecture. In our solution, we use HAQM S3 as the data source and HAQM Kendra as the vector store.

A Lambda function written in TypeScript is the core of the solution. The solution workflow consists of the following steps:

The user uploads source documents to an HAQM S3 data source. The data is then ingested and indexed into a vector database like HAQM Kendra or a custom vector store.
The user interacts with the AI assistant from their browser, which goes through an HAQM API Gateway WebSocket API. API Gateway calls a Lambda authorizer to validate the JSON Web Token (JWT) generated in the frontend through user authentication.
The API triggers a WebSocket Lambda function handler, which is responsible for adding user ongoing connection details to the HAQM DynamoDB table.
When the user asks a new question, the Lambda function pushes it to an HAQM Simple Queue Service (HAQM SQS) queue.
This SQS queue acts as an invocation for the core Lambda function handler that orchestrates the entire RAG process. (The maximum concurrent executions for Lambda is 1,000).
The function first interacts with a DynamoDB table to retrieve an existing thread or create a new one if it doesn’t exist, after which it invokes the vector database index. You can set up your own supported vector store to index the vector embeddings representation of your data.
The query for the vector databases index retrieves relevant information from the data source that’s configured in the S3 bucket and indexed in the vector database. The retrieved information is then formatted with the user’s question to form a prompt for an LLM hosted on HAQM Bedrock.
A configured Anthropic Claude family LLM initiates the streaming process to generate the final RAG-based response, which is sent back to the Lambda function as chunks.
This answer is sent to the WebSocket API, and finally to the user.

DynamoDB is used to maintain the conversational history of the AI assistant, with no predefined limits on the retention period. You can specify your desired retention period (TTL) based on your unique requirements. Additionally, we store user connection details in a DynamoDB table. After a user disconnects, we promptly delete their connection details from the table for data privacy and security. If you prefer for HAQM Bedrock to automatically create a vector index for you, refer to Create an HAQM Bedrock knowledge base.

When it comes to evaluating RAG AI assistant solutions, thorough assessment is crucial for achieving optimal performance and user satisfaction. Developers and organizations have several powerful tools at their disposal for this evaluation process. One notable option is the HAQM Bedrock knowledge base evaluation feature, which provides a comprehensive suite of metrics and testing capabilities specifically designed for RAG systems. This tool can help assess the accuracy of retrievals, the relevance of generated responses, and the overall coherence of the assistant’s outputs. Alternatively, for those seeking open source solutions, frameworks like Ragas offer a flexible and customizable approach to evaluation. Ragas provides a set of metrics and methodologies that can be tailored to specific use cases, allowing for detailed analysis of retrieval quality, answer relevance, and faithfulness to source material. By using these evaluation tools, teams can gain valuable insights into their RAG assistant’s performance, identify areas for improvement, and ultimately deliver more effective and reliable AI-powered conversational experiences.

Frontend and backend integration

The solution uses AWS Amplify and HAQM Cognito in the frontend application to integrate with the backend application. The following high-level workflow takes place from the time the user signs in to the application to the point when the request is sent to the API:

HAQM Cognito provides built-in sign-in and sign-out pages for user authentication. Amplify interacts with the HAQM Cognito user pool by making API calls to handle authentication processes.
Upon successful authentication, Amplify retrieves a session token in the form of a JWT, which is added to the authorization header when making an API call.
The API calls pass through API Gateway, which has a Lambda authorizer function. During this process, validation checks for the JWT token are performed using an HAQM Cognito verifier function.
After the API Gateway Lambda authorization is successful, the requests go to the corresponding Lambda functions for performing the query.

After the API is generated, you can integrate it with your frontend applications using various development platforms and services such as Amplify.

To learn more about HAQM Cognito and its authentication and authorization mechanisms, refer to Building fine-grained authorization using HAQM Cognito, API Gateway, and IAM and Configuring machine to machine Authentication with HAQM Cognito and HAQM API Gateway – Part 2.

In the following sections, we walk through the steps to deploy the pipeline, run the pipeline, and populate the data source (for this example, we use HAQM S3).

Prerequisites

Before getting started, verify that you have the following:

An AWS account. If you don’t have one, you can sign up for one.
Node.js version 18 or higher.
The AWS Cloud Development Kit (AWS CDK) set up. For prerequisites and installation instructions, see Getting started with the AWS CDK.
Access to Anthropic’s Claude 3 Haiku generative AI model. If you want to use a different model, see Evaluate, compare, and select the best foundation models for your use case in HAQM Bedrock. After you select a model, see Access HAQM Bedrock foundation models for deployment details.
A knowledge base that has ingested the data from your source documents into a vector database.

Deploy the pipeline

This solution uses a CloudFormation template to automatically deploy and provision the code pipeline application stack, which includes the required services and components. In addition, if you need future updates to the pipeline, including a rollback, you can use the same CloudFormation template to redeploy the application stack and automate provisioning of resources. To deploy the code pipeline stack, complete the following steps:

Clone the public GitHub repository on your local machine:

> mkdir rag-chatbot
> cd rag-chatbot
> git clone http://github.com/aws-samples/enhance-customer-experience-with-an-integrated-ai-chatbot

On the AWS CloudFormation console, choose the deployment AWS Region.
Choose Create stack.
Select Choose an existing template.
Choose Upload a template file and upload the cicd-template.yaml file from your local machine.
Choose Next.

Enter the name of the CloudFormation stack.
Enter or select the values for the following parameters:
1. BedrockModelID (see Supported foundation models in HAQM Bedrock)
2. BedrockModelRegion (see Model support by AWS Region in HAQM Bedrock)
3. Project
Choose Next.
On the Review page, select I acknowledge that CloudFormation might create IAM resources.
Choose Submit.

Wait a few minutes for the stack to deploy.

Run the pipeline

Now that you have deployed the CI/CD pipeline, you’re ready to deploy your backend infrastructure for the solution. The backend infrastructure includes Lambda functions, API Gateway, DynamoDB, and other required resources. This solution uses GitHub as the source code repository to run the pipeline. However, you can change the repository based on your needs. To change the repository, configure the Source stage to connect to your repository (for example, GitHub or GitLab). For details, see GitHub connections.

Update the pending connection

To update the pending connection, complete the following steps:

On the Developer Tools console, choose Settings and Connection.
Choose the connection demo-GithubConnection.
Choose Update pending connection.

In a few seconds, the GitHub connection will be available to use.

The CodePipeline might be in a failed state due to the pending connection.

After the connection is accepted, release the pipeline and deploy the backend infrastructure.

Using your own private source repository, you can enable automatic triggering when code changes occur. Streamline the deployment process and verify that your application or infrastructure is consistent and up to date with the latest code changes. This automation helps reduce manual intervention, improve deployment consistency, and integrate the deployment process with your development workflow.

Populate the data source

After creating your knowledge base, you need to ingest or sync your data so that it can be queried. Whenever you add new data to the data source, you start the ingestion workflow of converting your HAQM S3 data into vector embeddings and inserting the embeddings into the vector database. Depending on the amount of data, this workflow can take some time.

On the data source details page, choose Sync to sync your data.

To automate the process and lessen the need for manual intervention, you can synchronize to a data source in code using the start_ingestion_job API. In the provided sample code, HAQM Kendra is used as the vector database. However, you can set up your own supported vector store like HAQM Kendra or a custom vector database to index the vector embeddings representation of your data. HAQM Bedrock now supports integrating HAQM Kendra GenAI Index with knowledge bases natively. With this integration, you can implement RAG by ingesting source documents directly into HAQM Kendra and invoking the HAQM Bedrock model to retrieve relevant information from the HAQM Kendra index. This further streamlines the process and eliminates the need for a separate vector database.

Refer to the following for more details on the Kendra GenAI Index integration with HAQM Bedrock Knowledge Bases:

The following screenshot shows how your frontend AI assistant appears using the backend that’s deployed with this solution. It doesn’t provide frontend code or support for this process.

Security best practices

You should only grant access to users or administrators with the authority and proper permissions to upload HAQM S3 reference documents for the solution’s RAG functionality. Update the S3 bucket policy to align with best practices. For details, see Top 10 security best practices for securing data in HAQM S3.

When implementing AI solutions, it’s crucial to adhere to responsible AI practices to maintain the development of beneficial and ethical AI technologies. We encourage organizations to review the AWS Responsible AI Policy and align their practices with established guidelines.

Cleanup

To clean up the resources created by this solution, follow these steps:

Delete the CloudFormation stacks, including the pipeline stack.
If you created additional resources, such as S3 buckets or other AWS services, delete or remove them to avoid incurring unnecessary charges.

Conclusion

In this post, we demonstrated how to build an enterprise AI assistant solution that uses LLMs in HAQM Bedrock with the precision of enterprise knowledge bases using the RAG approach. By integrating AWS services such as Lambda and HAQM Bedrock, our solution enables organizations to securely access and retrieve proprietary data, providing contextually relevant and accurate responses. The RAG approach not only enhances the assistant’s ability to provide tailored responses within specific enterprise data domains, but also mitigates the risk of hallucinations. By injecting the latest enterprise proprietary knowledge into the response generation context, our solution makes sure that the assistant remains up-to-date and adaptable to evolving specific business needs. The sample code repository and CloudFormation template can enable organizations to streamline the development and deployment of their RAG-based AI assistant solutions.

Try out this solution to accelerate the prototyping and implementation of your own enterprise AI assistants, and empower your organization to deliver enhanced customer experiences and boost employee productivity.

About the authors

Mayuri Shinde is an experienced AWS DevOps Consultant, specializing in designing, implementing, and managing robust and scalable cloud-centered architectures. She excels in using cutting-edge DevOps methodologies to streamline operations. Outside of work, she enjoys reading and traveling.

Keshav Ganesh is an experienced DevOps Consultant at AWS. He specializes in implementing well-architected DevOps solutions in the cloud. He helps customers streamline their cloud journey by utilizing the latest DevOps offerings. Outside of work, he likes playing video games, watching movies, and traveling.

Takeshi Itoh is an ML Engineer at AWS. He focuses on enhancing customer business values using generative AI technologies and provides end-to-end consulting, from discovering business problems to designing and implementing robust solutions. Outside of work, he enjoys cooking and watching motorsports.

Shripad Deshpande is an AIOps leader. He oversees complex migrations engagements from on-premises or other cloud environments to AWS with the primary focus to drive the adoption of DevOps and AIOps practices. His team’s mission is solving challenges related to scalability, reliability, automation, and AIOps technologies.

Integration & Automation