AWS Public Sector Blog

High-level architecture and components for a generative AI-based RAG solution

AWS Branded Background with text "High-level architecture and components for a generative AI-based RAG solution

In today’s highly competitive public sector landscape, the ability to quickly respond to Requests for Proposals (RFPs) with high-quality proposals can be the difference between winning and losing multi-million-dollar opportunities. As HAQM Web Services (AWS) Partners face increasing pressure to accelerate their go-to-market strategies while maintaining proposal quality, generative AI has emerged as a game-changing solution.

Drawing inspiration from the AWS journey, this post introduces partners to a practical approach for implementing Retrieval Augmented Generation (RAG) solutions that can transform their proposal development process. The power of RAG technology on AWS allows partners to significantly reduce the time spent on proposal research and initial drafts, allowing teams to focus on strategic activities that drive win rates. Whether you’re responding to Requests for Information (RFIs), RFPs, or creating unsolicited proposals, the right RAG solution can help you tap into your organization’s collective knowledge and past successes to generate high-quality responses and proposals in hours rather than days.

In this post, we walk through a step-by-step guide to building your own RAG solution on AWS, complete with open-source instructions and best practices. Learn how to harness the same technological foundations that power enterprise-grade proposal automation while maintaining security, compliance, and content quality.

Main components of a generative AI-based RAG solution

  1. Data ingestion: This component involves taking in data, processing it, and storing it in a format that is usable with the foundation models to return relevant and appropriate responses. A robust solution should:
    • Support multiple data sources and formats: Ingest data, such as multiple formats, from single or multiple files, cloud storage, network drives, and web crawling.
    • Offer versatile processing: Provide multiple embedding models suited for different data types (for example, French language models for data in French, models for pictures and videos).
    • Include generic embeddings: Embeddings are mathematical representations of data that capture meaning and relationships between objects. Use generative AI frameworks that offer generic embeddings for ease of use in non-specific requirements.
    • Storage options: Use simple index or vector-based storage options for the generated embeddings. For example, for large, highly complex, and dimensional datasets that need semantic search capabilities, we suggest a vector database. Conversely, for smaller datasets where simple keyword searches are sufficient, a simple index is more appropriate.
  1. Foundation models: An extensible solution should offer the use of multiple models with ease. Key features include:
    • HAQM Bedrock: Enables the use of multiple foundation models (FMs) from vendors such as Anthropic, Cohere, AI21 Labs, etc., and AWS (the new HAQM Nova Model) through a simple API, without the need for expensive infrastructure costs.
    • HAQM SageMaker: For specialized needs, security, and requirements where models need to be hosted and fine-tuned, HAQM SageMaker provides scalable and elastic GPU-based servers to host FM from multiple vendors such as Meta, HuggingFace, AI21 Labs, Stability AI, or custom models.
  1. Fine-tuning of responses from a general-purpose large language model
    • Response tuning: This allows for the fine-tuning of responses by adjusting parameters such as temperature (to minimize hallucinations and control how varied or accurate the answer should be) and top-p (to select from the top tokens based on the sum of probabilities) to make sure that the responses are relevant and appropriate. They can be used to control the randomness and diversity of the model’s output during inference, thus indirectly influencing the fine-tuned model’s behavior. Features such as temperature and top-p should be available. Furthermore, the ability to change the size of input and output tokens to help with adjusting the input and output context window are useful features in any generative AI solution.
    • Domain-specific dataset: To enable accurate and relevant responses, a generative AI solution should have the capability to incorporate your own customized dataset. This allows you to use specific data for generating responses from a general-purpose model. For example, you can add previous RFx (Request for Proposal) materials from a specific industry vertical and use that dataset to generate responses tailored for customers within that industry.
    • Prompt engineering: The process to guide generative AI solutions to generate the desired output. The solution you choose should allow for prompt engineering.
  1. User interface
    • User application: A user-friendly interface, whether web-based or mobile, to interact with the system, enabling users to input queries and receive generated responses efficiently.
    • Access control: The user interface and solution should allow for setting up access control to make sure of the authorized use of the generative AI solution and dataset. Integration with AWS Identity and Access Management (IAM) using a provided authentication/authorization system or relying on federation from a centralized identity provider helps provide access control to the generative AI solution.

How partners can get started?

The following two options are provided to help AWS Partners quickly deploy solutions in partner or customer accounts to showcase multiple business use cases suited for respective partners or customers. These solutions can be operational within hours without needing deep technical knowledge to get started with RAG-based generative AI systems.

  1. AWS GenAI Chatbot: Deploying a multi-model and multi-RAG powered chatbot on AWS This solution provides ready-to-use code so you can start experimenting with a variety of large language models (LLMs) and multimodal language models, settings, and prompts in your own AWS account. Supported model providers include the following:
    • HAQM Bedrock: Supports a wide range of models from AWS, Anthropic, Cohere, and Mistral, such as the latest models from HAQM Nova. See recent announcements for more details. HAQM Nova Family of Models currently provide the best combination of accuracy, speed, and cost for a wide range of tasks, such as RFx tasks.
    • HAQM SageMaker: Self-hosted models from Foundation, Jumpstart, and HuggingFace.
    • Third-party providers through API such as Anthropic, Cohere, AI21 Labs, OpenAI, etc. See available LangChain integrations for a comprehensive list.

The guide on how to deploy the solution can be found in this GitHub link. The detailed architecture document can be found in the following section, and the source code can be found in this GitHub link.

 

Figure 1: Architecture representing a RAG based solution that AWS partners can deploy to accelerate their go-to-market.

  1. HAQM Bedrock in SageMaker Unified Studio
    Alternatively, Partners can use HAQM Bedrock in SageMaker Unified Studio, an integrated governed collaborative environment which enables developers to swiftly build and tailor generative AI applications. It provides an intuitive interface with access to the HAQM Bedrock high-performing FMs and advanced customization capabilities, such as Knowledge Bases, Guardrails, Agents, and Flows. HAQM Bedrock in SageMaker Unified Studio streamlines the development of generative AI applications by providing an effortless and accessible experience for developers across all skill levels.

Example use case: RFx responses and requirements

Either of the solutions offers a robust environment for organizations to develop and deploy generative AI applications tailored for RFx responses. Partners can use them to create new RFx applications that automate the generation, summarization, and customization of RFx responses, significantly reducing the time and effort needed for manual content creation. The organization’s single sign-on credentials allow the collaborative environment to foster teamwork across departments. This allows for the seamless integration of proprietary data and workflows into the generative AI models, resulting in more consistent, compliant, and relevant RFx responses.

Using capabilities such as RAG to create knowledge bases from past proposals and proprietary assets allows partners to make sure that the generated content is tailored to their specific needs. The systems can effectively gather context from previous proposals, requirements, or the internet to help tailor the responses for a customer. The industry and customer perspective can also be distilled across different documents added to the workspace. Another benefit is that the application can store this context in memory so that users never start fresh and can continue to enhance the customer and industry context to get valuable insights and relevant responses.

This streamlined approach accelerates the development process and enhances the overall efficiency and effectiveness of RFx responses, leading to improved win rates and reduced cost of sales for public sector partners.

Considerations

Generative AI and RAG are still evolving technologies, and responses should always be evaluated by humans before finalizing them. The concerns of bias, hallucinations, and other ways where responses from a RAG system may not be appropriate are beyond the scope of this post. From a security standpoint, make sure that all data classification and security requirements are met by the solution. Provide user access to knowledge bases to entities who need access to that data. For example, a customer can choose dedicated storage and processing with hosted models in their account, with access limited to authorized personnel.

Conclusion

The implementation of RAG frameworks for proposal content generation represents a transformative approach for AWS Partners seeking to accelerate their public sector business development. Based on its own data, AWS estimates that RAG technology can reduce first draft creation time by up to 47%, enabling teams to focus on strategic activities rather than routine content generation. The benefits extend far beyond mere time savings.

First, RAG frameworks make sure of accuracy and compliance by drawing from verified content repositories while maintaining proper sourcing. This is a critical requirement in public sector procurement. Second, the technology enables intelligent reuse of past successful proposals and marketing materials, helping partners use their best work consistently. Third, RAG solutions can handle complex technical requirements while maintaining proposal quality through built-in quality assurance mechanisms.

For AWS Partners, this translates to concrete business advantages: faster response times to RFx opportunities, reduced cost of sales, and improved win rates through higher-quality proposals. The ability to automatically analyze customer requirements and generate contextually relevant responses, while maintaining security and compliance, positions RAG as an essential tool for scaling public sector business development efforts. Following the proven approach of AWS allows partners to implement similar solutions to transform their proposal development processes and accelerate their go-to-market motions.

Rishabh Doshi

Rishabh Doshi

Rishabh is a solutions architect manager based in the San Francisco Bay Area with over 18 years of experience in software development and architecture. He collaborates with Public Sector Consulting Partners. With expertise in enterprise solutions, migrations, and solutions architecture, Rishabh offers architectural solutions and oversight for government, education, and non-profit verticals.

Gautam Chhawchharia

Gautam Chhawchharia

Gautam is a principal solutions architect based in the NYC metropolitan area with over two decades of experience in digital transformation, security and cloud architecture. He had led multiple high-impact projects focusing on security, IAM, fraud prevention and compliance within the financial services, insurance and startup space. He focuses on large public sector proposals with respect to security, solution architecture, and capture content.

Thomas Storck

Thomas Storck

Thomas is a solution architect manager based in Southern California with 18 years of professional experience in business strategy and technology consulting. Thomas has focused his career on helping technology and services vendors improve their go-to-market strategies. He has also spent time helping enterprise decision makers make the best choices when it comes to procuring technology to drive their transformation and reach their business objectives.