Audi & Reply: Scaling a GenAI multi-agent devbot from Pilot to Production-Ready

In today’s fast-paced digital world, AUDI AG faces immense pressure to innovate rapidly. In 2023, the Audi Cloud Foundation Services (ACS) team, supported by Storm Reply, launched the first version of their generative AI chatbot solution. This solution was designed to reduce internal knowledge base retrieval time and improve employee productivity. For more details, see How Audi improved their chat experience with Generative AI on HAQM SageMaker.

By adopting a multi-agent architecture, Audi and Storm Reply have since expanded the initial generative AI pilot into a new production-ready solution called devbot. This innovative solution reduces cloud resource management effort and minimizes error-prone manual processes, streamlining daily activities through capabilities like pre-validated code building blocks for infrastructure provisioning, comprehensive cost optimization analyses for running workloads, identifying security recommendations and their remediations, and validating architectural designs.

The collaboration between Audi and Storm Reply effectively addressed the challenges of building a robust, production-ready solution under tight time constraints, unlocking additional features and enhancing devbot’s overall capabilities.

Key challenges in defining the devbot multi-agent architecture

One of the primary challenges in developing devbot was supporting the diverse and specialized tasks handled by the ACS team. Devbot needed to handle cross-domain functions such as security, cost optimization, infrastructure as code (IaC) building blocks, and architectural design, all while maintaining a seamless, unified user experience.

Another major challenge was ensuring the reliability and accuracy of AI-generated insights. For instance, when analyzing security configurations or making recommendations based on the AWS Well-Architected Framework—which helps organizations evaluate the pros and cons of architectural decisions—devbot had to provide actionable, precise, and verifiable outputs to earn user trust. Without such reliability, adoption of the tool would be limited and its value diminished.

Audi and Storm Reply also prioritized modularity, scalability, low latency, and ease of integration. Devbot needed to integrate seamlessly with Audi’s existing systems, including its on-premises internal documentation platform, Single-Sign-On (SSO) identity provider, and sources like the AWS Well-Architected Framework, while remaining flexible for future enhancements.

Devbot solution overview

The devbot solution marks a significant step forward from Audi’s initial generative AI pilot by introducing a new, robust serverless multi-agent architecture, as shown in Figure 1.

Figure 1. High-level design of devbot

As an all-in-one AI assistant, devbot helps streamlines cloud security, cost optimization, IaC building blocks for infrastructure provisioning, and architectural guidance through a set of specialized AI agents, detailed below.

Security Findings Agent: Integrating with AWS Security Hub, a service that helps automate AWS security checks and centralizes security alerts, helps identify security issues within AWS accounts and provides actionable remediation steps. The outputs include detailed reports with relevant security findings, remediation links, and natural-language explanations.
Cost Optimization Agent: By analysing spending patterns through AWS Cost Explorer, which helps to visualize, understand, and manage AWS costs and usage over time, this agent generates reports that highlight cost inefficiencies. It helps identifiy underutilized resources and high-cost services and suggests optimizations, such as rightsizing instances or switching to cost-effective instance families, all presented in a clear, natural-language format.
Building Blocks Agent: This agent helps facilitate infrastructure provisioning by providing validated IaC snippets. When a user requests specific AWS resources, the agent queries the available validated building blocks and returns ready-to-deploy IaC.
Well-Architected Agent: Acting as a virtual solutions architect, this agent answers ACS’s architectural queries and recommends actions aligned with the AWS Well-Architected Framework documentation and guidelines. It uses HAQM Bedrock Knowledge Bases to obtain the latest documentation, utilizing the HAQM Bedrock-provided HAQM S3 bucket source adapter to retrieve the most recent content, which is subsequently indexed in the OpenSearch vector store. A retrieval-augmented generation (RAG) workflow verifies that relevant context is used to provide precise and actionable responses to user queries.

At the core of its architecture is HAQM Bedrock—a fully managed service that offers high-performing foundation models (FMs) from leading AI companies. HAQM Bedrock enables seamless integration between large language models (LLMs) and AWS services, providing intelligent, context-aware automation for cloud management.

The foundation of devbot’s intelligence is Claude 3.5 Sonnet v1, selected after benchmarking against Claude 2.1 Sonnet. The transition was driven by improvements in reasoning accuracy, response speed, and operational cost efficiency. Continuous performance evaluations help ensure that responses remain relevant, precise, and reliable.

To mitigate hallucinations, devbot uses a Retrieval-Augmented Generation (RAG) workflow that integrates HAQM Bedrock Knowledge Bases and HAQM OpenSearch, helping ensure responses are always grounded in authoritative AWS documentation and best practices. Additionally, an automatic RAG evaluation mechanism, implemented through the RAGAS library, assesses faithfulness and context relevance for internal evaluation purposes. This mechanism utilizes two key metrics: (a) Context Precision, which evaluates the performance of the retrieval model by measuring how accurately the retrieved information aligns with the context of the query; and (b) Faithfulness, which measures how factually consistent a response is with the retrieved context, ranging from 0 to 1, with higher scores indicating better consistency. A response is considered faithful if all its claims can be supported by the retrieved context. These evaluations are used to monitor changes in the prompt and other workflows, allowing us to track the performance of the solution over time and determine which configurations yield better results.

The Building Block Agent, in particular, uses a hierarchical retrieval approach that begins with generating descriptive summaries for each code block using an LLM. These descriptions are then indexed in OpenSearch, with the S3 path of the original code block included in the metadata for each indexed document. The original code blocks are securely stored in an HAQM S3 bucket. During the retrieval process, the most similar description is fetched from OpenSearch, and the corresponding S3 path is extracted from the document’s metadata, allowing efficient retrieval of the relevant code block. This method helps enhance both the efficiency and accuracy of code access, helping streamline the development process.

Instead of relying on external LLM orchestration libraries like LangChain, which can introduce dependencies and debugging complexity, devbot uses a custom AWS Lambda-based orchestrator to dynamically route queries to the appropriate agent. This setup improves task execution and prioritization while maintaining full control over the workflow. Retrieval quality is further enhanced through reranking models that balance accuracy with computational efficiency.

The architecture is designed for scalability, enabling dynamic agent expansion as needs evolve. User authentication and access control are managed through HAQM Cognito, integrated with Audi’s SSO, while HAQM CloudFront and AWS AppSync support low-latency response streaming for smooth user interactions.

Finally, HAQM DynamoDB and OpenSearch provide encrypted persistent storage for anonymized chat memory and retrievable documentation.

By combining efficient orchestration, intelligent retrieval, and modular scalability, devbot delivers secure, context-aware cloud insights—boosting operational efficiency, security, and cost optimization at scale.

Production-ready solution

Moving beyond a proof of concept, Audi and Storm Reply designed devbot as a production-ready solution from the outset. This required addressing several critical areas, starting with Audi’s security and compliance requirements. By leveraging HAQM Bedrock Guardrails—which provides configurable safeguards for building generative AI applications at scale—Audi could configure devbot to comply with its internal policies and stakeholder expectations. Comprehensive penetration testing further helped validate devbot’s security posture and helped confirm that sensitive data remained protected at all times.

Performance optimizations also played a pivotal role in making devbot production-ready. Near-instant response streaming was implemented to provide prompt answers during interactions, while conversational memory preserved context across queries, enabling more natural and efficient user experiences.

In addition, automatic RAG evaluation shows that devbot is generally resistant to hallucinations and capable of providing reliable responses to users. The few hallucinated answers observed, while noteworthy, should be considered within the context that faithfulness scores are not 100% accurate (due to probabilistic nature of LLM scoring). A faithfulness score reflects the complexity of evaluating responses based on the retrieved context. To calculate the faithfulness score, all claims in a response are identified, checked against the retrieved context for verifiability, and scored using the following formula:

To improve precision, devbot retrieves a broad set of contexts to maximize the likelihood of including relevant knowledge based on the user query.

Outcomes and insights

Beyond its technical capabilities, devbot’s user-friendly interface has played a key role in its success. Its intuitive design enables team members to quickly access the information they need, with minimal training required. The graphical interface—shown in Figure 2—now supports features such as streaming responses, conversational memory, multilingual support, and integrated knowledge bases, all of which enhance overall usability.

To enable multilingual capabilities, Claude 3.5 Sonnet v1 is used as the core LLM. It automatically detects the language of each user query, then passes this context to the appropriate agent to help ensure the response is generated in the same language. Internally, queries are translated into English for processing, and responses are translated back into the original language using LLM in streaming mode. This enables a seamless, effective, and natural communication experience across languages.

Figure 2. Devbot’s user-friendly interface

Solution data points

Devbot demonstrates significant performance improvements across various tasks. On average, each agent delivers a full response in approximately 15 seconds, with the first token arriving in under 6 seconds. This contrasts sharply with manual processes, which are often time-consuming and error-prone. Time savings were calculated by comparing how long it takes users to perform these tasks manually versus the time spent querying and receiving a response from devbot.

Well-Architected Agent: Because the AWS Well-Architected Framework documentation is detailed and extensive, when manually seeking guidelines and suggestions for AWS cloud architectures, users may spend an average of 30 minutes consulting the Well-Architected Framework documentation. In contrast, this agent retrieves relevant information with minimal latency, resulting in a direct time saving of nearly 99%.
Building Blocks Agent: manually searching for pre-validated IaC can take around 10 minutes. This agent reduces retrieval time by approximately 98%, helping ensure rapid access to validated configurations and minimizing the risk of errors.
Security Findings Agent: identifying and remediating security findings manually typically takes at least 5 minutes. Security Findings Agent achieves an average time reduction of 95%, expediting the remediation process and enhancing security posture.
Cost Optimization Agent: generating cost-saving reports and remediations manually requires a conservative time and effort of about 2 minutes. This agent saves around 88% of this time, allowing users to focus on higher-value tasks.

Considering a typical mixture of these activities that can occur multiple times a day, devbot provides significant efficiency improvements. For example, if users rely on the Well-Architected Agent twice and once each for the remaining agents, they can save up to 76 minutes per day. This results in an impressive 16% improvement in daily efficiency.

This efficiency can be further amplified by expanding the specialized agents to handle a broader range of tasks, further helping reduce reliance on manual activities and freeing time for more strategic work.

In addition to these measurable time savings, devbot also brings intangible benefits such as reduced errors, knowledge empowerment, enhanced solution reusability, and faster deployment times. While these factors are complex to quantify, they help contribute to overall improvements in productivity and operational efficiency.

Conclusion

The collaboration between Audi and Storm Reply has demonstrated the power of generative AI in addressing time-consuming and error-prone manual daily tasks. Devbot is more than just an AI assistant—it’s a comprehensive solution that empowers the ACS team and their cloud projects to navigate the complexities of cloud management with confidence and ease. From promoting compliance with Audi’s security and compliance requirements to accelerating resource provisioning through validated building blocks and optimizing costs, the solution sets a new benchmark for AI-driven tools in enterprise environments, empowering users to work more efficiently.

Next steps

Looking ahead, there is immense potential to expand and enhance devbot’s capabilities. Future evolutions could introduce new agents tailored for specific use cases, such as automated deployments and advanced analytics. Integration with additional AWS services, such as AWS Control Tower—which assists in setting up and govern a secure and multi-account AWS environment—could help further extend devbot’s functionalities. Additionally, Audi is committed to leveraging responsible AI through their devbot solution, making a significant contribution to the company’s AI journey and digital transformation

AUDI AG is a German automotive manufacturer and part of the Volkswagen Group. It has production facilities in several countries. With a strong focus on quality, design, and engineering excellence, Audi has established itself as a leading brand in the global luxury car industry and designs, engineers, produces, markets, and distributes luxury vehicles.

Storm Reply is the expert on HAQM Web Services (AWS) within the Reply network. As an AWS Premier Consulting Partner since 2014, Storm Reply was named the AWS System Integrator of the Year (EMEA) in 2022 and 2023. Using several AWS competencies and extensive expertise, Storm Reply assists leading enterprises, higher mid-markets, and digital natives use the ever-growing AWS. With a proven track record in industry-tailored cloud adoption, migrations and state-of-the-art software development, the company has driven generative artificial intelligence (AI) innovations all across the DACH region and the global automotive industry.

AWS for Industries