Building an AI coding assistant on AWS: A guide for federal agencies

Imagine cutting hours off your software development time by automating tedious tasks like creating unit tests or deciphering legacy code. That’s the power of generative AI, which is transforming the entire software development lifecycle (SDLC) and reducing the “toil” for developers on a range of software development tasks. By tackling these necessary but time-consuming tasks, generative AI can boost your development teams’ efficiency and productivity.

If you’re working in highly regulated industries like the federal government or national security, you face unique challenges—from managing complex legacy systems with accumulated technical debt to keeping pace with rapidly evolving technologies. Your need for accelerated software development must be balanced with stringent security and compliance requirements when adopting generative AI capabilities. This is especially true for system integrators (SIs) developing code for federal missions, who have limited options to integrate emerging AI capabilities while meeting National Institute of Standards and Technology (NIST) guidelines 800-171 and Cybersecurity Maturity Model Certification (CMMC) for secure code development.

In this post, we provide an overview of how to build an AI coding assistant that is compliant with Federal Risk and Authorization Management Program (FedRAMP) and the Department of Defense Cloud Computing Security Requirements Guide (DoD CC SRG), using open source development tools and HAQM Web Services (AWS) generative AI services in both AWS GovCloud (US) and standard AWS Regions.

It’s important to note that using open source software and securely deploying solutions on AWS is the customers’ responsibility in the AWS Shared Responsibility Model.

AI coding assistant on AWS

Foundation models (FMs), such as Large language models (LLMs), are at the heart of AI-powered code generation, explanation, refactoring, and testing. AWS offers several ways for you to access and utilize these powerful models:

HAQM Bedrock

HAQM Bedrock is a fully managed service that offers a choice of high-performing FMs, from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and HAQM through a single API, along with a broad set of capabilities to help you build generative AI applications. HAQM Bedrock provides a comprehensive model lineup for development tasks including:

Anthropic’s Claude, which is known for its exceptional code generation and agentic capabilities
Meta’s Llama, which excels in code completion and concept explanation
HAQM Nova, a recently introduced, next-generation FM that delivers industry-leading price-performance benefits.

With HAQM Bedrock serverless architecture, there is no underlying infrastructure for you to manage so you can quickly integrate and switch between models for testing and optimization. This means developers can quickly integrate HAQM Bedrock with their applications and open source tools, allowing them to use these powerful AI models to enable code transformation, automated code reviews, and intelligent debugging suggestions without leaving their development environment. And with a pay-as-you-go token usage pricing model, you only pay for what you use, with no time-based term commitments.

HAQM SageMaker AI

HAQM SageMaker AI is a fully managed service that brings together a broad set of tools to enable high-performance, low-cost machine learning (ML) capabilities. You can streamline the process of building, training, and deploying ML models at scale without managing your own infrastructure.

To further enhance SageMaker AI accessibility and applicability, SageMaker JumpStart streamlines model deployment by offering a straightforward point-and-click interface or API that you can use to launch models, including LLMs. You can rapidly deploy production-ready endpoints from open source models designed for code generation tasks including Starcoder2, CodeLlama, and Mistral for efficient, high-performance applications.

After deployment, data scientists and data engineers can fine-tune these models on their private code base and datasets. SageMaker AI provides dedicated infrastructure to deploy and host models so you have full control over performance and cost optimization using a flexible, usage-based pricing model. This combination of no-code deployment and flexible fine-tuning capabilities enables you to quickly adapt a wide selection of proven models to their specific needs, reducing the time from concept to delivery.

LLM inference solution for HAQM Dedicated Cloud (LISA)

LLM inference solution for HAQM Dedicated Cloud (LISA) is an infrastructure-as-code (IaC) solution. LISA offers secure, scalable, and low latency access to customers’ generative LLMs and embedding language models. Customers deploy LISA directly into an AWS account and integrate it with an identity provider (IdP). Customers “bring their own models” (BYOM) for self-hosting and inference. LISA complements HAQM Bedrock by supporting built-in configurability with Bedrock models, and by offering additional capabilities.

Customers bring models to LISA for self-hosting and inference via HAQM Elastic Container Service (ECS). LISA supports models compatible with Hugging Face’s Text Generation Inference (TGI) and Text Embeddings Inference (TEI) images, along with vLLM. LISA also supports OpenAl’s API spec via the LiteLLM proxy, a popular Python library. LiteLLM standardizes interactions using OpenAl’s API format, translating inputs to match each model provider’s unique API requirements. This makes LISA compatible with over 100 models hosted by external providers, including Bedrock and SageMaker Jumpstart. Using LISA as a model orchestration layer, customers securely centralize and standardize communication across multiple model providers. Customers can switch seamlessly between self-hosted LLMs and externally hosted models.

LISA integrates with applications compatible with OpenAI’s API specification. This enables users to access any LISA-configured model for LLM prompting directly within their integrated development environment (IDE) for code generation. Developers can access all LISA configured models through a single, centralized programmatic API and authenticate using temporary or long-lived API tokens from their IDE.

Alternatively, LISA includes a chatbot user interface (UI). Here, customers can prompt LLMs, modify prompt templates, change model arguments, manage their personal session history, upload files, and access other features. Administrators can configure models through LISA’s model management UI and manage other features like vector stores for Retrieval Augmented Generation (RAG).

Figure 1: LISA Model Management console with both HAQM ECS (e.g., Starcoder2) and LiteLLM-hosted HAQM Bedrock (for example, Mistral Mixtral or Meta Llama3) models.

Open source AI development tools

Open source AI-powered plugins such as Continue, Cline, and Aider.chat enhance the software development experience throughout the entire SDLC. These tools integrate with FMs hosted on fully managed services, such as HAQM Bedrock and SageMaker AI, or with self-hosted solutions such as LISA outlined earlier.

For example, the Continue plugin can be configured to quickly switch between different models directly within your development environment. User authentication is handled using Identity and Access Management (IAM) (HAQM Bedrock and SageMaker AI) or API tokens (LISA).

The following screenshot shows the Visual Studio Code (VS Code) IDE displaying the Continue plug-in model option chat interface and configuration file.

Figure 2: VS Code IDE showing the Continue plug-in configuration file for different models hosted on HAQM Bedrock and LISA.

Local model provider

To enable fast and responsive code generation with minimal network latency, developers and organizations can use open source tools such as Ollama. Ollama is open source software that you can use to host LLMs locally within a virtual private cloud (VPC), providing a flexible and secure solution.

Ollama can be deployed on HAQM Elastic Compute Cloud (HAQM EC2) or HAQM ECS, enabling auto scaling and resilience. Developers can then choose the models from the Ollama model library or custom models that have been tuned and approved for their use cases. Through IDE extensions such as Continue, developers gain access to code autocompletion, embeddings, and chat functionality—all without requiring external connectivity or a third-party LLM provider.

By scaling this setup, developers can rapidly deploy and test different models, enabling the evaluation of model performance and identifying the best fit for their development projects. Furthermore, by deploying multiple instances of Ollama, developers can run multiple models concurrently without impacting the performance of other developer environments.

Secure developer environment

Several options are available for setting up a secure and compliant environment for development teams. These include physical or virtual desktops and AWS Cloud based options for secure remote environments. Some examples include:

HAQM WorkSpaces – A managed, secure, and flexible desktop-as-a-service (DaaS) solution that allows provisioning of virtual Windows or Linux desktops for developers.
HAQM AppStream 2.0 – A fully managed, secure, and scalable application streaming service that enables streaming desktop applications from AWS to any device with a web browser or AppStream 2.0 client. AppStream 2.0 supports developer tools such as VS Code and JetBrains.

A robust DevSecOps pipeline can be implemented with open source tools and integrated into a continuous integration and continuous deployment (CI/CD) pipeline as referenced in Building end-to-end AWS DevSecOps CI/CD pipeline with open source SCA, SAST and DAST tools. This approach enables organizations to achieve a higher level of security assurance while maintaining the agility and speed required for DevOps workloads.

Security and compliance

At the core of the offering’s security is the integration with AWS Key Management Service (AWS KMS) for data-at-rest encryption. This includes the encryption of storage volumes used by HAQM WorkSpaces, HAQM EC2, and HAQM ECS. Furthermore, the integration of AWS PrivateLink provides a private secure connection between the development environment and AWS services, ideal for controlled or air-gapped environments where internet access is limited or unavailable.

To meet the stringent security requirements of US government customers, the AWS services outlined in this solution are FedRAMP (Moderate and High) and DoD CC SRG Impact Levels 2, 4, and 5 authorized for use in AWS GovCloud (US) and standard AWS US East/West Regions. This means that the solution can be deployed in the most sensitive and restricted environments, while meeting stringent government and security standards. For up-to-date service compliance, see the AWS Services in Scope by Compliance Program page.

Solution overview

The following reference architecture shows an overview of how AWS services and open source tools covered above can be used to build an AI coding assistant solution to help meet specific compliance requirements or unique use cases where greater flexibility and control are needed.

Figure 3: Architecture diagram of on-premises and remote developers (using HAQM WorkSpaces) with access to an AI coding assistant using open source development tools, such as Continue and Ollama, and different options for model providers and models such as HAQM Bedrock, SageMaker AI, and LISA.

The developer workflow consists of the following steps:

Secure developer environment – Developers locally or remotely access their physical or virtual developer desktops (for example, HAQM WorkSpaces) installed with a supported IDE such as VS Code or JetBrains. The environment also needs to include access to a source code repository as part of their DevSecOps software development pipeline.
Open source AI development tools – Open source IDE plug-in extensions, such as Continue, are configured and connected with organizational approved model providers and models.
(Optional) Autocomplete and embeddings models – AI tools that can support autocomplete or embeddings, such as Continue, can use small language models (SLMs) that can be hosted locally on a private model provider using Ollama running on HAQM ECS or HAQM EC2.
Chat models – Chat capabilities requiring complex tasks or larger context sizes require access to larger and more powerful models. Developers have the option to quickly choose different models for specific tasks or compare the results between models. Model providers that provide access to these models can be directly integrated into compatible IDEs based on use cases below:
- HAQM Bedrock – Choice of industry leading FMs for code generation and accessed through a single API with pay-as-you-go pricing.
- SageMaker – Choice of proprietary and open source models that have been pre-trained or fine-tuned and accessed through a dedicated SageMaker AI inference API.
- LISA – Self-hosted proprietary and open source models on HAQM ECS. Additionally, proxy requests (LiteLLM) with OpenAPI API compatible front-end to other model providers such as HAQM Bedrock and SageMaker AI for centralized model access and governance.
Data protection – Secure, private access using AWS PrivateLink and VPC endpoints.

Conclusion

In this post, we demonstrated how to build an AI coding assistant to accelerate software development. Using AWS AI services and solutions such as HAQM Bedrock, HAQM SageMaker AI, and LISA, customers have flexibility in model selection and deployment options, from fully managed services to self-hosted solutions. The architecture provides multiple layers of security controls, including encryption and private connectivity, and meets FedRAMP and DoD CC SRG compliance standards, making it suitable for even the most security-conscious customers.

Whether you choose to deploy models locally for low-latency operations or use cloud-based services for complex tasks, you maintain full control over your development environment and data security. This comprehensive approach enables highly regulated industries to confidently adopt AI-powered development tools while meeting stringent security and compliance requirements.

Although this use case focuses on AI coding assistants for federal customers, the same guidance and technologies can be applied to other commercial or public sector customers that are interested in building their own AI-powered coding assistant on AWS.

AWS Public Sector Blog