AWS Public Sector Blog
How to estimate HAQM Bedrock costs for public sector applications
Generative AI holds the potential to transform the public sector landscape with innovative capabilities for content creation, decision support, and citizen services. HAQM Web Services (AWS) provides organizations seeking to use generative AI with many tools, ranging from the required infrastructure to build and train foundation models (FMs) to fully developed generative AI assistants.
HAQM Bedrock is a fully managed service provided by AWS to help organizations rapidly develop and deploy secure, cost-effective generative AI applications without specialized expertise or infrastructure investments. HAQM Bedrock provides access to multiple high-performance FMs through a single API, offering robust security and compliance capabilities that meet government requirements and delivering a pay-as-you-go pricing model that aligns with public sector budgeting constraints. This in turn enables public sector agencies to build and scale generative AI applications that can transform citizen services, streamline operations, and extract insights from vast data repositories. Agencies benefit from the ability to select the most appropriate models for their specific use cases, implement customizations that reflect their unique missions, take advantage of built-in guardrails for responsible AI use, and scale implementations as needed—all while maintaining control over their data and avoiding the significant costs of specialized hardware.
As state and federal organizations increasingly explore generative AI implementations using HAQM Bedrock, they face a critical challenge: accurately estimating the costs associated with these workloads. The public sector operates under unique constraints—strict budgeting cycles, procurement regulations, and transparency requirements—making thorough cost estimation not only financially prudent but legally necessary. Having visibility into these components gives customers the flexibility to understand and estimate costs for their use case. In this blog post, we cover details about estimating costs on HAQM Bedrock. Although we go through the main features or components that need to be considered, we don’t cover the actual pricing, which is provided in detail on the HAQM Bedrock pricing page.
HAQM Bedrock components
HAQM Bedrock provides end-to-end capabilities for generative AI development. These capabilities extend across three primary components:
- Inference options that balance performance and cost
- Model choices from FMs to specialized ones
- Tools for customization, orchestration, and development
The following figure shows this breakdown of capabilities.
Figure 1: The comprehensive capabilities of HAQM Bedrock for generative AI development
Each of these components carries different cost considerations that public sector organizations must evaluate. The next sections dive deep into each of these in more detail.
Inference
When you submit an input to an FM, the model predicts a probable sequence of tokens and returns that sequence as the output. This is known as inference. HAQM Bedrock provides you the capability of running inference with the FM of your choice. Therefore, model inference is the primary driver for costs incurred with HAQM Bedrock. There are different options available for inference, and pricing can vary across these different types:
- On-Demand – With this option, you get maximum flexibility since you only pay for what you use, without any long-term commitments. When using text models, you’re charged based on the tokens processed – both for input and output. For embedding models, you just pay for input tokens processed. If you’re generating images, the charge is per image.
- Batch processing – In batch processing, you submit multiple requests in a single input file and get all responses in one output file. These results are stored in your HAQM Simple Storage Service (HAQM S3) bucket for easy access later. Batch mode costs 50 percent less than On-Demand mode for select AI models from providers like Anthropic, Meta, Mistral AI, and HAQM.
- Provisioned Throughput – With Provisioned Throughput allows you to purchase dedicated model units for your specific base or custom models where you get throughput when you need it most. Each model unit purchased provides a specific throughput capacity, measured by the maximum number of tokens (both input and output) that can be processed per minute. The pricing follows an hourly charging model with added discounts for commitments.
- Prompt caching – Prompt caching in HAQM Bedrock adds portions of your context to a cache. The model can use the cache to skip recomputation of inputs and reduce input token processing cost as well as latency. When using prompt caching, you’re charged at a reduced rate for tokens read from cache. Depending on the model, however, tokens written to cache might be charged at a rate that is higher than that of uncached input tokens.
- Latency-optimized inference –Latency-optimized inference in HAQM Bedrock delivers significantly faster response times for FM without compromising accuracy. Developers can set the “Latency” parameter to “optimized” in the HAQM Bedrock runtime API to access this capability. For pricing details, check each model’s pricing for latency-optimized inference.
Models
As discussed in the previous section, the main driver of costs for FMs in HAQM Bedrock is inference. However, additional costs might be incurred depending on the model deployment type and the amount of customization required.
There are two types of model deployment options: serverless and server-based. Costs for serverless models are incurred as discussed in the previous section on inference. Server-based models are provided through the HAQM Bedrock Marketplace. These models are deployed on HAQM SageMaker AI instances, and they therefore incur additional costs for running these instances.
You can also customize HAQM Bedrock foundation models in order to improve their performance. HAQM Bedrock currently provides three customization methods: fine-tuning, continued pre-training, and model distillation. Customizing these models in this fashion incurs additional charges. Keep in mind that currently, this requires obtaining Provisioned Throughput.
In some instances, you might want to use your own models, either pre-trained or fine-tuned according to specific business requirements. You can do this with HAQM Bedrock using HAQM Bedrock Custom Model Import feature. After it’s imported, the cost to host your model is based on CustomModelUnits (CMUs), which are determined by factors such as model architecture, parameter size, and supported context length. Hosting costs are priced per CMU per minute of active inference, plus a monthly storage fee per CMU, making it important to factor CMU consumption into your overall cost planning. Check the model import pricing from the HAQM Bedrock pricing page.
Tools
HAQM Bedrock provides you with several additional tools that you can use to quickly accelerate the deployment and maintenance of your application. These tools reduce undifferentiated heavy lifting so you can meet business needs. These tools can incur additional costs. Next, we review some of the most relevant tools from a cost consideration standpoint.
Knowledge bases
HAQM Bedrock Knowledge Bases is a comprehensive solution for enhancing FMs with your specific organizational data. The following are all the different options within HAQM Bedrock Knowledge Bases that impact costs:
- Document ingestion and processing – HAQM Bedrock Knowledge Bases offers multiple options to parse multimodal data, including figures, charts, and tables in .pdf files, in addition to .jpeg and .png image files. This is shown in the following screenshot.
Figure 2: Parsing options in HAQM Bedrock Knowledge Bases
The default parser provided doesn’t incur any charges. However, you might need to use the other options such as HAQM Bedrock Data Automation or FM parsing to process visually rich documents or use cases. These options will incur additional charges. HAQM Bedrock Data Automation uses per-page pricing, and FM parsers charge based on input and output tokens.
You can also use AWS Lambda functions to implement custom logic for transforming, cleaning, or enriching your data before and after model processing, which will incur costs for using AWS Lambda.
- Vector database integration – In order to interpret the data from a data source, Bedrock Knowledge Bases requires the conversion of the data into vector embeddings, a numerical representation of the data. These embeddings can be compared to the vector representations of a query to assess similarity and determine which sources to return during data retrieval. Embedding is done using inference from types of FMs called embedding models. Inference using embedding models incurs costs as discussed in detail in the prior section on model costs. Bedrock Knowledge Bases provides several integration options for vector databases including vector engine for HAQM OpenSearch Serverless, HAQM Aurora, MongoDB Atlas, Pinecone, and Redis Enterprise Cloud. These databases also incur costs for storage and operations on embeddings.
- Retrieval Augmented Generation (RAG) – Bedrock Knowledge Bases helps you take advantage of RAG, a popular technique that involves drawing information from a data store to augment the responses generated by the FM. Inference using these models incurs costs as discussed in detail in the prior section on model costs. Bedrock Knowledge Bases applies techniques that help improve the accuracy of the responses. Some of these techniques might have an impact on the number of tokens sent to FMs and consequently might impact inference costs. A few of these include number of source chunks and modifying base orchestration and generation prompt templates.
- HAQM Bedrock Data Automation – HAQM Bedrock Data Automation simplifies the process of extracting insights from unstructured multimodal content. It automatically handles various data types, preparing the data for ingestion into FMs without manual intervention. HAQM Bedrock Data Automation also provides both standard and custom output options, reducing the need for extensive prompt engineering. This service is priced based on the volume of data processed, making it a cost-effective option for handling diverse data sources.
Agents
HAQM Bedrock Agents enables generative AI applications to automate multistep tasks by seamlessly connecting with company systems, APIs, and data sources. There is no charge to invoke an agent. However, there are incidental charges that need to be considered.
An agent is configured with default base prompt templates, which outline how the agent constructs a prompt to send to the FM at each step of the agent sequence. These calls made to FMs incur charges. The charges are dependent on the model invoked, and the pricing is calculated based on the number of input and output tokens processed by the request. The prompt template consists of instructions with placeholders filled in with user input, the agent configuration, and context at runtime. Therefore, the charges incurred aren’t for the user input alone.
Agents can also be configured to use a knowledge base. So, there might be additional charges related to the storage and retrieval of information from that knowledge base, as detailed in the previous section.
Agents might also use Lambda for fulfillment of processes at each step, from preprocessing to postprocessing. This incurs separate charges for Lambda execution.
Flows
HAQM Bedrock Flows is a workflow authoring and execution feature of HAQM Bedrock for generative AI applications. It accelerates the creation, testing, and deployment of user-defined generative AI workflows through an intuitive visual builder and a set of APIs.
Bedrock Flows counts a node transition each time a node in your workflow is executed. You’re charged for the total number of node transitions across all your flows.
You might incur additional charges if the execution of your application workflow uses other AWS services or transfers data. For example, if your workflow invokes an HAQM Bedrock Guardrails policy, you’ll be billed for the number of text units processed by the policy. For more on this, refer to the security and compliance section later in this post.
HAQM Bedrock Prompt Management
HAQM Bedrock Prompt Management accelerates the creation, testing, and running of prompts through an intuitive UI and a set of APIs. Prompt optimization in HAQM Bedrock automatically rewrites prompts for better performance and more concise responses for FMs. You’re charged based on the number of tokens in the input prompts and in the optimized prompts.
Security and compliance costs
With increasing adoption of FMs and generative AI, public sector organizations in particular need to verify compliance of these applications with various compliance regulations such as FedRAMP, NIST AI Risk Management Framework (RMF), Health Insurance Portability and Accountability Act (HIPAA ) and General Data Protection Regulation (GDPR).
HAQM Bedrock provides a range of security features and compliance certifications suitable for the public sector, including data encryption, isolation, private connectivity, and FedRAMP authorization. It also offers access control, content safety guardrails, and monitoring capabilities.
Organizations need to estimate these costs when building generative AI applications. Although most of the costs are no different from a traditional application, some features are specific to generative AI. One such feature is HAQM Bedrock Guardrails. It provides safeguards that you can configure for your generative AI applications based on your use cases and responsible AI policies. Bedrock Guardrails provides content filters, topic denials, word filters, sensitive information filters, and contextual ground checks to filter out model hallucinations. Charges are incurred based on the policy type used in the guardrail. Another feature is logging model invocations. Although there is no direct cost to log model invocations in HAQM Bedrock, you need to consider the storage costs of logs in HAQM S3 and any costs of building custom dashboards on those logs.
Conclusion
In this post, we covered the various components of HAQM Bedrock that public sector organizations need to consider when evaluating costs. By taking a strategic approach to cost evaluation that acknowledges the unique constraints and opportunities of public service, government organizations can successfully implement generative AI to improve services while remaining responsible stewards of public resources. As generative AI continues to transform public service delivery, this balanced approach to technology deployment, cost management, and public value creation will be essential for successful implementation. For further details on HAQM Bedrock and additional dive deep sessions on how you can manage your costs effectively, contact your AWS account team.