Choose the right throughput strategy for HAQM DynamoDB applications

HAQM DynamoDB is a fully managed key-value and document database designed to deliver consistent single-digit millisecond performance at any scale. When getting started with DynamoDB, one of the first decisions you will make is choosing between two throughput modes: on-demand and provisioned. On-demand mode is the default and recommended throughput option because it simplifies building modern, serverless applications that can start small and scale to millions of requests per second. However, choosing the right throughput strategy requires evaluating your operational needs, development velocity, and application characteristics, with cost being a key consideration.

In this post, we examine both throughput modes in detail, exploring their characteristics, strengths, and ideal use cases. Regardless of which capacity mode or table class you choose, DynamoDB provides consistent single-digit millisecond latency at any scale, and your tables benefit from enterprise-ready features such as built-in fault tolerance, backup and restore, encryption at rest, and global tables.

Throughput capacity modes

DynamoDB offers two capacity modes to handle your application’s read and write throughput requirements. Each DynamoDB table’s throughput can be independently configured, you can have some tables in on-demand capacity mode and another in provisioned capacity mode. The following table shows a high-level comparison between them based on different features. The sections below will expand on each one of them in detail.

Key differences at a glance

Feature	On-demand	Provisioned
Capacity planning	Optional	Required
Cost	Based on the number of requests	Based on provisioned read and write capacity per hour, not utilization
Scale limits	Based on table and account quotas, and max throughput. (The default value is 40,000 RCU and WCU but you can request a quota increase).	Based on provisioned capacity, table and account limits and max throughput.
Cost optimization	Better for variable and unpredictable workloads.	Better for steady and predictable workloads.
Management overhead	Very low (fully managed).	Requires effort to identify right provisioned settings and needs regular fine-tuning as workload changes
Potential causes of throttling	Since it scales automatically, it can be less likely to hit throttling errors.	You rely on the table’s provisioned capacity and auto-scaling configuration. It requires at least 2 consecutive minutes above the threshold utilization, before auto-scaling adds capacity, it is more likely to hit throttling errors.
Recommended use cases	Automatic scaling without capacity management. Pay-per-request pricing model. Variable or unpredictable traffic patterns. New applications or unknown workloads. Need for serverless operations. Minimal operational overhead preferred.	Steady, predictable throughput patterns. Reliable capacity forecasting possible. Consistent utilization above 50%. Regular daily or hourly traffic cycles. Limited, predictable small traffic bursts.

On-demand capacity

On-demand capacity mode offers you a fully managed, serverless database experience that automatically scales in response to application traffic. It eliminates the need to predict capacity usage and pre-provision resources for day-to-day operations. DynamoDB automatically creates headroom (up to double the previous peak in traffic) for an organic growing workload, supporting up to millions of requests per second. You pay only for the actual reads and writes your application performs with predictable billing that directly correlates to your usage, and you don’t have to worry about capacity planning.

On-demand provides additional capabilities such as configurable maximum throughput, an optional table-level setting that provides an additional layer of cost predictability and fine-grained control by allowing you to specify maximum read or write (or both) throughput for on-demand tables. Additionally, warm throughput provides insight into the read and write operations your table can immediately support without throttling. While warm throughput values grow automatically as your usage increases, you can also proactively set higher warm throughput values through the pre-warming process.

For on-demand tables, warm throughput represents the minimum capacity your table is prepared to handle instantaneously. Pre-warming doesn’t provision capacity in advance but rather prepares the table’s internal partitioning structure to handle the expected load. This is an asynchronous, non-blocking operation that helps ensure your table can handle anticipated traffic surges without throttling.

New on-demand tables start with initial warm throughput values of 4,000 write requests and 12,000 read requests per second. Without pre-warming, exceeding these initial values may result in some throttling as DynamoDB gradually scales to accommodate the increased demand.

Provisioned capacity

In provisioned capacity mode, you must specify the number of reads and writes per second that you require for your application’s traffic. You are charged based on the provisioned hourly read and write capacity; not how much was actually consumed. As a result, even during periods of low or no activity, you are charged for the total throughput provisioned. Provisioned capacity mode is ideal for workloads with steady, predictable and cyclical traffic for a given hour or day.

With provisioned capacity, you can:

Schedule scaling activities for known usage patterns
Adjust capacity manually as needed
Use auto-scaling to adjust capacity automatically

However, actual billing can vary with provisioned mode due to auto-scaling activities. When auto-scaling responds to traffic changes, your provisioned capacity (and thus costs) may remain elevated even after demand decreases. In contrast, on-demand mode provides more predictable billing as costs directly correlate with your actual request volume, without being affected by scaling behaviors.

With DynamoDB auto-scaling, you can configure utilization thresholds between 20% and 90% of your provisioned capacity. The service monitors your table’s consumption through HAQM CloudWatch metrics at one-minute intervals. When your consumed capacity exceeds the target utilization for two consecutive minutes, auto-scaling initiates a scale-up operation. There might be a short delay of up to a few minutes before triggering auto-scaling.

For scale-down operations, DynamoDB uses more conservative thresholds to protect your application performance. Usage must remain below 20% of the current threshold and persist for 15 minutes. CloudWatch alarms may take a few additional minutes to trigger the actual scaling.

Even with auto-scaling enabled, make sure to regularly monitor your capacity settings, auto-scaling configurations, and application requirements. This monitoring helps ensure your settings remain optimal for both performance and cost.

Managing throttling and traffic changes

The way your table responds to sudden traffic increases is a key factor in choosing between capacity modes. On-demand mode instantaneously accommodates requests to your table without the need to scale any resources up or down, making it ideal for rapid traffic changes. Provisioned mode, with auto-scaling, works well for traffic that increases gradually over several minutes.

While both capacity modes help manage throughput, they don’t prevent all types of throttling. Two important types of throttling can occur regardless of your capacity mode:

Table-level throttling occurs when your application attempts to consume more capacity than your table’s provisioned capacity. For provisioned tables, this happens when you exceed your provisioned and burst capacity. To address this, you can increase the provisioned capacity, lower your auto-scaling thresholds or modify your application traffic to gradually ramp up to ensure sufficient burst capacity is available for spikes. For on-demand tables, table level-throttling is less common, you can use warm throughput to prepare for special marketing events that could drive large spikes in traffic. In both scenarios it is important to validate your table and account limits.
Hot-partition throttling happens when many requests target items with the same partition key, concentrating traffic on a few partitions. While DynamoDB can scale to support virtually unlimited throughput at the table level, individual partitions have throughput limits. These limits are adaptive based on your table’s overall throughput rather than fixed values.
When a partition receives a disproportionate amount of traffic compared to other partitions, requests may be throttled even if your table has unused capacity overall. This is known as a hot partition problem. Solving hot-partition throttling typically requires data modeling changes to better distribute your access patterns across multiple partitions.

Burst capacity and auto-scaling behavior (Provisioned mode only)

For tables in provisioned mode, DynamoDB provides burst capacity to handle brief traffic spikes. This burst capacity comes from unused throughput and helps prevent throttling during sudden, short-term increases in traffic. However, if high traffic persists, you might experience throttling after burst capacity is depleted and until auto-scaling adds more capacity.

Understanding utilization thresholds

For tables in provisioned mode, auto-scaling thresholds determine how much spare capacity your table maintains. A higher threshold (such as 80%) optimizes costs but provides less buffer for traffic spikes, while a lower threshold (such as 60%) costs more but offers better protection against sudden increases. When setting your threshold, consider how quickly your traffic typically increases, your application’s tolerance for throttling, and the balance between cost and performance requirements.

The following graph shows how different thresholds affect capacity allocation. The green line represents your actual WCU consumption. The orange line shows the allocated provisioned capacity at 80% threshold utilization, notice how closely it follows consumption, offering cost efficiency but higher throttling risk. At 70% threshold utilization, the provisioned capacity line provides more buffer for sudden traffic increases. The most conservative approach shows 60% threshold utilization, offering additional protection against traffic spikes at the cost of unused capacity.

With on-demand mode, you pay only for the throughput you use, eliminating the need to track utilization.

Workload patterns and traffic changes

Understanding your workload characteristics is crucial for choosing between capacity modes. Two key factors to consider are your workload’s predictability and how it handles changes in application traffic.

Workload predictability affects which mode will serve you better. Provisioned capacity works best for workloads where throughput changes follow gradual, smooth patterns. Examples include enterprise applications with daily business-hour transitions, known batch processes that scale up and down on regular schedules, or systems that experience steady growth over time. Even if absolute volumes vary day-to-day, provisioned capacity remains effective as long as the changes follow expected patterns and give you time to gradually adjust capacity to adapt to the expected patterns.

For most modern applications, especially those with variable traffic patterns, on-demand capacity proves more effective. Ecommerce, media and entertainment, gaming, and Internet of Things (IoT) applications benefit from its ability to handle dynamic workload requirements without capacity planning. This becomes particularly important when traffic varies by the second or minute.

To illustrate these differences, consider a scenario where your application will experience a sudden increase in traffic as a result of a marketing event. This will be the highest traffic your table has ever experienced, and you are uncertain when exactly it will happen. Let’s examine how each mode handles this.

Both capacity modes can benefit from warm throughput to prepare resources in advance for the new volume. However, the operational requirements differ significantly between modes:

With on-demand mode, once your table is pre-warmed, it can instantly support the prewarmed value without any user intervention, and you pay for the actual number of read and write requests made.

Using provisioned capacity, even with warm throughput, you would still need to update the current auto-scaling settings, set a new maximum value that matches the expected load, lower the target utilization (perhaps 30-40%) to maintain substantial buffer throughput, as the table relies on burst capacity during the scaling period, monitor auto-scaling effectiveness, and potentially accept throttling risks if burst capacity depletes before auto-scaling completes.

The uncertainty of when the traffic increase will occur creates a challenging trade-off with provisioned capacity: setting a low target utilization means paying for unused capacity until the event happens, while setting it higher to control costs increases your risk of throttling when the traffic surge arrives

This fundamental difference in handling traffic changes makes on-demand mode particularly valuable for applications where consistent performance is critical.

Cost and operational value

Cost considerations should factor in both absolute costs and operational overhead when choosing between on-demand and provisioned capacity modes. While on-demand mode has a fixed per-request price, its overall cost-effectiveness depends on your usage patterns. For workloads with significant traffic variations, as indicated in the next graph, on-demand can be more cost-effective as it automatically scales without the need for over-provisioning.

Provisioned capacity’s effective per-hour cost varies based on configured target utilization. At lower utilization rates, you may end up paying more per request than with on-demand mode. For example, if you need to maintain high capacity to handle periodic spikes, you’ll be paying for that capacity even during low-traffic periods that might make up the majority of your operating time. This over-provisioning can significantly increase your costs.

Additionally, provisioned capacity requires managing several indirect costs: paying for provisioned capacity whether used or not, time spent monitoring utilization, overhead of adjusting capacity levels to match demand, and the risk of over-provisioning to handle traffic spikes. On-demand mode eliminates this management overhead since there’s no capacity to monitor or adjust.

However, for predictable, high-utilization workloads as indicated in the following graph, provisioned capacity can be more cost-effective if managed efficiently. The key is to carefully analyze your usage patterns and operational needs when making the choice between these modes. You can use reserved provisioned capacity for workloads with highly predictable patterns and well-understood scaling needs, where you can save up to 77% over standard provisioned capacity rates.

The on-demand pay-per-use model can provide direct cost savings in several common scenarios:

Applications with infrequent traffic spikes that would otherwise require constant over-provisioning
Workloads with long periods of low activity between busy periods
Systems with unpredictable usage patterns

Beyond direct costs, on-demand mode’s value proposition centers on operational simplicity and business agility in several key areas:

Resource Management:
- Invest in continuous monitoring and capacity planning
- Spend engineering time fine-tuning auto-scaling settings
Risk Mitigation:
- Mitigate business impact from potential throttling during traffic spikes
- Over-provision as a risk mitigation strategy

This operational simplicity aligns particularly well with modern applications where development speed and consistent performance directly impact business success.

Consider provisioned capacity only when you have predictable workloads with minimal variation and can maintain high utilization of your provisioned capacity consistently. Otherwise, the simplified operations and elimination of capacity management when using on-demand mode typically provide better overall value.

Monitoring overhead

Managing provisioned capacity requires ongoing attention from your engineering team. You need to monitor CloudWatch metrics regularly to understand usage patterns (Consumed WCU over provisioned WCU to understand write utilization, and consumed RCU over provisioned RCU to understand read utilization), identify potential throttling issues (read and write throttles), and fine-tune your auto-scaling configuration based on your application’s needs. Monitor consumed vs provisioned metrics to understand table utilization, this ongoing maintenance isn’t just about watching dashboards—it requires understanding complex usage patterns and making informed decisions about capacity adjustments.

On-demand capacity eliminates most of this operational overhead, allowing you to monitor consumption through consumed WCU and consumed RCU metrics. However, both capacity modes require some preparation for significant events such as seasonal sales or marketing campaigns. For these high-traffic events, configure warm throughput to make sure that your table can handle the expected request volume. However, the preparation process differs significantly between modes.

Preparing for high-traffic events with provisioned capacity involves comprehensive planning: estimating and adjusting base capacity, configuring auto-scaling parameters, maintaining adequate buffer capacity, and planning for both scale-up and scale-down phases. You’ll need to monitor and adjust these settings before, during, and after the event to ensure optimal performance.

With on-demand capacity, preparation focuses on setting appropriate warm throughput levels and maximum throughput limits. This streamlined approach lets your team focus on your event’s success rather than capacity management, while still maintaining control over costs and downstream impact.

Auto-scaling fine tuning

If you’re using provisioned capacity, you must understand how auto-scaling timing works in practice. While on-demand tables adapt immediately to traffic changes, provisioned tables have specific scaling rules. For scaling up, DynamoDB requires two consecutive minutes of increased utilization over the current target before initiating a scale up operation. During this period, your table needs sufficient buffer capacity (burst capacity included) to handle the increased traffic.

Scaling down follows a more conservative pattern. DynamoDB waits 15 minutes after traffic drops to ensure it’s not just a temporary decrease. After confirming the lower traffic pattern, your table can scale down up to four times in the first hour, and once per hour thereafter. These timing constraints help prevent excessive scaling operations while maintaining stable performance.

Setting appropriate utilization thresholds requires balancing between having enough buffer capacity for traffic increases and optimizing capacity usage during normal operations. This auto-scaling approach works well for applications with predictable, stable workloads where traffic patterns are well understood. For applications with variable workloads, you might need to maintain higher buffer capacity or accept occasional throttling during sudden traffic increases.

Conclusion

Both DynamoDB capacity modes offer the same performance characteristics and enterprise-ready features. However, on-demand capacity mode is the recommended choice for most modern applications due to its ability to handle unexpected traffic patterns, eliminate capacity planning overhead, and reduce operational complexity.

Consider on-demand capacity mode when you:

Need immediate response to traffic changes
Want to minimize operational overhead
Have variable or unpredictable workloads
Value development agility and operational simplicity

Provisioned capacity mode remains a viable option for specific use cases when you:

Have predictable, steady-state workloads with consistent utilization above 50%.
Can leverage reserved capacity purchases, scheduled scaling, and fine-tuned auto-scaling.
Have the operational capacity to monitor and optimize capacity regularly.

To help you visualize the decision-making process, here’s a decision tree summarizing when to choose on-demand vs. provisioned capacity mode:

While on-demand mode offers operational simplicity for most modern applications, provisioned mode can still provide significant cost savings when you have tolerance for the additional overhead. However, carefully evaluate whether these potential savings outweigh the operational complexities of capacity management.

Remember, you can always start with on-demand mode to understand your usage patterns and switch to provisioned mode later if your workload characteristics justify the change.

For more information, see:

In our next post, we’ll explore specific use cases and scenarios to help you make an informed decision about capacity modes for your particular needs.

About the author

Esteban Serna, Principal DynamoDB Specialist Solutions Architect, is a database enthusiast with 15 years of experience. From deploying contact center infrastructure to falling in love with NoSQL, Esteban’s journey led him to specialize in distributed computing. Today, he helps customers design massive-scale applications with single-digit millisecond latency using DynamoDB. Passionate about his work, Esteban loves nothing more than sharing his knowledge with others.

AWS Database Blog