AWS Database Blog

Scale your connections with HAQM DocumentDB using mongobetween

HAQM DocumentDB (with MongoDB compatibility) is a fully managed native JSON document database that makes it easy and cost-effective to operate critical document workloads at virtually any scale without managing infrastructure. You can use the same application code written using MongoDB API (versions 3.6, 4.0, and 5.0) compatible drivers, and tools to run, manage, and scale workloads on HAQM DocumentDB without worrying about managing the underlying infrastructure. As a document database, HAQM DocumentDB makes it straightforward to store, query, and index JSON data.

Modern applications built for serverless deployments using AWS Lambda or AWS Fargate, or for containerized deployments using HAQM Elastic Container Service (HAQM ECS) or HAQM Elastic Kubernetes Service (HAQM EKS) are built to scale on-demand. During scale-up events, these applications may try to open a large number of connections to your HAQM DocumentDB cluster. During high variation (spiky) workload periods, these applications open and close database connections at a high rate.

Each open connection consumes memory and CPU resources on the HAQM DocumentDB instance. Each instance has a connection limit that scales with instance size. Each of the instances, primary and replicas, have individual connection limits. It can be challenging to ensure that application scale-up events don’t breach this connection limit. After the connection limit has been reached, HAQM DocumentDB rejects any further connection attempts and the application will encounter connection exceptions. Sustained frequent opening and closing of connections during spiky workload periods also results in performance and latency fluctuations from the HAQM DocumentDB cluster due to pressure on instance resources like CPU and memory.

To better manage and stabilize connections to HAQM DocumentDB for such workloads, mongobetween is a lightweight MongoDB connection pooler written in Golang. Its primary function is to handle a large number of incoming connections and multiplex them across a smaller connection pool to one or more HAQM DocumentDB clusters.

In this post, we discuss how to configure mongobetween to scale the connections beyond the connection limit of an HAQM DocumentDB instance.

Solution overview

In this post, we use mongobetween as a connection pooler on an HAQM Elastic Compute Cloud (HAQM EC2) instance. mongobetween is configured with a fixed connection pool size to limit the number of connections to HAQM DocumentDB. Applications connect to mongobetween instead of directly to HAQM DocumentDB when it requires connections higher than the DocumentDB instance connection limits. mongobetween acts as a connection multiplexer, handling the many incoming connections from the applications and efficiently managing the smaller pool of connections to HAQM DocumentDB.

The following diagram illustrates the architecture for this setup.

Prerequisites

Refer to the Prerequisites section in the sample code repository and complete the steps in the Setup mongobetween in the HAQM EC2 Instance section.

Create the test environment

To create the test environment, refer to Create the test environment in the sample code.

Run the sample application

We discuss two methods to connect to HAQM DocumentDB: directly and using mongobetween.

Connect to HAQM DocumentDB directly

In this approach, as we increase the number of processes in the Python script, the number of connections to the HAQM DocumentDB instances keep increasing and eventually reach the limit of the maximum number of connections as per the instance type. Therefore, the application has limited scalability. To simulate the scenario where the application attempts to open connections within the connection limits of HAQM DocumentDB instances refer to section Run Python script with direct connection to DocumentDB cluster with 200 concurrent processes in the sample code . To simulate the scenario where HAQM DocumentDB starts rejecting connection attempts beyond its limits refer to section Run a Python script with a direct connection to a DocumentDB cluster with 900 concurrent processes in the sample code.

Connect to HAQM DocumentDB using mongobetween

In this approach, as we increase the number of processes in the Python script, the number of open connections to the HAQM DocumentDB instances stays constant because it’s controlled by the mongobetween proxy. Therefore, the application can keep scaling connections to the proxy and mongobetween distributes and assigns a fixed set of open connections to incoming requests. To simulate the scenario where the application connects to mongobetween proxy instead of directly connecting to HAQM DocumentDB refer to section Run test script with mongobeetween connection pooling in the sample code.

High availability deployment options

The previous section demonstrates how a single mongobetween process works , multiplexing incoming connections from our application to HAQM DocumentDB. However, in production environments, you need high availability to have no single point of failure, and have multiple mongobetween instances running to meet the scale of your workload.

In this section, we discuss two common deployment approaches to make mongobetween highly available.

Sidecar deployment approach

You can run mongobetween as a sidecar to your containerized application. This will have the minimum latency added when your application code communicates to mongobetween, as compared to the approach that we discuss next. This approach also doesn’t need any complex networking setup between your application and mongobetween. Each time the application scales, it will open up another set of connections to HAQM DocumentDB via the mongobetween sidecar. Therefore, you have to be very careful about the number of outbound connections from mongobetween exceeding the HAQM DocumentDB connection limits in case of a scaling event. In this mode of deployment, this may limit the scaling capacity of your application. The following diagram illustrates this workflow.

Service-based deployment approach

The other method is to run mongobetween containers as a standalone service. Application containers connect to the mongobetween service, which routes the connection to one of mongobetween pods to connect to HAQM DocumentDB. With this approach, the application can scale independent of mongobetween and HAQM DocumentDB connection limits. The following diagram illustrates this workflow.

Latency considerations

Mongobetween only acts a proxy and accumulates incoming connections without rejecting them when a limit is exceeded. However, it can only process as many incoming requests from the application as the outgoing connections it has made to HAQM DocumentDB by mongobetween, the rest of the requests wait in the queue for a connection to be freed up. Therefore, the application needs to be designed to handle additional latency than it would normally have when the code connects directly to HAQM DocumentDB. As a result, connection timeouts need to be adjusted. Exponential backoff-based exception handling and retries is a good approach in the application design.

Conclusion

In this post, we showed how you can configure mongobetween to scale connections beyond the connection limit of an HAQM DocumentDB instance. mongobetween acts as a connection multiplexer, handling the many incoming connections from the applications and efficiently managing the smaller pool of connections to HAQM DocumentDB. We also talked about high availability deployment options for mongobetween in a production environment.

If you have any feedback of questions, leave them in the comments section.


About the authors

Sourav Biswas is a Senior HAQM DocumentDB Specialist Solutions Architect at AWS. He has been helping HAQM DocumentDB customers successfully adopt the service and implement best practices around it. Before joining AWS, he worked extensively as an application developer and solutions architect for various NoSQL vendors.

Anshu VajpayeeAnshu Vajpayee is a Senior HAQM DocumentDB Specialist Solutions Architect at AWS. He has been helping customers adopt NoSQL databases and modernize applications using HAQM DocumentDB. Before joining AWS, he worked extensively with relational and NoSQL databases.