AWS Partner Network (APN) Blog

Improving MongoDB Atlas Search Elasticity with HAQM S3

By Seth Payne, Lead Product Manager at MongoDB
By Daniel Ernst, Search Engineering Lead at MongoDB
By Arjun Gurumurthy, Staff Engineer at MongoDB
By Archana Srinivasan, Sr. Technical Account Manager at AWS
By Rashmikant Vyas (RK), Sr. Technical Account Manager at AWS
By Sergio Ariel de la Campa, Sr. Technical Account Manager at AWS

logo MongoDB
Connect to MongoDB

MongoDB customers were seeking a solution to handle Atlas Search index rebuild in case of a problem with underlying hardware or scaling activities. This blog post discusses how MongoDB has enhanced the indexing capabilities of its Atlas Search service by leveraging HAQM S3. The approach allows MongoDB to reduce the time required to rebuild search indexes, achieving a 14x improvement compared to the previous method.

Background

MongoDB Atlas a comprehensive cloud database platform, was launched on HAQM Web Services (AWS) in 2016. Today’s service update includes global expansion, now available in 34 AWS regions worldwide. With AWS, MongoDB Atlas offers a user-friendly, scalable, and secure cloud platform for deploying, managing, and monitoring MongoDB databases.

One of the key features of MongoDB Atlas is its integrated search and vector search capabilities. This eliminates the need to run a separate search system alongside your database. Initially, MongoDB ran search and database on same hardware, causing contention and scaling limitations. In 2023, MongoDB introduced dedicated search nodes on HAQM EC2 Graviton to address this. The current challenge with MongoDB Atlas search deployments is that they require complete index rebuilding for each search process, causing operational overhead and longer deployment times.

Challenges with legacy approach

MongoDB Atlas (Full-Text) Search launched 5 years ago with Atlas Vector Search being added within the last couple of years. They have become popular among developers because of the native experience within MongoDB and the evolving query capabilities and enhancements. This launch helps MongoDB Atlas users to seamlessly leverage full-text and vector search capabilities within their transactional database systems. MongoDB’s Search capabilities eliminate the need for complex ETL to keep search and transaction data in sync. With this MongoDB now supports complex search workloads with high demands for queries per second (QPS), data ingestion, and low latency. Atlas Search and Vector Search are based on Apache Lucene. The initial architecture collocated the database process (mongod) and search process (mongot) on the same hardware, which had the potential to create workload contention and scaling challenges. While this collocated approach can still be used, MongoDB released dedicated search nodes in 2023. The new architecture allows users to isolate and independently scale their search (or vector search) processes from their database processes. With the dedicated search node approach, the search process (mongot) runs on its own hardware on HAQM EC2 Graviton. The search process communicates the search results to the database process (mongod) via the network. This method eliminates resource conflicts and allows efficient scaling for large-scale search applications handling high query loads, data ingestion, and storage. Leveraging dedicated search nodes is the optimal approach for any team running search and/or vector search processes in production.

When deploying or scaling search infrastructure, MongoDB Atlas clusters rebuild indices for each search process from source MongoDB collection. Creating search indexes this way is predictable and reliable, but it means reading the entire MongoDB source collection, which may take a long time. Therefore, implementing new search infrastructure is slow and burdens the MongoDB Atlas cluster’s read operations.

Solution overview

This blog post details new Atlas Search indexing features that leverage HAQM Simple Storage Service (HAQM S3). Atlas Search in this context refers to both the full-text and vector search capabilities. MongoDB has improved the indexing experience for users with large datasets, using HAQM S3 to enhance index rebuilding during deployments, scaling, and recovery. The system periodically backs up its HAQM S3 search indexes instead of constantly rebuilding them from the MongoDB source. Rebuilding an index on a fresh instance uses the recent Atlas Search file snapshot, rebuilding only since the last capture. This speeds up deployment and scaling while improving search availability, especially for large datasets. These benefits come with no changes to current configuration for the end users.

MongoDB Atlas Search, launched in 2019, and its recent Atlas Vector Search feature provide developers with native search capabilities within MongoDB. This eliminates the need for complex ETL processes to synchronize search and transactional data. The search solutions initially ran on the same hardware as the database process, causing resource contention and scaling limitations. MongoDB addressed these challenges in 2023 by introducing dedicated search nodes on HAQM EC2 Graviton hardware. However, a significant challenge remains: when deploying or scaling search infrastructure, MongoDB Atlas must rebuild indices for each search process by performing a complete read of the source collection. This creates operational burden and extends deployment times.

Architecture overview

The MongoDB Atlas Control Plane comprises services whose functionality includes handling search index management command, node configuration updates, retrieving index definitions, and provisioning mongot and mongod nodes. Atlas Search indexes are created by specifying a mongod collection as the data source, along with an index configuration. Whenever a customer creates or updates a search index, mongots retrieves its definition from MongoDB Atlas. It then builds the index and dynamically updates it to reflect changes in the collection’s documents.

These search indexes are built on Lucene, and comprise a set of immutable files on a disk. MongoDB Atlas Search takes an index snapshot which is essentially a set of the files corresponding to a valid queryable Lucene index. Figure 1 architecture allows for periodic and efficient uploading of search indexes to HAQM S3, and retrieval of this data when an index rebuild is required. If index files remain unchanged between snapshots, they don’t need to be re-uploaded, optimizing storage and transfer processes.

Solution architecture of Atlas Search using HAQM S3 for storing index snapshots

Figure 1: Architecture of Atlas Search using HAQM S3

The design ensures that index uploads and downloads are atomic and resilient to mongot restarts and transient errors. If a download fails, the system gracefully falls back to building the search index from scratch.

The upload index process involves retrieving HAQM S3 credentials, capturing the latest index snapshot, uploading the index in parallel, and storing index metadata in HAQM S3. For downloads, mongots retrieve credentials, query HAQM S3 for index metadata, and download files in parallel if a snapshot is available.

MongoDB uses GetFederationTokens to vend credentials scoped per customer, ensuring that a customer’s data remains inaccessible to others. The testing strategy is rigorous, involving unit, integration, and end-to-end tests. MongoDB even employed LocalStack to create fault-injectable HAQM S3 clients, simulating various exceptions and errors to verify mongot behavior under different conditions.

HAQM S3 plays a crucial role in enhancing the speed and efficiency of Atlas Search’s index rebuilding process. HAQM S3 provides a reliable and scalable storage solution for periodic snapshots of search index states. This enables quick retrieval of the latest index data when rebuilding is necessary. By using this approach, we get the latest index snapshot from HAQM S3, which speeds up rebuilding compared to using the MongoDB source.

Efficient storage management involves the asynchronous deletion of older snapshots once new ones are uploaded. MongoDB leverages HAQM S3 Lifecycle policies to automatically delete orphaned files, ensuring optimal use of storage resources and maintaining a clean, up-to-date snapshot repository.

The design also incorporates other critical safeguards: limiting resource utilization during S3 access preserves core search functionality, and distributed uploads across multiple search hosts prevent system overload.

Benefits

Traditional MongoDB index recreation during scaling or node rebuilds takes hours, posing scalability challenges due to its time-consuming nature. With its new HAQM S3 powered feature, MongoDB Atlas Search now offers customers faster search index rebuild times. MongoDB’s internal tests have shown this time is reduced by 14X with HAQM S3. This new feature is transparent to MongoDB users, requiring no action from customers.

HAQM S3 based decoupling of the indexing layer allows for a scalable and evolving design. Previously, preventing concurrent index uploads required implementing concurrency control mechanisms. However, with the recently announced conditional writes support in S3, this process is simplified – the native S3 feature can automatically prevent concurrent uploads.

Conclusion

MongoDB Atlas has improved its Search capabilities by implementing an innovative HAQM S3-based approach to index rebuilding. The new method delivers performance and efficiency gains. This new method lets us periodically save copies of search indexes to HAQM S3, which speeds up index rebuilding when deploying, scaling, or recovering. MongoDB’s new method of rebuilding indexes, using HAQM S3’s latest Atlas Search file-level snapshot and starting from the last capture point, is 14 times faster than the old method.

MongoDB customers using Atlas Search dedicated nodes experience faster scaling and node rebuilding, decreasing index reconstruction time from hours to minutes. The HAQM S3 based approach serves as a foundation for future features.

MongoDB-APN-Blog-Connect-2022


MongoDB – AWS Partner Spotlight

MongoDB is an AWS Competency Partner. Their modern, general purpose database platform, is designed to unleash the power of software and data for developers and the applications they build.

To learn more, refer to these getting started guides for Atlas Search or Atlas Vector Search for step-by-step instructions.

Contact MongoDB | Partner Overview | AWS Marketplace