HAQM DocumentDB

HAQM DocumentDB (with MongoDB compatibility) FAQs

Page topics

General
13
Performance
1
Pricing
5
Elastic Clusters
11
Hardware, scaling, and storage
3
Backup and restore
12
High availability and replication
12
Security and compliance
6
Major version upgrade
4
Machine learning
2
Generative AI and machine learning
6
Zero-ETL integration
7

General

Open all

What is HAQM DocumentDB (with MongoDB compatibility)?

HAQM DocumentDB (with MongoDB compatibility) is a fast, scalable, highly available, and fully managed enterprise document database service that supports native JSON workloads. As a document database, HAQM DocumentDB makes it easy to store, query, and index JSON data. Developers can use the same MongoDB application code, drivers, and tools as they do today to run, manage, and scale workloads on HAQM DocumentDB. Enjoy improved performance, scalability, and availability without worrying about managing the underlying infrastructure.

Customers can use AWS Database Migration Service (DMS) to easily migrate their on-premises or HAQM Elastic Compute Cloud (EC2) MongoDB non-relational databases to HAQM DocumentDB with virtually no downtime. There are no upfront investments required to use HAQM DocumentDB, and customers only pay for the capacity they use.

What use cases are well-suited for a document database like HAQM DocumentDB?

Document-oriented databases are one of the fastest growing categories of noSQL databases, with the primary reason being that document databases offer both flexible schemas and extensive query capabilities. The document model is a great choice for use cases with dynamic datasets that require ad-hoc querying, indexing, and aggregations. With the scale that HAQM DocumentDB provides, it is used by a wide variety of customers for use cases such as content management, personalization, catalogs, mobile and web applications, IoT, and profile management.

What does "MongoDB-compatible" mean?

“MongoDB compatible” means that HAQM DocumentDB interacts with the Apache 2.0 open source MongoDB 3.6, 4.0, and 5.0 APIs. As a result, you can use the same MongoDB drivers, applications, and tools with HAQM DocumentDB with little or no changes. While HAQM DocumentDB supports a vast majority of the MongoDB APIs that customers actually use, it does not support every MongoDB API. Our focus has been to deliver the capabilities that customer actually use and need.

Since launch, we have continued to work backwards from customers and have delivered an additional 80+ capabilities, including MongoDB 4.0 and 5.0 compatibility, transactions, and sharding. To learn more about the supported MongoDB APIs, see the compatibility documentation . To learn more about recent HAQM DocumentDB launches, see “HAQM DocumentDB Announcements” on the HAQM DocumentDB resources page.

Is HAQM DocumentDB restricted by the MongoDB SSPL license?

No. HAQM DocumentDB does not utilize any MongoDB SSPL code and thus is not restricted by this license. Instead, HAQM DocumentDB interacts with the Apache 2.0 open-source MongoDB 3.6, 4.0, and 5.0 APIs. We will continue to listen and work backward from our customers to deliver the capabilities that they need. To learn more about the supported MongoDB APIs, see the compatibility documentation . To learn more about recent HAQM DocumentDB launches, see “HAQM DocumentDB Announcements” on the HAQM DocumentDB resources page .

How can I migrate data from an existing MongoDB database to HAQM DocumentDB?

Customers can use AWS Database Migration Service (DMS) to easily migrate their on-premises or HAQM Elastic Compute Cloud (EC2) MongoDB databases to HAQM DocumentDB with virtually no downtime. With DMS, you can migrate from a MongoDB replica set or from a sharded cluster to HAQM DocumentDB. Additionally, you can use most existing tools to migrate data from a MongoDB database to HAQM DocumentDB, including mongodump/mongorestore, mongoexport/mongoimport , and third-party tools that support Change Data Capture (CDC) via the oplog. For more information, see Migrating to HAQM DocumentDB .

Do I need to change client drivers to use HAQM DocumentDB?

No, HAQM DocumentDB works with a vast majority of MongoDB drivers compatible with MongoDB 3.4+.

Does HAQM DocumentDB support ACID transactions?

Yes. With the launch of support for MongoDB 4.0 compatibility, HAQM DocumentDB supports the ability to perform atomicity, consistency, isolation, durability (ACID) transactions across multiple documents, statements, collections, and databases.

Is HAQM DocumentDB subject to MongoDB's end of life (EOL) schedule?

No, HAQM DocumentDB does not follow the same support lifecycles as MongoDB and MongoDB's EOL schedule does not apply to HAQM DocumentDB.

How do I access my HAQM DocumentDB cluster?

HAQM DocumentDB clusters are deployed within a customer's HAQM VPC (VPC) and can be accessed directly by HAQM Elastic Compute Cloud (EC2) instances or other AWS services that are deployed in the same VPC. Additionally, HAQM DocumentDB can be accessed by HAQM EC2 instances or other AWS services in different VPCs in the same region or other regions via VPC peering. Access to HAQM DocumentDB clusters must be done through the mongo shell or with MongoDB drivers. HAQM DocumentDB requires that you authenticate when connecting to a cluster. For additional options, see Connecting to an HAQM DocumentDB Cluster from Outside an HAQM VPC .

Why are HAQM RDS permissions and resources required to use HAQM DocumentDB?

For certain management features such as instance lifecycle management, encryption-at-rest with HAQM Key Management Service (KMS) keys and security groups management, HAQM DocumentDB leverages operational technology that is shared with HAQM Relational Database Service (RDS) and HAQM Neptune . When using the describe-db-instances and describe-db-clusters AWS CLI APIs, we recommend filtering for HAQM DocumentDB resources using the following parameter: "--filter Name=engine,Values=docdb".

What instances types does HAQM DocumentDB offer?

Please see the HAQM DocumentDB pricing page for current information on available instance types per region.

How do I try HAQM DocumentDB?

To try HAQM DocumentDB, please see the Getting Started guide.

Does HAQM DocumentDB have an SLA?

Yes. For more information, please see HAQM DocumentDB (with MongoDB compatibility) Service Level Agreement .

Performance

Open all

What type of performance can I expect from HAQM DocumentDB?

When writing to storage, HAQM DocumentDB only persists a write-ahead logs, and does not need to write full buffer page syncs. As a result of this optimization, which does not compromise durability, HAQM DocumentDB writes are typically faster than traditional databases. HAQM DocumentDB clusters can scale out to millions of reads per second with up to 15-read replicas .

Pricing

Open all

How much does HAQM DocumentDB cost and in which AWS regions is HAQM DocumentDB available?

Please see the HAQM DocumentDB pricing page for current information on regions and prices.

Does HAQM DocumentDB have a free tier and can you get started for free?

Yes, you can try HAQM DocumentDB for free using the 1-month free trial. If you have not used HAQM DocumentDB before, you are eligible for a one month free trial. Your organization gets 750 hours per month of t3.medium instance usage, 30 million IOs, 5 GB of storage, and 5 GB of backup storage for free for 30 days. Once your one month free trial expires or your usage exceeds the free allowance, you can shut down your cluster to avoid any charges, or keep it running at our standard on-demand rates . To learn more, refer to the DocumentDB free trial page .

Why should I use HAQM DocumentDB I/O-Optimized?

HAQM DocumentDB I/O-Optimized is the ideal choice when you need predictable costs or have I/O intensive applications. If you expect your I/O costs to exceed 25% of your total HAQM DocumentDB database costs, this option offers enhanced price performance. Refer to our HAQM DocumentDB I/O-Optimized documentation to learn more, including how to get started.

Can I switch back and forth between the I/O-Optimized and standard storage configurations?

You can switch your existing database clusters once every 30 days to HAQM DocumentDB I/O-Optimized. You can switch back to HAQM DocumentDB standard storage configurations at any time.

With HAQM DocumentDB I/O-Optimized, do I continue paying for the I/Os required for replicating data across regions with Global Clusters?

Yes, the charges for the I/O operations required to replicate data across regions continue to apply. HAQM DocumentDB I/O-Optimized does not charge for read and write I/O operations, which is different from data replication. Refer to our HAQM DocumentDB I/O-Optimized documentation to learn more.

Elastic Clusters

Open all

What is HAQM DocumentDB Elastic Clusters?

HAQM DocumentDB Elastic Clusters enables you to elastically scale your document database to handle millions of writes and reads, with petabytes of storage capacity. Elastic Clusters simplifies how customers interact with HAQM DocumentDB by automatically managing the underlying infrastructure and removing the need to create, remove, upgrade, or scale instances.

How do I get started with Elastic Clusters?

You can create an Elastic Clusters cluster using the HAQM DocumentDB API, SDK, CLI, CloudFormation (CFN), or the AWS console. When provisioning your cluster, you specify how many shards and the compute per shard that your workload needs. Once you have created your cluster, you are ready to start leveraging Elastic Clusters’ elastic scalability. Now, you can connect to the Elastic Clusters cluster and read or write data from your application. Elastic Clusters is elastic. Depending on your workload’s needs, you can add or remove compute by modifying your shard count and/or compute per shard using the AWS console, API, CLI, or SDK. Elastic Clusters will automatically provision/de-provision the underlying infrastructure and rebalance your data.

How does Elastic Clusters work?

Elastic Clusters uses sharding to partition data across HAQM DocumentDB’s distributed storage system. Sharding, also known as partitioning, splits large data sets into small data sets across multiple nodes enabling customers to scale out their database beyond vertical scaling limits of a single database. Elastic Clusters utilizes the separation of compute and storage in HAQM DocumentDB. Rather than re-partitioning collections by moving small chunks of data between compute nodes, Elastic Clusters can copy data efficiently within the distributed storage system.

What types of sharding does Elastic Clusters support?

Elastic Clusters supports hash-based partitioning.

How is Elastic Clusters different from MongoDB sharding?

With Elastic Clusters, you can easily scale out or scale in your workload on HAQM DocumentDB typically with little to no application downtime or impact to performance regardless of data size. A similar operation on MongoDB would impact application performance and take hours, and in some cases days. Elastic Clusters also offers differentiated management capabilities such as no impact backups and rapid point in time restore enabling customers to focus more time on their applications rather than managing their database.

Do I need to make any changes to my application to use Elastic Clusters?

No. You do not need to make any changes to your application to use Elastic Clusters.

Can I convert my existing HAQM DocumentDB cluster to an Elastic Clusters cluster?

No, in the near-term, you can leverage AWS Database Migration service (DMS) to migrate data from an existing HAQM DocumentDB cluster to an Elastic Clusters cluster.

How do I define a shard key?

Choosing an optimal shard key for Elastic Clusters is no different than other databases. A great shard key has two characteristics - high frequency and high cardinality. For example, if your application stores user_orders in DocumentDB, then generally you have to retrieve the data by the user. Therefore, you want all orders related to a given user to be in one shard. In this case, user_id would be a good shard key. Read more information .

What are the concepts associated with Elastic Clusters?

Elastic Clusters: An HAQM DocumentDB cluster that allows you to scale your workload’s throughput to millions of reads/writes per second and storage to petabytes. An Elastic Cluster cluster comprises of one or more shards for compute and a storage volume, and is highly available across multiple Availability Zones by default.
Shard: A shard provides compute for an elastic cluster. It will have a single writer instance and 0–15 read replicas. By default, a shard will have two instances: a writer and a single read replica. You can configure a maximum of 32 shards and each shard instance can have a maximum of 64 vCPUs.
Shard key: A shard key is a required field in your JSON documents in sharded collections that elastic clusters use to distribute read and write traffic to the matching shard.
Sharded collection: A sharded collection is a collection whose data is distributed across an elastic cluster in data partitions.

How does Elastic Clusters relate to other AWS services?

Elastic Clusters integrates with other AWS services in the same way DocumentDB does today. First, you can use AWS Database Migration Service (DMS) to migrate from MongoDB and other relational databases to Elastic Clusters. Second, you can monitor the health and performance of your Elastic Clusters cluster using HAQM CloudWatch. Third, you can set up authentication and authorization through AWS IAM users and roles and use AWS VPC for secure VPC-only connections. Last, you can use AWS Glue to import and export data from/to other AWS services such as S3, Redshift and OpenSearch.

Can I migrate my existing MongoDB sharded workloads to Elastic Clusters?

Yes. You can migrate your existing MongoDB sharded workloads to Elastic Clusters. You can either use the AWS Database Migration Service or native MongoDB tools, such as mongodump and mongorestore, to migrate your MongoDB workload to Elastic Clusters. Elastic Clusters also supports MongoDB’s commonly used APIs, such as shardCollection(), giving you the flexibility to reuse existing tooling and scripts with HAQM DocumentDB.

Hardware, scaling, and storage

Open all

What are the minimum and maximum storage limits of an HAQM DocumentDB cluster?

The minimum storage is 10 GB. Based on your cluster usage, your HAQM DocumentDB storage will automatically grow, up to 128 TiB in 10 GB increments with no impact on performance. With HAQM DocumentDB Elastic Clusters, storage will automatically grow up to 4 PiB in 10 GB increments. For either case, there is no need to provision storage in advance.

How does HAQM DocumentDB scale?

HAQM DocumentDB scales in two dimensions: storage and compute. HAQM DocumentDB's storage automatically scales from 10 GB to 128 TiB in Instance-based Clusters, and up to 4 PiB for HAQM DocumentDB Elastic Clusters. HAQM DocumentDB's compute capacity can be scaled up by creating larger instances and horizontally (for greater read throughput) by adding additional replica instances to the cluster.

How do I scale the compute resources associated with my HAQM DocumentDB cluster?

You can scale the compute resources allocated to your instance in the AWS Management Console by selecting the desired instance and clicking the “modify” button. Memory and CPU resources are modified by changing your instance class .

When you modify your instance class, your requested changes will be applied during your specified maintenance window. Alternatively, you can use the "Apply Immediately" flag to apply your scaling requests immediately. Both of these options will have an availability impact for a few minutes as the scaling operation is performed. Bear in mind that any other pending system changes will also be applied.

Backup and restore

Open all

How do I enable backups for my cluster?

Automated backups are always enabled on HAQM DocumentDB clusters. HAQM DocumentDB’s simple database backup capability enables point-in-time recovery for your clusters. You can increase your backup window for point-in-time restores up to 35 days. Backups do not impact database performance.

Can I take cluster snapshots and keep them around as long as I want?

Yes. Manual snapshots can be retained beyond the backup window and there is no performance impact when taking snapshots. Note that restoring data from cluster snapshots requires creating a new cluster.

If my instance fails, what is my recovery path?

HAQM DocumentDB automatically makes your data durable across three Availability Zones (AZs) within a Region and will automatically attempt to recover your instance in a healthy AZ with no data loss. In the unlikely event your data is unavailable within HAQM DocumentDB storage, you can restore from a cluster snapshot or perform a point-in-time restore operation to a new cluster. Note that the latest restorable time for a point-in-time restore operation can be up to five minutes in the past.

What happens to my automated backups and cluster snapshots if I delete my cluster?

You can choose to create a final snapshot when deleting your instance . If you do, you can use this snapshot to restore the deleted instance at a later date. HAQM DocumentDB retains this final user-created snapshot along with all other manually created snapshots after the instance is deleted. Only snapshots are retained after the instance is deleted (i.e., automated backups created for point-in-time restore are not kept).

What happens to my automated backups and cluster snapshots if I delete my account?

Deleting your AWS account will delete all automated backups and snapshot backups contained in the account.

Will I be billed for shared snapshots?

There is no charge for sharing snapshots between accounts. However, you may be charged for the snapshots themselves, as well as any clusters that you restore from shared snapshots.

Can I use HAQM DocumentDB snapshots outside of the service?

No. HAQM DocumentDB snapshots can only be used inside of the service.

What happens to my backups if I delete my cluster?

You can choose to create a final snapshot when deleting your cluster . If you do, you can use this snapshot to restore the deleted cluster at a later date. HAQM DocumentDB retains this final user-created snapshot along with all other manually created snapshots after the cluster is deleted.

High availability and replication

Open all

How does HAQM DocumentDB improve my cluster’s fault tolerance to disk failures?

HAQM DocumentDB automatically divides your storage volume into 10 GB segments spread across many disks. Each 10 GB chunk of your storage volume is replicated six ways, across three Availability Zones (AZs). HAQM DocumentDB is designed to transparently handle the loss of up to two copies of data without affecting write availability and up to three copies without affecting read availability. HAQM DocumentDB’s storage volume is also self-healing. Data blocks and disks are continuously scanned for errors and repaired automatically.

How does HAQM DocumentDB improve recovery time after a database crash?

Unlike other databases, after a database crash , HAQM DocumentDB does not need to replay the redo log from the last database checkpoint (typically five minutes) and confirm that all changes have been applied, before making the database available for operations. This reduces database restart times to less than 60 seconds in most cases. HAQM DocumentDB moves the cache out of the database process and makes it available immediately at restart time. This prevents you from having to throttle access until the cache is repopulated to avoid brownouts.

What kind of replicas does HAQM DocumentDB support?

HAQM DocumentDB supports read replicas , which share the same underlying storage volume as the primary instance. Updates made by the primary instance are visible to all HAQM DocumentDB replicas.

Feature: HAQM DocumentDB read replicas
Number of replicas: Up to 15
Replication Type: Asynchronous (typically milliseconds)
Performance impact on primary: Low
Act as failover target: Yes (no data loss)
Automated failover: Yes

Can I have cross-region replicas with HAQM DocumentDB?

Yes, you can replicate your data across regions using the Global Cluster feature . Global Clusters span across multiple AWS Regions. Global clusters replicate your data to clusters in up to five Regions with little to no impact on performance. Global clusters provide faster recovery from Region-wide outages and enable low-latency global reads. To learn more see our blog post .

Can I prioritize certain replicas as failover targets over others?

Yes. You can assign a promotion priority tier to each instance on your cluster. If the primary instance fails, HAQM DocumentDB will promote the replica with the highest priority to primary. If there are inconsistencies between two or more replicas in the same priority tier, then HAQM DocumentDB will promote the replica that is the same size as the primary instance.

Can I modify priority tiers for instances after they have been created?

You can modify the priority tier for an instance at any time. Simply modifying priority tiers will not trigger a failover .

Can I prevent certain replicas from being promoted to the primary instance?

You can assign lower priority tiers to replicas that you do not want promoted to the primary instance. However, if the higher priority replicas on the cluster are unhealthy or unavailable for some reason, then HAQM DocumentDB will promote the lower priority replica.

How does HAQM DocumentDB assure high availability of my cluster?

HAQM DocumentDB can be deployed in a high-availability configuration by using replica instances in multiple AWS Availability Zones as failover targets. In the event of a primary instance failure, a replica instance is automatically promoted to be the new primary with minimal service interruption.

How can I improve upon the availability of a single HAQM DocumentDB instance?

You can add additional HAQM DocumentDB replicas. HAQM DocumentDB replicas share the same underlying storage as the primary instance. Any HAQM DocumentDB replica can be promoted to become primary without any data loss and therefore can be used for enhancing fault tolerance in the event of a primary instance failure. To increase cluster availability, simply create one to 15 replicas, in multiple AZs, and HAQM DocumentDB will automatically include them in failover primary selection in the event of an instance outage.

What happens during failover and how long does it take?

Failover is automatically handled by HAQM DocumentDB so that your applications can resume database operations as quickly as possible without manual administrative intervention.

If you have an HAQM DocumentDB replica instance in the same or a different Availability Zone, when failing over, HAQM DocumentDB flips the canonical name record (CNAME) for your instance to point at the healthy replica, which is in turn promoted to become the new primary. Start-to-finish, failover typically completes within 30 seconds.
If you do not have an HAQM DocumentDB replica instance (i.e. a single instance cluster), HAQM DocumentDB will attempt to create a new instance in the same Availability Zone as the original instance. This replacement of the original instance is done on a best-effort basis and may not succeed, for example, if there is an issue that is broadly affecting the Availability Zone.

Your application should retry database connections in the event of connection loss.

Do I have a primary instance and an HAQM DocumentDB replica instance actively taking read traffic and a failover occurs, what happens?

HAQM DocumentDB will automatically detect a problem with your primary instance and begin routing your read/write traffic to an HAQM DocumentDB replica instance. On average, this failover will complete within 30 seconds. In addition, the read traffic that your HAQM DocumentDB replicas instances were serving will be briefly interrupted.

How far behind the primary will my replicas be?

Since HAQM DocumentDB replicas share the same data volume as the primary instance, there is virtually no replication lag. We typically observe lag times in the 10s of milliseconds.

Security and compliance

Open all

Can I use HAQM DocumentDB in HAQM Virtual Private Cloud (HAQM VPC)?

Yes. All HAQM DocumentDB clusters must be created in a VPC . With HAQM VPC, you can define a virtual network topology that closely resembles a traditional network that you might operate in your own datacenter. This gives you complete control over who can access your HAQM DocumentDB clusters.

Does HAQM DocumentDB support role-based access control (RBAC)?

HAQM DocumentDB supports RBAC with built-in roles. RBAC enables you to enforce least privilege as a best practice by restricting the actions that users are authorized to perform. For more information, see HAQM DocumentDB role-based access control .

How do the existing MongoDB authentication modes work with HAQM DocumentDB?

HAQM DocumentDB utilizes VPC’s strict network and authorization boundary. Authentication and authorization for HAQM DocumentDB management APIs is provided by IAM users , roles, and policies. Authentication to an HAQM DocumentDB database are done via standard MongoDB tools and drivers with Salted Challenge Response Authentication Mechanism (SCRAM), the default authentication mechanism for MongoDB.

Does HAQM DocumentDB support encrypting my data-at-rest?

Yes. HAQM DocumentDB allows you to encrypt your clusters using keys you manage through AWS Key Management Service (KMS) . On a cluster running with HAQM DocumentDB encryption, data stored at rest in the underlying storage is encrypted, as are its automated backups, snapshots, and replicas in the same cluster. Encryption and decryption are handled seamlessly. For more information about the use of KMS with HAQM DocumentDB, see the Encrypting HAQM DocumentDB Data at Rest .

Can I encrypt an existing unencrypted cluster?

Currently, encrypting an existing unencrypted HAQM DocumentDB cluster is not supported. To use HAQM DocumentDB encryption for an existing unencrypted cluster, create a new cluster with encryption enabled and migrate your data into it.

What compliance certifications does HAQM DocumentDB meet?

HAQM DocumentDB was designed to meet the highest security standards and to make it easy for you to verify our security and meet your own regulatory and compliance obligations. HAQM DocumentDB has been assessed to comply with PCI DSS , ISO 9001 , 27001 , 27017 , and 27018 , SOC 1, 2 and 3 , and Health Information Trust Alliance (HITRUST) Common Security Framework (CSF) certification , in addition to being HIPAA eligible . AWS compliance reports are available for download in AWS Artifact .

Major version upgrade

Open all

What is in-place major version upgrade?

In-place major version upgrade (MVU) lets you upgrade HAQM DocumentDB 3.6 or 4.0 clusters to HAQM DocumentDB 5.0 using the AWS Console, Software Development Kit (SDK), or Command Line Interface (CLI). With in-place MVU, there is no need to create new clusters or change your end points. In-place MVU is available in all regions where HAQM DocumentDB 5.0 is available. To get started with in-place MVU, please review in-place MVU documentation.

Why should I use in-place MVU?

In-place MVU lets you seamlessly upgrade your HAQM DocumentDB 3.6 or 4.0 clusters to version 5.0 without the need to perform backup and restore to another cluster and without using other data migration tools. In doing so, it reduces the time and effort associated with usual upgrade process which entail configuring the source and target end points, migrating indexes and data, changing application code, and more.

You won't need to change your endpoint in your applications post upgrade. Since the data stays in the same cluster, there is no additional cost to upgrade using feature.

What is the downtime when upgrading with in-place MVU?

Downtime can vary from cluster to cluster depending on number of collections, indexes, databases, and instances. Before running in-place major version upgrade on your production cluster, we strongly recommend running it in a lower environment to test downtime, performance, and also verify that your applications work as expected post upgrade.

You can also utilize HAQM DocumentDB’s fast clone feature to clone your cluster data for testing. Depending on the complexity of your HAQM DocumentDB implementation, you can engage our database solutions architect for additional help.

What engine versions does in-place MVU support today?

In-place MVU is only supported with HAQM DocumentDB 3.6 or 4.0 as a source and version 5.0 as target. It is not supported for HAQM DocumentDB Global Clusters or Elastic Clusters or with DocumentDB 4.0 as target.

Machine learning

Open all

How can I use my data in HAQM DocumentDB to build machine learning models?

HAQM DocumentDB integrates with HAQM SageMaker Canvas, making it easy to build machine learning (ML) models and customize foundation models using data stored in HAQM DocumentDB without writing a single line of code. You no longer need to develop custom data and ML pipelines between HAQM DocumentDB and SageMaker Canvas. You can launch SageMaker Canvas from within the HAQM DocumentDB console and add existing HAQM DocumentDB databases as a data source to start building your machine learning models. You can use your data in DocumentDB in SageMaker Canvas to build models to predict customer churn, detect fraud, predict maintenance failures, forecast financial metrics and sales, optimize inventory, summarize content, and generate content.

What is the cost associated with using HAQM DocumentDB as a data source in HAQM SageMaker Canvas to build machine learning models?

HAQM SageMaker Canvas offers a no-code interface to build machine learning models using data from various data sources including HAQM DocumentDB. You are charged for your use of SageMaker Canvas and for the resulting I/Os when SageMaker Canvas reads data from your HAQM DocumentDB instance. There is no additional charge to use DocumentDB as a data source in HAQM SageMaker Canvas. Visit the HAQM DocumentDB pricing page and SageMaker Canvas pricing page to learn more.

Generative AI and machine learning

Open all

What is vector search?

Vector search is a method used in machine learning (ML) to find similar data points to a given data point by comparing their vector representations using distance or similarity metrics. The closer the two vectors are in the vector space, the more similar the underlying items are considered to be. This technique helps capture the meaning or semantics of the data. This approach is useful in various applications, such as recommendation systems, natural language processing, and image recognition.

Why should I use vector search for HAQM DocumentDB?

Vector search for HAQM DocumentDB combines the flexibility and rich querying capability of a JSON-based document database with the power of vector search. You can use your existing HAQM DocumentDB data, or a flexible document data structure, to build machine learning and generative AI use cases such as semantic search experiences, product recommendations, personalization, chatbots, fraud detection, and anomaly detection. Visit the vector search for HAQM DocumentDB documentation to learn more.

Which versions of HAQM DocumentDB support vector search?

Vector search for HAQM DocumentDB is available on HAQM DocumentDB 5.0 instance-based clusters.

How does implementation of semantic search differ from keyword search with HAQM DocumentDB?

Vector search for HAQM DocumentDB enables the use of semantic search so you can capture the meaning, context, and intent behind your data. Keyword search finds the document based on the actual text or pre-defined synonym mappings. For example, in a traditional e-commerce application, a red dress might return products that have the words “red” and “dress” in their descriptions. Semantic search will retrieve results with dresses in different shades of red which can improve the user experience.

What is the cost associated with using vector search for HAQM DocumentDB?

There is no additional cost to use vector search for HAQM DocumentDB. Standard compute, I/O, storage, and backup charges will apply as you store, index, and search vectors in HAQM DocumentDB. Visit the HAQM DocumentDB pricing page to learn more.

Why should I use no-code machine learning with HAQM DocumentDB and HAQM SageMaker Canvas?

HAQM DocumentDB integrates with HAQM SageMaker Canvas making it easy to build generative artificial intelligence (AI) and machine learning (ML) applications using data stored in HAQM DocumentDB. You no longer need to develop custom data and ML pipelines between HAQM DocumentDB and SageMaker Canvas. The in-console integration removes the undifferentiated heavy lifting to connect and access data to accelerate ML development with a low code no code (LCNC) experience. You can launch SageMaker Canvas from within the HAQM DocumentDB console and add existing HAQM DocumentDB databases as a data source.

Zero-ETL integration

Open all

Why should I use the zero-ETL integration of HAQM DocumentDB with HAQM OpenSearch Service?

This zero-ETL integration with HAQM OpenSearch Service abstracts away the operational complexity in extracting, transforming, loading of data from an HAQM DocumentDB collection to HAQM OpenSearch managed cluster or serverless collection. With this integration, you no longer have to build or manage data pipelines nor transform data.

When I want to perform vector search for my generative AI use case, when should I use HAQM DocumentDB's native vector search capabilities versus zero-ETL integration with HAQM Opensearch service?

If you want to use MongoDB APIs, you should use the native database capabilities in HAQM DocumentDB to perform vector search on your documents. The HAQM DocumentDB zero-ETL integration with HAQM OpenSearch Service is well suited for searching across collections and for storing and indexing vectors with more than 2,000 dimensions.

How does this zero-ETL integration replicate data from HAQM DocumentDB to HAQM OpenSearch Service?

The zero-ETL integration of HAQM DocumentDB with HAQM OpenSearch Service uses HAQM OpenSearch Ingestion to seamlessly move operational data from HAQM DocumentDB to HAQM OpenSearch Service. To get started, you enable change stream functionality on the HAQM DocumentDB collection that needs to be replicated. The zero-ETL integration feature sets up an HAQM OpenSearch Ingestion pipeline in the your account that automatically replicates the data to an HAQM OpenSearch Service managed cluster or serverless collection.

HAQM OpenSearch Ingestion automatically understands the format of the data in HAQM DocumentDB collections and maps the data to HAQM OpenSearch Service to yield the most performant search results. You can synchronize data from multiple HAQM DocumentDB collections via multiple pipelines into one HAQM OpenSearch managed cluster or serverless collection to offer holistic insights across several applications. Optionally, you can specify custom data processors when defining the ingestion configuration in HAQM OpenSearch Service. Subsequent updates to the DocumentDB collections are also replicated to HAQM OpenSearch Service without any manual intervention.

How does data transformation work while moving data from HAQM DocumentDB to HAQM OpenSearch Service?

This zero-ETL leverages the native data transformational capabilities of HAQM OpenSearch Ingestion pipelines to aggregate and filter the data while it is in motion.

What options do I have if I don’t want to use the transformation logic provided by HAQM OpenSearch Ingestion?

You can also write custom transformation logic if you want bespoke transformational capability, and HAQM OpenSearch Ingestion will manage the transformation process. Alternatively, if want to move entire data from source to sink without customization, HAQM OpenSearch Ingestion provides out-of-the box blueprints so that youcan perform the integrations with just a few button clicks.

What security permissions are required for using the zero-ETL integration for HAQM OpenSearch?

In order to ensure that HAQM OpenSearch Ingestion has the necessary permissions to replicate data from HAQM DocumentDB, the zero-ETL integration feature creates an IAM role with the necessary permissions to read data from HAQM DocumentDB collection and write to an HAQM OpenSearch domain or collection. This role is then assumed by HAQM OpenSearch Ingestion pipelines to ensure that the right security posture is always maintained when moving the data from source to destination.

How can I monitor the state of my integration between HAQM DocumentDB and HAQM OpenSearch Service?

You can view all the metrics related to your zero-ETL integration with HAQM DocumentDB on the console dashboards provided by HAQM DocumentDB and OpenSearch Ingestion pipeline. You can also query real-time logs in HAQM CloudWatch and set up custom alerting using HAQM CloudWatch that are triggered when user-defined thresholds are breached.

Get started

Pricing

Learn more about pricing

Learn more

Console

HAQM DocumentDB (with MongoDB compatibility) FAQs

Page topics

General

What is HAQM DocumentDB (with MongoDB compatibility)?

What use cases are well-suited for a document database like HAQM DocumentDB?

What does "MongoDB-compatible" mean?

Is HAQM DocumentDB restricted by the MongoDB SSPL license?

How can I migrate data from an existing MongoDB database to HAQM DocumentDB?

Do I need to change client drivers to use HAQM DocumentDB?

Does HAQM DocumentDB support ACID transactions?

Is HAQM DocumentDB subject to MongoDB's end of life (EOL) schedule?

How do I access my HAQM DocumentDB cluster?

Why are HAQM RDS permissions and resources required to use HAQM DocumentDB?

What instances types does HAQM DocumentDB offer?

How do I try HAQM DocumentDB?

Does HAQM DocumentDB have an SLA?

Performance

What type of performance can I expect from HAQM DocumentDB?

Pricing

How much does HAQM DocumentDB cost and in which AWS regions is HAQM DocumentDB available?

Does HAQM DocumentDB have a free tier and can you get started for free?

Why should I use HAQM DocumentDB I/O-Optimized?

Can I switch back and forth between the I/O-Optimized and standard storage configurations?

With HAQM DocumentDB I/O-Optimized, do I continue paying for the I/Os required for replicating data across regions with Global Clusters?

Elastic Clusters

What is HAQM DocumentDB Elastic Clusters?

How do I get started with Elastic Clusters?

How does Elastic Clusters work?

What types of sharding does Elastic Clusters support?

How is Elastic Clusters different from MongoDB sharding?

Do I need to make any changes to my application to use Elastic Clusters?

Can I convert my existing HAQM DocumentDB cluster to an Elastic Clusters cluster?

How do I define a shard key?

What are the concepts associated with Elastic Clusters?

How does Elastic Clusters relate to other AWS services?

Can I migrate my existing MongoDB sharded workloads to Elastic Clusters?

Hardware, scaling, and storage

What are the minimum and maximum storage limits of an HAQM DocumentDB cluster?

How does HAQM DocumentDB scale?

How do I scale the compute resources associated with my HAQM DocumentDB cluster?

Backup and restore

How do I enable backups for my cluster?

Can I take cluster snapshots and keep them around as long as I want?

If my instance fails, what is my recovery path?

What happens to my automated backups and cluster snapshots if I delete my cluster?

What happens to my automated backups and cluster snapshots if I delete my account?

Can I share my snapshots with another AWS account?

Will I be billed for shared snapshots?

Can I automatically share snapshots?

Can I share my HAQM DocumentDB snapshots across different regions?

Can I share an encrypted HAQM DocumentDB snapshot?

Can I use HAQM DocumentDB snapshots outside of the service?

What happens to my backups if I delete my cluster?

High availability and replication

How does HAQM DocumentDB improve my cluster’s fault tolerance to disk failures?

How does HAQM DocumentDB improve recovery time after a database crash?

What kind of replicas does HAQM DocumentDB support?

Can I have cross-region replicas with HAQM DocumentDB?

Can I prioritize certain replicas as failover targets over others?

Can I modify priority tiers for instances after they have been created?

Can I prevent certain replicas from being promoted to the primary instance?

How does HAQM DocumentDB assure high availability of my cluster?

How can I improve upon the availability of a single HAQM DocumentDB instance?

What happens during failover and how long does it take?

Do I have a primary instance and an HAQM DocumentDB replica instance actively taking read traffic and a failover occurs, what happens?

How far behind the primary will my replicas be?

Security and compliance

Can I use HAQM DocumentDB in HAQM Virtual Private Cloud (HAQM VPC)?

Does HAQM DocumentDB support role-based access control (RBAC)?

How do the existing MongoDB authentication modes work with HAQM DocumentDB?

Does HAQM DocumentDB support encrypting my data-at-rest?

Can I encrypt an existing unencrypted cluster?

What compliance certifications does HAQM DocumentDB meet?

Major version upgrade

What is in-place major version upgrade?

Why should I use in-place MVU?

What is the downtime when upgrading with in-place MVU?

What engine versions does in-place MVU support today?

Machine learning

How can I use my data in HAQM DocumentDB to build machine learning models?