Navigation

Read Isolation, Consistency, and Recency

Isolation Guarantees

Read Uncommitted

Depending on the read concern, clients can see the results of writes before the writes are durable:

  • Regardless of a write’s write concern, other clients using "local" or "available" read concern can see the result of a write operation before the write operation is acknowledged to the issuing client.
  • Clients using "local" or "available" read concern can read data which may be subsequently rolled back during replica set failovers.

For operations in a multi-document transaction, when a transaction commits, all data changes made in the transaction are saved and visible outside the transaction. That is, a transaction will not commit some of its changes while rolling back others.

Until a transaction commits, the data changes made in the transaction are not visible outside the transaction.

However, when a transaction writes to multiple shards, not all outside read operations need to wait for the result of the committed transaction to be visible across the shards. For example, if a transaction is committed and write 1 is visible on shard A but write 2 is not yet visible on shard B, an outside read at read concern "local" can read the results of write 1 without seeing write 2.

Read uncommitted is the default isolation level and applies to mongod standalone instances as well as to replica sets and sharded clusters.

Read Uncommitted And Single Document Atomicity

Write operations are atomic with respect to a single document; i.e. if a write is updating multiple fields in the document, a read operation will never see the document with only some of the fields updated. However, although a client may not see a partially updated document, read uncommitted means that concurrent read operations may still see the updated document before the changes are made durable.

With a standalone mongod instance, a set of read and write operations to a single document is serializable. With a replica set, a set of read and write operations to a single document is serializable only in the absence of a rollback.

Read Uncommitted And Multiple Document Write

When a single write operation (e.g. db.collection.updateMany()) modifies multiple documents, the modification of each document is atomic, but the operation as a whole is not atomic.

When performing multi-document write operations, whether through a single write operation or multiple write operations, other operations may interleave.

For situations that require atomicity of reads and writes to multiple documents (in a single or multiple collections), MongoDB supports multi-document transactions:

  • In version 4.0, MongoDB supports multi-document transactions on replica sets.
  • In version 4.2, MongoDB introduces distributed transactions, which adds support for multi-document transactions on sharded clusters and incorporates the existing support for multi-document transactions on replica sets.

For details regarding transactions in MongoDB, see the Transactions page.

Important

In most cases, multi-document transaction incurs a greater performance cost over single document writes, and the availability of multi-document transactions should not be a replacement for effective schema design. For many scenarios, the denormalized data model (embedded documents and arrays) will continue to be optimal for your data and use cases. That is, for many scenarios, modeling your data appropriately will minimize the need for multi-document transactions.

For additional transactions usage considerations (such as runtime limit and oplog size limit), see also Production Considerations.

Without isolating the multi-document write operations, MongoDB exhibits the following behavior:

  1. Non-point-in-time read operations. Suppose a read operation begins at time t1 and starts reading documents. A write operation then commits an update to one of the documents at some later time t2. The reader may see the updated version of the document, and therefore does not see a point-in-time snapshot of the data.
  2. Non-serializable operations. Suppose a read operation reads a document d1 at time t1 and a write operation updates d1 at some later time t3. This introduces a read-write dependency such that, if the operations were to be serialized, the read operation must precede the write operation. But also suppose that the write operation updates document d2 at time t2 and the read operation subsequently reads d2 at some later time t4. This introduces a write-read dependency which would instead require the read operation to come after the write operation in a serializable schedule. There is a dependency cycle which makes serializability impossible.
  3. Reads may miss matching documents that are updated during the course of the read operation.

Cursor Snapshot

MongoDB cursors can return the same document more than once in some situations. As a cursor returns documents, other operations may interleave with the query. If one of these operations changes the indexed field on the index used by the query, then the cursor could return the same document more than once.

If your collection has a field or fields that are never modified, you can use a unique index on this field or these fields so that the query will return each document no more than once. Query with hint() to explicitly force the query to use that index.

Monotonic Writes

MongoDB provides monotonic write guarantees, by default, for standalone mongod instances and replica set.

For monotonic writes and sharded clusters, see Causal Consistency.

Real Time Order

New in version 3.4.

For read and write operations on the primary, issuing read operations with "linearizable" read concern and write operations with "majority" write concern enables multiple threads to perform reads and writes on a single document as if a single thread performed these operations in real time; that is, the corresponding schedule for these reads and writes is considered linearizable.

Causal Consistency

New in version 3.6.

If an operation logically depends on a preceding operation, there is a causal relationship between the operations. For example, a write operation that deletes all documents based on a specified condition and a subsequent read operation that verifies the delete operation have a causal relationship.

With causally consistent sessions, MongoDB executes causal operations in an order that respect their causal relationships, and clients observe results that are consistent with the causal relationships.

Client Sessions and Causal Consistency Guarantees

To provide causal consistency, MongoDB 3.6 enables causal consistency in client sessions. A causally consistent session denotes that the associated sequence of read operations with "majority" read concern and write operations with "majority" write concern have a causal relationship that is reflected by their ordering. Applications must ensure that only one thread at a time executes these operations in a client session.

For causally related operations:

  1. A client starts a client session.

    Important

    Client sessions only guarantee causal consistency for:

    • Read operations with "majority"; i.e. the return data has been acknowledged by a majority of the replica set members and is durable.
    • Write operations with "majority" write concern; i.e. the write operations that request acknowledgement that the operation has been applied to a majority of the replica set’s voting members.

    For more information on causal consistency and various read and write concerns, see Causal Consistency and Read and Write Concerns.

  2. As the client issues a sequence of read with "majority" read concern and write operations (with "majority" write concern), the client includes the session information with each operation.

  3. For each read operation with "majority" read concern and write operation with "majority" write concern associated with the session, MongoDB returns the operation time and the cluster time, even if the operation errors. The client session keeps track of the operation time and the cluster time.

    Note

    MongoDB does not return the operation time and the cluster time for unacknowledged (w: 0) write operations. Unacknowledged writes do not imply any causal relationship.

    Although, MongoDB returns the operation time and the cluster time for read operations and acknowledged write operations in a client session, only the read operations with "majority" read concern and write operations with "majority" write concern can guarantee causal consistency. For details, see Causal Consistency and Read and Write Concerns.

  4. The associated client session tracks these two time fields.

    Note

    Operations can be causally consistent across different sessions. MongoDB drivers and the mongo shell provide the methods to advance the operation time and the cluster time for a client session. So, a client can advance the cluster time and the operation time of one client session to be consistent with the operations of another client session.

Causal Consistency Guarantees

The following table lists the causal consistency guarantees provided by causally consistent sessions for read operations with "majority" read concern and write operations with "majority" write concern.

Guarantees Description
Read your writes Read operations reflect the results of write operations that precede them.
Monotonic reads

Read operations do not return results that correspond to an earlier state of the data than a preceding read operation.

For example, if in a session:

  • write1 precedes write2,
  • read1 precedes read2, and
  • read1 returns results that reflect write2

then read2 cannot return results of write1.

Monotonic writes

Write operations that must precede other writes are executed before those other writes.

For example, if write1 must precede write2 in a session, the state of the data at the time of write2 must reflect the state of the data post write1. Other writes can interleave between write1 and write write2, but write2 cannot occur before write1.

Writes follow reads Write operations that must occur after read operations are executed after those read operations. That is, the state of the data at the time of the write must incorporate the state of the data of the preceding read operations.

Read Preference

These guarantees hold across all members of the MongoDB deployment. For example, if, in a causally consistent session, you issue a write with "majority" write concern followed by a read that reads from a secondary (i.e. read preference secondary) with "majority" read concern, the read operation will reflect the state of the database after the write operation.

Isolation

Operations within a causally consistent session are not isolated from operations outside the session. If a concurrent write operation interleaves between the session’s write and read operations, the session’s read operation may return results that reflect a write operation that occurred after the session’s write operation.

MongoDB Drivers

Tip

Applications must ensure that only one thread at a time executes these operations in a client session.

Clients require MongoDB drivers updated for MongoDB 3.6 or later:

Java 3.6+

Python 3.6+

C 1.9+

C# 2.5+

Node 3.0+

Ruby 2.5+

Perl 2.0+

PHPC 1.4+

Scala 2.2+

Examples

Important

Causally consistent sessions can only guarantee causal consistency for reads with "majority" read concern and writes with "majority" write concern.

Consider a collection items that maintains the current and historical data for various items. Only the historical data has a non-null end date. If the sku value for an item changes, the document with the old sku value needs to be updated with the end date, after which the new document is inserted with the current sku value. The client can use a causally consistent session to ensure that the update occurs before the insert.

    with client.start_session(causal_consistency=True) as s1:
        current_date = datetime.datetime.today()
        items = client.get_database(
            'test', read_concern=ReadConcern('majority'),
            write_concern=WriteConcern('majority', wtimeout=1000)).items
        items.update_one(
            {'sku': "111", 'end': None},
            {'$set': {'end': current_date}}, session=s1)
        items.insert_one(
            {'sku': "nuts-111", 'name': "Pecans",
             'start': current_date}, session=s1)
    
    // Example 1: Use a causally consistent session to ensure that the update occurs before the insert.
    ClientSession session1 = client.startSession(ClientSessionOptions.builder().causallyConsistent(true).build());
    Date currentDate = new Date();
    MongoCollection<Document> items = client.getDatabase("test")
            .withReadConcern(ReadConcern.MAJORITY)
            .withWriteConcern(WriteConcern.MAJORITY.withWTimeout(1000, TimeUnit.MILLISECONDS))
            .getCollection("test");
    
    items.updateOne(session1, eq("sku", "111"), set("end", currentDate));
    
    Document document = new Document("sku", "nuts-111")
            .append("name", "Pecans")
            .append("start", currentDate);
    items.insertOne(session1, document);
    
    $items = $client->selectDatabase(
        'test',
        [
            'readConcern' => new \MongoDB\Driver\ReadConcern(\MongoDB\Driver\ReadConcern::MAJORITY),
            'writeConcern' => new \MongoDB\Driver\WriteConcern(\MongoDB\Driver\WriteConcern::MAJORITY, 1000),
        ]
    )->items;
    
    $s1 = $client->startSession(
        [ 'causalConsistency' => true ]
    );
    
    $currentDate = new \MongoDB\BSON\UTCDateTime();
    
    $items->updateOne(
        [ 'sku' => '111', 'end' => [ '$exists' => false ] ],
        [ '$set' => [ 'end' => $currentDate ] ],
        [ 'session' => $s1 ]
    );
    $items->insertOne(
        [ 'sku' => '111-nuts', 'name' => 'Pecans', 'start' => $currentDate ],
        [ 'session' => $s1 ]
    );
    
      async with await client.start_session(causal_consistency=True) as s1:
          current_date = datetime.datetime.today()
          items = client.get_database(
              'test', read_concern=ReadConcern('majority'),
              write_concern=WriteConcern('majority', wtimeout=1000)).items
          await items.update_one(
              {'sku': "111", 'end': None},
              {'$set': {'end': current_date}}, session=s1)
          await items.insert_one(
              {'sku': "nuts-111", 'name': "Pecans",
               'start': current_date}, session=s1)
    
    
     /* Use a causally-consistent session to run some operations. */
    
     wc = mongoc_write_concern_new ();
     mongoc_write_concern_set_wmajority (wc, 1000);
     mongoc_collection_set_write_concern (coll, wc);
    
     rc = mongoc_read_concern_new ();
     mongoc_read_concern_set_level (rc, MONGOC_READ_CONCERN_LEVEL_MAJORITY);
     mongoc_collection_set_read_concern (coll, rc);
    
     session_opts = mongoc_session_opts_new ();
     mongoc_session_opts_set_causal_consistency (session_opts, true);
    
     session1 = mongoc_client_start_session (client, session_opts, &error);
     if (!session1) {
        fprintf (stderr, "couldn't start session: %s\n", error.message);
        goto cleanup;
     }
    
     /* Run an update_one with our causally-consistent session. */
     update_opts = bson_new ();
     res = mongoc_client_session_append (session1, update_opts, &error);
     if (!res) {
        fprintf (stderr, "couldn't add session to opts: %s\n", error.message);
        goto cleanup;
     }
    
     query = BCON_NEW ("sku", "111");
     update = BCON_NEW ("$set", "{", "end",
          BCON_DATE_TIME (bson_get_monotonic_time ()), "}");
     res = mongoc_collection_update_one (coll,
    		       query,
    		       update,
    		       update_opts,
    		       NULL, /* reply */
    		       &error);
    
     if (!res) {
        fprintf (stderr, "update failed: %s\n", error.message);
        goto cleanup;
     }
    
     /* Run an insert with our causally-consistent session */
     insert_opts = bson_new ();
     res = mongoc_client_session_append (session1, insert_opts, &error);
     if (!res) {
        fprintf (stderr, "couldn't add session to opts: %s\n", error.message);
        goto cleanup;
     }
    
     insert = BCON_NEW ("sku", "nuts-111", "name", "Pecans",
          "start", BCON_DATE_TIME (bson_get_monotonic_time ()));
     res = mongoc_collection_insert_one (coll, insert, insert_opts, NULL, &error);
     if (!res) {
        fprintf (stderr, "insert failed: %s\n", error.message);
        goto cleanup;
     }
    
    
    using (var session1 = client.StartSession(new ClientSessionOptions { CausalConsistency = true }))
    {
        var currentDate = DateTime.UtcNow.Date;
        var items = client.GetDatabase(
            "test",
            new MongoDatabaseSettings
            {
                ReadConcern = ReadConcern.Majority,
                WriteConcern = new WriteConcern(
                        WriteConcern.WMode.Majority,
                        TimeSpan.FromMilliseconds(1000))
            })
            .GetCollection<BsonDocument>("items");
    
        items.UpdateOne(session1,
            Builders<BsonDocument>.Filter.And(
                Builders<BsonDocument>.Filter.Eq("sku", "111"),
                Builders<BsonDocument>.Filter.Eq("end", BsonNull.Value)),
            Builders<BsonDocument>.Update.Set("end", currentDate));
    
        items.InsertOne(session1, new BsonDocument
        {
            {"sku", "nuts-111"},
            {"name", "Pecans"},
            {"start", currentDate}
        });
    }
    
    my $s1 = $conn->start_session({ causalConsistency => 1 });
    $items = $conn->get_database(
        "test", {
            read_concern => { level => 'majority' },
            write_concern => { w => 'majority', wtimeout => 10000 },
        }
    )->get_collection("items");
    $items->update_one(
        {
            sku => 111,
            end  => undef
        },
        {
            '$set' => { end => $current_date}
        },
        {
            session => $s1
        }
    );
    $items->insert_one(
        {
            sku => "nuts-111",
            name  => "Pecans",
            start => $current_date
        },
        {
            session => $s1
        }
    );
    
    let s1 = client1.startSession(options: ClientSessionOptions(causalConsistency: true))
    let currentDate = Date()
    var dbOptions = MongoDatabaseOptions(
        readConcern: .majority,
        writeConcern: try .majority(wtimeoutMS: 1000)
    )
    let items = client1.db("test", options: dbOptions).collection("items")
    try items.updateOne(
        filter: ["sku": "111", "end": .null],
        update: ["$set": ["end": .datetime(currentDate)]],
        session: s1
    )
    try items.insertOne(["sku": "nuts-111", "name": "Pecans", "start": .datetime(currentDate)], session: s1)
    
    let s1 = client1.startSession(options: ClientSessionOptions(causalConsistency: true))
    let currentDate = Date()
    var dbOptions = MongoDatabaseOptions(
        readConcern: .majority,
        writeConcern: try .majority(wtimeoutMS: 1000)
    )
    let items = client1.db("test", options: dbOptions).collection("items")
    let result1 = items.updateOne(
        filter: ["sku": "111", "end": .null],
        update: ["$set": ["end": .datetime(currentDate)]],
        session: s1
    ).flatMap { _ in
        items.insertOne(["sku": "nuts-111", "name": "Pecans", "start": .datetime(currentDate)], session: s1)
    }
    

    If another client needs to read all current sku values, you can advance the cluster time and the operation time to that of the other session to ensure that this client is causally consistent with the other session and read after the two writes:

    with client.start_session(causal_consistency=True) as s2:
        s2.advance_cluster_time(s1.cluster_time)
        s2.advance_operation_time(s1.operation_time)
    
        items = client.get_database(
            'test', read_preference=ReadPreference.SECONDARY,
            read_concern=ReadConcern('majority'),
            write_concern=WriteConcern('majority', wtimeout=1000)).items
        for item in items.find({'end': None}, session=s2):
            print(item)
    
    // Example 2: Advance the cluster time and the operation time to that of the other session to ensure that
    // this client is causally consistent with the other session and read after the two writes.
    ClientSession session2 = client.startSession(ClientSessionOptions.builder().causallyConsistent(true).build());
    session2.advanceClusterTime(session1.getClusterTime());
    session2.advanceOperationTime(session1.getOperationTime());
    
    items = client.getDatabase("test")
            .withReadPreference(ReadPreference.secondary())
            .withReadConcern(ReadConcern.MAJORITY)
            .withWriteConcern(WriteConcern.MAJORITY.withWTimeout(1000, TimeUnit.MILLISECONDS))
            .getCollection("items");
    
    for (Document item: items.find(session2, eq("end", BsonNull.VALUE))) {
        System.out.println(item);
    }
    
    $s2 = $client->startSession(
        [ 'causalConsistency' => true ]
    );
    $s2->advanceClusterTime($s1->getClusterTime());
    $s2->advanceOperationTime($s1->getOperationTime());
    
    $items = $client->selectDatabase(
        'test',
        [
            'readPreference' => new \MongoDB\Driver\ReadPreference(\MongoDB\Driver\ReadPreference::RP_SECONDARY),
            'readConcern' => new \MongoDB\Driver\ReadConcern(\MongoDB\Driver\ReadConcern::MAJORITY),
            'writeConcern' => new \MongoDB\Driver\WriteConcern(\MongoDB\Driver\WriteConcern::MAJORITY, 1000),
        ]
    )->items;
    
    $result = $items->find(
        [ 'end' => [ '$exists' => false ] ],
        [ 'session' => $s2 ]
    );
    foreach ($result as $item) {
        var_dump($item);
    }
    
      async with await client.start_session(causal_consistency=True) as s2:
          s2.advance_cluster_time(s1.cluster_time)
          s2.advance_operation_time(s1.operation_time)
    
          items = client.get_database(
              'test', read_preference=ReadPreference.SECONDARY,
              read_concern=ReadConcern('majority'),
              write_concern=WriteConcern('majority', wtimeout=1000)).items
          async for item in items.find({'end': None}, session=s2):
              print(item)
    
    
     /* Make a new session, session2, and make it causally-consistent
      * with session1, so that session2 will read session1's writes. */
     session2 = mongoc_client_start_session (client, session_opts, &error);
     if (!session2) {
        fprintf (stderr, "couldn't start session: %s\n", error.message);
        goto cleanup;
     }
    
     /* Set the cluster time for session2 to session1's cluster time */
     cluster_time = mongoc_client_session_get_cluster_time (session1);
     mongoc_client_session_advance_cluster_time (session2, cluster_time);
    
     /* Set the operation time for session2 to session2's operation time */
     mongoc_client_session_get_operation_time (session1, &timestamp, &increment);
     mongoc_client_session_advance_operation_time (session2,
    				 timestamp,
    				 increment);
    
     /* Run a find on session2, which should now find all writes done
      * inside of session1 */
     find_opts = bson_new ();
     res = mongoc_client_session_append (session2, find_opts, &error);
     if (!res) {
        fprintf (stderr, "couldn't add session to opts: %s\n", error.message);
        goto cleanup;
     }
    
     find_query = BCON_NEW ("end", BCON_NULL);
     read_prefs = mongoc_read_prefs_new (MONGOC_READ_SECONDARY);
     cursor = mongoc_collection_find_with_opts (coll,
    			      query,
    			      find_opts,
    			      read_prefs);
    
     while (mongoc_cursor_next (cursor, &result)) {
        json = bson_as_json (result, NULL);
        fprintf (stdout, "Document: %s\n", json);
        bson_free (json);
     }
    
     if (mongoc_cursor_error (cursor, &error)) {
        fprintf (stderr, "cursor failure: %s\n", error.message);
        goto cleanup;
     }
    
    
    using (var session2 = client.StartSession(new ClientSessionOptions { CausalConsistency = true }))
    {
        session2.AdvanceClusterTime(session1.ClusterTime);
        session2.AdvanceOperationTime(session1.OperationTime);
    
        var items = client.GetDatabase(
            "test",
            new MongoDatabaseSettings
            {
                ReadPreference = ReadPreference.Secondary,
                ReadConcern = ReadConcern.Majority,
                WriteConcern = new WriteConcern(WriteConcern.WMode.Majority, TimeSpan.FromMilliseconds(1000))
            })
            .GetCollection<BsonDocument>("items");
    
        var filter = Builders<BsonDocument>.Filter.Eq("end", BsonNull.Value);
        foreach (var item in items.Find(session2, filter).ToEnumerable())
        {
            // process item
        }
    }
    
    my $s2 = $conn->start_session({ causalConsistency => 1 });
    $s2->advance_cluster_time( $s1->cluster_time );
    $s2->advance_operation_time( $s1->operation_time );
    
    $items = $conn->get_database(
        "test", {
            read_preference => 'secondary',
            read_concern => { level => 'majority' },
            write_concern => { w => 'majority', wtimeout => 10000 },
        }
    )->get_collection("items");
    $cursor = $items->find( { end => undef }, { session => $s2 } );
    
    for my $item ( $cursor->all ) {
        say join(" ", %$item);
    }
    
    try client2.withSession(options: ClientSessionOptions(causalConsistency: true)) { s2 in
        // The cluster and operation times are guaranteed to be non-nil since we already used s1 for operations above.
        s2.advanceClusterTime(to: s1.clusterTime!)
        s2.advanceOperationTime(to: s1.operationTime!)
    
        dbOptions.readPreference = .secondary
        let items2 = client2.db("test", options: dbOptions).collection("items")
        for item in try items2.find(["end": .null], session: s2) {
            print(item)
        }
    }
    
    let options = ClientSessionOptions(causalConsistency: true)
    let result2: EventLoopFuture<Void> = client2.withSession(options: options) { s2 in
        // The cluster and operation times are guaranteed to be non-nil since we already used s1 for operations above.
        s2.advanceClusterTime(to: s1.clusterTime!)
        s2.advanceOperationTime(to: s1.operationTime!)
    
        dbOptions.readPreference = .secondary
        let items2 = client2.db("test", options: dbOptions).collection("items")
    
        return items2.find(["end": .null], session: s2).flatMap { cursor in
            cursor.forEach { item in
                print(item)
            }
        }
    }
    

    Limitations

    The following operations that build in-memory structures are not causally consistent:

    Operation Notes
    collStats  
    $collStats with latencyStats option.  
    $currentOp Returns an error if the operation is associated with a causally consistent client session.
    createIndexes  
    dbHash Starting in MongoDB 4.2
    dbStats  
    getMore Returns an error if the operation is associated with a causally consistent client session.
    $indexStats  
    mapReduce Starting in MongoDB 4.2
    ping Returns an error if the operation is associated with a causally consistent client session.
    serverStatus Returns an error if the operation is associated with a causally consistent client session.
    validate Starting in MongoDB 4.2