- Reference >
- Operators >
- Aggregation Pipeline Stages >
- $merge (aggregation)
$merge (aggregation)¶
On this page
Definition¶
Note
The following page discusses the $merge
stage, which
outputs the aggregation pipeline results to a collection. For a
discussion of the $mergeObjects
operator which merges
documents into a single document, see $mergeObjects
instead.
-
$merge
¶ New in version 4.2.
Writes the results of the aggregation pipeline to a specified collection. The
$merge
operator must be the last stage in the pipeline.The
$merge
stage:- Can output to a collection in the same or different database.
- Starting in MongoDB 4.4,
$merge
can output to the same collection that is being aggregated. For more information, see Output to the Same Collection that is Being Aggregated.
- Starting in MongoDB 4.4,
- Creates a new collection if the output collection does not already exist.
- Can incorporate results (insert new documents, merge documents, replace documents, keep existing documents, fail the operation, process documents with a custom update pipeline) into an existing collection.
- Can output to a sharded collection. Input collection can also be sharded.
For a comparison with the
$out
stage which also outputs the aggregation results to a collection, see Comparison with $out.- Can output to a collection in the same or different database.
On-Demand Materialized Views
$merge
can incorporate the pipeline results into an
existing output collection rather than perform a full replacement of
the collection. This functionality allows users to create on-demand
materialized views, where the content of the output collection is
incrementally updated when the pipeline is run.
For more information on this use case, see On-Demand Materialized Views as well as the examples on this page.
Materialized views are separate from read-only views. For information on creating read-only views, see read-only views.
Syntax¶
$merge
has the following syntax:
For example:
If using all default options for $merge
(including writing
to a collection in the same database), you can use the simplified form:
The $merge
takes a document with the following fields:
Field | Description | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
into | The output collection. Specify either:
Note
|
||||||||||
on | Optional. Field or fields that act as a unique identifier for a document. The identifier determines if a results document matches an already existing document in the output collection. Specify either:
For the specified field or fields:
The default value for on depends on the output collection:
|
||||||||||
whenMatched | Optional. The behavior of You can specify either:
|
||||||||||
let | Optional. Specifies variables accessible for use in the whenMatched pipeline Specify a document with the variable name and value expression:
If unspecified, defaults to Note Starting in MongoDB 4.2.2, the To access the |
||||||||||
whenNotMatched | Optional. The behavior of You can specify one of the pre-defined action strings:
|
Considerations¶
_id
Field Generation¶
If the _id
field is not present in a document from the
aggregation pipeline results, the $merge
stage generates
it automatically.
For example, in the following aggregation pipeline,
$project
excludes the _id
field from the documents
passed into $merge
. When $merge
writes these
documents to the "newCollection"
, $merge
generates a
new _id
field and value.
Create a New Collection if Output Collection is Non-Existent¶
The $merge
operation creates a new collection if the
specified output collection does not exist.
- The output collection is created when
$merge
writes the first document into the collection and is immediately visible. - If the aggregation fails, any writes completed by the
$merge
before the error will not be rolled back.
Note
For a replica set or a standalone, if the
output database does not exist, $merge
also creates
the database.
For a sharded cluster, the specified output database must already exist.
If the output collection does not exist, $merge
requires
the on identifier to be the _id
field. To use a
different on
field value for a collection that does not exist, you
can create the collection first by creating a unique index on the
desired field(s) first. For example, if the output collection
newDailySales201905
does not exist and you want to specify the
salesDate
field as the on identifier:
Output to a Sharded Collection¶
The $merge
stage can output to a sharded collection.
When the output collection is sharded, $merge
uses
the _id
field and all the shard key fields as the default on identifier. If you override the default, the on identifier must include all the shard key fields:
For example, in a database that has sharding enabled
, use the sh.shardCollection()
method
to create a new sharded collection newrestaurants
with the
postcode
field as the shard key.
The newrestaurants
collection will contain documents with
information on new restaurant openings by month (date
field) and
postcode (shard key); specifically, the on
identifier is ["date", "postcode"]
(the ordering of the fields
does not matter). Because $merge
requires a unique,
index with keys that correspond to the on identifier fields, create the unique index
(the ordering of the fields do not matter): [1]
With the sharded collection restaurants
and the unique index
created, you can use $merge
to output the aggregation
results to this collection, matching on [ "date", "postcode" ]
as in the following example:
[1] | The In the example above, because the |
Replace Documents ($merge
) vs Replace Collection ($out
)¶
$merge
can replace an existing document in the output
collection if the aggregation results contain a document or
documents that match based on the on
specification. As such, $merge
can replace all documents
in the existing collection if the aggregation results include
matching documents for all existing documents in the collection and
you specify “replace” for
whenMatched.
However, to replace an existing collection regardless of the
aggregation results, use $out
instead.
Existing Documents and _id
and Shard Key Values¶
The $merge
errors if the $merge
results in a
change to an existing document’s _id
value.
Tip
To avoid this error, if the on field does not
include the _id
field, remove the _id
field in the
aggregation results to avoid the error, such as with a preceding
$unset
stage, etc.
Additionally, for a sharded collection, $merge
also
errors if it results in a change to the shard key value of an
exising document.
Any writes completed by the $merge
before the error will
not be rolled back.
Unique Index Constraints¶
If the unique index used by $merge
for on field(s) is dropped mid-aggregation, there is no
guarantee that the aggregation will be killed. If the aggregation
continues, there is no guarantee that documents do not have
duplicate on
field values.
If the $merge
attempts to write a document that violates
any unique index on the output collection, the operation errors; for
example:
- Insert a non-matching document that violates a unique index other than the index on the on field(s).
- Fail if there is a matching document in the collection. Specifically, the operation attempts to insert the matching document which violates the unique index on the on field(s).
- Replace an existing document with a new document that violates a unique index other than the index on the on field(s).
- Merge the matching documents that results in a document that violates a unique index other than the index on the on field(s).
whenMatched
Pipeline Behavior¶
Starting in MongoDB 4.2.2, if all of the following are true for a
$merge
stage:
- The value of whenMatched is an aggregation pipeline,
- The value of whenNotMatched is
insert
, and - There is no match for a document in the output collection,
$merge
inserts the document directly into the output
collection.
Prior to MongoDB 4.2.2, when these conditions for a
$merge
stage are met, the pipeline specified in the
whenMatched field is executed with an
empty input document. The resulting document from the pipeline is
inserted into the output collection.
See also
Comparison with $out
¶
$merge
and $out
Comparison
With the introduction of $merge
, MongoDB provides two
stages, $merge
and $out
, for writing the
results of the aggregation pipeline to a collection. The following
summarizes the capabilities of the two stages:
$merge |
$out |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Output to the Same Collection that is Being Aggregated¶
Warning
When $merge
outputs to the same collection that is being
aggregated, documents may get updated multiple times or the operation
may result in an infinite loop. This behavior occurs when the update
performed by $merge
changes the physical location of
documents stored on disk. When the physical location of a document
changes, $merge
may view it as an entirely new document,
resulting in additional updates. For more information on this
behavior, see Halloween Problem.
Starting in MongoDB 4.4, $merge
can output to the same
collection that is being aggregated. You can also output to a
collection which appears in other stages of the pipeline, such as
$lookup
.
Versions of MongoDB prior to 4.4 did not allow $merge
to
output to the same collection as the collection being aggregated.
Restrictions¶
Restrictions | Description |
---|---|
Transactions | An aggregation pipeline cannot use $merge inside a
transaction. |
Separate from materialized view
|
View definition cannot include the
$merge stage. If the view definition includes nested
pipeline (e.g. the view definition includes $facet
stage), this $merge stage restriction applies to the
nested pipelines as well. |
$lookup stage |
$lookup stage’s nested pipeline cannot include the
$merge stage. |
$facet stage |
$facet stage’s nested pipeline cannot include the
$merge stage. |
$unionWith stage |
$unionWith stage’s nested pipeline cannot include the
$merge stage. |
"linearizable" read concern |
The $merge stage cannot be used in conjunction with read
concern "linearizable" . That is, if you specify
"linearizable" read concern for
db.collection.aggregate() , you cannot include the
$merge stage in the pipeline. |
Examples¶
On-Demand Materialized View: Initial Creation¶
If the output collection does not exist, the $merge
creates
the collection.
For example, a collection named salaries
in the zoo
database
is populated with the employee salary and department history:
You can use the $group
and $merge
stages to
initially create a collection named budgets
(in the reporting
database) from the data currently in the salaries
collection:
Note
For a replica set or a standalone deployment, if the output
database does not exist, $merge
also creates the
database.
For a sharded cluster deployment, the specified output database must already exist.
$group
stage to group the salaries by thefiscal_year
anddept
.$merge
stage writes the output of the preceding$group
stage to thebudgets
collection in thereporting
database.
To view the documents in the new budgets
collection:
The budgets
collection contains the following documents:
See also
On-Demand Materialized View: Update/Replace Data¶
The following example refers to the collections from the previous example.
The example salaries
collection contains the
employee salary and department history:
The example budgets
collection contains the cumulative yearly
budgets:
During the current fiscal year (2019
in this example), new employees
are added to the salaries
collection and new head counts are
pre-allocated for the next year:
To update the budgets
collection to reflect the new
salary information, the following aggregation pipeline uses:
$match
stage to find all documents withfiscal_year
greater than or equal to2019
.$group
stage to group the salaries by thefiscal_year
anddept
.$merge
to write the result set to thebudgets
collection, replacing documents with the same_id
value (in this example, a document with the fiscal year and dept). For documents that do not have matches in the collection,$merge
inserts the new documents.
After the aggregation is run, view the documents in the budgets
collection:
The budgets
collection incorporates the new salary data for fiscal
year 2019 and adds new documents for fiscal year 2020:
See also
Only Insert New Data¶
To ensure that the $merge
does not overwrite existing data
in the collection, set whenMatched to
keepExisting or fail.
The example salaries
collection in the zoo
database contains
the employee salary and department history:
A collection orgArchive
in the reporting
database
contains historical departmental organization records for the past
fiscal years. Archived records should not be modified.
The orgArchive
collection has a unique compound index
on the fiscal_year
and dept
fields; i.e. there should be at
most one record for the same fiscal year and department combination:
At the end of current fiscal year (2019
in this example), the
salaries
collection contain the following documents:
To update the orgArchive
collection to include the fiscal
year 2019
that has just ended, the following aggregation pipeline
uses:
$match
stage to find all documents withfiscal_year
equal to2019
.$group
stage to group the employees by thefiscal_year
anddept
.$project
stage to suppress the_id
field and add separatedept
andfiscal_year
field. When the documents are passed to$merge
,$merge
automatically generates a new_id
field for the documents.$merge
to write the result set toorgArchive
.The
$merge
stage matches documents on thedept
andfiscal_year
fields andfails
when matched. That is, if a document already exists for the same department and fiscal year, the$merge
errors.
After the operation, the orgArchive
collection contains the following
documents:
If the orgArchive
collection already contained a document for
2019 for department "A"
and/or "B"
, the aggregation
fails because of the duplicate key error. However, any document inserted
before the error will not be rolled back.
If you specify keepExisting for the matching document, the aggregation does not affect the matching document and does not error with duplicate key error. Similarly, if you specify replace, the operation would not fail; however, the operation would replace the existing document.
Merge Results from Multiple Collections¶
By default, if a document in the aggregation results matches a
document in the collection, the $merge
stage
merges the documents.
An example collection purchaseorders
is populated with the
purchase order information by quarter and regions:
Another example collection reportedsales
is populated with the
reported sales information by quarter and regions:
Assume that, for reporting purposes, you want to view the data by quarter in the following format:
You can use the $merge
to merge in results from the
purchaseorders
collection and the reportedsales
collection
to create a new collection quarterlyreport
.
To create the quarterlyreport
collection, you can use the
following pipeline:
- First stage:
The
$group
stage groups by the quarter and uses$sum
to add theqty
fields into a newpurchased
field. For example:- Second stage:
- The
merge
stage writes the documents to thequarterlyreport
collection in the same database. If the stage finds an existing document in the collection that matches on the_id
field, the stage merges the matching documents. Otherwise, the stage inserts the document. For the initial creation, no documents should match.
To view the documents in the collection, run the following operation:
The collection contains the following documents:
Similarly, run the following aggregation pipeline against the
reportedsales
collection to merge the sales results into the
quarterlyreport
collection.
- First stage:
The
$group
stage groups by the quarter and uses$sum
to add theqty
fields into a newsales
field. For example:- Second stage:
- The
merge
stage writes the documents to thequarterlyreport
collection in the same database. If the stage finds an existing document in the collection that matches on the_id
field (the quarter), the stage merges the matching documents. Otherwise, the stage inserts the document.
To view the documents in the quarterlyreport
collection after
the data has been merged, run the following operation:
The collection contains the following documents:
Use the Pipeline to Customize the Merge¶
The $merge
can use a custom update pipeline when documents match. The
whenMatched pipeline can have
the following stages:
$addFields
and its alias$set
$project
and its alias$unset
$replaceRoot
and its alias$replaceWith
An example collection votes
is populated with the daily vote
tally. Create the collection with the following documents:s
Another example collection monthlytotals
has the up-to-date
monthly vote totals. Create the collection with the following
document:
At the end of each day, that day’s votes is inserted into the
votes
collection:
You can use $merge
with an custom pipeline to update the
existing document in the collection monthlytotals
:
- First stage:
The
$match
stage finds the specific day’s votes. For example:- Second stage:
The
$project
stage sets the_id
field to a year-month string. For example:- Third stage:
The
merge
stage writes the documents to themonthlytotals
collection in the same database. If the stage finds an existing document in the collection that matches on the_id
field, the stage uses a pipeline to add thethumbsup
votes and thethumbsdown
votes.- This pipeline cannot directly accesses the fields from the
results document. To access the
thumbsup
field and thethumbsdown
field in the results document, the pipeline uses the$$new
variable; i.e.$$new.thumbsup
and$new.thumbsdown
. - This pipeline can directly accesses the
thumbsup
field and thethumbsdown
field in the existing document in the collection; i.e.$thumbsup
and$thumbsdown
.
The resulting document replaces the existing document.
- This pipeline cannot directly accesses the fields from the
results document. To access the
To view documents in the monthlytotals
collection after the merge
operation, run the following operation:
The collection contains the following document: