How to import only non existing documents?

Go To StackoverFlow.com

2

I am using mongo import in order to import a bunch of jsons and I am looking for a way only to import records that don't exist (can be checked by oid). I tried with --upsert but it updates the record and I want to ignore it completley if it's already there.

Any ideas...?

2012-04-04 20:14
by Joly


9

The default behavior of mongoimport should not be to overwrite existing documents: In the JS shell, I created a document in the collection "testimport"

> db.testimport.save({_id:1, x:"a"})
> db.testimport.find()
{ "_id" : 1, "x" : "a" }
> 

Here are the contents of the file import.json. It contains 2 documents, one with a unique _id, and one with a duplicate _id.

import.json
{_id:1, x:"b"}
{_id:2, x:"b"}

In a new terminal window, mongoimport is run:

$ ./mongoimport -d test -c testimport import.json -vvvvv 
Wed Apr  4 19:03:48 creating new connection to:127.0.0.1
Wed Apr  4 19:03:48 BackgroundJob starting: ConnectBG
Wed Apr  4 19:03:48 connected connection!
connected to: 127.0.0.1
Wed Apr  4 19:03:48 ns: test.testimport
Wed Apr  4 19:03:48 filesize: 29
Wed Apr  4 19:03:48 got line:{_id:1, x:"b"}
Wed Apr  4 19:03:48 got line:{_id:2, x:"b"}
imported 2 objects
$

Even though the output of mongoimport says that two objects were imported, the document with _id:1 has not been overwritten.

> db.testimport.find()
{ "_id" : 1, "x" : "a" }
{ "_id" : 2, "x" : "b" }
>

If the --upsert flag is used, then the document with _id:1 will be updated:

$ ./mongoimport -d test -c testimport import.json -vvvvv --upsert
Wed Apr  4 19:14:26 creating new connection to:127.0.0.1
Wed Apr  4 19:14:26 BackgroundJob starting: ConnectBG
Wed Apr  4 19:14:26 connected connection!
connected to: 127.0.0.1
Wed Apr  4 19:14:26 ns: test.testimport
Wed Apr  4 19:14:26 filesize: 29
Wed Apr  4 19:14:26 got line:{_id:1, x:"b"}
Wed Apr  4 19:14:26 got line:{_id:2, x:"b"}
imported 2 objects
$

In the JS shell, we can see that the document with _id:1 has been updated:

> db.testimport.find()
{ "_id" : 1, "x" : "b" }
{ "_id" : 2, "x" : "b" }
>

Is this not the behavior that you are experiencing? The above was tested with version 2.1.1-pre, but I do not believe that the mongoimport code has changed for a while.

2012-04-04 23:24
by Marc
It actually does work as you say, not sure why I didn't see it earlier (doh!). Thanks - Joly 2012-04-05 07:41
how to make it so that if you do it without upserting you could check if it is duplicate without look ingat the _id but another field name. for example : don't import if {name : "example1"} exists. I know of --upsertFields but I don't want to change the existing doc at all. The default mongoimport checks by _id i think - jack blank 2016-11-30 16:32
Ads