To my surprise, I found that importing the same file to the same MongoDB (3.0) is much slower (> 20 times) using 3.0 version vs 2.6.4. Does anyone have the same problems? And how to fix it?
Here are the details:
2.6.4 loads around 16K rows for the same json file
**-logbash-3.2$ mongoimport --host --db media --collection media --upsert --upsertFields _id --type json --file /data/xxx.json
connected to: 2015-10-08T15:24:02.007-0700 Progress: 8860712/5024041951 0% 2015-10-08T15:24:02.007-0700 54900 18300/second 2015-10-08T15:24:05.004-0700 Progress: 15590853/5024041951 0% 2015-10-08T15:24:05.004-0700 96900 16150/second**
Here is the 3.0 run:
-logbash-3.2$ mongoimport30 --version mongoimport version: 3.0.6 git version: 7588eb887549bd5d2fc7bbc08f7c62d4b29b9d75 -logbash-3.2$ mongoimport30 --host --db media --collection media --upsertFields _id --type json --file /data/mediaingestor2.json --numInsertionWorkers 20000 -v
2015-10-08T15:53:04.393-0700 using upsert fields: [_id] 2015-10-08T15:53:04.393-0700 filesize: 5024041951 bytes 2015-10-08T15:53:04.393-0700 using fields: 2015-10-08T15:53:04.396-0700 connected to: 2015-10-08T15:53:04.396-0700 ns: 2015-10-08T15:53:04.396-0700 connected to node type: replset 2015-10-08T15:53:04.397-0700 using write concern: w='majority', j=false, fsync=false, wtimeout=0 2015-10-08T15:53:04.397-0700 using write concern: w='majority', j=false, fsync=false, wtimeout=0 2015-10-08T15:53:07.393-0700 [........................] 1.5 MB/4.7 GB (0.0%) 2015-10-08T15:53:10.393-0700 [........................] 1.5 MB/4.7 GB (0.0%) 2015-10-08T15:53:13.393-0700 [........................] 1.5 MB/4.7 GB (0.0%) 2015-10-08T15:53:16.393-0700 [........................] 1.5 MB/4.7 GB (0.0%) 2015-10-08T15:53:19.393-0700 [........................] 1.5 MB/4.7 GB (0.0%)
On the MongoDB side, I use mongostat
to see that the number of updates were around 400, which is much smaller than ~16K from the 2.6.4 version above. Note that I also tried --numInsertionWorkers 20000
which is supposed to make it faster but it seems to be the same as without using this option at all. Maybe the git version I am using is not the good one?