mongoimport 3.0 is slower than 2.6.4
Asked Answered
M

1

6

To my surprise, I found that importing the same file to the same MongoDB (3.0) is much slower (> 20 times) using 3.0 version vs 2.6.4. Does anyone have the same problems? And how to fix it?

Here are the details:

  1. 2.6.4 loads around 16K rows for the same json file

    **-logbash-3.2$ mongoimport --host mcp-mongo-dev-1201.sea2.rhapsody.com:27017 --db media 
            --collection media --upsert --upsertFields _id --type json --file /data/xxx.json
    

    connected to: mcp-mongo-dev-1201.sea2.rhapsody.com:27017
    2015-10-08T15:24:02.007-0700            Progress: 8860712/5024041951    0%
    2015-10-08T15:24:02.007-0700                    54900   18300/second
    2015-10-08T15:24:05.004-0700            Progress: 15590853/5024041951   0%
    2015-10-08T15:24:05.004-0700                    96900   16150/second**
    
  2. Here is the 3.0 run:

    -logbash-3.2$ mongoimport30 --version
    
    mongoimport version: 3.0.6
    git version: 7588eb887549bd5d2fc7bbc08f7c62d4b29b9d75
    
    -logbash-3.2$ mongoimport30 --host mcp-mongo-dev-1201.sea2.rhapsody.com:27017 --db media 
          --collection media --upsertFields _id --type json --file /data/mediaingestor2.json  --numInsertionWorkers 20000 -v
    

    2015-10-08T15:53:04.393-0700    using upsert fields: [_id]
    2015-10-08T15:53:04.393-0700    filesize: 5024041951 bytes
    2015-10-08T15:53:04.393-0700    using fields: 
    2015-10-08T15:53:04.396-0700    connected to: mcp-mongo-dev-1201.sea2.rhapsody.com:27017
    2015-10-08T15:53:04.396-0700    ns: media.media
    2015-10-08T15:53:04.396-0700    connected to node type: replset
    2015-10-08T15:53:04.397-0700    using write concern: w='majority', j=false, fsync=false, wtimeout=0
    2015-10-08T15:53:04.397-0700    using write concern: w='majority', j=false, fsync=false, wtimeout=0
    2015-10-08T15:53:07.393-0700    [........................] media.media  1.5 MB/4.7 GB (0.0%)
    2015-10-08T15:53:10.393-0700    [........................] media.media  1.5 MB/4.7 GB (0.0%)
    2015-10-08T15:53:13.393-0700    [........................] media.media  1.5 MB/4.7 GB (0.0%)
    2015-10-08T15:53:16.393-0700    [........................] media.media  1.5 MB/4.7 GB (0.0%)
    2015-10-08T15:53:19.393-0700    [........................] media.media  1.5 MB/4.7 GB (0.0%)
    

On the MongoDB side, I use mongostat to see that the number of updates were around 400, which is much smaller than ~16K from the 2.6.4 version above. Note that I also tried --numInsertionWorkers 20000 which is supposed to make it faster but it seems to be the same as without using this option at all. Maybe the git version I am using is not the good one?

Marxmarxian answered 8/10, 2015 at 22:56 Comment(2)
wow - my math was wrong. question still stands but ingestion rates are: mongoimport 2 @ 10k docs/sec and mongoimport 3 @ 700 docs/sec ... still 2 orders of magnitude but closer to 1...Telescopium
if i use --writeConcern '{w: 1, j: false}' with mongoimport 3 i can get about 1400 docs/sec ... w:0 doesn't seem to improve performance over w:1 for mongoimport 3 either.Telescopium
M
3

Running mongoimport with 20,000 numInsertionWorkers is excessive. The application may be loosing performance due to a lot context switching in support of so many threads. The right number of workers is going to be closer to the number of cores on the machine that you're running mongoimport on. You can find the right number through testing, Start with a single worker, monitor the performance, and then double the number in each successive test [1,2,4,8,16,...]. You'll eventually find a number at which performance no longer improves. At that point you will have exceeded the right number of workers.

When comparing performance between versions or processes it is important to make sure the conditions between the test runs haven't changed. It will be difficult to get a meaningful comparison between the two processes if the either the servers or the network has changed from test to test.

Check that the database itself is an identical state. For instance, there will be performance differences if your import workload is run against a database that has data and preexisting indexes and a database that is empty.

Check that the file system and OS configurations are set properly. Our documentation lists a set of system configurations you should set for best performance. http://docs.mongodb.org/manual/administration/production-notes/

Check that the server on which you are running mongoimport is not saturated. Look for any competing processes which may be consuming resources such as CPU, memory and network bandwidth in competition with mongoimport. Similarly, check the server on which you are running mongod for competing processes which contend for server resources.

Check the number of queued readers and writers in mongostat, a low number of queued operations in mongostat can indicate that the mongoimport process is the bottleneck. I suspect that the mongoimport process is bottlenecking upstream of the database.

Malm answered 9/10, 2015 at 5:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.