mongoimport stuck at same point while importing a JSON file
Asked Answered
P

3

9
OS: ubuntu 14.04 LTS
Mongo version: rs0:PRIMARY> db.version()
    3.0.7
Storage engine: wiredTiger.

Importing a JSON file (13GB, 1 JSON doc per line) via this command:

$ mongoimport --db InfDB --collection SFTest --file InfMapRed.json

This command used to work fine in 2.6 with the prior storage engine but now does not progress beyond 0.2%. The program keeps printing the line below over and over. The collection shows 1000 records via .count().

    2015-10-24T06:11:41.799+0000    connected to: localhost
    2015-10-24T06:11:44.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:11:47.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:11:50.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:11:53.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:11:56.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:11:59.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:12:02.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:12:05.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:12:08.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:12:11.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:12:14.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:12:17.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:12:20.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
    2015-10-24T06:12:23.788+0000    [........................] InfDB.SFTest       20.5 MB/13.0 GB (0.2%)
....

Wrote a simple python script to read the file and insert the docs line wise and that works fine.

Phaedrus answered 24/10, 2015 at 6:32 Comment(3)
Did you ever find a solution to this?Absorptivity
Nope, used my script. I've not been able to repro it and use mongoimport fine with latest tools. There are certain APIs that don't work the same in wiredTiger so maybe they updated the tool? Honestly, not sure since I was able to work around then.Phaedrus
Just remembered when faced with the problem again. What worked was using '--batchSize 100', default is 10000. Seems like for large docs need to reduce the batchSizePhaedrus
P
4

Using a smaller batch size solved this.

mongoimport --db InfDB --collection SFTest --file InfMapRed.json --batchSize 100

This is useful when importing large docs, default batch size is 10000.

Phaedrus answered 11/4, 2016 at 13:59 Comment(0)
A
4

I've had this issue with large json files, batchSize did not fix the issue, but numInsertionWorkers did

this works for mongo 3 only:

in your case, with 1 worker, you were able to insert 0.2% of the data, so 100/0.2=500, you need 500 workers to get the data at once

mongoimport --db InfDB --collection SFTest --file InfMapRed.json --numInsertionWorkers 500       

reference: https://docs.mongodb.com/manual/reference/program/mongoimport/#cmdoption--numInsertionWorkers

Aimo answered 6/5, 2016 at 22:19 Comment(0)
S
2

Check the secondaries. When I had this problem in v3.0.8, the secondaries were stuck in the RECOVERING state, and the logs showed why:

2015-11-19T00:35:01.271+0000 I REPL     [rsBackgroundSync] replSet error RS102 too stale to catch up
2015-11-19T00:42:16.360+0000 I REPL     [rsBackgroundSync] replSet error RS102 too stale to catch up
2015-11-19T00:45:01.410+0000 I REPL     [rsBackgroundSync] replSet error RS102 too stale to catch up
2015-11-19T00:52:16.496+0000 I REPL     [rsBackgroundSync] replSet error RS102 too stale to catch up
2015-11-19T00:55:01.551+0000 I REPL     [rsBackgroundSync] replSet error RS102 too stale to catch up

By default, mongoimport operates with "majority" write concern. Since my secondaries were stale, they couldn't replicate the import operations, and the primary was waiting around for replication that could never occur.

After performing a manual resync on the secondaries, I attempted a mongoimport again with success. Alternatively, if only one of your secondaries is RECOVERING, you could set the write concern to a low number with the --writeConcern option.

Synonym answered 21/3, 2016 at 15:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.