Importing Date-datatype using mongoimport
Asked Answered
P

5

11

I have many GB of data stored in PostgreSQL database and i need those to be imported into the MongoDB. I did this using CSV export and mongoimport.

There are columns like this '2011-06-25' in that CSV and it has been imported as string, not as MongoDate, so i cannot effectively search by date.

I've found this : http://www.mongodb.org/display/DOCS/Import+Export+Tools#ImportExportTools-Example%3AImportingInterestingTypes but the example says, i need to use JSON structure for the file. Do i really need to export JSON file from PostgreSQL?

If i do - how?

If i don't, how to export "MongoDate" through CSV?

Perkoff answered 25/6, 2011 at 5:11 Comment(1)
i guess it wouldn't be as fast as native mongoimport tool, which is crucial fact in case of large amount of dataPerkoff
C
9

Your options:

  • import the stuff as CSV and convert the data after the import to Date() (either using the mongo console or using a script written in your favorite language)

  • import your data as JSON and using the $date descriptor for converting date strings into Date instances

Conveyor answered 25/6, 2011 at 6:3 Comment(0)
V
21

Actually the first option is pretty fast even with huge data. Here is an example query using the mongo console:

/usr/bin/mongo yourdbname --eval "db.yourcollectionname.find().forEach(function(doc){doc.yourdatefield = new ISODate(doc.yourdatefield);db.yourcollectionname.save(doc)});"
Vallecula answered 14/2, 2013 at 13:55 Comment(5)
Looks like this is going to take a couple of hours on my collection of 65 million documents. I guess it is "pretty fast".Buddie
Now it looks like it is going to take something like 10 or 20 hours. Not so fast.Buddie
@Buddie I was wondering if this approach would be practical for me, but my collection has ~130 million documents. Sigh...Resuscitate
I would remember that we ended up importing the documents from csv, which was really fast. Then we ran a query in mongodb to loop through all documents and convert the string date into real date object. That took a while, but it is a lot easier to work on the data once it is in the database. At least when compared to an error prone regex / sed command chain fired towards the csv before the import.Buddie
this is what i ended up doing, was a lot faster than iterating through in JS code.Ghent
C
9

Your options:

  • import the stuff as CSV and convert the data after the import to Date() (either using the mongo console or using a script written in your favorite language)

  • import your data as JSON and using the $date descriptor for converting date strings into Date instances

Conveyor answered 25/6, 2011 at 6:3 Comment(0)
T
2

Write a script using your favorite language to import the data. It will allow you to control the data-type and apply structural changes to the data as well.

Generally you will find that the mapping of the data is not the same when converting from tabular data to a document database.

If you write a script you may also want to support running multiple copies (or threads) to improve the import speed.

Tamarisk answered 25/6, 2011 at 14:35 Comment(0)
A
2

When possible, re-importing from the source CSV should be faster. For a relatively large dataset (130M objects, source CSV ~6GB), a mongoimport run took 40mins, but the forEach suggested by @webDEVILopers took 24h+ (at which point I canceled it).

To import from CSV and convert your date, I ran

mongoimport --db your-db-name --type csv --file your-file.csv --collection your-collection --fields timestamp.date\(2006-01-02\ 15:04:0
5.00000+00\),count.int32\(\), --columnsHaveType

The .date() part is tricky, you have to specify that particular date in your source format. So my timestamps were like "2018-02-03 12:09:23.00000+00", which meant formatting the reference date (Jan 2nd, 2006 15:04:05) in that format (as in command above). See this for reference.

Agreeable answered 11/4, 2018 at 11:31 Comment(4)
you actually put a date string into the date(....) field?Ghent
also i think your answer is cut off?Ghent
@Ghent yeah, you have to put in that exact date (Jan 2nd, 2006 15:04:05) in date field, formatted to match what your dates look like in your dataset. it's quite bizzarre. Which part is cut off? The command section should scroll horizontally.Agreeable
re cutoff: oh maybe there's an odd wraparound. OK i'll try that. weird :/Ghent
G
0

I'm a little bit late, but the forEach version above can be made faster by orders of magnitude by using bulkWrite instead of save, and write with 0.25ms instead of 1.2ms per document.

Note: my data required cleaning up strings, but that doesn't have an effect on writing documents.

Without bulkWrite, i.e. save:

mongo> i=0;
0
mongo> start= new Date();
ISODate("2023-05-15T07:20:30.231Z")
mongo> db.myCol.find().forEach( d => {
     if (typeof(d.timeslot)!="string"){ return; };
     d.timeslot= new ISODate(d.timeslot.replace(' UTC',''));
     db.myCol.save(d);
     i+= 1;
     if (i%1000==0) {
       end=new Date();
       diff= (end.valueOf()-start.valueOf());
       printjson({
         ms: diff,
         n: 1000,
         avg_ms: diff/1000
       });
       start=new Date();
     }
   });
{ "ms" : 12722, "n" : 1000, "avg_ms" : 12.722 }
{ "ms" : 1163, "n" : 1000, "avg_ms" : 1.163 }
{ "ms" : 1208, "n" : 1000, "avg_ms" : 1.208 }
{ "ms" : 1183, "n" : 1000, "avg_ms" : 1.183 }
{ "ms" : 1168, "n" : 1000, "avg_ms" : 1.168 }
:

With bulkWrite:

mongo> blk=[];
[ ]
mongo> start= new Date();
ISODate("2023-05-15T07:27:16.882Z")
mongo> db.myCol.find().forEach( d => {
     if (typeof(d.timeslot)!="string"){ return; };
     blk.push({ updateOne: { filter: {_id:d._id }, update: { $set: { timeslot: new ISODate(d.timeslot.replace(' UTC','')) } } } });
     if (blk.length > 0 && blk.length%1000==0) {
       res=db.myCol.bulkWrite(blk);
       end=new Date();
       diff= (end.valueOf()-start.valueOf());
       printjson({
         ms: diff,
         n: res.matchedCount,
         avg_ms: diff/res.matchedCount
       });
       start=new Date();
       blk=[];
     }
   });
mongo> db.myCol.bulkWrite(blk);
{ "ms" : 9745, "n" : 1000, "avg_ms" : 9.745 }
{ "ms" : 252, "n" : 1000, "avg_ms" : 0.252 }
{ "ms" : 231, "n" : 1000, "avg_ms" : 0.231 }
{ "ms" : 213, "n" : 1000, "avg_ms" : 0.213 }
{ "ms" : 209, "n" : 1000, "avg_ms" : 0.209 }
:
Gi answered 15/5, 2023 at 7:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.