Convert to date MongoDB via mongoimport
Asked Answered
M

3

10

I have downloaded huge chunks of data in the format in csv. I am using mongoimport to enter the data into MongoDB for processing. How do I get the date into date format recognized by MongoDB?

sample data with header

Date, Open Price, High Price, Low Price, Last Traded Price , Close Price, Total Traded Quantity, Turnover (in Lakhs)
04-Apr-2014,901,912,889.5,896.75,892.85,207149,1867.08
03-Apr-2014,908,918,897.65,900,900.75,156260,1419.9
02-Apr-2014,916,921.85,898,900.7,900.75,175990,1591.97
Menswear answered 6/4, 2014 at 4:43 Comment(3)
Do you actually mean mongoimport or are referring to some other tool?Soloman
I mean mongoimport. Its works fine if there is someway I can do it after getting it into the Db. Or if there is entirely different way to get the data into the Db with the date in its recognizable format.Menswear
This answer https://mcmap.net/q/959946/-importing-date-datatype-using-mongoimport to a similar question includes a one-line mongo command to convert strings to dates.Hooker
I
20

As far as I know, there is no way to do this with mongoimport.

But this is achievable by importing the data and then running the following script (note that there is no point of all this hastle with a monthes as in Neil's Lunn script, because mongo can properly convert your date by doing this new Date('04-Apr-2014')):

db.collName.find().forEach(function(el){
    el.dateField = new Date(el.dateField);
    db.collName.save(el)
});

PS If timezone is so important (I assume that it is not, if there are only dates without time information), you can just change timezone on your local machine and then run the query. (Thanks to Neil Lunn for clarification regarding this)

Imogeneimojean answered 6/4, 2014 at 5:27 Comment(1)
It's so irritating that mongoexport in CSV mode throws away type information like this! (facepalm) Basically, it's useless unless your data consists of just strings and numbers.Fanatic
S
7

As of Mongo version 3.4, you can use --columnsHaveTypes option to specify the type of your field while using mongoimport to import your data. here is the link for reference.

Sample mongoimport syntax below:

mongoimport --db XYZ --collection abc --type tsv --fields id.int32(),client_name.string(),app_name.auto(),date.date() --columnsHaveTypes --file "abc.tsv" --verbose
Stump answered 29/3, 2017 at 6:19 Comment(2)
For large datasets (mine was 130M objects), re-importing from the source using this was faster (50min) vs the forEach response below (20h). You have to specify the date format though, see this post.Sufi
Note, this only works for TSV and CSV filesAbducent
S
6

You basically have three options here as though you can import CSV directly using mongoimport, it has no idea how to convert dates from this format.

  1. Convert your CSV input to JSON format by whatever means. For your date values you can use the extended JSON syntax form that will be recognized by the tool. The resulting JSON you produce can then be passed to mongoimport.

  2. Write your own program to import the data by reading your CSV input and doing the correct conversions.

  3. Import the CSV content as is, and then manipulate the data directly in your MongoDB collection using your language of choice.

One take on the third option would be to loop the results and update the dates accordingly:

var months = [
    "Jan", "Feb", "Mar", "Apr", "May", "Jun", 
    "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
];

db.collection.find({ },{ "Date": 1 }).forEach(function(doc){

    var splitDate = doc.Date.split("-");

    var mval = months.indexOf( splitDate[1] );
    mval = ( mval < 10 ) ?  "0" + mval : mval

    var newDate = new Date( splitDate[2] + "-" + mval + "-" + splitDate[0] );

    db.collection.update(
        { _id: doc._id },
        { "$set": { "Date": newDate } }
    );

})

And that would make sure your dates are then converted to the correct BSON date format with the same matching date values you are expected.

Beware of "local" timezone conversions, you will want to be storing as UTC time.

Soloman answered 6/4, 2014 at 5:23 Comment(4)
I think that all this manipulation with months is redundant. Because in shell you can do the following new Date('04-Apr-2014') and this will give you a valid ISODate : ISODate("2014-04-04T07:00:00Z"). If anything, check my answer.Imogeneimojean
@SalvadorDali Actually that will be timezone independent. Notice the hour value in your date you show. Hence the manipulationSoloman
If timezone is so important (I assume that it is not, if there are only dates without time information), you can just change timezone on your local machine and then run the query. But thank you for showing me about time issues in mongo. I did not know this.Imogeneimojean
@SalvadorDali Not a good idea. You should always expect the data stored be be as UTC. Local conversions should be done "client" (to database) side as you could be representing to multiple timezones. If you have a date as the 1st of April, then that should be the UTC representation, not something that unexpedly offsets.Soloman

© 2022 - 2024 — McMap. All rights reserved.