Iterating through PyMongo cursor throws InvalidBSON: year is out of range
Asked Answered
G

4

5

I am using PyMongo to simply iterate over a Mongo collection, but I'm struggling with handling large Mongodb date objects.

For example, if I have some data in a collection that looks like this:

"bad_data" : [ 
            {
                "id" : "id01",
                "label" : "bad_data",
                "value" : "exist",
                "type" : "String",
                "lastModified" : ISODate("2018-06-01T10:04:35.000Z"),
                "expires" : Date(9223372036854775000)
            }
        ]

I will do something like:

from pymongo import MongoClient, database, cursor, collection
client = MongoClient('localhost')
db = client['db1']
db.authenticate('user', 'pass', source='admin')
collection = db['collection']
for i in collection:
    # do something with i

and get the error InvalidBSON: year 292278994 is out of range

Is there any way I can handle dealing with this rediculous Date() object without bson falling over? I realise that having such a date in Mongodb is crazy but there is nothing I can do about this as it's not my data.

Goins answered 6/11, 2018 at 18:10 Comment(0)
L
7

There actually is a section in the PyMongo FAQ about this very topic:

Why do I get OverflowError decoding dates stored by another language’s driver?

PyMongo decodes BSON datetime values to instances of Python’s datetime.datetime. Instances of datetime.datetime are limited to years between datetime.MINYEAR (usually 1) and datetime.MAXYEAR (usually 9999). Some MongoDB drivers (e.g. the PHP driver) can store BSON datetimes with year values far outside those supported by datetime.datetime.

So the basic constraint here is on the datetime.datetime type as implemented for the mapping from BSON by the driver, and though it might be "ridiculous" it's valid for other languages to create such a date value.

As pointed to in the FAQ your general workarounds are:

  1. Deal with the offending BSON Date. Whilst valid to store, it possibly was not the "true" intention of whomever/whatever stored it in the first place.

  2. Add a "date range" condition to your code to filter "out of range" dates:

    result = db['collection'].find({ 
      'expires': { '$gte': datetime.min, '$lte': datetime.max }
    })
    for i in result:
      # do something with i
    
  3. Omit the offending date field in projection if you don't need the data in further processing:

    result = db['collection'].find({  }, projection={ 'expires': False })
    for i in result:
      # do something with i
    

Certainly 'expires' as a name suggests the original intent of the value was a date so far into the future that it was never going to come about, with the original author of that data ( and very possibly current code still writing it ) not being aware of the "Python" date constraint. So it's probably quite safe to "lower" that number in all documents and where any code is still writing it.

Lezley answered 6/11, 2018 at 20:17 Comment(4)
Thanks for this. I have also figured out a way you can iterate over the result and skip any results that return InvalidBSON by calling the __getitem__ function.Goins
@Goins - any chance you have an example of this check?Hebel
This is good but what if the field is dynamic. Would be nice to have a way to omit problem fields unknown or create a custom decoder.Burhans
The link is broken here. The FAQ is now here: pymongo.readthedocs.io/en/stable/…Peridium
V
1

I had the same issue with pymongo query. when I tried to query my data I got this Error:

year 53427 is out of range

because some of my values were in Unix time and some in IOS time.

my solution:

  1. first mark the UNIX time with 1 and normal time with 0 by condition for Date greater than a future date (2023)
{$addFields: {
    Unix_time_index: {$cond: 
        { if: { $gt: [ "$mytime", ISODate("2023-12-12T00:00:00.000Z") ] },
          then: 1,
          else: 0 }},
        }},
  1. then change my time to a Long type number (only for the UNIX time):
{$addFields:{
    mytime: {$cond:
        { if: { $eq: [ "$Unix_time_index", 1 ] },
          then: {$toLong: "$mytime"},
          else: "$mytime" }},
        }},
  1. then divide the long unix time by 1000:
{$addFields:{
    mytime: {$cond:
        { if: { $eq: [ "$Unix_time_index", 1 ] },
          then: { $divide: [ "$mytime",1000 ] },
          else: "$mytime" }},
                }},
  1. and at the end convert the long unix number into a regular number:
{$addFields:{
mytime: {$cond: 
        { if: { $eq: [ "$Unix_time_index", 1 ] },
          then: {$toDate: "$mytime"},
          else: "$mytime" }}
                }},

results:

/* 1 */
{
    "_id" : ObjectId("606b5d48ab86b4002130f944"),
    "mytime" : ISODate("2021-06-18T21:52:51.000Z"),
    "before" : Date(1624053171000000),
    "Unix_time_index" : 1.0
}

/* 2 */
{
    "_id" : ObjectId("606b5d48ab86b4002130f944"),
    "mytime" : ISODate("2021-06-18T21:52:47.000Z"),
    "before" : Date(1624053167000000),
    "Unix_time_index" : 1.0
}

/* 3 */
{
    "_id" : ObjectId("606b5d48ab86b4002130f944"),
    "mytime" : ISODate("2022-01-27T23:44:46.000Z"),
    "before" : ISODate("2022-01-27T23:44:46.000Z"),
    "Unix_time_index" : 0.0
}
Varnish answered 31/3, 2022 at 15:37 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Ahl
U
0

I've just had the same issue :

As Neil mentionnned and I quote :

PyMongo decodes BSON datetime values to instances of Python’s datetime.datetime. Instances of datetime.datetime are limited

So After looking closely at my object, I've noticed some date object are like this : 0000:12:30T00:00:00

In my case, the best approach is to set a validation prior to inserting document into the collection, so if the date is 0 , setting it to 1970 solved the issue

enter image description here

Unbar answered 28/12, 2020 at 16:34 Comment(0)
C
0

This was driving me crazy. I was running a similar script on Amazon Linux and kept getting

year 20245 is out of range

Then I ran it on Ubuntu WSL on Windows 11 and got the same error, but with some useful information:

year 20245 is out of range (Consider Using CodecOptions(datetime_conversion=DATETIME_AUTO) or MongoClient(datetime_conversion='DATETIME_AUTO')). See: https://pymongo.readthedocs.io/en/stable/examples/datetimes.html#handling-out-of-range-datetimes

And indeed, creating the MongoClient with the option datetime_conversion="DATETIME_AUTO" solved the problem for me:

client = MongoClient(CONNECTION_STRING,datetime_conversion="DATETIME_AUTO")
Ciapas answered 29/4 at 8:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.