Python json.loads shows ValueError: Extra data

Asked 11/1, 2014 at 5:36 Answered 19/9, 2023 at 20:10

230

I am getting some data from a JSON file "new.json", and I want to filter some data and store it into a new JSON file. Here is my code:

import json
with open('new.json') as infile:
    data = json.load(infile)
for item in data:
    iden = item.get["id"]
    a = item.get["a"]
    b = item.get["b"]
    c = item.get["c"]
    if c == 'XYZ' or  "XYZ" in data["text"]:
        filename = 'abc.json'
    try:
        outfile = open(filename,'ab')
    except:
        outfile = open(filename,'wb')
    obj_json={}
    obj_json["ID"] = iden
    obj_json["VAL_A"] = a
    obj_json["VAL_B"] = b

And I am getting an error, the traceback is:

  File "rtfav.py", line 3, in <module>
    data = json.load(infile)
  File "/usr/lib64/python2.7/json/__init__.py", line 278, in load
    **kw)
  File "/usr/lib64/python2.7/json/__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python2.7/json/decoder.py", line 369, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 88 column 2 - line 50607 column 2 (char 3077 - 1868399)

Here is a sample of the data in new.json, there are about 1500 more such dictionaries in the file

{
    "contributors": null, 
    "truncated": false, 
    "text": "@HomeShop18 #DreamJob to professional rafter", 
    "in_reply_to_status_id": null, 
    "id": 421584490452893696, 
    "favorite_count": 0, 
    "source": "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Mobile Web (M2)</a>", 
    "retweeted": false, 
    "coordinates": null, 
    "entities": {
        "symbols": [], 
        "user_mentions": [
            {
                "id": 183093247, 
                "indices": [
                    0, 
                    11
                ], 
                "id_str": "183093247", 
                "screen_name": "HomeShop18", 
                "name": "HomeShop18"
            }
        ], 
        "hashtags": [
            {
                "indices": [
                    12, 
                    21
                ], 
                "text": "DreamJob"
            }
        ], 
        "urls": []
    }, 
    "in_reply_to_screen_name": "HomeShop18", 
    "id_str": "421584490452893696", 
    "retweet_count": 0, 
    "in_reply_to_user_id": 183093247, 
    "favorited": false, 
    "user": {
        "follow_request_sent": null, 
        "profile_use_background_image": true, 
        "default_profile_image": false, 
        "id": 2254546045, 
        "verified": false, 
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/413952088880594944/rcdr59OY_normal.jpeg", 
        "profile_sidebar_fill_color": "171106", 
        "profile_text_color": "8A7302", 
        "followers_count": 87, 
        "profile_sidebar_border_color": "BCB302", 
        "id_str": "2254546045", 
        "profile_background_color": "0F0A02", 
        "listed_count": 1, 
        "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", 
        "utc_offset": null, 
        "statuses_count": 9793, 
        "description": "Rafter. Rafting is what I do. Me aur mera Tablet.  Technocrat of Future", 
        "friends_count": 231, 
        "location": "", 
        "profile_link_color": "473623", 
        "profile_image_url": "http://pbs.twimg.com/profile_images/413952088880594944/rcdr59OY_normal.jpeg", 
        "following": null, 
        "geo_enabled": false, 
        "profile_banner_url": "https://pbs.twimg.com/profile_banners/2254546045/1388065343", 
        "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", 
        "name": "Jayy", 
        "lang": "en", 
        "profile_background_tile": false, 
        "favourites_count": 41, 
        "screen_name": "JzayyPsingh", 
        "notifications": null, 
        "url": null, 
        "created_at": "Fri Dec 20 05:46:00 +0000 2013", 
        "contributors_enabled": false, 
        "time_zone": null, 
        "protected": false, 
        "default_profile": false, 
        "is_translator": false
    }, 
    "geo": null, 
    "in_reply_to_user_id_str": "183093247", 
    "lang": "en", 
    "created_at": "Fri Jan 10 10:09:09 +0000 2014", 
    "filter_level": "medium", 
    "in_reply_to_status_id_str": null, 
    "place": null
}

Cereal answered 11/1, 2014 at 5:36 Comment(4)

This is the error you get whenever the input JSON has more than one object per line. Many of the answer here assume there is only one object per line, or construct examples obeying that, but would break if that wasn't the case. – Kuebbing 3/1, 2020 at 14:17

@Kuebbing : Can you explain the line more than one object per line – Quintin 18/2, 2020 at 9:11

@Kuebbing I think you meant "more than one line per object"? – Anu 4/2, 2023 at 8:21

Yes, "more than one line per object", silly me... – Kuebbing 7/2, 2023 at 23:47

208

As you can see in the following example, json.loads (and json.load) does not decode multiple json object.

>>> json.loads('{}')
{}
>>> json.loads('{}{}') # == json.loads(json.dumps({}) + json.dumps({}))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\json\__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "C:\Python27\lib\json\decoder.py", line 368, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 3 - line 1 column 5 (char 2 - 4)

If you want to dump multiple dictionaries, wrap them in a list, dump the list (instead of dumping dictionaries multiple times)

>>> dict1 = {}
>>> dict2 = {}
>>> json.dumps([dict1, dict2])
'[{}, {}]'
>>> json.loads(json.dumps([dict1, dict2]))
[{}, {}]

Spiracle answered 11/1, 2014 at 5:39 Comment(18)

Can you please explain again with reference to the code I gave above? I am a newbie, and at times take long to grasp such things. – Cereal 11/1, 2014 at 5:43

@ApoorvAshutosh, It seems like new.json contains a json and another redundant data. json.load, json.loads can only decode a json. It raise a ValueError when it encounter addtional data as you see. – Spiracle 11/1, 2014 at 5:48

Have pasted a sample from new.json, and I am filtering out some data from it, so I don't get where I am getting extra data from – Cereal 11/1, 2014 at 5:50

@ApoorvAshutosh, You said 1500 more such dictionaries in the edited question. That's the additional data. If you're the one who made a new.json, just put a single json in a file. – Spiracle 11/1, 2014 at 5:51

@ApoorvAshutosh, If you need to dump multiple dictionaries as json, wrap them in a list, and dump the list. – Spiracle 11/1, 2014 at 5:53

the issue here is not about loading into a JSON file, that has already happened. Can you tell me how to retrieve data from there? I already have a file that has dictionaries in it. I now have to retrieve each of those dictionaries. https://mcmap.net/q/119922/-python-json-parser-closed – Cereal 11/1, 2014 at 7:19

@ApoorvAshutosh, BTW, trailing ',' is missing in the json (in the new question). (at the line "x": []) => invalid json. – Spiracle 11/1, 2014 at 7:27

sure, asap. And could you just look into one more thing, as I said, about how to read from a file with multiple dictionaries – Cereal 11/1, 2014 at 7:27

@ApoorvAshutosh, I'm doing research that issue. I will post answer there if research is done. – Spiracle 11/1, 2014 at 7:28

Thats just a sample, I mentioned it in a comment – Cereal 11/1, 2014 at 7:28

@ApoorvAshutosh, Please post a valid sample! – Spiracle 11/1, 2014 at 7:29

@ApoorvAshutosh, No, I mean the sample in the new question. – Spiracle 11/1, 2014 at 7:31

Its for this very sample, the structure of the dictionaries is basically the same. However, I'll edit that question with this very sample – Cereal 11/1, 2014 at 7:32

@ApoorvAshutosh, I posted an answer that workaround the issue. Check it out. – Spiracle 11/1, 2014 at 7:49

Can I ask that why it still works when I use json.dump instead of json.dumps? I am using Python 3.5.2 – Sera 23/9, 2016 at 0:30

@ShuruiLiu, Please post a separated question. – Spiracle 23/9, 2016 at 17:0

as someone who has an issue such as this from a json web scrape. I ran the code through a linter to see if it is valid json. It seems that it is, so why would this error still call? – Coarse 15/7, 2017 at 18:56

I was trying with this option, but I saw another useful way to get all items : file.readlines() which returns a list of sentences. – Choreograph 4/2, 2021 at 20:31

209

Iterate over the file, loading each line as JSON in the loop:

tweets = []
with open('tweets.json', 'r') as file:
    for line in file:
        tweets.append(json.loads(line))

This avoids storing intermediate python objects. As long as you write one full tweet per append() call, this should work.

Lighten answered 28/3, 2015 at 1:27 Comment(3)

The accepted answer addresses how to fix the source of the problem if you control the process of exporting, but if you are using someone else's data and you just have to deal with it, this is a great low-overhead method. – Duralumin 12/3, 2017 at 2:57

Many datasets (e.g.: Yelp dataset) nowadays are provided as "set" of Json objects and your approach it's convenient to load them. – Traditional 25/1, 2018 at 0:15

This only works for inputs that have one complete JSON object per line. That is a common input format (it is not JSON, but a related format sometimes called either JSONL or NDJSON), but it is not what is shown in the OP. – Anu 4/2, 2023 at 8:19

208

As you can see in the following example, json.loads (and json.load) does not decode multiple json object.

>>> json.loads('{}')
{}
>>> json.loads('{}{}') # == json.loads(json.dumps({}) + json.dumps({}))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\json\__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "C:\Python27\lib\json\decoder.py", line 368, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 3 - line 1 column 5 (char 2 - 4)

If you want to dump multiple dictionaries, wrap them in a list, dump the list (instead of dumping dictionaries multiple times)

>>> dict1 = {}
>>> dict2 = {}
>>> json.dumps([dict1, dict2])
'[{}, {}]'
>>> json.loads(json.dumps([dict1, dict2]))
[{}, {}]