Deserialize json array stream one item at a time
Asked Answered
G

5

39

I serialize an array of large objects to a json http response stream. Now I want to deserialize these objects from the stream one at a time. Are there any c# libraries that will let me do this? I've looked at json.net but it seems I'd have to deserialize the complete array of objects at once.

[{large json object},{large json object}.....]

Clarification: I want to read one json object from the stream at a time and deserialize it.

Glimpse answered 4/12, 2013 at 11:26 Comment(3)
Why exactly do you want such a behavior?Ceresin
Because I don't want to keep the entire array of large objects in memory.Glimpse
You'll probably need to use JsonTextReader from Json.NET and read the tokens in manually.Marinate
R
62

In order to read the JSON incrementally, you'll need to use a JsonTextReader in combination with a StreamReader. But, you don't necessarily have to read all the JSON manually from the reader. You should be able to leverage the Linq-To-JSON API to load each large object from the reader so that you can work with it more easily.

For a simple example, say I had a JSON file that looked like this:

[
  {
    "name": "foo",
    "id": 1
  },
  {
    "name": "bar",
    "id": 2
  },
  {
    "name": "baz",
    "id": 3
  }
]

Code to read it incrementally from the file might look something like the following. (In your case you would replace the FileStream with your response stream.)

using (FileStream fs = new FileStream(@"C:\temp\data.json", FileMode.Open, FileAccess.Read))
using (StreamReader sr = new StreamReader(fs))
using (JsonTextReader reader = new JsonTextReader(sr))
{
    while (reader.Read())
    {
        if (reader.TokenType == JsonToken.StartObject)
        {
            // Load each object from the stream and do something with it
            JObject obj = JObject.Load(reader);
            Console.WriteLine(obj["id"] + " - " + obj["name"]);
        }
    }
}

Output of the above would look like this:

1 - foo
2 - bar
3 - baz
Resurrectionism answered 4/12, 2013 at 21:24 Comment(4)
This is essentially the same solution I came up with. Except I do a new JsonSerializer().Deserialize<T>(reader); instead of JObject.Load. I'm not exactly sure how JsonTextReader manages the stream data though.Glimpse
You can always peruse the source code to find out.Resurrectionism
this got me going in the right direction, but i had to do JsonSerializer.Create().Deserialize<T>(reader, desiredType); since JObject.Load never worked (nor threw an error -- it was most bizarre).Mra
Remember, using JObject.Load is not performant compared to serializer.Deserialize. So always use DeserializeRebbecarebbecca
A
4

I have simplified one of the samples/tests of my parser/deserializer to answer this question's use case more straightforwardly.

Here's for the test data:

https://github.com/ysharplanguage/FastJsonParser/tree/master/JsonTest/TestData

(cf. fathers.json.txt)

And here's for the sample code:

    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using System.Text;

    // Our stuff
    using System.Text.Json;

//...

    public class FathersData
    {
        public Father[] fathers { get; set; }
    }

    public class Someone
    {
        public string name { get; set; }
    }

    public class Father : Someone
    {
        public int id { get; set; }
        public bool married { get; set; }
        // Lists...
        public List<Son> sons { get; set; }
        // ... or arrays for collections, that's fine:
        public Daughter[] daughters { get; set; }
    }

    public class Child : Someone
    {
        public int age { get; set; }
    }

    public class Son : Child
    {
    }

    public class Daughter : Child
    {
        public string maidenName { get; set; }
    }

//...

    static void FilteredFatherStreamTestSimplified()
    {
        // Get our parser:
        var parser = new JsonParser();

        // (Note this will be invoked thanks to the "filters" dictionary below)
        Func<object, object> filteredFatherStreamCallback = obj =>
        {
            Father father = (obj as Father);
            // Output only the individual fathers that the filters decided to keep (i.e., when obj.Type equals typeof(Father)),
            // but don't output (even once) the resulting array (i.e., when obj.Type equals typeof(Father[])):
            if (father != null)
            {
                Console.WriteLine("\t\tId : {0}\t\tName : {1}", father.id, father.name);
            }
            // Do not project the filtered data in any specific way otherwise,
            // just return it deserialized as-is:
            return obj;
        };

        // Prepare our filter, and thus:
        // 1) we want only the last five (5) fathers (array index in the resulting "Father[]" >= 29,995),
        // (assuming we somehow have prior knowledge that the total count is 30,000)
        // and for each of them,
        // 2) we're interested in deserializing them with only their "id" and "name" properties
        var filters = 
            new Dictionary<Type, Func<Type, object, object, int, Func<object, object>>>
            {
                // We don't care about anything but these 2 properties:
                {
                    typeof(Father), // Note the type
                    (type, obj, key, index) =>
                        ((key as string) == "id" || (key as string) == "name") ?
                        filteredFatherStreamCallback :
                        JsonParser.Skip
                },
                // We want to pick only the last 5 fathers from the source:
                {
                    typeof(Father[]), // Note the type
                    (type, obj, key, index) =>
                        (index >= 29995) ?
                        filteredFatherStreamCallback :
                        JsonParser.Skip
                }
            };

        // Read, parse, and deserialize fathers.json.txt in a streamed fashion,
        // and using the above filters, along with the callback we've set up:
        using (var reader = new System.IO.StreamReader(FATHERS_TEST_FILE_PATH))
        {
            FathersData data = parser.Parse<FathersData>(reader, filters);

            System.Diagnostics.Debug.Assert
            (
                (data != null) &&
                (data.fathers != null) &&
                (data.fathers.Length == 5)
            );
            foreach (var i in Enumerable.Range(29995, 5))
                System.Diagnostics.Debug.Assert
                (
                    (data.fathers[i - 29995].id == i) &&
                    !String.IsNullOrEmpty(data.fathers[i - 29995].name)
                );
        }
        Console.ReadKey();
    }

The rest of the bits is available here:

https://github.com/ysharplanguage/FastJsonParser

'HTH,

Arawak answered 19/5, 2014 at 0:29 Comment(0)
H
1

I know that the question is old, but it appears in google search, and I needed the same thing recently. The another way to deal with stream serilization is to use JsonSerializer.DeserializeAsyncEnumerable

Usage looks like:

await using (var readStream = File.Open(filePath, FileMode.Open, FileAccess.Read, FileShare.Read))
{
    await foreach (T item in JsonSerializer.DeserializeAsyncEnumerable<T>(readStream))
    {                        
        // do something withe the item
    }
}
Heyday answered 24/1, 2023 at 9:24 Comment(0)
H
0

This is my solution (combined from different sources, but mainly based on Brian Rogers solution) to convert huge JSON file (which is an array of objects) to XML file for any generic object.

JSON looks like this:

   {
      "Order": [
          { order object 1},
          { order object 2},
          {...}
          { order object 10000},
      ]
   }

Output XML:

<Order>...</Order>
<Order>...</Order>
<Order>...</Order>

C# code:

XmlWriterSettings xws = new XmlWriterSettings { OmitXmlDeclaration = true };
using (StreamWriter sw = new StreamWriter(xmlFile))
using (FileStream fs = new FileStream(jsonFile, FileMode.Open, FileAccess.Read))
using (StreamReader sr = new StreamReader(fs))
using (JsonTextReader reader = new JsonTextReader(sr))
{
    //sw.Write("<root>");
    while (reader.Read())
    {
        if (reader.TokenType == JsonToken.StartArray)
        {
            while (reader.Read())
            {
                if (reader.TokenType == JsonToken.StartObject)
                {
                    JObject obj = JObject.Load(reader);
                    XmlDocument doc = JsonConvert.DeserializeXmlNode(obj.ToString(), "Order");
                    sw.Write(doc.InnerXml); // a line of XML code <Order>...</Order>
                    sw.Write("\n");
                    //this approach produces not strictly valid XML document
                    //add root element at the beginning and at the end to make it valid XML                                
                }
            }
        }
    }
    //sw.Write("</root>");
}
Happenstance answered 18/5, 2016 at 20:28 Comment(0)
T
0

With Cinchoo ETL - an open source library, you can parse large JSON efficiently with low memory footprint. Since the objects are constructed and returned in a stream based pull model

using (var p = new ChoJSONReader(** YOUR JSON FILE **))
{
            foreach (var rec in p)
            {
                Console.WriteLine($"Name: {rec.name}, Id: {rec.id}");
            }
}

For more information, please visit codeproject article.

Hope it helps.

Tungting answered 15/2, 2019 at 23:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.