iPad - Parsing an extremely huge json - File (between 50 and 100 mb)
Asked Answered
I

2

16

I'm trying to parse an extremely big json-File on an iPad. The filesize will vary between 50 and 100 mb (there is an initial file and there will be one new full set of data every month, which will be downloaded, parsed and saved into coredata)

I'm building this app for a company as an Enterprise solution - the json file contains sensitive customerdata and it needs to be saved locally on the ipad so it will work even offline. It worked when the file was below 20mb, but now the set of data became bigger and I really need to parse it. I'm receiving memory warnings during parsing and after the third warning it just crashes. I have several different Core Data entities and I'm just setting all the values coming from the json file (when app is launched for the first time) and after everything is done, I'm doing the [context save].

I was hoping somebody could give me some advice on how to handle such huge files. I was thinking about splitting the json file up to several smaller json files and maybe parsing them in multiple threads, but I don't know if that's the right approach. I guess one big problem is that the whole file is being held in the memory - maybe there's some way of "streaming" it into the memory or something like that?

I'm using JSONKit (https://github.com/johnezang/JSONKit) for parsing the file, since I have read that it's the fastest one (maybe there's a slower one which goes easier on memory?).

Thanks in advance.

Ichthyic answered 10/4, 2013 at 17:26 Comment(7)
It would probably be best if the data were transferred in parts, rather than one big JSON string. Your basic size limitation is the space required for all the JSON objects.Blather
How about writing all the data into an sqlite file or write the core data persistence using a Mac tool and copy that into the app before signing it instead of offloading this to the device?Danforth
If you have control over the API to the server, I would recommend an API that would take in an Offset parameter and a Count parameter. The Offset specifies the offset into the results and Count indicates how many records to fetch. So subsequent calls to the API would increment the Offset by the count value.Powdery
@Kerni, I have been thinking about something like that, but I will need to parse a new file every month after the app is done, therefore I need to do it all on the iPad.Ichthyic
@rajagp, unfortunately I don't have any control on the server stuff. All I initially got was a extremely huge xml-file, which was exported from some oracle db in a flat xml-structure. The file had 2,5 gb but I managed to shrink it with XSTL and then converted it to JSON, which resulted in about 90mb... I might try Hot Licks solution though.Ichthyic
This is why you need to keep up with getting new model of iPad Pro. The regular iPad for regular users are not suitable for this sort of use cases.Briggs
@zsong that's not true. And as you can see, there indeed was a working solution for this problem :-).Ichthyic
M
21

1) Write your data to a file, then use NSData's dataWithContentsOfFile:options:error: and specify the NSDataReadingMappedAlways and NSDataReadingUncached flags. This will tell the system to use mmap() to reduce the memory footprint, and not to burden the file system cache with blocks of memory (that makes it slower, but much less of a burden to iOS).

2) You can use the YAJL SAX style JSON parser to get objects as they decode.

Note: I have not done 2) but have used the techniques embodied in 1).

3) I ended up needed such a thing myself, and wrote SAX-JSON-Parser-ForStreamingData that can be tied to any asynchronous downloader (including my own).

Macaluso answered 10/4, 2013 at 17:51 Comment(11)
David beat me to it. Totally agree with him. I've used both approaches and had some success with both on files larger than 100MB and records numbering in the hundreds of thousands--though ultimately I went with a paging approach and had our server people break it up for me. That solved all of my issues. It's ideal if you can just use something like NSJSONSerializtion or JSONKit since they're really fast, however, you have to work with what you get.Piggin
So I will have to switch from JSONKit to YAJL, is this correct?Ichthyic
JSONKit won't let you get pieces back, so you need to either have enough memory to hold all the result, or its going to fail. I have no experience with YAJL but it seems @MattLong does, and it would seem a viable option, if not the only option you have. That said you might find a C or C++ parser, but I imagine that would be even more work on your part.Macaluso
If you want to use a stream parser, then yes, get YAJL. However, I would try David's approach #1 first. It will be less hassle if you can get it working with your data set. YAJL I suggest as a last resort.Piggin
Ok, thank you a lot for the very useful answers. I will go ahead and try it out now. I'll give feedback as soon as I'm done :-)Ichthyic
I have now used your way of solving this problem and it worked. It takes about 10 minutes on the ipad to parse and save everything into Core Data, which is quite a lot of time, but since this task is only done once each month, it's not too horrible. I will see how the company reacts and then maybe come back to CouchDeveloper's solution if needed. Anyway, thank you guys for the great support :-).Ichthyic
If this was my app, then I'd try to find something the user could do for 10 min - pop a webview and let them just surf the web while you do the processing in the background. When done, pop an action sheet letting them know the processing is done and they can quit surfing. Don't make them site there for 10 min watching a spinner. Or, when you get the new file, do the processing in the background and only announce its finished when all processing is done. You can use the techniques in github.com/dhoerl/ScrollWatcher to increase/decrease the background activity priority (UIWebView too).Macaluso
@DavidH , can I use YAJL with Swift, please?Lepton
Apologies @DavidH, really need a stream parser for a large JSON file and my app is in Swift.Lepton
@Lepton You should be able to use either YAJL or my project mentioned above for your task. You can add the Objective C to your project and call it from Swift. With my project, you would first give the class a delegate, then start sending it small chunks of data read from the file (you can do this using POSIX read()). As the JSON is decoded it the delegate will return fully decoded pieces. This of course assumes that your JSON is of an array of objects that can be decoded separately.Macaluso
@DavidH, thanks. I am currently trying out YAJL. Thanks for replying meLepton
D
3

Given the current memory constraints on a mobile device, it's likely impossible to parse 100 MB JSON text and then create a representation of a Foundation object which itself will take roughly 10 times the amount of RAM than the size of source JSON text.

That is, your JSON result would take about 1 GByte RAM in order to allocate the space required for the foundation objects.

So, there is likely no way to create one gigantic JSON representation - no matter how you get and read and parse the input. You need to split it into many smaller ones. This may require a modification on the server side, though.

Another solution is this, but much more elaborated:

Use a SAX style parser, which takes the huge JSON as input via a streaming API and outputs several smaller JSON texts (the inner parts). The SAX style parser may use a Blocks API (dispatch lib) to pass its results - the smaller JSONs asynchronously to another JSON parser. That is, the smaller JSONs are fed a usual JSON parser which produces the JSON representations, which in turn are fed your CoreData Model Generator.

You can even make it possible to download the huge JSON and parse it simultaneously with the SAX style parser, while simultaneously creating smaller JSONs and simultaneously storing them into Core Data.

What you need is a JSON parser with a SAX style API that can parse chunks of input text, performs fast, and can create a representation of Foundation objects.

I know only one JSON library which has this feature set, and there are even examples given which can partly show how you can accomplish exactly this: JPJson on GitHub. The parser is also very fast - on ARM it's faster than JSONKit. Caveat: its implementation is in C++ and requires a few steps to install it on a developer machine. It has a well documented Objective-C API, though.

Would like to add that I'm the author ;) An update is soon available, which utilizes latest C++11 compiler and C++11 library features resulting in even faster code (25% faster on ARM than JSONKit and twice as fast as NSJSONSerialization).

To give you same of the facts of the speed: The parser is able to download (over WiFi) and parse 25 MByte data containing 1000 JSONs (25 kByte each) in 7 seconds on Wifi 802.11g, and 4 seconds on Wifi 802.11n, including creating and releasing the 1000 representations on an iPad 2.

Dodge answered 11/4, 2013 at 17:13 Comment(10)
Thanks for your answer. I will definetely take a look at this tomorrow in the office. So the idea would be to use JPJson parser to split the huge json-file into smaller json-strings (or files) and then use some different json parser (Dom-style?) to parse those split parts?Ichthyic
Alternatively, separate the huge JSON in many smaller ones on the server, and send this many JSON in one huge byte stream in one connection. The JPJson parser can parse a data stream consisting of multiple JSON docs. Whenever it finishes one JSON representation it calls a block (or callback) passing the result as a parameter. That way, you can use existing classes, no need to code something yourself. (see JPAsyncJsonParser, linkDodge
Ok, thanks. The problem is, I don't have really much control of how the JSON file will be passed every month. There will be one initial JSON which has to be parsed on the first app launch and that's what I'm trying to do right now. I'm gonna try different things out and give feedback as soon as I have any useful results.Ichthyic
OK, then you have to go with a SAX style parser, which requires you to code the handlers yourself. If you encounter an issue or a missing feature in JPJson please post it in the issue list on GitHub. :)Dodge
Thanks. So, just for clarification - JPJson doesn't parse any JSON, it just splits it into a couple of smaller JSON-files, is this correct?Ichthyic
The parser can parse any JSON. It can also produce a JSON representation - which is a hierarchy of Foundation objects. The basic API is similar to most other JSON parsers. It uses a built-in "semantic actions" class when creating a Foundation hierarchy. However, you can define your own semantic actions objects which would - for example just validate the JSON, or produce a different kind of representation, and much more. A "Semantic Actions" class and the parser interface through the SAX like API. You can set your own semantic actions object as a parameter when invoking the parser.Dodge
I have now used David H's solution along with YAJL-Parser. Your solution seems to be great, but for now it's too much time-consuming for me to do everything again with your parser. I will probably come back to your solution soon, since it looks very promising. Thank you for your help, Sir.Ichthyic
Ok, great! Just want to add that I now did update all projects for most recent Xcode. And probably of interest for you - added an example that shows how you can split up a huge JSON Array containing many JSON Objects (using a custom parser) into a series of JSON Objects, and then parse these JSON Objects separately using a normal parser which also creates the representation of each.Dodge
Cool, nice to know. Which one is the example you're talking about? Is it one of the examples in this folder: JPJson / Examples / FoundationExamples / ?Ichthyic
You can find it in sample10 in folder FoundationExamples.Dodge

© 2022 - 2024 — McMap. All rights reserved.