Parsing a JSON string that is bigger than the memory
Asked Answered
C

2

6

The platform I'm working on has pretty tight memory constraints and I'm trying to find a way to parse big JSON strings without ever loading more than a few hundred bytes at max into the memory. The JSON string is stored in a file on a much bigger chip (flash memory).

There are two things that I can't really find a good solution for:

  1. Access a certain value by specifying a "path" like foo["bar"][2].
    (And if the value turns out to be an array/object then we should only return the fact that it is an array/object and maybe also if it's empty or not.)
  2. Iterate over any object/array within the JSON.

So basically I need functions that, when called, parse the json step by step and only save the parts that we actually need to continue parsing.

For the interface I don't think it would be possible to have something like exampleJson["aa"].2.["gg], but I managed to get really close to that: exampleJson["aa"].2.["gg"](). This would cause a function to be called that can then easily access {'aa',2,'gg'} and read/parse the json from a file.

This is my code so far, but I really don't know how to continue:
https://repl.it/HfwS/2

-- Looks complicated, but is pretty simple. Using meta tables we create a json interface that can almost be accessed as if it was a lua table.
-- E.g. example["aa"][2]["gg"]() ; the only difference is that we have to use parentheses at the end
-- The problematic part starts where it says `THIS IS WHERE THE JSON PARSING WOULD HAPPEN`
json = {}
setmetatable(json, {
    __call = function(path)
        local jsonFile = _file.open(filePath)
        local fileLen = jsonFile:stat().size

        local patternTable = {} -- Will store `{'aa',2,'gg'}` for `example.['aa'].[2]['gg']()`

        local fakeJson = {}
        setmetatable(fakeJson, { 
            __index = function (t, k)
                patternTable[#patternTable+1] = k
                return fakeJson
            end;
            __call = function()

                -- THIS IS WHERE THE JSON PARSING WOULD HAPPEN --

                -- The patternTable contains {'aa',2,'gg'} at this point 

                -- Loop through the json file char by char
                local valueToReturn = ''
                local filePos = 0
                for i=1, fileLen do
                    jsonFile:seek("set", filePos)
                    local currentChar = jsonFile:read(1) -- read character at current position
                    filePos = filePos + 1
                    -- print(currentChar)

                    -- Now the question is, how do we parse the json?
                    print('Magic to parse the json')
                    -- valueToReturn = ?
                end

                patternTable = {} -- Reset the patternTable
                return valueToReturn
            end;
        })
      return fakeJson
    end;
})


local fakeParsedJson = json('example.json')
local value = fakeParsedJson["aa"][2]["gg"]() -- Notice the `()` in the end

print(value)
Caryophyllaceous answered 3/5, 2017 at 15:31 Comment(0)
C
1

I spent some more time thinking about how this could be accomplished and finally managed to pull it off. Retrieving values and iterating over arrays/objects works like a charm. If you know of a better way to do it, please tell me. (I'm not too happy with the code; it seems like it could be a lot cleaner.) But hey it works.

If you want to try it here's a fiddle: https://repl.it/HfwS/31

json = {}
setmetatable(json, {
    __call = function(filePath)
        local jsonFile = _file.open(filePath)
        local fileLen = jsonFile:stat().size

        local jsonPath = {} -- Would store `{'aa',2,'gg'}` for `example['aa'][2]['gg']()`

        local fakeJson = {}
        setmetatable(fakeJson, { 
            __index = function (t, k)
                jsonPath[#jsonPath+1] = k
                return fakeJson
            end;
            __call = function()

                -- THIS IS WHERE THE JSON PARSING WOULD HAPPEN --

                -- The jsonPath contains {'aa',2,'gg'} at this point 

                local brcStack = {} -- will be used to push/pop braces/brackets
                local jsonPathDim = 1 -- table dimension (['a'] ==  1; ['a']['b'] == 2; ...)
                -- Loop through the json file char by char
                local valueToReturn
                local filePos = 0
                local nextChar = function()
                    jsonFile:seek("set", filePos)
                    filePos = filePos + 1
                    local char = jsonFile:read(1)
                    --print(char)
                    return char
                end
                local jsonValid = true
                for o=1, fileLen do -- infinite
                    if jsonPathDim > #jsonPath then -- jsonPath followed. Now we can extract the value.
                        while true do
                            local currentChar = nextChar()
                            if currentChar == '"' then -- string
                                valueToReturn = ''
                                for i=1, fileLen do
                                    currentChar = nextChar()
                                    if currentChar == '"' then
                                        break
                                    elseif currentChar == nil then
                                        jsonValid = false
                                        break
                                    else
                                        valueToReturn = valueToReturn .. currentChar
                                    end
                                end
                                break
                            elseif string.find(currentChar,'[%d.]') then -- numbers 0.3, .3, 99 etc
                                local rawValue = ''
                                if currentChar == '.' then
                                    rawValue = '0'
                                end
                                for i=1, fileLen do
                                    if string.find(currentChar, '[%s,\r\n%]%}]') then
                                        break
                                    elseif filePos > fileLen then
                                        jsonValid = false
                                        break
                                    else
                                        rawValue = rawValue .. currentChar
                                    end
                                    currentChar = nextChar()
                                end
                                valueToReturn = tonumber(rawValue)
                                break
                            elseif currentChar == 't' then -- true
                                valueToReturn = true
                                break
                            elseif currentChar == 'f' then -- false
                                valueToReturn = false
                                break
                            elseif currentChar == 'n' then -- null
                                valueToReturn = nil -- ?
                                break
                            elseif currentChar == '{' then -- null
                                valueToReturn = {}
                                brcStack[#brcStack+1] = '{'
                                local origBrcLvl = #brcStack
                                while true do
                                    currentChar = nextChar()
                                    if filePos > fileLen then
                                        jsonValid = false
                                        break
                                    elseif currentChar == '\\' then
                                        nextChar()
                                        -- Continue
                                    elseif origBrcLvl == #brcStack and currentChar == '"' then
                                        local keyToPush = ''
                                        while true do
                                            currentChar = nextChar()
                                            if currentChar == '"' then
                                                while true do
                                                    currentChar = nextChar()
                                                    if currentChar == ':' then
                                                        valueToReturn[keyToPush] = 0
                                                        break
                                                    elseif filePos > fileLen then
                                                        break
                                                    end
                                                end
                                                break
                                            elseif filePos > fileLen then
                                                jsonValid = false
                                                break
                                            else
                                                keyToPush = keyToPush .. currentChar
                                            end
                                        end
                                        break
                                    elseif currentChar == '[' or currentChar == '{' then
                                        brcStack[#brcStack+1] = currentChar
                                    elseif currentChar == ']' then
                                        if brcStack[#brcStack] == ']' then
                                            brcStack[#brcStack] = nil
                                        else
                                            jsonValid = false
                                            break
                                        end
                                    elseif currentChar == '}' then
                                        if brcStack[#brcStack] == '}' then
                                            brcStack[#brcStack] = nil
                                        else
                                            jsonValid = false
                                            break
                                        end
                                    end
                                end
                                break
                            elseif currentChar == '[' then
                                brcStack[#brcStack+1] = '['
                                valueToReturn = {} 
                                local origBrcLvl = #brcStack
                                while true do
                                    currentChar = nextChar()

                                    if origBrcLvl == #brcStack and #valueToReturn == 0 and not string.find(currentChar, '[%s\r\n%]]') then
                                        valueToReturn[#valueToReturn+1] = 0
                                    end
                                    if filePos > fileLen then
                                        jsonValid = false
                                        break
                                    elseif currentChar == '\\' then
                                        nextChar()
                                        -- Continue
                                    elseif origBrcLvl == #brcStack and currentChar == ',' then
                                        valueToReturn[#valueToReturn+1] = 0
                                    elseif currentChar == '[' or currentChar == '{' then
                                        brcStack[#brcStack+1] = currentChar
                                    elseif currentChar == ']' then
                                        if brcStack[#brcStack] == ']' then
                                            brcStack[#brcStack] = nil
                                        else
                                            jsonValid = false
                                            break
                                        end
                                    elseif currentChar == '}' then
                                        if brcStack[#brcStack] == '}' then
                                            brcStack[#brcStack] = nil
                                        else
                                            jsonValid = false
                                            break
                                        end
                                    end
                                end
                                break
                            end
                        end
                        break
                    end
                    local currentKey = jsonPath[jsonPathDim]
                    local currentKeyLen = string.len(currentKey)
                    if type(jsonPath[jsonPathDim]) == 'string' then -- Parsing { object
                        while true do
                            local currentChar = nextChar()
                            if currentChar == '{' then
                                brcStack[#brcStack+1] = '{'
                                local origBrcLvl = #brcStack
                                local keyFound = true
                                for z=1, fileLen do -- loop over keys until we find it
                                    currentChar = nextChar()
                                    if currentChar == '\\' then
                                        nextChar()
                                        -- Continue
                                    elseif origBrcLvl == #brcStack and currentChar == '"' then
                                        local keyMatched = false
                                        for i=1, fileLen do
                                            local expectedChar = string.sub(currentKey,i,i)
                                            if nextChar() == expectedChar then
                                                if i == currentKeyLen and nextChar() == '"' then
                                                    keyMatched = true
                                                    while true do 
                                                        currentChar = nextChar()
                                                        if currentChar == ':' then
                                                            break
                                                        elseif currentChar == nil then
                                                            jsonValid = false
                                                            break
                                                        end
                                                    end
                                                    break
                                                end
                                                -- Continue
                                            else
                                                keyMatched = false
                                                break
                                            end
                                        end
                                        if keyMatched then
                                            keyFound = true
                                            break
                                        end
                                    elseif currentChar == '[' or currentChar == '{' then
                                        brcStack[#brcStack+1] = currentChar
                                    elseif currentChar == ']' then
                                        if brcStack[#brcStack] == ']' then
                                            brcStack[#brcStack] = nil
                                        else
                                            jsonValid = false
                                            break
                                        end
                                    elseif currentChar == '}' then
                                        if brcStack[#brcStack] == '}' then
                                            brcStack[#brcStack] = nil
                                        else
                                            jsonValid = false
                                            break
                                        end
                                    end
                                end
                                if keyFound then
                                    jsonPathDim = jsonPathDim+1
                                end
                                break
                            elseif currentChar == nil then
                                jsonValid = false
                                break
                            end
                        end
                    elseif type(jsonPath[jsonPathDim]) == 'number' then -- Parsing [ array
                        while true do
                            local currentChar = nextChar()
                            if currentChar == '[' then
                                brcStack[#brcStack+1] = '['
                                local origBrcLvl = #brcStack
                                local currentIndex = 1
                                -- currentKey
                                local keyMatched = true
                                for i=1, fileLen do
                                    currentChar = nextChar()
                                    if currentChar == '\\' then
                                        nextChar()
                                        -- Continue
                                    elseif origBrcLvl == #brcStack and currentChar == ',' then
                                        currentIndex = currentIndex +1
                                        if currentIndex == currentKey then
                                            jsonPathDim = jsonPathDim+1
                                            break
                                        end
                                    elseif currentChar == '[' or currentChar == '{' then
                                        brcStack[#brcStack+1] = currentChar
                                    elseif currentChar == ']' then
                                        if brcStack[#brcStack] == ']' then
                                            brcStack[#brcStack] = nil
                                        else
                                            jsonValid = false
                                            break
                                        end
                                    elseif currentChar == '}' then
                                        if brcStack[#brcStack] == '}' then
                                            brcStack[#brcStack] = nil
                                        else
                                            jsonValid = false
                                            break
                                        end
                                    else
                                        -- Continue
                                    end
                                end
                                break
                            elseif currentChar == nil then
                                jsonValid = false
                                break
                            end
                        end
                    else
                        jsonValid = false
                        break -- Invalid json
                    end
                end
                jsonPath = {} -- Reset the jsonPath
                return valueToReturn
            end;
        })
      return fakeJson
    end;
})



local example =  json('example.json')

-- Read a value
local value = example["aa"][2]['k1']()
print(value)

-- Loop over a key value table and print the keys and values
for key, value in pairs(example["aa"][2]()) do
    print('key: ' .. key, 'value: ' .. example["aa"][2][key]())
end

JSON validation could be better, but if you supply invalid json data then you shouldn't expect anything anyways.

Caryophyllaceous answered 5/5, 2017 at 12:12 Comment(0)
F
0

If you want to decode single JSON element (object, array, etc.) instead of decoding the whole JSON, you need JSON library having two features:

  • "traverse" functionality (dry-run-decoding without creating Lua objects)
  • ability to pass JSON as sequence of small parts (instead of preloading whole JSON as huge Lua string).

Example:
How to partially decode JSON using this module:

-- This is content of data.txt file:
-- {"aa":["qq",{"k1":23,"gg":"YAY","Fermat_primes":[3, 5, 17, 257, 65537]}]}
-- We want to extract as Lua values only "Fermat_primes" array and "gg" string
local json = require('json')

-- Open file
local file = assert(io.open('data.txt', 'r'))

-- Define loader function which will read the file in 64-byte chunks
local function my_json_loader()
   return file:read(64)
end

local FP, gg
-- Prepare callback function for traverse with partial decode
local function my_callback (path, json_type, value)
   path = table.concat(path, '/')
   if path == "aa/2/Fermat_primes" then
      FP = value
      return true  -- we want to decode this array instead of traverse through it
   elseif path == "aa/2/gg" then 
      gg = value
   end
end

json.traverse(my_json_loader, my_callback)

-- Close file
file:close()

-- Display the results
print('aa.2.gg = '..gg)
print('aa.2.Fermat_primes:')
for k, v in ipairs(FP) do print(k, v) end

Output:

 aa.2.gg = YAY
 aa.2.Fermat_primes:
 1  3
 2  5
 3  17
 4  257
 5  65537
Flack answered 6/5, 2017 at 3:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.