Convert Lua data to JSON
Asked Answered
J

3

9

This EPGP World of Warcraft addon outputs an epgp.lua database file.

I wrote a plugin to convert the Lua data into a JSON object for display on a guild website. It was working in older versions of the addon, but now I'm having trouble trying to get it to convert the file properly. Here are two snippets that show the conversion problem - see this demo.

The first works great at forming a nested array:

["roster_info"] = {
    {
        "Agantica", -- [1]
        "ROGUE", -- [2]
        "09/03-2013", -- [3]
    }, -- [1]
    {
        "Intikamim", -- [1]
        "PALADIN", -- [2]
        "17/02-2013", -- [3]
    }, -- [2]
},

becomes

"roster_info" : [
    [
        "Agantica",
        "ROGUE",
        "09/03-2013"
    ],
    [
        "Intikamim",
        "PALADIN",
        "17/02-2013"
    ]
]

But the string replacment sees this next snippet as a nested array when it should be an object inside of an array:

["bonus_loot_log"] = {
    {
        ["player"] = "Magebox",
        ["timestamp"] = "2013-03-07 13:44:00",
        ["coinsLeft"] = "-1",
        ["reward"] = "|cffa335ee|Hitem:86815:0:0:0:0:0:0:632235520:90:0:445|h[Attenuating Bracers]|h|r",
    }, -- [1]
            {
        ["player"] = "Lîutasila",
        ["coinsLeft"] = "-1",
        ["timestamp"] = "2013-03-07 13:47:00",
    }, -- [2]
},

becomes

"bonus_loot_log" : [
    [
        "player" : "Magebox",
        "timestamp" : "2013-03-07 13:44:00",
        "coinsLeft" : "-1",
        "reward" : "|cffa335ee|Hitem:86815:0:0:0:0:0:0:632235520:90:0:445|h[Attenuating Bracers]|h|r"
    ],
    [
        "player": "Lîutasila",
        "coinsLeft": "-1",
        "timestamp": "2013-03-07 13:47:00"
    ]
]

Here is the string conversion script that only works on the first snippet.

lua_string
    .replace(/\[(.*)\]\s\=\s/g,'$1:')     // change equal to colon & remove outer brackets
    .replace(/[\t\r\n]/g,'')              // remove tabs & returns
    .replace(/\}\,\s--\s\[\d+\]\}/g,']]') // replace sets ending with a comment with square brackets
    .replace(/\,\s--\s\[\d+\]/g,',')      // remove close subgroup and comment
    .replace(/,(\}|\])/g,'$1')            // remove trailing comma
    .replace(/\}\,\{/g,'],[')             // replace curly bracket set with square brackets
    .replace(/\{\{/g,'[[')                // change double curlies to square brackets
    .replace(/EPGP_DB\s\=/,'');

So, I need some help getting the Lua to convert properly with an array of objects (second example).

Jacquard answered 27/4, 2013 at 18:56 Comment(5)
How is the epgp.lua generated? If it is a lua code generating this output, you can edit that code and use the LuaJSON library/module.Marketa
It's generated by the addon when you log out of World of Warcraft. All you do is upload the raw data file to your site.Jacquard
This because of your replace sets ending with a comment with square brackets and change double curlies to square brackets lines. Double curlies are not necessary mean array inside array. Object inside array is also double curlies in Lua.Gopherwood
@EgorSkriptunoff could you please update the demo with what you are describing. The issue I have is differentiating an object from an array inside of an array, or maybe there is a better method I haven't thought of?Jacquard
You can use npmjs.com/package/luaparseNunley
G
1
// convert EPGP_DB from LUA to JSON
var str = document.getElementsByTagName('data')[0].innerHTML;
var diff;
do {  // replace curlies around arrays with square brackets
    diff = str.length;
    str = str.replace(/\{(((\n\t*)\t)\S.*(\2.*)*)\,\s--\s\[\d+\]\3\}/g,'[$1$3]');
    diff = diff - str.length;
} while (diff > 0);
str = str
.replace(/EPGP_DB\s=\s/, '')         // remove variable definition
.replace(/\s--\s\[\d+\](\n)/g, '$1') // remove comment
.replace(/\,(\n\t*\})/g, '$1')       // remove trailing comma
.replace(/\[(.*?)\]\s\=\s/g,'$1:')   // change equal to colon, remove brackets
.replace(/[\t\r\n]/g,'');            // remove tabs & returns
console.log(str);
json = window.JSON.parse(str);
console.log(json);
document.getElementById('result').innerText = json.global.last_version;
Gopherwood answered 28/4, 2013 at 13:22 Comment(7)
+1 Nice answer, but sadly it works in webkit but not Firefox: jsfiddle.net/Mottie/MfncJ/4 (using full epgp.lua file) - could it be that Firefox doesn't support matching capture groups?Jacquard
@Jacquard - This string is too long for regex operations.Gopherwood
@Jacquard - Or too long for JSON to parse.Gopherwood
@Jacquard - IMO, my code is correct. The problem is on JavaScript side. I don't know how to solve it. Is it possible to rewrite Lua->JSON convertor in Lua instead of JavaScript ?Gopherwood
I don't think this is the right solution for this problem. If you're looking to convert Lua Data into JSON you should use one of the available Lua modules that accomplish this and not rely on regex to do this.Disclimax
Also, it would be useful to have some amount of English text to accompany the code snippet -- introducing it, at a minimum. Not quite feeling giving this a downvote, but not giving an upvote either.Festatus
@Festatus - The answer is outdated. Currently I prefer to use a full-fledged converter (example) instead of fast-and-dirty regex solutions.Gopherwood
C
12

You generally cannot convert any Lua table to JSON data simply by using string operations. The problem is that while Lua uses tables for both arrays and dictionaries, JSON needs two different types. There are other syntactical differences.

This is best solved by a module which converts between Lua and JSON representation. Take a look at the Lua wiki on JSON modules and find a Lua module to convert Lua to JSON. There are multiple modules, some which are pure Lua, being a good choice to embed into WoW. They correctly detect whether a table represents an array or dictionary and output the relevant JSON.

Canst answered 28/4, 2013 at 8:52 Comment(1)
+1 for the nudge in the right direction. If the data consumer needs JSON, but you have a Lua table, the right answer is to produce JSON in the first place from the Lua code rather than attempting to do text replacements which can only be successful if a full Lua parser were used. Which really amounts to getting Lua to write JSON output in the first place, and is a solved problem.Mchale
D
4

It can be converted steadily by syntactic parsing. However, this is a very tedious process.

        $(function () {
            $("#run").on("click", function () {
                let src = $("#src").val();
                let [i, contents] = convert(src, 0, []);

                function isValue(element){
                    let idx = contents.indexOf(element) + 1
                    for(let i=idx; i < contents.length; i++){
                        if(["SPACE","TAB","RETURN"].indexOf(contents[i].type) > -1) continue;
                        if(contents[i].type == "SPLIT") return 0
                        if(contents[i].type == "BRKT_F") return 2
                        if(["BRKT_S","BRKT_W","BREAK","FBREAK"].indexOf(contents[i].type) > -1) return 1
                    }
                }
              

                let converted = "";
                contents.forEach((element, index) => {
                    switch(element.type){
                        case "NUMBER":{
                            converted += element.content
                            break;
                        }
                        case "UNKNOWN": {
                            if(isValue(element)==1){
                              if(element.content == "return"){
                              } else if(["true","false"].indexOf(element.content)>-1){
                                converted += element.content
                              } else {
                                converted += '"' + element.content + '"'
                              }
                            } else if(isValue(element)==2){
                                converted += element.content
                            } else {
                                converted += '"' + element.content + '"'
                            }
                            break;
                        }
                        case "STR_S":
                        case "STR_D":{
                            converted += element.content
                            break;
                        }
                        case "BRKT_S":{
                            converted += element.content
                            break;
                        }
                        case "BRKT_W":{
                            converted += element.content
                            break;
                        }
                        case "BRKT_F":{
                            converted += element.content
                            break;
                        }
                        case "SPACE":{
                            converted += element.content
                            break;
                        }
                        case "TAB":{
                            converted += element.content
                            break;
                        }
                        case "RETURN":{
                            converted += element.content
                            break;
                        }
                        case "BREAK":{
                            converted += ","
                            break;
                        }
                        case "FBREAK":{
                            converted += "."
                            break;
                        }
                        case "SPLIT":{
                            converted += ":"
                            break;
                        }
                    }
                });
                $("#result").val(converted)
            })
        })

      function getBracketSurfaceInner(contents, element){
        if(["BRKT_S", "BRKT_W", "BRKT_F"].indexOf(element.type) == -1 || "]})".indexOf(element.content) == -1) return "";
        let idx = contents.indexOf(element)
        let innerElements = [];
        let nest = 1;
        for(let i=idx-1; i>=1; i--){
          if(["BRKT_S", "BRKT_W", "BRKT_F" ].indexOf(contents[i].type)>=0){
            if("]})".indexOf(contents[i].content)>=0){ nest ++ }
            if("[{(".indexOf(contents[i].content)>=0){ nest -- }
          }
          if(nest==0 && contents[i].type == element.type){
            return innerElements;
          }
          if(nest == 1) {
            innerElements.unshift(contents[i]);
          }
        }
        return innerElements;
      }


      function removeLastCamma(contents, element){
        let idx = contents.indexOf(element)
        let last = -1;
        for(let i=idx-1; i>=1; i--){
          if(["NUMBER", "UNKNOWN", "STR_S", "STR_D"].indexOf(contents[i].type)>=0) return;
          if(contents[i].type == "BREAK"){
              last = i;
              break;
          }
        }
        contents.splice(last, 1);
      }

        function convert(text, pos, contents) {

            let MODE = undefined;
            // NUMBER
            // UNKNOWN
            // SPLIT
            // BREAK
            // FBREAK
            // STR_S
            // STR_D
            // BRKT_S
            // BRKT_W
            // BRKT_F
            // CTRL
            // RETURN
            let MODES = [MODE];

            let content = "", currentElement;

            let i, c

            function PUSH_BEFORE(replace) {
                if (content.length > 1) {
                    contents.push({
                        type: MODE,
                        content: content.slice(0, content.length - 1),
                    });
                }
                content = "" + (replace ? replace : c)
                currentElement = contents[contents.length-1];
                MODE = MODES.shift()
            }

            function PUSH_AFTER(replace) {
                if (content.length > 0) {
                    let str = (replace ? content.slice(0, content.length - 1) + replace : content.slice(0, content.length));
                    contents.push({
                        type: MODE,
                        content: str,
                    });
                }
                content = ""
                currentElement = contents[contents.length-1];
                MODE = MODES.shift()
            }


            for (i = pos; i < text.length; i++) {
                c = text.charAt(i)
                content = content + c

                if (MODE == "ESCAPE") {
                    MODE = MODES.shift()
                } else
                if (MODE == "STR_S") {
                    if (c == "'") {
                        PUSH_AFTER('"')
                    }
                } else
                if (MODE == "STR_D") {
                    if (c == '"') {
                        PUSH_AFTER()
                    }
                } else
                if (MODE == "BRKT_S") {
                    if (c == ']') {
                        PUSH_BEFORE()
                    }
                } else
                if (MODE == "BRKT_F") {
                    if (c == ')') {
                        PUSH_BEFORE()
                    }
                } else
                if (MODE == "BRKT_W") {
                    if (c == '}') {
                        PUSH_BEFORE()
                    }
                } else {

                    switch (c) {
                        case "{":{
                            PUSH_BEFORE()
                            MODE = "BRKT_W"
                            let begin_idx = contents.length; 
                            contents.push({
                                type: MODE,
                                content: c,
                            });
                            MODES.push(MODE)
                            let [f, innerContents] = convert.call(this, text, i + 1, contents)
                            removeLastCamma(contents, contents[contents.length-1]);
                            
                            let surface = getBracketSurfaceInner(innerContents, innerContents[innerContents.length-1]);
                            let d = 0;
                            for(let l=0; l<surface.length; l++){
                                if(surface[l].type == "SPLIT") { d = 1; break; }
                            }
                            i = f
                            content = ""
                            if(d==0){
                                contents[begin_idx].type = "BRKT_S";
                                contents[begin_idx].content = "[";
                                contents[contents.length-1].type = "BRKT_S";
                                contents[contents.length-1].content = "]";
                                MODE = MODES.shift() | "BRKT_S"
                            } else {
                                MODE = MODES.shift() | "BRKT_W"
                            }
                            break;
                        }
                        case "}":{
                            PUSH_BEFORE()
                            contents.push({
                                type: "BRKT_W",
                                content: c,
                            });
                            return [i, contents]
                            break;
                        }
                        case "[": {
                            PUSH_BEFORE()
                            MODE = "BRKT_S"
                            let begin_idx = contents.length;
                            contents.push({
                                type: MODE,
                                content: c,
                            });
                            MODES.push(MODE)
                            let [f, innerContents] = convert.call(this, text, i + 1, contents)
                            removeLastCamma(contents, contents[contents.length-1]);
                          
                            innerContents = getBracketSurfaceInner(contents, contents[contents.length-1]);
                            let d = 0;
                            for(let l=0; l<innerContents.length; l++){
                              if(["BREAK", "BRKT_F"].indexOf(innerContents[l].type)>-1) {d = 1; break; }
                            }
                            if(d==0){
                                contents[begin_idx].type = "NOP";
                                contents[begin_idx].content = "";
                                contents[contents.length-1].type = "NOP";
                                contents[contents.length-1].content = "";
                            }
                          
                            i = f
                            content = ""
                            MODE = MODES.shift() | "BRKT_S"
                            break;
                        }
                        case "]": {
                            PUSH_BEFORE()
                            contents.push({
                                type: "BRKT_S",
                                content: c,
                            });
                            return [i, contents]
                            break;
                        }
                        case "(": {
                            PUSH_BEFORE()
                            MODE = "BRKT_F"
                            let begin_idx = contents.length;
                            contents.push({
                                type: MODE,
                                content: c,
                            });
                            MODES.push(MODE)
                            let [f, innerContents] = convert.call(this, text, i + 1, contents)
                            removeLastCamma(contents, contents[contents.length-1]);
                          
                            innerContents = getBracketSurfaceInner(contents, contents[contents.length-1]);

                            contents[begin_idx].type = "BRKT_F";
                            contents[begin_idx].content = "(";
                            contents[contents.length-1].type = "BRKT_F";
                            contents[contents.length-1].content = ")";
                          
                            i = f
                            content = ""
                            MODE = MODES.shift() | "BRKT_F"
                            break;
                        }
                        case ")": {
                            PUSH_BEFORE()
                            contents.push({
                                type: "BRKT_F",
                                content: c,
                            });
                            return [i, contents]
                            break;
                        }
                        case "'": {
                            if(MODE=="STR_D") {
                              break;
                            }
                            PUSH_BEFORE('"')
                            MODE = "STR_S"
                            break;
                        }
                        case '"': {
                            if(MODE=="STR_S") {
                              break;
                            }
                            PUSH_BEFORE()
                            MODE = "STR_D"
                            break;
                        }
                        case "\\": {
                            MODES.push(MODE)
                            MODE = "ESCAPE"
                            break;
                        }
                        case ",": {
                            PUSH_BEFORE()
                            MODE = "BREAK"
                            break;
                        }
                        case ".": {
                            PUSH_BEFORE()
                            MODE = "FBREAK"
                            break;
                        }
                        case "=": {
                            PUSH_BEFORE(":")
                            MODE = "SPLIT"
                            break;
                        }
                        case ":": {
                            PUSH_BEFORE()
                            MODE = "SPLIT"
                            break;
                        }
                        case " ": {
                            if (MODE != "SPACE") {
                                PUSH_BEFORE()
                                MODE = "SPACE"
                            }
                            break;
                        }
                        case "\t": {
                            if (MODE != "TAB") {
                                PUSH_BEFORE()
                                MODE = "TAB"
                            }
                            break;
                        }
                        case "\n": {
                            PUSH_BEFORE()
                            MODE = "RETURN"
                            break;
                        }
                        default: {
                            if (" SPACE TAB RETURN BREAK FBREAK SPLIT ".indexOf(" " + MODE + " ") > -1) {
                                PUSH_BEFORE()
                            }

                            if (!isNaN(content)) {
                                MODE = "NUMBER"
                            }
                            else {
                                MODE = "UNKNOWN"
                            }

                            break;
                        }
                    }
                }
            }
            return [i, contents]
        }
#src {
  width: 400px;
  height: 200px;
}
#result {
  width: 400px;
  height: 200px;
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
Source:<br><textarea id="src"></textarea>
<button id="run">Convert</button><br>
Result:<br><textarea id="result"></textarea>
Duer answered 6/5, 2022 at 12:3 Comment(0)
G
1
// convert EPGP_DB from LUA to JSON
var str = document.getElementsByTagName('data')[0].innerHTML;
var diff;
do {  // replace curlies around arrays with square brackets
    diff = str.length;
    str = str.replace(/\{(((\n\t*)\t)\S.*(\2.*)*)\,\s--\s\[\d+\]\3\}/g,'[$1$3]');
    diff = diff - str.length;
} while (diff > 0);
str = str
.replace(/EPGP_DB\s=\s/, '')         // remove variable definition
.replace(/\s--\s\[\d+\](\n)/g, '$1') // remove comment
.replace(/\,(\n\t*\})/g, '$1')       // remove trailing comma
.replace(/\[(.*?)\]\s\=\s/g,'$1:')   // change equal to colon, remove brackets
.replace(/[\t\r\n]/g,'');            // remove tabs & returns
console.log(str);
json = window.JSON.parse(str);
console.log(json);
document.getElementById('result').innerText = json.global.last_version;
Gopherwood answered 28/4, 2013 at 13:22 Comment(7)
+1 Nice answer, but sadly it works in webkit but not Firefox: jsfiddle.net/Mottie/MfncJ/4 (using full epgp.lua file) - could it be that Firefox doesn't support matching capture groups?Jacquard
@Jacquard - This string is too long for regex operations.Gopherwood
@Jacquard - Or too long for JSON to parse.Gopherwood
@Jacquard - IMO, my code is correct. The problem is on JavaScript side. I don't know how to solve it. Is it possible to rewrite Lua->JSON convertor in Lua instead of JavaScript ?Gopherwood
I don't think this is the right solution for this problem. If you're looking to convert Lua Data into JSON you should use one of the available Lua modules that accomplish this and not rely on regex to do this.Disclimax
Also, it would be useful to have some amount of English text to accompany the code snippet -- introducing it, at a minimum. Not quite feeling giving this a downvote, but not giving an upvote either.Festatus
@Festatus - The answer is outdated. Currently I prefer to use a full-fledged converter (example) instead of fast-and-dirty regex solutions.Gopherwood

© 2022 - 2024 — McMap. All rights reserved.