How to transform nested arrays string in json like string to structured object using parslet
Asked Answered
E

1

1

I have a problem with transforming parsed JSON-like string, which contains nested arrays, to structured object. I am using parslet to do this.

I created parser and transformer, presented below. But I can't handle nested arrays situation.

require 'parslet'
class JSONLikeDataParser < Parslet::Parser
  rule(:l_map_paren) {str('{')}
  rule(:r_map_paren) {str('}')}
  rule(:l_list_paren) {str('[')}
  rule(:r_list_paren) {str(']')}
  rule(:map_entry_delimiter) {str(':')}
  rule(:val_delimiter) {str(',')}
  rule(:quote) {str('"')}

  rule(:simple_val) {match('[^",:\.\{\}\[\]]').repeat(1)}
  rule(:quoted_val) {quote >> (quote.absnt? >> any).repeat(0) >> quote}


  rule(:map) {l_map_paren >> map_entries.maybe.as(:map) >> r_map_paren}
  rule(:map_entries) {map_entry >> (val_delimiter >> map_entry).repeat}
  rule(:map_entry) {map_key >> map_entry_delimiter >> object}
  rule(:map_key) {(match('[A-Za-z_]').repeat(1) >> match('[A-Za-z0-9_]').repeat).as(:key)}

  rule(:list) {l_list_paren >> list_values.maybe.as(:list) >> r_list_paren}
  rule(:list_values) {object >> (val_delimiter >> object).repeat}

  rule(:object) {map | (simple_val | quoted_val).as(:value) | list }

  root(:object)
end

#TODO doesn't handle properly nested array: [[[1,2],[3]]]
class JSONLikeDataTransform < Parslet::Transform
  rule(map: subtree(:s)) do
    ret = {}
    if (s.is_a?(Hash))
      ret[s[:key]] = s[:value]
    else
      s.each do |h|
        ret.merge!(h)
      end
    end
    OpenStruct.new(ret)
  end
  rule(key: simple(:k), value: simple(:v)) {{k.str => v.str}}
  rule(key: simple(:k), list: simple(:v)) {{k.str => [v]}}
  rule(key: simple(:k), list: sequence(:v)) {{k.str => v}}

  rule(map: simple(:s)) {s ? OpenStruct.new(s) : OpenStruct.new}

  rule(list: subtree(:s)) {[s]}
  rule(list: sequence(:s)) {s}
  rule(list: simple(:s)) {s ? [s] : []}

  rule(value: subtree(:s)) {s}
  rule(value: sequence(:s)) {s}
  rule(value: simple(:s)) {s.str}
end

puts JSONLikeDataTransform.new.apply(JSONLikeDataParser.new.parse("[[[1],[2,3]]]")).inspect

Problematic string is "[[[1],[2,3]]]". I expect to receive properly nested structure. But what I get is "[[[[1],[2,3]]]]" one bracket too much.

Egregious answered 25/6, 2019 at 8:9 Comment(8)
There is zero chance somebody will get through this wall of code; could you please provide MCVE to recognize and reproduce the issue on our side?Concave
GobbleUp not defined.Killie
I'll assume the GobbleUp rule can be replaced with " rule(:quoted_val) {quote >> (quote.absnt? >> any).repeat(0) >> quote}"Killie
@AlekseiMatiushkin I believe this is MCVE. If someone knows the concept of parslet, then it is clear to him that what I included is Parser - first stage, and Transformation - second stage rules. Parsing works properly, but I had to attach it, as without it Transformations code would be incomprehensible.Egregious
@NigelThorne - yeap GobbleUp works as you expect, this is strange it is not included in parslet gem, I can see it in github in accelerator section, but indeed it is not present in downloaded gem. I will change it to slower counterpart.Egregious
I can make the example pass, but I don't know what else it should passKillie
Well, basically this is JSON-like structure. What differs is the fact, that values/keys don't have to be quoted. They are quoted only if needed e.g they contain special char (":,.[]{}) or it is also possible to have "3" quoted just to indicate it is string, but AFAIC this is handled properly by parser. Only transformation was not really working in some cases, like the one presented. Some sample of representative input would be {key:[],key1:[[[1],[]]], key2:{key1:1,key2:"1"}, key3:"dog,{:ca]t,",key3:{}}. Expected output is a structured data created with OpenStruct and Array structures.Egregious
@MateuszFryc if GobbleUp is faster..that's interesting. That means code could scan your parser ... look for the slower version and replace it with the faster version and your parser would go faster... This reminds me of regex optimisation... damn another todo for my backlog.Killie
E
1

Thanks everyone for sharing especially, @NigelThorne whose answer in other thread lead me to actual solution of this problem. I Introduced some additional classes like Map/Value/Arr so I am able to recognize whether particular array is created by Transform framework, or it is a result of list matching.

Below is a working code and some tests for future reference.

require 'parslet'
require 'parslet/convenience'
require 'ostruct'


module JSONLikeDataModule
  module Parsing

    class GobbleUp < Parslet::Atoms::Base
      def initialize absent, min_chars = 0
        @absent = absent
        @min_chars = min_chars
      end

      def try(source, context, consume_all)
        excluding_length = source.chars_until(@absent)

        if excluding_length >= @min_chars
          return succ(source.consume(excluding_length))
        else
          return context.err(self, source, "No such string in input: #{@absent.inspect}.")
        end
      end

      def to_s_inner(prec)
        "until('#{@absent}')"
      end
    end

    class JSONLikeDataParser < Parslet::Parser
      rule(:l_map_paren) {str('{')}
      rule(:r_map_paren) {str('}')}
      rule(:l_list_paren) {str('[')}
      rule(:r_list_paren) {str(']')}
      rule(:map_entry_delimiter) {str(':')}
      rule(:val_delimiter) {str(',')}
      rule(:quote) {str('"')}

      rule(:simple_val) {match('[^",:\.\{\}\[\]]').repeat(1)}
      rule(:quoted_val) {quote >> GobbleUp.new('"').as(:value) >> quote}


      rule(:map) {l_map_paren >> map_entries.maybe.as(:map) >> r_map_paren}
      rule(:map_entries) {map_entry >> (val_delimiter >> map_entry).repeat}
      rule(:map_entry) {map_key >> map_entry_delimiter >> object}
      rule(:map_key) {(match('[A-Za-z_]').repeat(1) >> match('[A-Za-z0-9_]').repeat).as(:key)}

      rule(:list) {l_list_paren >> list_values.maybe.as(:list) >> r_list_paren}
      rule(:list_values) {object >> (val_delimiter >> object).repeat}

      rule(:object) {map.as(:value) | simple_val.as(:value) | quoted_val | list.as(:value)}

      root(:object)
    end

    class JSONLikeDataTransform < Parslet::Transform
      rule(key: simple(:key), value: simple(:value)) {{builder.value(key) => builder.value(value)}}

      rule(map: subtree(:s)) do
        ret = {}
        next builder.map(ret) unless s

        to_transform = s
        if to_transform.is_a?(Hash)
          to_transform = [to_transform]
        end

        to_transform.each do |h|
          ret.merge!(h)
        end
        builder.map(ret)
      end

      rule(list: simple(:list)) {builder.list(list)}
      rule(list: sequence(:list)) {builder.list(list)}
      rule(list: subtree(:list)) {builder.list(list)}

      rule(value: simple(:value)) {builder.value(value)}
      rule(value: sequence(:value)) {value.map {|val| builder.value(val)}}
      rule(value: subtree(:value)) {builder.value(value)}

    end



    class Builder
      def map(h)
        Map.new(h)
      end

      def list(l)
        Arr.new(l)
      end

      def value(v)
        Value.new(v)
      end

      class Arr
        def initialize(val)
          @val = val
        end

        def val
          return [] unless @val
          return @val.map(&:val) if @val.is_a?(Array)
          return [@val.val]
        end
      end

      class Map
        def initialize(val)
          @val = val
        end

        def val
          return OpenStruct.new unless @val
          @val.inject(OpenStruct.new) do |ostruct, (k,v)|
            ostruct[k.val] = v.val
            ostruct
          end
        end
      end

      class Value
        def initialize(val)
          @val = val
        end

        def val
          @val.respond_to?(:str) ? @val.str : @val.val
        end
      end
    end
  end
end

module JSONLikeDataModule
  class JSONLikeDataFactory
    @@flag_data_parser ||= Parsing::JSONLikeDataParser.new
    @@flag_data_transform ||= Parsing::JSONLikeDataTransform.new

    class << self
      private :new

      def create(flag_data_str)
        parsed_tree = @@flag_data_parser.parse_with_debug(flag_data_str)
        ret = @@flag_data_transform.apply(parsed_tree, :builder => Parsing::Builder.new)
        ret.val
      end
    end
  end
end

Tests

require 'minitest/autorun'
class JSONLikeFactoryTest < Minitest::Test
  include JSONLikeDataModule
  describe "JSONLikeDataFactory" do

    subject do
      JSONLikeDataFactory
    end

    it "should create string val" do
      subject.create('_S').must_equal "_S"
    end

    it "should create empty array" do
      subject.create('[]').must_equal []
    end

    it "should create empty nested array" do
      subject.create('[[[]]]').must_equal [[[]]]
    end

    it "should create not empty nested array" do
      subject.create('[[[1],[2,3]]]').must_equal [[['1'],['2','3']]]
    end

    it "should create empty OpenStruct" do
      subject.create('{}').must_equal OpenStruct.new
    end

    it "should create filled 1level OpenStruct" do
      subject.create('{key1:val,key2:"val"}').must_equal OpenStruct.new(key1: "val", key2: "val")
    end

    it "should create filled 2levels OpenStruct" do
      subject.create('{key1:val,key2:"val",key3:{},key4:{key1:val},key5:[1,2],key6:[[1,2,3],{},1,{key1:"[]{}:,."}]}').must_equal o(key1: "val", key2: "val", key3: o, key4: o(key1: "val"), key5: %w(1 2), key6: [%w(1 2 3), o, '1', o(key1: '[]{}:,.')])
    end

    def o(h= {})
      OpenStruct.new(h)
    end
  end
end
Egregious answered 25/6, 2019 at 15:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.