YAML: error parsing a string containing a square bracket as its first character
Asked Answered
S

3

5

I'm parsing a YAML file in Ruby and some of the input is causing a Psych syntax error:

require 'yaml'

example = "my_key: [string] string"
YAML.load(example)

Resulting in:

Psych::SyntaxError: (<unknown>): did not find expected key
          while parsing a block mapping at line 1 column 1
from [...]/psych.rb:456:in `parse'

I received this YAML from an external API that I do not have control over. I can see that editing the input to force parsing as a string, using my_key: '[string] string', as noted in "Do I need quotes for strings in YAML?", fixes the issue however I don't control how the input is received.

Is there a way to force the input to be parsed as a string for some keys such as my_key? Is there a workaround to successfully parse this YAML?

Sanjuana answered 2/1, 2020 at 12:32 Comment(5)
You may want to paste result correctly.Binette
Just to understand the problem: What do you expect? The string [string] string or the string 'string`? Obviously you don't get valid yaml, so maybe you have a description from the API you use.Bruckner
It's weird that an API would return a result in YAML that isn't actually valid YAML :/ But couldn't you just pre-process the response before reading as YAML?Salubrious
You may not control how the string is received, but you do have control over it immediately prior to parsing it so munging it isn't out of the question. I'd do it in a small piece of code separate from the parsing code, following all the appropriate cautionary steps of backing up the original until you know your code has successfully parsed it.Glacialist
I ran into this scenario with a tool that had a bug choking on parsing <, > in yaml strings, even when escaped. It's a bit of a hack, but I ended up using the HTML escaped versions instead successfully (&lt;, &gt;).Canopy
S
4

One approach would be to process the response before reading it as YAML. Assuming it's a string, you could use a regex to replace the problematic pattern with something valid. I.e.

resp_str = "---\nmy_key: [string] string\n"
re = /(\: )(\[[a-z]*?\] [a-z]*?)(\n)/
resp_str.gsub!(re, "#{$1}'#{$2}'#{$3}")
#=> "---\n" + "my_key: '[string] string'\n"

Then you can do

YAML.load(resp_str)
#=> {"my_key"=>"[string] string"}
Salubrious answered 2/1, 2020 at 21:54 Comment(1)
I've had to do that too many times and agree. It's often the shortest path back to sanity.Glacialist
O
4

It does not work because square brackets have a special meaning in YAML, denoting arrays:

YAML.load "my_key: [string]"
#⇒ {"my_key"=>["string"]}

and [foo] bar is an invalid type. One should escape square brackets explicitly

YAML.load "my_key: \\[string\\] string"
#⇒ {"my_key"=>"\\[string\\] string"}

Also, one might implement the custom Psych parser.

Outsert answered 2/1, 2020 at 12:41 Comment(4)
Thanks, I understand that. However I don't control how this input is being sent as it's being sent from an external API.Sanjuana
Then the only way to go would be to implement a custom Psych parser. Here is my blof post describing how to accomplish that.Outsert
I'm looking forward to the day I can casually drop my blog post in answer to a question hereShuffle
Sometimes it's necessary to pre-process an input file to fix known errors prior to passing them to YAML, JSON, or even an XML/HTML parser. It's the nature of the internet that if someone can implement a standard wrong someone will, usually because they had a "bright idea".Glacialist
S
4

One approach would be to process the response before reading it as YAML. Assuming it's a string, you could use a regex to replace the problematic pattern with something valid. I.e.

resp_str = "---\nmy_key: [string] string\n"
re = /(\: )(\[[a-z]*?\] [a-z]*?)(\n)/
resp_str.gsub!(re, "#{$1}'#{$2}'#{$3}")
#=> "---\n" + "my_key: '[string] string'\n"

Then you can do

YAML.load(resp_str)
#=> {"my_key"=>"[string] string"}
Salubrious answered 2/1, 2020 at 21:54 Comment(1)
I've had to do that too many times and agree. It's often the shortest path back to sanity.Glacialist
P
-1

There is very native and easy solution. If you would like to have string context you can always put quotes around it:

 YAML.load "my_key: '[string]'"
=> {"my_key"=>"[string]"}
Poseur answered 2/1, 2020 at 13:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.