Reading unescaped backslashes in JSON into R
Asked Answered
W

2

1

I'm trying to read some data from the Facebook Graph API Explorer into R to do some text analysis. However, it looks like there are unescaped backslashes in the JSON feed, which is causing rjson to barf. The following is a minimal example of the kind of input that's causing problems.

library(rjson)
txt <- '{"data":[{"id":2, "value":"I want to \\"post\\" a picture\\video"}]}'
fromJSON(txt)

(Note that the double backslashes at \\" and \\video will convert to single backslashes after parsing, which is what's in my actual data.)

I also tried the RJSONIO package which also gave errors, and even crashed R at times.

Has anyone come across this problem before? Is there a way to fix this short of manually hunting down every error that crops up? There's potentially megabytes of JSON being parsed, and the error messages aren't very informative about where exactly the problematic input is.

Word answered 19/11, 2013 at 9:3 Comment(0)
T
0

Just replace backslashes that aren't escaping double quotes, tabs or newlines with double backslashes.

In the regular expression, '\\\\' is converted to one backslash (two levels of escaping are needed, one for R, one for the regular expression engine). We need the perl regex engine in order to use lookahead.

library(stringr)
txt2 <- str_replace_all(txt, perl('\\\\(?![tn"])'), '\\\\\\\\')
fromJSON(txt2)
Tullis answered 19/11, 2013 at 9:12 Comment(5)
Thanks. That doesn't work though, since there are also characters like \" to denote escaped quote literals. IOW, sometimes the backslashes are correct, and sometimes they need to be modified.Word
I've modified my example to clarify.Word
@HongOoi OK, I've updated my answer. The best solution depends upon how consistently wrong the JSON is. If they are randomly single or double blakslashes, you'll probably need to do some manual correct.Tullis
Yeah, I just found the following snippet: "message":"Ok thank you :)\" Note the unescaped \ right before the ending quote. What a mess. Looks like there's no getting around manual correction.Word
Have you tried using unexpected.escape = "keep", to at least prevent errors and get something read into R?Tullis
R
0

The problem is that you are trying to parse invalid JSON:

library(jsonlite)
txt <- '{"data":[{"id":2, "value":"I want to \\"post\\" a picture\\video"}]}'
validate(txt)

The problem is the picture\\video part because \v is not a valid JSON escape sequence, even though it is a valid escape sequence in R and some other languages. Perhaps you mean:

library(jsonlite)
txt <- '{"data":[{"id":2, "value":"I want to \\"post\\" a picture\\/video"}]}'
validate(txt)
fromJSON(txt)

Either way to problem is at the JSON data source that is generating invalid JSON. If this data really comes form Facebook, you found a bug in their API. But more likely you are not retrieving it correctly.

Rendering answered 25/7, 2014 at 11:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.