JSON: why are forward slashes escaped?
Asked Answered
T

5

477

The reason for this "escapes" me.

JSON escapes the forward slash, so a hash {a: "a/b/c"} is serialized as {"a":"a\/b\/c"} instead of {"a":"a/b/c"}.

Why?

Talapoin answered 16/10, 2009 at 21:54 Comment(5)
FWIW I've never seen forward slashes escaped in JSON, I just noticed it with the Java library at code.google.com/p/json-simpleTalapoin
PHP's json_encode() escapes forward slashes by default, but has the JSON_UNESCAPED_SLASHES option starting from PHP 5.4.0 (March 2012)Mannered
Here's a PHP code that will not escape every slash, only in '</': echo str_replace('</', '<\/', json_encode($obj, JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES));Camilia
Does the code include the '</': or does it start at echo? Because starting at echo fails for me. I simply dont get anything. Yes I replaced my $obj for my variable :)Rambler
JSON doesn't escape or serialize anything... your JSON serializer does. Which one are you using?Impassion
C
369

JSON doesn't require you to do that, it allows you to do that. It also allows you to use "\u0061" for "A", but it's not required, like Harold L points out:

The JSON spec says you CAN escape forward slash, but you don't have to.

Harold L answered Oct 16 '09 at 21:59

Allowing \/ helps when embedding JSON in a <script> tag, which doesn't allow </ inside strings, like Seb points out:

This is because HTML does not allow a string inside a <script> tag to contain </, so in case that substring's there, you should escape every forward slash.

Seb answered Oct 16 '09 at 22:00

Some of Microsoft's ASP.NET Ajax/JSON API's use this loophole to add extra information, e.g., a datetime will be sent as "\/Date(milliseconds)\/". (Yuck)

Cartogram answered 16/10, 2009 at 22:4 Comment(23)
Thanks for the answer. Never thought of that edge case. They should escape instances of </ with <\/, but not escape all the other slashes. :/Talapoin
That would be a good thing, escaping just </. Though JSON is not often embedded in script tags anyway.Cartogram
yeah, the hoops people have gone through for HTML... this is now the 2nd recent surprise for me re: JSON. The other one was that Infinity and NaN are not serialized. stackoverflow.com/questions/1423081Talapoin
JSON conversion is useful for generating inline scripts, eg. var f = <= xxx.to_json %>;. It should definitely not escape all forward slashes--it makes every JSON-encoded URL longer, instead of just the rare edge case.Distemper
See this blog post for the rationale for the ASP.NET JSON date format: weblogs.asp.net/bleroy/archive/2008/01/18/dates-and-json.aspxFumigator
why doesn't it just escape the ( </ ) character pair instead of all forward slashes ( / ) than ?Killarney
@GuyMontag: Probably because it is slightly more efficient / easier to implement when you don't have to remember which characters you have seen before to decide when to output an escape sequence. This way it's a simple per character substitution.Cartogram
...the only characters that need to be escaped in an encoding mechanism are the special characters used in the encoding mechanism structure itself( for JSON that would be ", {,},[,], etc.)...all other characters are payload and should be treated as such....if you break html because you send the wrong characters it is not the "structured data's encoding mechanism's responsibility to fix this....JSON needs to be replaced....it should be agnostic to client side language, server side language, and application, it is a payload delivery mechanism.Killarney
JSON needs to be replaced because a particular implementation of a JSON serializer outputs some JSON that (while being entirely valid JSON) has some extra characters so it can also be dropped into an HTML script element as a JS literal?! That isn't so much throwing the baby out with the bathwater as throwing the baby out because someone bought him a set of water wings.Lidia
I think instead of “Yuck”, the hack of using an escaped forward slash to mark a string as being more than a string is neat and much better than alternatives such as iterating through a deserialized JSON object and converting everything that matches a RegExp for ISO 8601 to a Date object or needing a separate key to indicate whether the serialized value is a pure string or a Date.Gargan
What I don't get, is why a JSON serializer would even care where the JSON ends up. On a web page, in an HTTP request, whatever. Let the final renderer do additional encoding, if it needs it.Orthopteran
@DanRoss And it can. Escaping / is not required, it is allowed, to ease the use of JSON. If you don't want to escape /, then don't.Woodshed
"when embedding JSON in a <script> tag, which doesn't allow </ inside strings" is incorrect. </ is just fine within a script tag. It's only </script> (perhaps with spaces in there) that terminates them.Povertystricken
@T.J.Crowder The HTML 3.2 and 4.01 specs explicitly forbid </ inside <script> (and <style>). Because, how should a browser treat <div><script>["</div>"]</script>? You could (should?) interpret this as <div><script>["</script></div><script>"]</script>. As this is how it should be parsed if you'd change <script> for <b>. It was only the HTML 5 spec that changed this to </script. In the good old days, we even used <script><!-- ... //--></script> (and other magic incantations) just to be absolutely sure the content of a script tag was not misinterpreted.Cartogram
@T.J.Crowder And JSON (2000) predates the HTML5 spec (2014). But these days, you're right. And it's just a lot faster to just escape all occurrences of </, without having to look ahead for script, so why bother.Cartogram
@Ruben: HTML 3.2 is ancient history, but where do you see that in even the HTML4 spec? In any case, the HTML5 spec codified what browsers had actually been doing for years. Separately: What makes you think replacing / is faster than replacing, say, </? It would at least minimize bloat. But in any case, it's handling the issue at the wrong level (JSON rather than the point at which you're using JSON in a script tag, if it happens you are).Povertystricken
@T.J.Crowder Section B.3.2 "Specifying non-HTML data" specifically deals with this issue. Secondly, handling </ is marginally slower than looking for just / because you need track what the previous character was. Blindly replacing / is just simpler, and that's what happens for all the reserved characters too. Besides, the JSON spec still allows you to not escape /. So if you don't like the bloat: use/write a JSON formatter that doesn't escape /. No-one is forcing you to do either way, as it's an optional encoding feature.Cartogram
@Ruben: Thanks for finding that for me, I always like to have that kind of arcane info. :-) I suspect you'll find if you actually test it that PHP will replace </ just as fast as /. But it's not particularly important, it's still handling it at the wrong level.Povertystricken
@Cartogram nowadays we use the even more complicated <script type="text/javascript"><!--//--><![CDATA[//><!--//--><!]]></script> form, which, using a CDATA, escapes (hah!) this problem as well. (To the curious: <style type="text/css"><!--/*--><![CDATA[/*><!--*//*]]>*/--></style> is the matching “proper” escape for CSS, and both are also XHTML/1.1 clean.)Dilettantism
Warning users should not rely on this feature of json_encode() and should properly escape JSON embedded in an HTML document to avoid XSS. See Rule #3.1 of the OWASP cheatsheet.Monadelphous
@Monadelphous Rule 3.1 that you mention seems to indicate that < and > are unsafe characters, and <\/script> is thus inadequate encoding to prevent XSS. Are you aware of an exploit for this, as I can’t yet see how it might be exploited.Amon
@SimonEast See this post for examples. e.g. 1. it can be disabled at runtime or removed as default in a future version of PHP. 2. html entities, e.g. &quot; instead of ". 3. Text encoding assumed to be UTF-8 by json_encodeMonadelphous
Replacing </ is faster. < occurs less than /. It's also likely faster even if < appears a ton because you don't have to take any action other than setting and unsetting a flag in the vast majority of cases (i.e. unless </ appears instead of just < or /).Cathiecathleen
R
53

The JSON spec says you CAN escape forward slash, but you don't have to. A reverse solidus must be escaped, but you do not need to escape a solidus. Section 9 says

"All characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark (U+0022), reverse solidus (U+005C), and the control characters U+0000 to U+001F."

Rapid answered 16/10, 2009 at 21:59 Comment(0)
A
28

PHP escapes forward slashes by default which is probably why this appears so commonly. I suspect it's because embedding the string "</script>" inside a <script> tag is considered unsafe.

Example:

<script>
var searchData = <?= json_encode(['searchTerm' => $_GET['search'], ...]) ?>;
// Do something else with the data...
</script>

Based on this code, an attacker could append this to the page's URL:

?search=</script> <some attack code here>

Which, if PHP's protection was not in place, would produce the following HTML:

<script>
var searchData = {"searchTerm":"</script> <some attack code here>"};
...
</script>

Even though the closing script tag is inside a string, it will cause many (most?) browsers to exit the script tag and interpret the items following as valid HTML.

With PHP's protection in place, it will appear instead like this, which will NOT break out of the script tag:

<script>
var searchData = {"searchTerm":"<\/script> <some attack code here>"};
...
</script>

This functionality can be disabled by passing in the JSON_UNESCAPED_SLASHES flag but most developers will not use this since the original result is already valid JSON.

Amon answered 22/1, 2018 at 2:30 Comment(4)
"is considered unsafe" -> it really is unsafe. Exploit: <script>let the = "bodies </script><script>alert("the floor");</script>";</script> Try it, the bodies will alert the floor rather than getting a variable called 'the' with script tags in its value. You can say "then don't embed it in a page", yeah, that's a possible workaround, but a lot of people do this anyway (so let's just make good escape functions because why not) and frankly I understand their point: it would make sense if it were safe to have JSON data with correctly escaped data values in JavaScript.Mantinea
Thanks @Mantinea - great example of why PHP has opted to escape slashes by default! Functions should be secure by default, and only insecure when you specifically want it that way.Amon
I beg to differ. PHP shouldn't encode forward slashes by default. If a frontend developer want to echo user inputted value into HTML code, he should realize that it is always very dangerous, whether it is inside <script> or inside a <div>; and he should be the one to take a lot of precautions, by using htmlspecialchars(), for example. On the backend, it is perfectly safe to not escape forward slashes.Newsome
@DanielWu Unfortunately many developers are lazy, which is why “secure by default” is a good strategy. Developers can disable those additional slashes by adding the extra parameter if they understand the consequences. (Also I don’t think htmlspecialchars() will work in this scenario. The slashes are still required.)Amon
C
22

I asked the same question some time ago and had to answer it myself. Here's what I came up with:

It seems, my first thought [that it comes from its JavaScript roots] was correct.

'\/' === '/' in JavaScript, and JSON is valid JavaScript. However, why are the other ignored escapes (like \z) not allowed in JSON?

The key for this was reading http://www.cs.tut.fi/~jkorpela/www/revsol.html, followed by http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.2. The feature of the slash escape allows JSON to be embedded in HTML (as SGML) and XML.

Catastrophism answered 16/3, 2012 at 10:18 Comment(2)
A structured data payload delivery mechanism should not be tied to language constructs..as this may change in the future...but this might explain the design decisions if there were any of the JSON creators.Killarney
'\/' === '/' So I don't need to unescape forward slashes when receiving my jsonp?Apocope
H
1

Yes, some JSON utiltiy libraries do it for various good but mostly legacy reasons. But then they should also offer something like setEscapeForwardSlashAlways method to set this behaviour OFF.

In Java, org.codehaus.jettison.json.JSONObject does offer a method called

setEscapeForwardSlashAlways(boolean escapeForwardSlashAlways)

to switch this default behaviour off.

Hypotonic answered 10/10, 2022 at 12:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.