How should I escape strings in JSON?
Asked Answered
T

20

182

When creating JSON data manually, how should I escape string fields? Should I use something like Apache Commons Lang's StringEscapeUtilities.escapeHtml, StringEscapeUtilities.escapeXml, or should I use java.net.URLEncoder?

The problem is that when I use SEU.escapeHtml, it doesn't escape quotes and when I wrap the whole string in a pair of 's, a malformed JSON will be generated.

Tetra answered 11/6, 2010 at 3:54 Comment(2)
If you're wrapping the whole string in a pair of ', you're doomed from the start: JSON strings can only be surrounded with ". See ietf.org/rfc/rfc4627.txt.Fern
+1 for the StringEscapeUtilities outline. Its pretty useful.Trombley
F
186

Ideally, find a JSON library in your language that you can feed some appropriate data structure to, and let it worry about how to escape things. It'll keep you much saner. If for whatever reason you don't have a library in your language, you don't want to use one (I wouldn't suggest this¹), or you're writing a JSON library, read on.

Escape it according to the RFC. JSON is pretty liberal: The only characters you must escape are \, ", and control codes (anything less than U+0020).

This structure of escaping is specific to JSON. You'll need a JSON specific function. All of the escapes can be written as \uXXXX where XXXX is the UTF-16 code unit¹ for that character. There are a few shortcuts, such as \\, which work as well. (And they result in a smaller and clearer output.)

For full details, see the RFC.

¹JSON's escaping is built on JS, so it uses \uXXXX, where XXXX is a UTF-16 code unit. For code points outside the BMP, this means encoding surrogate pairs, which can get a bit hairy. (Or, you can just output the character directly, since JSON's encoded for is Unicode text, and allows these particular characters.)

Fern answered 11/6, 2010 at 3:58 Comment(9)
Is it valid in JSON, like in JavaScript, to enclose strings in double quotes or single quotes? Or is it only valid to enclose them in double quotes?Tetra
@Sergei: The characters {[]}:? must not be escaped with a single backslash. (\:, for example, is not valid in a JSON string.) All of those can optionally be escaped using the \uXXXX syntax, at the waste of several bytes. See §2.5 of the RFC.Fern
I'm not sure how widely it's supported, but in my experience a call to JSON.stringify() did the job.Parbuckle
For my tiny brain, the RFC is a bit vague, when it states "... any UNICODE character...". Which encoding? utf-8, utf-16, shift-jis, ...? Big endian/little endian? The RFC also does not state the character encoding for the whole json string. Some clarification would be much appreciated. Maybe for Java programmers, the term "unicode" is enough to ring a bell, but for C/C++ programmers having a std::string etc, it is not enough information.Petras
@BitTickler: I'm not sure which part of the RFC you're referencing. If you mean what Unicode characters are valid in a string literal, then the production unescaped tells you what characters you can include without an escape sequence, and which must be escaped; when those two are combined, a string literal can contain any sequence of Unicode code points. String literals have no encoding: they are text. The JSON value as a whole (including embedded literals), however, might, if it needs to be stored as bytes; the standard mandates one of the UTF- encodings shall be used.Fern
@LS. stringify() can change the order of non-array object properties, as cited at developer.mozilla.org/en/docs/Web/JavaScript/Reference/…Provinciality
@Patanjali: The question and comments didn't specify order is important. Usually, order should not be important. From json.org: "An object is an unordered set of name/value pairs." If order is important to the consumer of JSON, it should be written in a way to preserve that order. For example, use property keys that would fall into the correct order when sorted.Parbuckle
@LS. Faerie's nuts. (Fair enough).Provinciality
@Petras a unicode character is not vague at all -- it just means that it has a code point (or points) in the unicode spec. When you use std::string, it is a bunch of unicode characters. When you need to serialize it, lets say to a file or across the network, that's where 'which encoding' comes in. It seems according to Thanatos that they want you to use a UTF, but technically any encoding can be used as long as it can be reconstituted into unicode characters.Florin
M
60

Extract From Jettison:

 public static String quote(String string) {
         if (string == null || string.length() == 0) {
             return "\"\"";
         }

         char         c = 0;
         int          i;
         int          len = string.length();
         StringBuilder sb = new StringBuilder(len + 4);
         String       t;

         sb.append('"');
         for (i = 0; i < len; i += 1) {
             c = string.charAt(i);
             switch (c) {
             case '\\':
             case '"':
                 sb.append('\\');
                 sb.append(c);
                 break;
             case '/':
 //                if (b == '<') {
                     sb.append('\\');
 //                }
                 sb.append(c);
                 break;
             case '\b':
                 sb.append("\\b");
                 break;
             case '\t':
                 sb.append("\\t");
                 break;
             case '\n':
                 sb.append("\\n");
                 break;
             case '\f':
                 sb.append("\\f");
                 break;
             case '\r':
                sb.append("\\r");
                break;
             default:
                 if (c < ' ') {
                     t = "000" + Integer.toHexString(c);
                     sb.append("\\u" + t.substring(t.length() - 4));
                 } else {
                     sb.append(c);
                 }
             }
         }
         sb.append('"');
         return sb.toString();
     }
Micron answered 20/5, 2013 at 15:15 Comment(6)
Well, this was the OP tagMicron
Don't understand only when c < ' ', change to \u. In my case, there is character \uD38D, which is 55357 and over ' ', so doesn't change to \u...Kinzer
@Kinzer Sounds like a new questionMicron
@Micron Thanks for your reply, I still don't know why. but finally, I changed the method to fix it like below, if (c < ' ' || c > 0x7f) { t = "000" + Integer.toHexString(c).toUpperCase(); sb.append("\\u" + t.substring(t.length() - 4)); } else { sb.append(c); } }Kinzer
@Stony, all characters other than ", \ , and control characters (those before “ ”) are valid inside JSON strings as long as the output encoding matches. In other words, you do not need to encode “펍” as \uD38D as long as the UTF encoding is preserved.Meeks
Can someone help me with the input string for the above method, which will satisfy the switch case of '/b' and /t. I am unable to write a junit for the the same. @Micron . Thanks.Leoine
R
39

Try this org.codehaus.jettison.json.JSONObject.quote("your string").

Download it here: http://mvnrepository.com/artifact/org.codehaus.jettison/jettison

Reamy answered 9/1, 2012 at 14:21 Comment(5)
Definitely the best solution! ThxGoatsucker
but this does not quoting of braces like [{Mg
@Sergei You don't have to escape braces inside of a JSON string.Drainage
Might be useful to show what this actually returns.Shephard
org.json.JSONObject.quote("your json string") also works fineReflector
C
26

There is now a StringEscapeUtils#escapeJson(String) method in the Apache Commons Text library.

The methods of interest are as follows:

This functionality was initially released as part of Apache Commons Lang version 3.2 but has since been deprecated and moved to Apache Commons Text. So if the method is marked as deprecated in your IDE, you're importing the implementation from the wrong library (both libraries use the same class name: StringEscapeUtils).

The implementation isn't pure Json. As per the Javadoc:

Escapes the characters in a String using Json String rules.

Escapes any values it finds into their Json String form. Deals correctly with quotes and control-chars (tab, backslash, cr, ff, etc.)

So a tab becomes the characters '\' and 't'.

The only difference between Java strings and Json strings is that in Json, forward-slash (/) is escaped.

See http://www.ietf.org/rfc/rfc4627.txt for further details.

Copyist answered 31/3, 2014 at 8:34 Comment(5)
This is the most practical answer for me. Most projects already use apache commons lang, so no need to add a dependency for one function. A JSON builder would probably be the best answer.Watters
As a follow-up, and because I can't figure out how to edit a comment I added a new one, I found javax.json.JsonObjectBuilder and javax.json.JsonWriter. Very nice builder/writer combination.Watters
This is deprecated in apache commons lang, you need to use apache commons text. Sadly, this library follows the optional/outdated spec by escaping / characters. This breaks lots of things including JSON with URLs in it. The original proposal had / as a special char to escape but this is no longer the case, as we can see in the latest spec at time of writingBasting
@Basting Thank you. I updated the answer.Copyist
I've sinced started using Google's Gson for Json conversion. Similarly to Apache Commons there seems to be some caveats around their implementation [1] [2] but I'm pushing on with it for now...Copyist
H
24

org.json.simple.JSONObject.escape() escapes quotes,\, /, \r, \n, \b, \f, \t and other control characters. It can be used to escape JavaScript codes.

import org.json.simple.JSONObject;
String test =  JSONObject.escape("your string");
Hileman answered 23/7, 2012 at 10:34 Comment(3)
It depends on the json library you are using (JSONObject.escape, JSONObject.quote, ..) but it's always a static method doing the quoting job and simply should be reusedCataplasm
Which library is org.json part of? I don't have it on my classpath.Froemming
com.googlecode.json-simple see mvnrepository.com/artifact/com.googlecode.json-simple/…Hileman
S
13

org.json.JSONObject quote(String data) method does the job

import org.json.JSONObject;
String jsonEncodedString = JSONObject.quote(data);

Extract from the documentation:

Encodes data as a JSON string. This applies quotes and any necessary character escaping. [...] Null will be interpreted as an empty string

Sandhi answered 5/10, 2016 at 11:11 Comment(1)
org.apache.sling.commons.json.JSONObject also has this same thingHighchair
F
6

StringEscapeUtils.escapeJavaScript / StringEscapeUtils.escapeEcmaScript should do the trick too.

Flinty answered 26/9, 2013 at 18:39 Comment(1)
escapeJavaScript escapes single quotes as \', which is incorrect.Ryun
V
6

If you are using fastexml jackson, you can use the following: com.fasterxml.jackson.core.io.JsonStringEncoder.getInstance().quoteAsString(input)

If you are using codehaus jackson, you can use the following: org.codehaus.jackson.io.JsonStringEncoder.getInstance().quoteAsString(input)

Voluptuary answered 6/10, 2016 at 18:6 Comment(0)
G
3

Not sure what you mean by "creating json manually", but you can use something like gson (http://code.google.com/p/google-gson/), and that would transform your HashMap, Array, String, etc, to a JSON value. I recommend going with a framework for this.

Gyve answered 11/6, 2010 at 4:3 Comment(5)
By manually I meant not by using a JSON library like Simple JSON, Gson, or XStream.Tetra
Just a matter of curiosity -- why wouldn't you want to use one of these APIs? It's like trying to escape URLs manually, instead of using URLEncode/Decode...Gyve
Not really the same, those libraries come with a lot more than the equivalent of URLEncode/Decode, they include a whole serialization package to allow persistence of java object in json form,and sometimes your really only need to encode a short bunch of textEnclave
do a manual creating of JSON makes sense, if you wish to not include a library just for serializing small bits of dataTriboelectricity
I would ask to have a team member removed from any project I was on if they dared to create JSON manually where there existed a high quality library to do so.Vonvona
S
2

I have not spent the time to make 100% certain, but it worked for my inputs enough to be accepted by online JSON validators:

org.apache.velocity.tools.generic.EscapeTool.EscapeTool().java("input")

although it does not look any better than org.codehaus.jettison.json.JSONObject.quote("your string")

I simply use velocity tools in my project already - my "manual JSON" building was within a velocity template

Santiagosantillan answered 30/4, 2012 at 20:22 Comment(0)
H
2

For those who came here looking for a command-line solution, like me, cURL's --data-urlencode works fine:

curl -G -v -s --data-urlencode 'query={"type" : "/music/artist"}' 'https://www.googleapis.com/freebase/v1/mqlread'

sends

GET /freebase/v1/mqlread?query=%7B%22type%22%20%3A%20%22%2Fmusic%2Fartist%22%7D HTTP/1.1

, for example. Larger JSON data can be put in a file and you'd use the @ syntax to specify a file to slurp in the to-be-escaped data from. For example, if

$ cat 1.json 
{
  "type": "/music/artist",
  "name": "The Police",
  "album": []
}

you'd use

curl -G -v -s --data-urlencode [email protected] 'https://www.googleapis.com/freebase/v1/mqlread'

And now, this is also a tutorial on how to query Freebase from the command line :-)

Hyponasty answered 4/11, 2014 at 3:38 Comment(0)
E
2

Use EscapeUtils class in commons lang API.

EscapeUtils.escapeJavaScript("Your JSON string");
Encephalitis answered 9/11, 2014 at 11:27 Comment(1)
Note that single quotes for example are handled differently when escaping to javascript or json. In commons.lang 3.4 StringEscapeUtils (commons.apache.org/proper/commons-lang/javadocs/api-3.4/org/…) has a escapeJSON method which is different than the escapeJavaScript method in commons.lang 2: commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/…Forcier
C
2

Consider Moshi's JsonWriter class. It has a wonderful API and it reduces copying to a minimum, everything can be nicely streamed to a filed, OutputStream, etc.

OutputStream os = ...;
JsonWriter json = new JsonWriter(Okio.buffer(Okio.sink(os)));
json.beginObject();
json.name("id").value(getId());
json.name("scores");
json.beginArray();
for (Double score : getScores()) {
  json.value(score);
}
json.endArray();
json.endObject();

If you want the string in hand:

Buffer b = new Buffer(); // okio.Buffer
JsonWriter writer = new JsonWriter(b);
//...
String jsonString = b.readUtf8();
Comfrey answered 21/7, 2015 at 14:21 Comment(0)
R
1

If you need to escape JSON inside JSON string, use org.json.JSONObject.quote("your json string that needs to be escaped") seem to work well

Reflector answered 16/10, 2016 at 4:24 Comment(0)
E
1

Apache commons-text now has a StringEscapeUtils.escapeJson(String).

Eurystheus answered 19/6, 2018 at 13:38 Comment(0)
J
0

The methods here that show the actual implementation are all faulty.
I don't have Java code, but just for the record, you could easily convert this C#-code:

Courtesy of the mono-project @ https://github.com/mono/mono/blob/master/mcs/class/System.Web/System.Web/HttpUtility.cs

public static string JavaScriptStringEncode(string value, bool addDoubleQuotes)
{
    if (string.IsNullOrEmpty(value))
        return addDoubleQuotes ? "\"\"" : string.Empty;

    int len = value.Length;
    bool needEncode = false;
    char c;
    for (int i = 0; i < len; i++)
    {
        c = value[i];

        if (c >= 0 && c <= 31 || c == 34 || c == 39 || c == 60 || c == 62 || c == 92)
        {
            needEncode = true;
            break;
        }
    }

    if (!needEncode)
        return addDoubleQuotes ? "\"" + value + "\"" : value;

    var sb = new System.Text.StringBuilder();
    if (addDoubleQuotes)
        sb.Append('"');

    for (int i = 0; i < len; i++)
    {
        c = value[i];
        if (c >= 0 && c <= 7 || c == 11 || c >= 14 && c <= 31 || c == 39 || c == 60 || c == 62)
            sb.AppendFormat("\\u{0:x4}", (int)c);
        else switch ((int)c)
            {
                case 8:
                    sb.Append("\\b");
                    break;

                case 9:
                    sb.Append("\\t");
                    break;

                case 10:
                    sb.Append("\\n");
                    break;

                case 12:
                    sb.Append("\\f");
                    break;

                case 13:
                    sb.Append("\\r");
                    break;

                case 34:
                    sb.Append("\\\"");
                    break;

                case 92:
                    sb.Append("\\\\");
                    break;

                default:
                    sb.Append(c);
                    break;
            }
    }

    if (addDoubleQuotes)
        sb.Append('"');

    return sb.ToString();
}

This can be compacted into

    // https://github.com/mono/mono/blob/master/mcs/class/System.Json/System.Json/JsonValue.cs
public class SimpleJSON
{

    private static  bool NeedEscape(string src, int i)
    {
        char c = src[i];
        return c < 32 || c == '"' || c == '\\'
            // Broken lead surrogate
            || (c >= '\uD800' && c <= '\uDBFF' &&
                (i == src.Length - 1 || src[i + 1] < '\uDC00' || src[i + 1] > '\uDFFF'))
            // Broken tail surrogate
            || (c >= '\uDC00' && c <= '\uDFFF' &&
                (i == 0 || src[i - 1] < '\uD800' || src[i - 1] > '\uDBFF'))
            // To produce valid JavaScript
            || c == '\u2028' || c == '\u2029'
            // Escape "</" for <script> tags
            || (c == '/' && i > 0 && src[i - 1] == '<');
    }



    public static string EscapeString(string src)
    {
        System.Text.StringBuilder sb = new System.Text.StringBuilder();

        int start = 0;
        for (int i = 0; i < src.Length; i++)
            if (NeedEscape(src, i))
            {
                sb.Append(src, start, i - start);
                switch (src[i])
                {
                    case '\b': sb.Append("\\b"); break;
                    case '\f': sb.Append("\\f"); break;
                    case '\n': sb.Append("\\n"); break;
                    case '\r': sb.Append("\\r"); break;
                    case '\t': sb.Append("\\t"); break;
                    case '\"': sb.Append("\\\""); break;
                    case '\\': sb.Append("\\\\"); break;
                    case '/': sb.Append("\\/"); break;
                    default:
                        sb.Append("\\u");
                        sb.Append(((int)src[i]).ToString("x04"));
                        break;
                }
                start = i + 1;
            }
        sb.Append(src, start, src.Length - start);
        return sb.ToString();
    }
}
Judkins answered 16/9, 2016 at 8:38 Comment(1)
How is the quote() method described in other answers faulty?Intussusception
J
0

using the \uXXXX syntax can solve this problem, google UTF-16 with the name of the sign, you can find out XXXX, for example:utf-16 double quote

Jigging answered 21/10, 2016 at 5:33 Comment(0)
W
0

I think the best answer in 2017 is to use the javax.json APIs. Use javax.json.JsonBuilderFactory to create your json objects, then write the objects out using javax.json.JsonWriterFactory. Very nice builder/writer combination.

Watters answered 28/6, 2017 at 14:30 Comment(0)
I
0

If you just want a manual one-off solution to escape some text on the fly and are using a Windows machine, this PowerShell solution will work from a clean install of the OS with no other tools:

[PSCustomObject] @{
   'foo' = 'Hello, World!'
   'bar' = 'Goodbye, World!'
} | ConvertTo-JSON

Result:

{
    "foo":  "Hello, World!",
    "bar":  "Goodbye, World!"
}

If you're not familiar with PowerShell, these single quotes work like UNIX hard quotes; the only escape you'll need is for more sinqle quotes. Double quotes are often more convenient if your data has single quotes, but certain characters will need to be escaped with back ticks:

[PSCustomObject] @{
   'foo' = 'two single quotes: '''' and two dollar signs: $$'
   'bar' = "two single quotes: '' and two dollar signs: `$`$"
} | ConvertTo-JSON

Result:

{
    "foo":  "two single quotes: \u0027\u0027 and two dollar signs: $$",
    "bar":  "two single quotes: \u0027\u0027 and two dollar signs: $$"
}

Piping to clip.exe will output the result to your clipboard so you can paste it somewhere else:

[PSCustomObject] @{
   'foo' = 'some quotes: """'
   'bar' = 'picket fence: /\/\/\'
} | ConvertTo-JSON | clip

No result is displayed, but this is now in the user's clipboard:

{
    "foo":  "some quotes: \"\"\"",
    "bar":  "picket fence: /\\/\\/\\"
}
Infuse answered 5/10, 2023 at 20:21 Comment(0)
L
0
public static string SerializeString(string str)
    {
        var builder = new StringBuilder(str.Length+4);
        builder.Append('\"');

        char[] charArray = str.ToCharArray();
        foreach (var c in charArray)
        {
            switch (c)
            {
                case '"': builder.Append("\\\""); break;
                case '\\': builder.Append("\\\\");break;
                case '\b': builder.Append("\\b"); break;
                case '\f': builder.Append("\\f"); break;
                case '\n': builder.Append("\\n"); break;
                case '\r': builder.Append("\\r"); break;
                case '\t': builder.Append("\\t"); break;
                
                default:
                    int codepoint = Convert.ToInt32(c);
                    if ((codepoint >= 32) && (codepoint <= 126))
                    {
                        builder.Append(c);
                    }
                    else
                    {
                        builder.Append("\\u");
                        builder.Append(codepoint.ToString("x4"));
                    }
                    break;
            }
        }

        builder.Append('\"');
        return builder.ToString();
    }

https://github.com/Jackyjjc/MiniJSON.cs/blob/master/MiniJSON.cs#L497

Leatherleaf answered 10/4 at 13:27 Comment(2)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Capitulate
Pardon me, but did you notice that the question is tagged "java"? Hence I don't know how much your C# code will help. In any case, answers that contain code only are generally considered poor quality. Can you edit your question and explain how the code correctly escapes a JSON string?Autotype

© 2022 - 2024 — McMap. All rights reserved.