Don't escape single quotes with System.Text.Json
Asked Answered
B

1

7

I would like .net core System.Text.Json to ignore the single quote character when escaping characters for serialization but I just can't get it to work:

var encoderSettings = new TextEncoderSettings();
encoderSettings.AllowRange(UnicodeRanges.BasicLatin);
encoderSettings.AllowCharacters('\u0027');

var options = new JsonSerializerOptions{ 
    Encoder = JavaScriptEncoder.Create(encoderSettings) 
};

System.Text.Json.JsonSerializer.Serialize(new { text = "abc 'zorro' 123" }, options);

This will result in a string:

{"text":"abc \u0027zorro\u0027 123"}

When I would like it to be

{"text":"abc 'zorro' 123"}

Any ideas here? Just want to not escape the single quotes. I've also tried to replace the \u0027 with a \'.

If I do it like this - it works:

options = new JsonSerializerOptions
{
    Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping
};
jsonString = JsonSerializer.Serialize(new { text = "abc 'zorro' 123" }, options);

...but this will also disable escaping for all characters including <, > and & (excepting only characters that the JSON standard requires to be escaped), which I also don't want.

Berriman answered 13/10, 2022 at 15:28 Comment(4)
"...but of course I can't have it like that" But that's how it works. If you can't have it like that, then i guess you have to accept the output with the escaped single quotes. Or what precisely and exactly makes you not want to do it?Caecilian
hmm, okey? So you think it's OK to allow all html-characters when I only need single quotes? The question is why doesn't my code work. What am I missing, or have I misunderstood the AllowCharacters feature?Berriman
Please add this to the question. The question as-is doesn't make it really clear that you only want single quotes to be not escaped, while keeping other unsafe characters escaped.Caecilian
I also cannot make it work and not have the single quotes escaped:var encoderSettings = new TextEncoderSettings(); encoderSettings.AllowCharacters('\u0027'); // Allow Single Quotes encoderSettings.AllowRange(UnicodeRanges.BasicLatin); var options2 = new JsonSerializerOptions { Encoder = JavaScriptEncoder.Create(encoderSettings), WriteIndented = true }; var bodyJSON = JsonSerializer.Serialize(eventToSend, options2);Resistance
S
10

This behavior is documented in How to customize character encoding with System.Text.Json:

Block lists

The preceding sections show how to specify allow lists of code points or ranges that you don't want to be escaped. However, there are global and encoder-specific block lists that can override certain code points in your allow list. Code points in a block list are always escaped, even if they're included in your allow list.

Global block list

The global block list includes things like private-use characters, control characters, undefined code points, and certain Unicode categories, such as the Space_Separator category, excluding U+0020 SPACE. ... <snip>

Encoder-specific block lists

Examples of encoder-specific blocked code points include '<' and '&' for the HTML encoder, '\' for the JSON encoder, and '%' for the URL encoder. ... <snip>

So, as documented, JavaScriptEncoder.Create() may override your allowed characters and escape certain "blocked" characters. While the full set of blocked characters is not documented, from the reference source, JavaScriptEncoder.Create(TextEncoderSettings settings) constructs an encoder that blocks "HTML sensitive" characters, which are defined in AllowedBmpCodePointsBitmap.cs and include ':

public void ForbidHtmlCharacters()
{
    ForbidChar('<');
    ForbidChar('>');
    ForbidChar('&');
    ForbidChar('\''); // can be used to escape attributes
    ForbidChar('\"'); // can be used to escape attributes
    ForbidChar('+'); // technically not HTML-specific, but can be used to perform UTF7-based attacks
}

If you do not want to use JavaScriptEncoder.UnsafeRelaxedJsonEscaping but also don't want to have ' escaped, could create a custom JsonConverter<string> that manually pieces together the required encoded JSON string, then writes it out using Utf8JsonWriter.WriteRawValue() (which was first introduced in .NET 6):

public class StringConverter : JsonConverter<string>
{
    readonly static Lazy<JavaScriptEncoder> Encoder = new (() => 
        {
            var encoderSettings = new TextEncoderSettings();
            encoderSettings.AllowRange(UnicodeRanges.BasicLatin);
            encoderSettings.AllowCharacters('\u0027');

            return JavaScriptEncoder.Create(encoderSettings);
        });

    public override string? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) => reader.GetString();

    public override void Write(Utf8JsonWriter writer, string value, JsonSerializerOptions options)
    {
        var encoder = Encoder.Value;
        
        using var textWriter = new StringWriter();

        textWriter.Write("\"");
        foreach (var (startIndex, characterCount, final) in value.SplitIndices('\''))
        {
            encoder.Encode(textWriter, value, startIndex, characterCount);
            if (!final)
                textWriter.Write('\'');
        }
        textWriter.Write("\"");
        
        writer.WriteRawValue(textWriter.ToString(), true);
    }
}

public static class StringExtensions
{
    public static IEnumerable<(int startIndex, int characterCount, bool final)> SplitIndices(this string value, char separator)
    {
        if (value == null)
            throw new ArgumentNullException(nameof(value));
        int index = 0;
        int nextIndex;
        while ((nextIndex = value.IndexOf(separator, index)) >= 0)
        {
            yield return (index, nextIndex - index, false);
            index = nextIndex + 1;
        }
        yield return (index, value.Length - index, true);
    }
}

Then serialize as follows:

var model = new { text = "abc 'zorro' 123" };

var options = new JsonSerializerOptions
{
    Converters = { new StringConverter() },
};
var json = JsonSerializer.Serialize(model, options);

Which results in {"text":"abc 'zorro' 123"} as required. Demo fiddle here.

You could also try to create your own JavaScriptEncoder subclass that ignores global block lists, though that would likely be more involved that creating the custom converter.

Snorter answered 13/10, 2022 at 18:34 Comment(4)
All this to not have the single quotes be escaped. Is the solution then to switch over to NewtonSoft instead?Resistance
@OliverNilsen - if it fully meets your needs then feel free to do so. It has been around for a long time and is well tested and reviewed. The Json.NET serializer does not support async serialization though so if you need that you will need to stick with STJ.Snorter
I’ve tried to use the AllowCharacters method as the original poster mentioned and also as shown on MS Docs but it just doesn’t work.Resistance
@OliverNilsen - right, because AllowCharacters apparently doesn't override the block list. Only by using JavaScriptEncoder.UnsafeRelaxedJsonEscaping can you completely disable optional escaping. Or you can switch to Json.NET. Your choice.Snorter

© 2022 - 2024 — McMap. All rights reserved.