This behavior is documented in How to customize character encoding with System.Text.Json:
Block lists
The preceding sections show how to specify allow lists of code points or ranges that you don't want to be escaped. However, there are global and encoder-specific block lists that can override certain code points in your allow list. Code points in a block list are always escaped, even if they're included in your allow list.
Global block list
The global block list includes things like private-use characters, control characters, undefined code points, and certain Unicode categories, such as the Space_Separator category, excluding U+0020 SPACE
. ... <snip>
Encoder-specific block lists
Examples of encoder-specific blocked code points include '<'
and '&'
for the HTML encoder, '\'
for the JSON encoder, and '%'
for the URL encoder. ... <snip>
So, as documented, JavaScriptEncoder.Create()
may override your allowed characters and escape certain "blocked" characters. While the full set of blocked characters is not documented, from the reference source, JavaScriptEncoder.Create(TextEncoderSettings settings)
constructs an encoder that blocks "HTML sensitive" characters, which are defined in AllowedBmpCodePointsBitmap.cs
and include '
:
public void ForbidHtmlCharacters()
{
ForbidChar('<');
ForbidChar('>');
ForbidChar('&');
ForbidChar('\''); // can be used to escape attributes
ForbidChar('\"'); // can be used to escape attributes
ForbidChar('+'); // technically not HTML-specific, but can be used to perform UTF7-based attacks
}
If you do not want to use JavaScriptEncoder.UnsafeRelaxedJsonEscaping
but also don't want to have '
escaped, could create a custom JsonConverter<string>
that manually pieces together the required encoded JSON string, then writes it out using Utf8JsonWriter.WriteRawValue()
(which was first introduced in .NET 6):
public class StringConverter : JsonConverter<string>
{
readonly static Lazy<JavaScriptEncoder> Encoder = new (() =>
{
var encoderSettings = new TextEncoderSettings();
encoderSettings.AllowRange(UnicodeRanges.BasicLatin);
encoderSettings.AllowCharacters('\u0027');
return JavaScriptEncoder.Create(encoderSettings);
});
public override string? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) => reader.GetString();
public override void Write(Utf8JsonWriter writer, string value, JsonSerializerOptions options)
{
var encoder = Encoder.Value;
using var textWriter = new StringWriter();
textWriter.Write("\"");
foreach (var (startIndex, characterCount, final) in value.SplitIndices('\''))
{
encoder.Encode(textWriter, value, startIndex, characterCount);
if (!final)
textWriter.Write('\'');
}
textWriter.Write("\"");
writer.WriteRawValue(textWriter.ToString(), true);
}
}
public static class StringExtensions
{
public static IEnumerable<(int startIndex, int characterCount, bool final)> SplitIndices(this string value, char separator)
{
if (value == null)
throw new ArgumentNullException(nameof(value));
int index = 0;
int nextIndex;
while ((nextIndex = value.IndexOf(separator, index)) >= 0)
{
yield return (index, nextIndex - index, false);
index = nextIndex + 1;
}
yield return (index, value.Length - index, true);
}
}
Then serialize as follows:
var model = new { text = "abc 'zorro' 123" };
var options = new JsonSerializerOptions
{
Converters = { new StringConverter() },
};
var json = JsonSerializer.Serialize(model, options);
Which results in {"text":"abc 'zorro' 123"}
as required. Demo fiddle here.
You could also try to create your own JavaScriptEncoder
subclass that ignores global block lists, though that would likely be more involved that creating the custom converter.