How can I get PreMailer.Net to not change the encoding of non-ascii characters?
Asked Answered
B

2

7

I've also posted my problem as a Github Issue on the official repo.

I am using PreMailer.Net to inline CSS into HTML documents. However, when I call MoveCssInline, it encodes non-ASCII characters like '&'. For example:

<a href="http://www.website.com/page?param1=a&param2=b"></a>

Is changed to:

<a href="http://www.website.com/page?param1=a&amp;param2=b"></a>

I thought this behavior would be limited to URLs and href values but it turns out it also encodes innerHTML/content. For instance:

Additionally, I have tested further and found that this encoding is not just done on attributes like href. It in fact will also encode text/InnerHTML values, which are absolutely valid html without encoding. Example:

<p>&</p>

This is valid HTML and should not be encoded, but PreMailer.Net will change this to:

<p>&amp;</p>

Does anyone have a fix or workaround for this? I do not have control over the HTML documents and am not allowed to change the URLs or content other than inlining the CSS.

Burden answered 22/2, 2020 at 4:21 Comment(2)
Would encodeURIComponent and decodeURIComponent work for this purpose. References if needed <hr> Encoding Decoding (Assuming you have access to JavaScript)Hutcheson
@GalaxyCat105 - I do not have access to JavaScript, this is all running server-side in an ASP.NET / C# environment.Burden
F
2

Depending on your individual needs, as merely a guide, try these:

        Symbols.Ampersand: temp.Append("&amp;")
        Symbols.NoBreakSpace: temp.Append("&nbsp;")
        Symbols.GreaterThan: temp.Append("&gt;")
        Symbols.LessThan: temp.Append("&lt;")

Update:

These lines come from lines 132-139 of a PreMailer.Net dependency called AngleSharp, which is an HTML parser.

Currently, as far as I can tell the encoding is mandatory on AngleSharp, and hence it cannot be avoided with any setting in either AngleSharp or PreMailer.Net.

According to the following closed issue, this is by design in accordance with the HTML spec. However, I believe there is still a bug as it should only encode attribute values, not innerHTML content. Additionally, I don't think it is an acceptable behavior for a CSS inliner, which should not be validating or sanitizing HTML. Additionally, I don't even think the parser should be making changes that are not asked for by the client.

Freesia answered 15/3, 2020 at 1:32 Comment(4)
Okay, where/how would I use those though?Burden
I found this: github.com/AngleSharp/AngleSharp/blob/master/src/AngleSharp/…Freesia
Actually, that looks very promising. I commented out some of those lines and it looks like it might fix my problem, assuming it doesn't have other undesirable side effects.Burden
Glad I could help :)Freesia
E
0

This issue has been discussed over here and fixed here.

You should use these options as mentioned in this file.

:input_encoding => 'ASCII-8BIT',
:output_encoding => nil,
Electrotherapy answered 25/2, 2020 at 14:32 Comment(1)
All of those are for Premailer - the Ruby implementation. I'm using PreMailer.Net, and it doesn't appear to have those options.Burden

© 2022 - 2024 — McMap. All rights reserved.