Converting HTML entities to Unicode Characters in C#

Asked 21/11, 2012 at 11:40 Answered 25/9, 2018 at 13:39

Solved c#windows-runtime html-entities html-encode

I found similar questions and answers for Python and Javascript, but not for C# or any other WinRT compatible language.

The reason I think I need it, is because I'm displaying text I get from websites in a Windows 8 store app. E.g. é should become é.

Or is there a better way? I'm not displaying websites or rss feeds, but just a list of websites and their titles.

Calaboose answered 21/11, 2012 at 11:40 Comment(3)

Duplicate: #5784317 – Latrena 21/11, 2012 at 11:49

Actually it's not. He had a different issue. – Calaboose 21/11, 2012 at 12:45

It is indeed a duplicate. That question just had an extra step at the end that you don't need. – Cacus 1/5, 2019 at 13:47

I recommend using System.Net.WebUtility.HtmlDecode and NOT HttpUtility.HtmlDecode.

This is due to the fact that the System.Web reference does not exist in Winforms/WPF/Console applications and you can get the exact same result using this class (which is already added as a reference in all those projects).

Usage:

string s =  System.Net.WebUtility.HtmlDecode("&eacute;"); // Returns é

Gassman answered 21/11, 2012 at 11:57 Comment(4)

"you can get the exact same result using this class" - INCORRECT. Only the HttpUtility implementation will correctly decode ' as an apostrophe on WP8. – Serotonin 23/12, 2013 at 7:18

In my case, HttpUtility.HtmlDecoded do the right thing. – Marabout 14/3, 2014 at 17:39

Good solution, but the disadvantage with System.Net.WebUtility.HtmlDecode is you won't find it under .NET Framework 3.5 if you are coding for old Windows 7. – Wakerly 26/1, 2017 at 14:50

link is broken. – Wrong 4/4, 2023 at 5:22

Use HttpUtility.HtmlDecode() .Read on msdn here

decodedString = HttpUtility.HtmlDecode(myEncodedString)

Retinite answered 21/11, 2012 at 11:43 Comment(3)

Yep, note that for WinForms or Console application you first have to add reference to the System.Web assembly. – Referendum 21/11, 2012 at 11:48

Hi, I tried this solution but it doesn't decode characters like { :( – Intoxicated 26/7, 2014 at 16:36

@l19 Is that a recognized htmlentity? I can't find it in this list. I did manage to find it in a developmental W3C spec, though. That's probably why it isn't decoded yet. – Leralerch 25/3, 2015 at 20:14

This might be useful, replaces all (for as far as my requirements go) entities with their unicode equivalent.

    public string EntityToUnicode(string html) {
        var replacements = new Dictionary<string, string>();
        var regex = new Regex("(&[a-z]{2,5};)");
        foreach (Match match in regex.Matches(html)) {
            if (!replacements.ContainsKey(match.Value)) { 
                var unicode = HttpUtility.HtmlDecode(match.Value);
                if (unicode.Length == 1) {
                    replacements.Add(match.Value, string.Concat("&#", Convert.ToInt32(unicode[0]), ";"));
                }
            }
        }
        foreach (var replacement in replacements) {
            html = html.Replace(replacement.Key, replacement.Value);
        }
        return html;
    }

Burundi answered 1/7, 2014 at 16:34 Comment(2)

Work for my case, but I edited the regex for "var regex = new Regex("(&[a-z]{2,6};)");" There is a lot of html character longer than 5 (like $eacute; ) – Wedurn 11/4, 2016 at 21:7

I'd also suggest changing the regex to var regex = new Regex("(&[a-zA-Z]{2,7};)");so that characters such as Ã are included. – Elana 26/4, 2017 at 17:45

Different coding/encoding of HTML entities and HTML numbers in Metro App and WP8 App.

With Windows Runtime Metro App

{
    string inStr = "ó";
    string auxStr = System.Net.WebUtility.HtmlEncode(inStr);
    // auxStr == &#243;
    string outStr = System.Net.WebUtility.HtmlDecode(auxStr);
    // outStr == ó
    string outStr2 = System.Net.WebUtility.HtmlDecode("&oacute;");
    // outStr2 == ó
}

With Windows Phone 8.0

{
    string inStr = "ó";
    string auxStr = System.Net.WebUtility.HtmlEncode(inStr);
    // auxStr == &#243;
    string outStr = System.Net.WebUtility.HtmlDecode(auxStr);
    // outStr == &#243;
    string outStr2 = System.Net.WebUtility.HtmlDecode("&oacute;");
    // outStr2 == ó
}

To solve this, in WP8, I have implemented the table in HTML ISO-8859-1 Reference before calling System.Net.WebUtility.HtmlDecode().

Negotiable answered 5/2, 2013 at 9:15 Comment(1)

The link is dead. – Pintail 15/7, 2020 at 12:40

This worked for me, replaces both common and unicode entities.

private static readonly Regex HtmlEntityRegex = new Regex("&(#)?([a-zA-Z0-9]*);");

public static string HtmlDecode(this string html)
{
    if (html.IsNullOrEmpty()) return html;
    return HtmlEntityRegex.Replace(html, x => x.Groups[1].Value == "#"
        ? ((char)int.Parse(x.Groups[2].Value)).ToString()
        : HttpUtility.HtmlDecode(x.Groups[0].Value));
}

[Test]
[TestCase(null, null)]
[TestCase("", "")]
[TestCase("&#39;fark&#39;", "'fark'")]
[TestCase("&quot;fark&quot;", "\"fark\"")]
public void should_remove_html_entities(string html, string expected)
{
    html.HtmlDecode().ShouldEqual(expected);
}

Alboin answered 29/9, 2016 at 18:53 Comment(0)

Improved Zumey method (I can`t comment there). Max char size is in the entity: &exclamation; (11). Upper case in the entities are also possible, ex. À (Source from wiki)

public string EntityToUnicode(string html) {
        var replacements = new Dictionary<string, string>();
        var regex = new Regex("(&[a-zA-Z]{2,11};)");
        foreach (Match match in regex.Matches(html)) {
            if (!replacements.ContainsKey(match.Value)) { 
                var unicode = HttpUtility.HtmlDecode(match.Value);
                if (unicode.Length == 1) {
                    replacements.Add(match.Value, string.Concat("&#", Convert.ToInt32(unicode[0]), ";"));
                }
            }
        }
        foreach (var replacement in replacements) {
            html = html.Replace(replacement.Key, replacement.Value);
        }
        return html;
    }

Firebird answered 25/9, 2018 at 13:39 Comment(0)

With Windows Runtime Metro App

With Windows Phone 8.0

Recommended topics

Hot tags