Reference to undeclared entity exception while working with XML
Asked Answered
P

9

26

I am trying to set the innerxml of a xmldoc but get the exception: Reference to undeclared entity

XmlDocument xmldoc = new XmlDocument();
string text = "Hello, I am text α   – —"
xmldoc.InnerXml = "<p>" + text + "</p>";

This throws the exception:

Reference to undeclared entity 'alpha'. Line 2, position 2..

How would I go about solving this problem?

Pomeroy answered 11/11, 2008 at 18:4 Comment(0)
J
30

XML, unlike HTML does not define entities (ie named references to UNICODE characters) so &alpha; &mdash; etc. are not translated to their corresponding character. You must use the numerical value instead. You can only use &lt; and &amp; in XML

If you want to create HTML, use an HtmlDocument instead.

Juback answered 11/11, 2008 at 18:11 Comment(1)
HtmlDocument comes from the System.Windows.Forms namespace j.mp/pSmv82 If you don't like its close association with the WebBrowser control or that causes issues to your app, a pure HTML parser is available through the HTML Agility Pack htmlagilitypack.codeplex.com/wikipage?title=ExamplesRedon
G
15

In .Net, you can use the System.Xml.XmlConvert class:

string text = XmlConvert.EncodeName("Hello &alpha;");

Alternatively, you can declare the entities locally by putting the declarations between square brackets in a DOCTYPE declaration. Add the following header to your xml:

<!DOCTYPE documentElement[
<!ENTITY Alpha "&#913;">
<!ENTITY ndash "&#8211;">
<!ENTITY mdash "&#8212;">
]>

Do a google on "html character entities" for the entity definitions.

Grot answered 9/5, 2009 at 6:50 Comment(0)
K
7

Try replacing &Alpha with

  &#913;
Kern answered 11/11, 2008 at 18:6 Comment(0)
R
6

The preceding answer is right. Another alternative is to link your html document to the DTD where those character entities are defined, and that is standard XHTML DTD definition. Your xml file should include the following declaration:

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
            "http://www.w3.org/TR/html4/strict.dtd">
Ripe answered 11/11, 2008 at 18:21 Comment(1)
For details on how to apply: azurator.blogspot.be/2012/03/parsing-html-into-xelement.htmlGermann
N
3

Use string System.Net.WebUtility.HtmlDecode(string) which will decode all HTML entity encoded characters to its Unicode variant. It is available from dot.net framework 4

Nonrigid answered 28/2, 2014 at 9:13 Comment(1)
if you have an &amp; in your HtmlDecode will mess it up b/c it will turn it into & and then XML will choke on itVincevincelette
C
1

A variant of the solution described at https://mcmap.net/q/515143/-reference-to-undeclared-entity-exception-while-working-with-xml is: Declare the entities in a separate file, and then reference that file from the XML declaration subset. Here is an example for how to use HTML entities in an XSLT stylesheet.

<!DOCTYPE xsl:stylesheet
[
<!ENTITY % htmlentities SYSTEM "html-entity-list.ent">
%htmlentities;
]>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"...>

The external file with entities is "html-entitiy-list.ent". I have generated it from https://html.spec.whatwg.org/entities.json . An example entry in the generated file is this one:

<!ENTITY Auml "Ä">
Crosstree answered 22/2, 2021 at 9:28 Comment(0)
U
0

You could also set the InnerText to "Hello, I am text α – —", making the XmlDocument escape them automatically. I think.

Upgrade answered 11/11, 2008 at 18:24 Comment(0)
F
0

The use of a HtmlDocument wasn't suitable in my situation, our system had a custom XmlUrlResolver which we made use of for loading the xml.

//setup
public class CustomXmlResolver : XmlUrlResolver { /* ... */ }
String originalXml; //fetched xml with html entities in it

var doc = new XmlDocument();
doc.XmlResolver = new AdCastXmlResolver();

//making use of a transitional dtd
doc.LoadXml("<!DOCTYPE html SYSTEM \"xhtml1-transitional.dtd\" > " + originalXml);
Fandango answered 17/2, 2010 at 22:43 Comment(0)
A
0

If you do want to use the HTML entity names you are used to, the W3C has got you covered and has produced "XML Entity Definitions for Characters" http://www.w3.org/TR/xml-entity-names/, which essentially is a list of named entities very similar to the ones that HTML has. But as mentioned above, this is not built into XML, and needs to be explicitly supported by XML applications that want to use these named entities.

Alpinist answered 4/1, 2016 at 8:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.