I have looked around a lot but have not been able to find a built-in .Net method that will only escape special XML characters:
<
, >
, &
, '
and "
if it's not a tag.
For example, take the following text:
Test& <b>bold</b> <i>italic</i> <<Tag index="0" />
I want it to be converted to:
Test& <b>bold</b> <i>italic</i> <<Tag index="0" />
Notice that the tags are not escaped. I basically need to set this value to an InnerXML
of an XmlElement
and as a result, those tags must be preserved.
I have looked into implementing my own parser and use a StringBuilder
to optimize it as much as I can but it can get pretty nasty.
I also know the tags that are acceptable which may simplify things (only: br, b, i, u, blink, flash, Tag). In addition, these tags can be self closing tags
(e.g. <u />)
or container tags
(e.g. <u>...</u>)
<b>foo <i>bar</b> really <br></i>
. You are in for plenty of fun if you want to do that yourself. As option consider HtmlAgilityPack to parse HTML into a reasonable tree and carefully insert all nodes into XML... – CheckbookTest Value is < 3 but > 1
. – Aloeswood< 3
isn't a valid start tag, so you could figure that out. But your point still stands,<
and>
are escaped to remove ambiguity in parsing. There are going to be cases where any reasonable parser would choose one path, while you may have wanted another. – Cherianne<3 but >1
. Allowing only a known list of tags makes it much easier, though. – Aloeswood