Keep HTML tags in XML using LINQ to XML
Asked Answered
C

4

5

I have an xml file from which I am extracting html using LINQ to XML. This is a sample of the file:

<?xml version="1.0" encoding="utf-8" ?>
<tips>
    <tip id="0">
    This is the first tip.
</tip>
<tip id="1">
    Use <b>Windows Live Writer</b> or <b>Microsoft Word 2007</b> to create and publish content.
</tip>
<tip id="2">
    Enter a <b>url</b> into the box to automatically screenshot and index useful webpages.
</tip>
<tip id="3">
    Invite your <b>colleagues</b> to the site by entering their email addresses.  You can then share the content with them!
</tip>
</tips>

I am using the following query to extract a 'tip' from the file:

Tip tip = (from t in tipsXml.Descendants("tip")
                   where t.Attribute("id").Value == nextTipId.ToString()
                   select new Tip()
                   {
                     TipText= t.Value,
                     TipId = nextTipId
                   }).First();

The problem I have is that the Html elements are being stripped out. I was hoping for something like InnerHtml to use instead of Value, but that doesn't seem to be there.

Any ideas?

Thanks all in advance,

Dave

Casemate answered 19/1, 2009 at 15:27 Comment(0)
H
8

Call t.ToString() instead of Value. That will return the XML as a string. You may want to use the overload taking SaveOptions to disable formatting. I can't check right now, but I suspect it will include the element tag (and elements) so you would need to strip this off.

Note that if your HTML isn't valid XML, you will end up with an invalid overall XML file.

Is the format of the XML file completely out of your control? It would be nicer for any HTML inside to be XML-encoded.

EDIT: One way of avoiding getting the outer part might be to do something like this (in a separate method called from your query, of course):

StringBuilder builder = new StringBuilder();
foreach (XNode node in element.Nodes())
{
    builder.Append(node.ToString());
}

That way you'll get HTML elements with their descendants and interspersed text nodes. Basically it's the equivalent of InnerXml, I strongly suspect.

Horsepower answered 19/1, 2009 at 15:36 Comment(1)
heh, snap on the edit. Encoding HTML inside XML is common and convenient for this kind of case; the alternative would be to use valid XHTML, declaring the XHTML xmlns as default and putting the tip/tips elements in a different namespace to avoid confusing the two.Tuna
A
1

Just use string.Concat(tip.Nodes()) to get the content with html tags

Amphicoelous answered 24/2, 2011 at 9:43 Comment(0)
T
0

TipText= t.Value,

XElement.value returns only the text that is directly inside the element. Text in nested elements - HTML or otherwise - will not be included, and of course any &-entity-references will appear in their decoded form.

If you want the content as a string with markup you could call XElement.ToString(), possibly with SaveOptions.DisableFormatting. But note this includes the wrapping < tip> element - that is, in web browser DOM terms, it's the outerHTML not the innerHTML. To get the innerHTML you would have to join together all the ToString()s of the child XElement.Nodes.

Tuna answered 19/1, 2009 at 15:51 Comment(0)
L
0

Just use:

string.Concat(element.Nodes()) 

to get the content with HTML tags.

Lifeboat answered 25/1, 2019 at 15:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.