LINQ to XML ignores line breaks in attributes
Asked Answered
B

3

8

According to this question:

Are line breaks in XML attribute values allowed?

line breaks in XML attributes are perfectly valid (although perhaps not recommended):

<xmltag1>
    <xmltag2 attrib="line 1
line 2
line 3">
    </xmltag2>
</xmltag1>

When I parse such XML using LINQ to XML (System.Xml.Linq), those line breaks are converted silently to space ' ' characters.

Is there any way to tell the XDocument.Load() parser to preserve those line breaks?

P.S.: The XML I'm parsing is written by third-party software, so I cannot change the way the line breaks are written.

Blat answered 13/7, 2012 at 8:37 Comment(1)
If you are writing attributes programatically look at this articlewhich shows different ways of escaping string.weblogs.sqlteam.com/mladenp/archive/2008/10/21/… keep in mind that not only linebreaks must be escaped.Irrespective
I
9

If you want line breaks in attribute values to be preserved then you need to write them with character references e.g.

<foo bar="Line 1.&#10;Line 2.&#10;Line3."/>

as other wise the XML parser will normalize them to spaces, according to the XML specification http://www.w3.org/TR/xml/#AVNormalize.

[edit] If you want to avoid the attribute value normalization then loading the XML with a legacy XmlTextReader helps:

            string testXml = @"<foo bar=""Line 1.
Line 2.
Line 3.""/>";

            XDocument test;
            using (XmlTextReader xtr = new XmlTextReader(new StringReader(testXml)))
            {
                xtr.Normalization = false;
                test = XDocument.Load(xtr);
            }
            Console.WriteLine("|{0}|", test.Root.Attribute("bar").Value);

That outputs

|Line 1.
Line 2.
Line 3.|
Inez answered 13/7, 2012 at 8:42 Comment(2)
Thank you, but as I wrote in my question, the XML is written by a third-party software, so I cannot change this. Maybe I need some kind of RegEx replace which converts the line breaks to &#10;Blat
I saw that note in your question but in this case there is a clear specification and the result you get is complying with the specification. So I wrote that answer to point out that the behaviour you get is the right one, even if not wanted in your case. I think a legacy XmlTextReader however will allow you to avoid the attribute value normalization, so I will edit my answer to show that.Inez
P
1

According to MSDN:

Although XML processors preserve all white space in element content, they frequently normalize it in attribute values. Tabs, carriage returns, and spaces are reported as single spaces. In certain types of attributes, they trim white space that comes before or after the main body of the value and reduce white space within the value to single spaces. (If a DTD is available, this trimming will be performed on all attributes that are not of type CDATA.)

For example, an XML document might contain the following:

<whiteSpaceLoss note1="this is a note." note2="this
is
a
note.">

An XML parser reports both attribute values as "this is a note.", converting the line breaks to single spaces.

I can't find anything about preserving whitespaces of attributes, but I guess it may be impossible according to this explanation.

Pecker answered 13/7, 2012 at 9:16 Comment(0)
A
0

the line breaks are not spaces when parsed (not ASCII code 32) if you step through each letter you will see that the "space ' '" is a ASCII code 10 =LF(LineFeed)(!!) - so the linebreaks are still present if you need try to replace them with a ASCII 13 in your code... (textboxes (windows forms) not showing LF as a linebreak)

Acrospire answered 13/7, 2012 at 8:51 Comment(3)
Thank you, I tested that before, and I really got two ASCII code 32 characters where the line breaks should be. I'm going to test that again to be sure.Blat
I tested it again. Both '\r' and '\n' characters in the XML attribute are converted to ' ' spaces (ASCII code 32).Blat
u'r right - that applies to a cdata section - could not find a way currently to preserve the linebreaks. is an reaplace of 32 32 to LB an option for you?Acrospire

© 2022 - 2024 — McMap. All rights reserved.