xmlreader newline \n instead of \r\n
Asked Answered
C

5

17

When I use XmlReader.ReadOuterXml(), elements are separated by \n instead of \r\n. So, for example, if I have XmlDocument representatino of

<A>
<B>
</B>
</A>

I get

<A>\n<B>\n</B>\n</A>

Is there an option to specify newline character? XmlWriterSettings has it but XmlReader doesn't seem to have this.

Here is my code to read xml. Note that XmlWriterSettings by default has NewLineHandling = Replace

XmlDocument xmlDocument = <Generate some XmlDocument>
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;

// Use a memory stream because it accepts UTF8 characters.  If we use a 
// string builder the XML will be UTF16.
using (MemoryStream memStream = new MemoryStream())
{
    using (XmlWriter xmlWriter = XmlWriter.Create(memStream, settings))
    {
        xmlDocument.Save(xmlWriter);
    }

    //Set the pointer back to the beginning of the stream to be read
    memStream.Position = 0;
    using (XmlReader reader = XmlReader.Create(memStream))
    {
        reader.Read();
        string header = reader.Value;
        reader.MoveToContent();
        return "<?xml " + header + " ?>" + Environment.NewLine + reader.ReadOuterXml();
    }
}
Centreboard answered 25/11, 2009 at 0:22 Comment(1)
This is further troublesome when the input xml has a mix of \r\n and \n, and when the downstream system is sensitive to difference between the two, e.g. the Xml document is an intermediary state to perform an xslt transform before encoding the output in a flat file with specific delimiters.Gerfen
L
17

XmlReader will automatically normalize \r\n\ to \n. Although this seems unusual on Windows, it is actually required by the XML Specification (http://www.w3.org/TR/2008/REC-xml-20081126/#sec-line-ends).

You can do a String.Replace:

string s = reader.ReadOuterXml().Replace("\n", "\r\n");
Lalitta answered 25/11, 2009 at 0:42 Comment(1)
For the sake of cross-platform compatibility, I'd suggest .Replace("\n", Environment.NewLine), but if your environment is fixed this is functionality identical.Picklock
Z
6

I had to write database data to an xml file and read it back from the xml file, using LINQ to XML. Some fields in a record were themselves xml strings complete with \r characters. These had to remain intact. I spent days trying to find something that would work, but it seems Microsoft was by design converting \r to \n.

The following solution works for me:

To write a loaded XDocument to the XML file keeping \r intact, where xDoc is an XDocument and filePath is a string:

XmlWriterSettings xmlWriterSettings = new XmlWriterSettings 
    { NewLineHandling = NewLineHandling.None, Indent = true };
using (XmlWriter xmlWriter = XmlWriter.Create(filePath, xmlWriterSettings))
{
    xDoc.Save(xmlWriter);
    xmlWriter.Flush();
}

To read an XML file into an XElement keeping \r intact:

using (XmlTextReader xmlTextReader = new XmlTextReader(filePath) 
   { WhitespaceHandling = WhitespaceHandling.Significant })
{
     xmlTextReader.MoveToContent();
     xDatabaseElement = XElement.Load(xmlTextReader);
}
Zaremski answered 16/8, 2011 at 16:37 Comment(1)
This is because the XmlTextReader has a normalization setting defaulted to false unlike XmlReader.Create which always normalizes newlines no matter what. See msdn.microsoft.com/en-us/library/… and the note towards the end of msdn.microsoft.com/en-us/library/…Gerfen
B
4

Solution 1: Write entitized XML

Use a well configured XmlWriter with NewLineHandling.Entitize option so the XmlReader will not eliminate normalize the line endings.

You can use such a custom XmlWriter even with XDocument:

xDoc.Save(XmlWriter.Create(fileName, new XmlWriterSettings { NewLineHandling = NewLineHandling.Entitize }));

Solution 2: Read non-entitized XML without normalization

Solution 1 is the cleaner way; however, it is possible that you already have the non-entitized XML and you cannot modify the creation and still you want to prevent normalization. The accepted answer suggests a replace but that replaces every \n occurrences blindly even if it is not desirable. To retrieve all of the line endings as they are in the file you can try to use the legacy XmlTextReader class, which does not normalize XML files by default. You can use it with XDocument, too:

var xDoc = XDocument.Load(new XmlTextReader(fileName));
Brae answered 24/1, 2017 at 14:39 Comment(0)
D
0

There's a quicker way if you're just trying to get to UTF-8. First create a writer:

public class EncodedStringWriter : StringWriter
{
    public EncodedStringWriter(StringBuilder sb, Encoding encoding)
        : base(sb)
    {
        _encoding = encoding;
    }

    private Encoding _encoding;

    public override Encoding Encoding
    {
        get
        {
            return _encoding;
        }
    }

}

Then use it:

XmlDocument doc = new XmlDocument();
doc.LoadXml("<foo><bar /></foo>");

StringBuilder sb = new StringBuilder();
XmlWriterSettings xws = new XmlWriterSettings();
xws.Indent = true;

using( EncodedStringWriter w = new EncodedStringWriter(sb, Encoding.UTF8) )
{
    using( XmlWriter writer = XmlWriter.Create(w, xws) )
    {
        doc.WriteTo(writer);
    }
}
string xml = sb.ToString();

Gotta give credit where credit is due.

Delly answered 25/11, 2009 at 2:37 Comment(0)
M
-2

XmlReader reads files, not writes them. If you are getting \n in your reader it is because that's what's in the file. Both \n and \r are whitespace and are semantically the same in XML, it will not affect the meaning or content of the data.

Edit:

That looks like C#, not Ruby. As binarycoder says, ReadOuterXml is defined to return normalized XML. Typically this is what you want. If you want the raw XML you should use Encoding.UTF8.GetString(memStream.ToArray()), not XmlReader.

Mindymine answered 25/11, 2009 at 0:32 Comment(1)
Dour, I added my code. If I use XmlWriter with NewLineHandling = Replace, shouldn't it write correct string?Centreboard

© 2022 - 2024 — McMap. All rights reserved.