XML Deserialization of string elements with newlines in C#
Asked Answered
M

3

7

I can't seem to figure out why this test doesn't pass

The test is:

given the following XML:

<?xml version="1.0" encoding="utf-8"?>
  <foo>
<account>
 1234567890
</account>
<deptCode>
 ABCXYZ
</deptCode>
</foo>

and the following class:

class Foo  {

  [XmlElement(ElementName = "account", DataType = "normalizedString")]
  string account;

  [XmlElement(ElementName = "deptCode", DataType = "normalizedString"]
  string deptCode;

}

when that XML is deserialized with:

XmlSerializer serializer = new XmlSerializer(typeof(Foo));
Foo myFoo = (Foo) serializer.Deserialize(xmlReader);

I get the following values:

Foo.account = "\r\n 1234567890 \r\n"
Foo.deptCode = "\r\n ABCXYZ \r\n"

instead of the expected

Foo.account = "1234567890"
    Foo.deptCode = "ABCXYZ"

How can I make it so that the deserialization process gives me the expected results? I thought the DataType="normalizedString" might do it, but it seems to have no effect, and when I use XmlReaderSettings.IgnoreWhitespace, it just takes away the "\r" character, leaving me with "\n 1234567890"

Meraree answered 20/10, 2011 at 15:36 Comment(4)
What's your code that deserializes the objects? Probably you need to set some options on the deserializer.Fungous
added deserialization code in question, the only relevant setting I could see was the aforementioned XmlReaderSettings.IngoreWhitespace, which still leaves me with the newline.Meraree
What type is xmlReader? Are you using XmlTextReader?Lancinate
Neither XmlReader nor XmlTextReader seems to workMeraree
M
4

It seems it is working as intended. From IgnoreWhitespace documentation:

White space that is not considered to be significant includes spaces, tabs, and blank lines used to set apart the markup for greater readability.

Basically, what it does is preserves (when set to false) whitespaces in between elements such as:

<Foo>

<bar>Text</bar>
</Foo>

The newline between <Foo> and <bar> will be returned by reader. Set IgnoreWhitespace to true, and it won't.

To achieve your goal you'll have to do programmatic trimming, as mentioned by Kirill. When you think about it, how is reader supposed to know whether whitespace of pure string content of element (as in your examples) is just for indenting purposes or actual content?

For more reading on ignoring whitespaces you may want to take a look here and here.

Martine answered 20/10, 2011 at 18:42 Comment(0)
R
4

You can create custom XmlTextReader class:

public class CustomXmlTextReader : XmlTextReader
{
    public CustomXmlTextReader(Stream stream) : base(stream) { }

    public override string ReadString()
    {
        return base.ReadString().Trim();
    }
}
Rumple answered 20/10, 2011 at 17:29 Comment(0)
M
4

It seems it is working as intended. From IgnoreWhitespace documentation:

White space that is not considered to be significant includes spaces, tabs, and blank lines used to set apart the markup for greater readability.

Basically, what it does is preserves (when set to false) whitespaces in between elements such as:

<Foo>

<bar>Text</bar>
</Foo>

The newline between <Foo> and <bar> will be returned by reader. Set IgnoreWhitespace to true, and it won't.

To achieve your goal you'll have to do programmatic trimming, as mentioned by Kirill. When you think about it, how is reader supposed to know whether whitespace of pure string content of element (as in your examples) is just for indenting purposes or actual content?

For more reading on ignoring whitespaces you may want to take a look here and here.

Martine answered 20/10, 2011 at 18:42 Comment(0)
L
1

Try using XmlTextReader for deserialization with the WhiteSpaceHandling property set to WhiteSpaceHandling.None and Normalization = true

Lancinate answered 20/10, 2011 at 16:2 Comment(3)
unfortunately XmlTextReader with WhiteSpaceHandling.None had no effectMeraree
what about setting Normalization = true? It's false by default. I think this should convert \n to white space.Lancinate
Normalization = true converts "\r\n" to "\n" but leaves the newline in there. XmlReaderSettings.IgnoreWhitespace also removes the "\r" but I couldn't test the combination of the two, since I can only seem to add an XmlReaderSettings instance to a plain XmlReader and not an XmlTextReader (the constructor has no settings parameter, and the settings property has no setter)Meraree

© 2022 - 2024 — McMap. All rights reserved.