Hexadecimal value 0x00 is a invalid character loading XML document
Asked Answered
D

1

13

I recently had an XML which would not load. The error message was

Hexadecimal value 0x00 is a invalid character

received by the minimum of code in LinqPad (C# statements):

var xmlDocument = new XmlDocument();
xmlDocument.Load(@"C:\Users\Thomas\AppData\Local\Temp\tmp485D.tmp");

I went through the XML with a hex editor but could not find a 0x00 character. I minimized the XML to

<?xml version="1.0" encoding="UTF-8"?>
<x>
</x>

In my hex editor it shows up as

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000  FF FE 3C 00 3F 00 78 00 6D 00 6C 00 20 00 76 00  ÿþ<.?.x.m.l. .v.
00000010  65 00 72 00 73 00 69 00 6F 00 6E 00 3D 00 22 00  e.r.s.i.o.n.=.".
00000020  31 00 2E 00 30 00 22 00 20 00 65 00 6E 00 63 00  1...0.". .e.n.c.
00000030  6F 00 64 00 69 00 6E 00 67 00 3D 00 22 00 55 00  o.d.i.n.g.=.".U.
00000040  54 00 46 00 2D 00 38 00 22 00 3F 00 3E 00 0D 00  T.F.-.8.".?.>...
00000050  0A 00 3C 00 78 00 3E 00 0D 00 0A 00 3C 00 2F 00  ..<.x.>.....<./.
00000060  78 00 3E 00                                      x.>.

So it's very easy to see that there is no 00 00 character anywhere. All even columns contain values other than 00.

Why does it complain about invalid 0x00 character?

Delgado answered 12/10, 2014 at 22:17 Comment(2)
See here also #11037299Garlicky
I had the same problem. It was caused by me reading in the xml as bytes and converting each "batch" of bytes to a string, and then concatenating the strings.Mash
D
18

The problem is in the encoding. The byte order marks FF FE are for UTF-16, but the XML header defines encoding="UTF-8".

If you generate the XML yourself, there are two options:

a) write a UTF-8 header: EF BB BF

b) define UTF-16 encoding: encoding="UTF-16"

If you receive the XML from someone else, there are also two options:

A) tell the author to fix the XML according a) or b)

B) sanitize the input in your application (not preferred)

Delgado answered 12/10, 2014 at 22:17 Comment(3)
+1. Another hacky way to load - read as string first and than load XML from string (as parser will ignore encoding in this case).Culture
@AlexeiLevenkov Your hacky way didnt work for me. I'm dong an XDocument.Parse(data.getstring("datafieldname") and it is producing the hex error.Barehanded
@Barehanded you should ask separate question showing minimal reproducible example of your particular case (linking to this post would "show research effort"). See how this question demonstrated "minimal" data to reproduce the issue and consider doing similar in your new question.Culture

© 2022 - 2024 — McMap. All rights reserved.