How to make TXMLDocument (with the MSXML Implementation) always include the encoding attribute?
Asked Answered
B

2

9

I have legacy code (I didn't write it) that always included the encoding attribute, but recompiling it to D2010, TXMLDocument doesn't include the encoding anymore. Because the XML data have accented characters both on tags and data, TXMLDocument.LoadFromFile simply throws EDOMParseErros saying that an invalid character is found on the file. Relevant code:

   Doc := TXMLDocument.Create(nil);  
   try
     Doc.Active := True;
     Doc.Encoding := XMLEncoding;
     RootNode := Doc.CreateElement('Test', '');
     Doc.DocumentElement := RootNode;
     <snip>
     //Result := Doc.XMl.Text;
     Doc.SaveToXML(Result);    // Both lines gives the same result

On older versions of Delphi, the following line is generated:

<?xml version="1.0" encoding="ISO-8859-1"?>

On D2010, this is generated:

<?xml version="1.0"?>

If I change manually the line, all works like always worked in the last years.

UPDATE: XMLEncoding is a constant and is defined as follow

  XMLEncoding = 'ISO-8859-1';
Buncombe answered 3/5, 2010 at 17:32 Comment(0)
B
4
var 
  XMLStream: TStringStream;
begin  
   Doc := TXMLDocument.Create(nil);  
   try
     Doc.Active := True;
     Doc.Encoding := XMLEncoding;
     RootNode := Doc.CreateElement('Test', '');
     Doc.DocumentElement := RootNode;
     <snip>
     XMLStream := TStringStream.Create;
     Doc.SaveToStream(XMLStream);
     Result := XmlStream.DataString;
     XMLStream.Free;

Since Ken's answer and the link to MSXML article, I decided to investigate the XML property and SaveToXML method. Both use the XML property of the MSXMLDOM implementation - which in the article is said that do not bring the encoding when directly read ( in the "Creating New XML Documents with MSXML" section right after the use of CreateProcessInstruction method).

UPDATE:

I found that accented characters are getting truncated in the resulting XML. When the processor of that XML started to throw strange errors, we saw that the chars are being converted to the numeric char constant ( #13 is the numeric char constant for carriage return). So, I used a TStringStream to get it FINALLY right.

Buncombe answered 4/5, 2010 at 18:37 Comment(0)
D
6

You'll want to see IXMLDocument.CreateProcessingStruction. I use OmniXML, but it's syntax is similar and should get you started:

var
  FDoc: IXMLDocument;
  PI:  IXMLProcessingInstruction;
begin
  FDoc := OmniXML.CreateXMLDoc();
  PI := FDoc.CreateProcessingInstruction('xml', 'version="1.0" encoding="UTF-8"');
  FDoc.AppendChild(PI);
end;
Delmadelmar answered 3/5, 2010 at 20:8 Comment(2)
That's exactly what Microsoft recommends for MSXML, too: msdn.microsoft.com/en-us/library/aa468560.aspx. However, the thing at the start of the document isn't technically a processing instruction. It's an XML declaration; the string "xml" isn't really allowed for the name of a processing instruction, so it appears the CreateProcessingInstruction method is doing double duty.Halfwit
@Rob: That's probably why it took me a while a couple of years ago to figure it out (didn't have the MSDN link you provided at the time). However, it actually could be considered a processing instruction, couldn't it, if it's telling the parser how to interpret the content? "This is XML, and it's in this character set - that will make it easier to figure out."Delmadelmar
B
4
var 
  XMLStream: TStringStream;
begin  
   Doc := TXMLDocument.Create(nil);  
   try
     Doc.Active := True;
     Doc.Encoding := XMLEncoding;
     RootNode := Doc.CreateElement('Test', '');
     Doc.DocumentElement := RootNode;
     <snip>
     XMLStream := TStringStream.Create;
     Doc.SaveToStream(XMLStream);
     Result := XmlStream.DataString;
     XMLStream.Free;

Since Ken's answer and the link to MSXML article, I decided to investigate the XML property and SaveToXML method. Both use the XML property of the MSXMLDOM implementation - which in the article is said that do not bring the encoding when directly read ( in the "Creating New XML Documents with MSXML" section right after the use of CreateProcessInstruction method).

UPDATE:

I found that accented characters are getting truncated in the resulting XML. When the processor of that XML started to throw strange errors, we saw that the chars are being converted to the numeric char constant ( #13 is the numeric char constant for carriage return). So, I used a TStringStream to get it FINALLY right.

Buncombe answered 4/5, 2010 at 18:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.