How to Decode XML Blob field in D7
Asked Answered
H

2

8

I'm having a problem trying to decode XML data returned by an instance of MS SQL Server 2014 to an app written in D7. (the version of Indy is the one which came with it, 9.00.10).

Update When I originally wrote this q, I was under the impression that the contents of the blob field needed to be Base64-decoded, but it seems that that was wrong. Having followed Remy Lebeau's suggestion, the blob stream contains recognisable text in the field names and field values before decoding but not afterwards.

In the code below, the SQL in the AdoQuery is simply

Select * from Authors where au_lname = 'White' For XML Auto

the Authors table being the one in the demo 'pubs' database. I've added the "Where" clause to restrict the size of the result set so I can show a hex dump of the returned blob.

According to the Sql Server OLH, the default type of the returned data when 'For XML Auto' is specified is 'binary base64-encoded format'. The data type of the single field of the AdoQuery is ftBlob, if I let the IDE create this field.

Executing the code below generates an exception "Uneven size in DecodeToStream". At the call to IdDecoderMIME.DecodeToString(S), the length of the string S is 3514, and 3514 mod 4 is 2, not 0 as it apparently should be, hence the exception. I've confirmed that the number of bytes in the field's value is 3514, so there's no difference between the size of the variant and the length of the string, i.e. nothing has gone awol in between.

procedure TForm1.FormCreate(Sender: TObject);
var
  SS : TStringStream;
  Output : String;
  S : String;
  IdDecoderMIME : TIdDecoderMIME;
begin
  SS := TStringStream.Create('');
  IdDecoderMIME := TIdDecoderMIME.Create(Nil);
  try
    AdoQuery1.Open;
    TBlobField(AdoQuery1.Fields[0]).SaveToStream(SS);
    S := SS.DataString;
    IdDecoderMIME.FillChar := #0;
    Output := IdDecoderMIME.DecodeToString(S);
    Memo1.Lines.Text := S;
  finally
    SS.Free;
    IdDecoderMIME.Free;
  end;
end;

I'm using this code:

procedure TForm1.FormCreate(Sender: TObject);
var
  SS : TStringStream;
  MS : TMemoryStream;
  Output : String;
begin
  SS := TStringStream.Create('');
  MS := TMemoryStream.Create;
  try
    AdoQuery1.Open;
    TBlobField(AdoQuery1.Fields[0]).SaveToStream(SS);
    SS.WriteString(#13#10);
    Output := SS.DataString;
    SS.Position := 0;
    MS.CopyFrom(SS, SS.Size);
    MS.SaveToFile(ExtractFilePath(Application.ExeName) + 'Blob.txt');
  finally
    SS.Free;
    MS.Free;
  end;
end;

A hex dump of the Blob.Txt file looks like this

00000000  44 05 61 00 75 00 5F 00 69 00 64 00 44 08 61 00  D.a.u._.i.d.D.a.
00000010  75 00 5F 00 6C 00 6E 00 61 00 6D 00 65 00 44 08  u._.l.n.a.m.e.D.
00000020  61 00 75 00 5F 00 66 00 6E 00 61 00 6D 00 65 00  a.u._.f.n.a.m.e.
00000030  44 05 70 00 68 00 6F 00 6E 00 65 00 44 07 61 00  D.p.h.o.n.e.D.a.
00000040  64 00 64 00 72 00 65 00 73 00 73 00 44 04 63 00  d.d.r.e.s.s.D.c.
00000050  69 00 74 00 79 00 44 05 73 00 74 00 61 00 74 00  i.t.y.D.s.t.a.t.
00000060  65 00 44 03 7A 00 69 00 70 00 44 08 63 00 6F 00  e.D.z.i.p.D.c.o.
00000070  6E 00 74 00 72 00 61 00 63 00 74 00 44 07 61 00  n.t.r.a.c.t.D.a.
00000080  75 00 74 00 68 00 6F 00 72 00 73 00 01 0A 02 01  u.t.h.o.r.s.....
00000090  10 E4 04 00 00 0B 00 31 37 32 2D 33 32 2D 31 31  .......172-32-11
000000A0  37 36 02 02 10 E4 04 00 00 05 00 57 68 69 74 65  76.........White
000000B0  02 03 10 E4 04 00 00 07 00 4A 6F 68 6E 73 6F 6E  .........Johnson
000000C0  02 04 0D E4 04 00 00 0C 00 34 30 38 20 34 39 36  .........408 496
000000D0  2D 37 32 32 33 02 05 10 E4 04 00 00 0F 00 31 30  -7223.........10
000000E0  39 33 32 20 42 69 67 67 65 20 52 64 2E 02 06 10  932 Bigge Rd....
000000F0  E4 04 00 00 0A 00 4D 65 6E 6C 6F 20 50 61 72 6B  ......Menlo Park
00000100  02 07 0D E4 04 00 00 02 00 43 41 02 08 0D E4 04  .........CA.....

As you can see, some of it is legible (field names and contents), some of it not. Does anyone recognise this format and know how to clean it up into the plain text I get from executing the same query in SS Management Studio, i.e. how do I successfully extract the XML from the result set?

Btw, I get the same result (including the contents of the Blob.Txt file) using both the MS OLE DB Provider for Sql Server and the Sql Server Native Client 11 provider, and using Delphi Seattle in place of D7.

Given that the code accesses an external database, this code is the closest I can get to an MCVE.

Update #2 The decoding problem vanishes if I change the Sql query to

select Convert(Text,
(select * from authors where au_lname = 'White' for xml AUTO
))

which gives the result (in SS) of

<authors au_id="172-32-1176" au_lname="White" au_fname="Johnson" phone="408 496-7223" address="10932 Bigge Rd." city="Menlo Park" state="CA" zip="94025" contract="1"/>

but I'm still interested to know how to get this to work without needing the Convert(). I've noticed that if I remove the Where clause from the Sql, what is returned is not well-formed XML - it contains a series of nodes, one per data row, but there is no enclosing root node.

Also btw, I realise that I can avoid this problem by not using "For XML Auto", I'm just interested in how to do it correctly. Also, I don't need any help parsing the XML once I've managed to extract it.

Hubby answered 5/12, 2016 at 21:32 Comment(2)
Does the base64 include padding characters? It should, to make it an even multiple of 4. If not, you could write your own padding characters to the end of your TStringStream to make its Size an even multiple (though you really should be using TDataSet.CreateBlobStream() instead of TBlobField.SaveToStream(), but then you can't add padding manually). Alternatively, consider upgrading to Indy 10, as its TIdDecoderMIME does not reject input that is not an even multiple of 4.Norward
@RemyLebeau: Thanks. I get the same result using TDataSet.CreateBlobStream. After trying the manual padding you suggested, it seems that Base64 decoding isn't actually necessary in that it converts partially-legible text to illegible- see updated q text.Hubby
B
4

Add the TYPE Directive to specify that you want XML returned.

select * 
from Authors 
where au_lname = 'White' 
for xml auto, type
Brooking answered 6/12, 2016 at 13:5 Comment(3)
Thanks. Unfortunately adding ", type" to the query has the effect that the AdoQuery1 object does not get any field objects created on the call to Open(), so it id not possible to retrieve any data at all. My work-around using Convert() in the query works fine, though.Hubby
@Hubby it does if you use the SQLOLEDB.1. If you use SQLNCLI11.1 you have to set DataTypeCompatibility=80 in the connection string.Brooking
So it does! The result still isn't well-formed XML, but I can deal with that.Hubby
K
2

You can't simply decode the binary blob into XML.

You can use TADOCommand and direct its output stream to an XML document object e.g.:

const
  adExecuteStream = 1024;
var
  xmlDoc, RecordsAffected: OleVariant;
  cmd: TADOCommand;

xmlDoc := CreateOleObject('MSXML2.DOMDocument.3.0'); // or CoDomDocument30.Create;
xmlDoc.async := False;

cmd := TADOCommand.Create(nil);    
// specify your connection string
cmd.ConnectionString := 'Provider=SQLOLEDB;Data Source=(local);...';
cmd.CommandType := cmdText;
cmd.CommandText := 'select top 1 * from items for xml auto';
cmd.Properties['Output Stream'].Value := xmlDoc;
cmd.Properties['XML Root'].Value := 'RootNode';
cmd.CommandObject.Execute(RecordsAffected, EmptyParam, adExecuteStream);

xmlDoc.save('d:\test.xml');
cmd.Free;

This results a well-formed XML with enclosing root node RootNode.

Kidder answered 6/12, 2016 at 13:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.