If you are using an up-to-date version of Indy 10, then the overloaded version of TIdHTTP.Post()
that returns a String
does decode the data to Unicode, however the actual charset used for the decoding depends on what media type the HTTP Content-Type
response header specifies:
if the media type is either application/xml
, application/xml-external-parsed-entity
, application/xml-dtd
, or is not a text/...
type but does end with +xml
, then the charset specified in the encoding
attribute of the XML's prolog is used. If no charset is specified, UTF-8 is used.
otherwise, if the Content-Type
response header specifies a charset, then it is used.
otherwise, if the media type is a text/...
type, then:
a. if the media type is text/xml
, text/xml-external-parsed-entity
, or ends with +xml
, then us-ascii
is used.
b. otherwise ISO-8859-1
is used.
otherwise, Indy's default encoding (ASCII by default) is used.
Without seeing the actual HTTP Content-Type
header, it is hard to know which condition your situation falls into. It sounds like it is falling into either #2 or #3b, which would account for the UTF-8 byte values being returned as-is, if ISO-8859-1
or similar charset is being used.
UTF8ToString()
expects a UTF-8 encoded RawByteString
as input, but you are passing it a UTF-16 encoded UnicodeString
instead. The RTL will perform a UTF16->Ansi conversion in that situation, using a default Ansi charset for the conversion. That is why you get the compiler warning, because such a conversion can lose data.
XML is really a binary data format, subject to charset encodings. An XML parser needs to know what the XML's encoding is, and be able to parse the raw encoded bytes accordingly. That is why XML has an explicit encoding
attribute right in the XML prolog. However, when TIdHTTP
downloads XML as a String
, although it does automatically decode it to Unicode, it does not yet update the XML's prolog accordingly.
The real solution is to not download XML as a String
in the first place. Download it as a TStream
instead (TMemoryStream
is a better choice than TStringStream
) so your XML parser has access to the original bytes, the original charset declaration, etc. You can pass the TStream
to the TXMLDocument.LoadFromStream()
method, for instance.