I used the following function in Delphi 6 to detect Unicode BOMs.
const
//standard byte order marks (BOMs)
UTF8BOM: array [0..2] of AnsiChar = #$EF#$BB#$BF;
UTF16LittleEndianBOM: array [0..1] of AnsiChar = #$FF#$FE;
UTF16BigEndianBOM: array [0..1] of AnsiChar = #$FE#$FF;
UTF32LittleEndianBOM: array [0..3] of AnsiChar = #$FF#$FE#$00#$00;
UTF32BigEndianBOM: array [0..3] of AnsiChar = #$00#$00#$FE#$FF;
function FileHasUnicodeBOM(const FileName: string): Boolean;
var
Buffer: array [0..3] of AnsiChar;
Stream: TFileStream;
begin
Stream := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite); // Allow other programs read access at the same time.
Try
FillChar(Buffer, SizeOf(Buffer), $AA);//fill with characters that we are not expecting then...
Stream.Read(Buffer, SizeOf(Buffer)); //...read up to SizeOf(Buffer) bytes - there may not be enough
//use Read rather than ReadBuffer so the no exception is raised if we can't fill Buffer
Finally
FreeAndNil(Stream);
End;
Result := CompareMem(@UTF8BOM, @Buffer, SizeOf(UTF8BOM)) or
CompareMem(@UTF16LittleEndianBOM, @Buffer, SizeOf(UTF16LittleEndianBOM)) or
CompareMem(@UTF16BigEndianBOM, @Buffer, SizeOf(UTF16BigEndianBOM)) or
CompareMem(@UTF32LittleEndianBOM, @Buffer, SizeOf(UTF32LittleEndianBOM)) or
CompareMem(@UTF32BigEndianBOM, @Buffer, SizeOf(UTF32BigEndianBOM));
end;
This will detect all the standard BOMs. You could use it to block such files if that's the behaviour you want.
You state that Delphi 6 TStringList
can load 16 bit encoded files if they do not have a BOM. Whilst that may be the case, you will find that, for characters in the ASCII range, every other character is #0
. Which I guess is not what you want.
If you want to detect that text is Unicode for files without BOMs then you could use IsTextUnicode
. However, it may give false positives. This is a situation where I suspect it is better to ask for forgiveness than permission.
Now, if I were you I would not actually try to block Unicode files. I would read them. Use the TNT Unicode library. The class you want is called TWideStringList
.