How do I read integers from big-endian binary file if Windows/Delphi/IDE implies little-endian order?
Asked Answered
B

2

8

I am very confused. I need to read binary files (.fsa extension by Applied Biotechnology aka ABIF, FASTA files) and I ran into a problem reading signed integers. I am doing everything according to this manual https://drive.google.com/file/d/1zL-r6eoTzFIeYDwH5L8nux2lIlsRx3CK/view?usp=sharing So, for example, let's look at the fDataSize field in the header of a file https://drive.google.com/file/d/1rrL01B_gzgBw28knvFit6hUIA5jcCDry/view?usp=sharing

I know that it is supposed to be 2688 (according to the manual, it is a signed integer of 32 bits), which is 00000000 00000000 00001010 10000000 in binary form. Indeed, when I read these 32 bits as an array of 4 bytes, I get [0, 0, 10, -128], which is exactly the same in binary form.

However, if I read it as Integer, it results in 16809994, which is 00000001 00000000 10000000 00001010 in bits.

As I understood from multiple forums, they use Swap and htonl functions to convert integers from little-endian order to big-endian. They also recommend using BSWAP EAX instruction for 32bit integers. But in this case they work in a kind of wrong way, specifically: Swap, applied to 16809994, returns 16779904 or 00000001 00000000 00001010 10000000, and BSWAP instruction converts 16809994 to 176160769, i.e. 00001010 10000000 00000000 00000001

As we can see, built-in functions do something different from what I need. Swap is likely to return the correct result, but, for some reason, reading these bits as an Integer changes the left-most byte. So, what is wrong and what do I do?

Upd. 1 For storing the header data I use the following record:

type
  TFasMainHeader = record
    fFrmt        : array[1..4]  of ansiChar;
    fVersion     : Word;
    fDir         : array[1..4] of ansiChar;
    fNumber      : array[1..4]  of Byte; //
    fElType      : Word;
    fElSize      : Word;
    fNumEls      : array[1..4]  of Byte; //
    fDataSize    : Integer;
    fDataOffset  : Integer;
    fDO : word;
    fDataHandle  : array[1..98]  of Byte;
  end;

Then upon the button click I perform the following:

aFileStream.Read(fas_main_header, SizeOf(TFasMainHeader));
with fas_main_header do begin
    if fFrmt <> 'ABIF' then raise Exception.Create('Not an ABIF file!');
    fVersion := Swap(fVersion);
    fElType := Swap(fElType);
    fElSize := Swap(fElSize);
...

Next I need to swap Int32 variables in the right way, but at this point fDataSize, for example, is 16809994. See the state of the record in detail during debugging:

enter image description here

It doesn't make sense to me since there shouldn't be a one-bit in the binary representation of fDataSize value (it also screws the BSWAP result).

See the binary structure of the file beginning (fDataSize bytes are highlited): enter image description here

Bomke answered 25/11, 2020 at 12:48 Comment(6)
Did you see #3065835? You cannot use Swap because it swaps the two bytes of a 16-bit integer.Moreville
@AndreasRejbrand I did not come across this particular topic, but I tried BSWAP EAX, which is mentioned in the OP post. It didn't really work, at least it didn't return what I need and also substituted one of the zero-bytes to one-byte.Bomke
Maybe I do something wrong from the very beginning? I assign an integer variable to 32 bits of a big-endian file and then apply BSWAP to this variable. Is that correct?Bomke
2688 is 00000000 00000000 00001010 10000000 in binary (00 00 0A 80 in hex) and big endian. I just used a hex editor to create a file containing only this data and loaded these bytes into a Delphi Integer variable. This value is thus interpreted as -2146828288. Applying BSWAP, however, yields 2688, as expected.Moreville
You might need to tell us exactly how the bytes look in the file (use a hex editor) and exactly how you "read" the file.Moreville
@AndreasRejbrand I updated OP post and included the information you needed.Bomke
M
11

The problem has nothing to do with endianness, but with Delphi records.

You have

type
  TFasMainHeader = record
    fFrmt        : array[1..4]  of ansiChar;
    fVersion     : Word;
    fDir         : array[1..4] of ansiChar;
    fNumber      : array[1..4]  of Byte; //
    fElType      : Word;
    fElSize      : Word;
    fNumEls      : array[1..4]  of Byte; //
    fDataSize    : Integer;
    fDataOffset  : Integer;
    fDO : word;
    fDataHandle  : array[1..98]  of Byte;
  end;

and you expect this record to overlay the bytes in your file, with fDataSize "on top of" 00 00 0A 80.

But the Delphi compiler will add padding between the fields of the record to make them properly aligned. Hence, your fDataSize will not be at the correct offset.

To fix this, use the packed keyword:

type
  TFasMainHeader = packed record
    fFrmt        : array[1..4]  of ansiChar;
    fVersion     : Word;
    fDir         : array[1..4] of ansiChar;
    fNumber      : array[1..4]  of Byte; //
    fElType      : Word;
    fElSize      : Word;
    fNumEls      : array[1..4]  of Byte; //
    fDataSize    : Integer;
    fDataOffset  : Integer;
    fDO : word;
    fDataHandle  : array[1..98]  of Byte;
  end;

Then the fields will be at the expected locations.

And then -- of course -- you can use any method you like to swap the byte order.

Perferably the BSWAP instruction.

Moreville answered 25/11, 2020 at 16:14 Comment(2)
Yeah, I expected that it was not due to the endianness. Thank you so much, Andreas, sincerely. Sorry if I wasted your time for such a trifle, I am just a beginner. Wish you all the best :)Bomke
@endocringe: No problem. Glad I could help!Moreville
E
1

Here is a implementation example using pure pascal:

program FasDemo;
{$APPTYPE CONSOLE}
uses
  System.SysUtils, System.Classes;

type
  TFasInt16 = packed record
    B0, B1 : Byte;
    function ToUInt32 : UInt32;
    function ToInt32  : Int32;
    class operator Implicit(A: TFasInt16): Integer;      // Implicit conversion of TFasInt16 to Integer
    class operator Implicit(A: Integer)  : TFasInt16;    // Implicit conversion of Integer   to TFasInt16
  end;
  TFasInt32 = packed record
    W0, W1 : TFasInt16;
    function ToUInt32 : UInt32;
    function ToInt32  : Int32;
    class operator Implicit(A: TFasInt32): Integer;      // Implicit conversion of TFasInt32 to Integer
    class operator Implicit(A: Integer)  : TFasInt32;    // Implicit conversion of Integer to TFasInt32
  end;


function TFasInt16.ToUInt32: UInt32;
begin
  Result := (B0 shl 8) + B1;
end;

function TFasInt16.ToInt32: Int32;
begin
  Result := Int16(B0 shl 8) + B1;
end;

class operator TFasInt16.Implicit(A: Integer): TFasInt16;
begin
  Result.B1 := Byte(A);
  Result.B0 := Byte(A shr 8);
end;

class operator TFasInt16.Implicit(A: TFasInt16): Integer;
begin
  Result := A.ToInt32;
end;

function TFasInt32.ToUInt32: UInt32;
begin
  Result := (W0.ToUInt32 shl 16) + W1.ToUInt32;
end;

function TFasInt32.ToInt32: Int32;
begin
  Result := (W0.ToUInt32 shl 16) + W1.ToUInt32;
end;

class operator TFasInt32.Implicit(A: TFasInt32): Integer;
begin
  Result := A.ToInt32;
end;

class operator TFasInt32.Implicit(A: Integer): TFasInt32;
begin
  Result.W1 := Int16(A);
  Result.W0 := Int16(A shr 16);
end;

var
  Stream   : TFileStream;
  FasInt32 : TFasInt32;
  FasInt16 : TFasInt16;
  AInteger : Integer;
begin
  Stream := TFileStream.Create('C:\Users\fpiette\Downloads\A02-RD12-0002-35-0.5PP16-001.5sec.fsa', fmOpenRead);
  try
    Stream.Position := $16;
    Stream.Read(FasInt32, SizeOf(FasInt32));
    WriteLn(FasInt32.W1.ToUInt32, ' 0x', IntToHex(FasInt32.W1.ToUInt32, 8));
    WriteLn(FasInt32.W1.ToInt32,  ' 0x', IntToHex(FasInt32.W1.ToInt32,  8));
    WriteLn(FasInt32.ToUInt32,    ' 0x', IntToHex(FasInt32.ToUInt32,    8));
    WriteLn(FasInt32.ToInt32,     ' 0x', IntToHex(FasInt32.ToInt32,     8));

    WriteLn;
    WriteLn('Test implicit conversion 16 bits to integer ');
    AInteger := FasInt32.W1;
    WriteLn(AInteger,             ' 0x', IntToHex(AInteger,     8));

    WriteLn;
    WriteLn('Test implicit conversion 32 bits to integer ');
    AInteger := FasInt32;
    WriteLn(AInteger,             ' 0x', IntToHex(AInteger,     8));

    WriteLn;
    WriteLn('Test implicit conversion 16 bits from integer');
    FasInt16 := 1234;
    WriteLn(FasInt16.ToInt32,     ' 0x', IntToHex(FasInt16.ToInt32,  8));
    FasInt16 := -1234;
    WriteLn(FasInt16.ToInt32,     ' 0x', IntToHex(FasInt16.ToInt32,  8));

    WriteLn;
    WriteLn('Test implicit conversion 32 bits from integer');
    FasInt32 := 12345678;
    WriteLn(FasInt32.ToInt32,     ' 0x', IntToHex(FasInt32.ToInt32,     8));
    FasInt32 := -12345678;
    WriteLn(FasInt32.ToInt32,     ' 0x', IntToHex(FasInt32.ToInt32,     8));

    ReadLn;
  finally
    FreeAndNil(Stream);
  end;
end.

You can add, if your Delphi version support it, add inline directive.

I made implicit conversions to/from integer using operator overloading. Using it the types can be used without calling conversion routines: the compiler does the job for us!

Of course other operator overloading can be added, you get the idea.

To access the FAS header and other structures, you can use the types TFasInt32 and TFasInt16 instead of Word and Integer. The rest of the code will be just has it was not big-endian! The compiler will automatically convert back and forth to native integers (little-endian).

Endocarp answered 25/11, 2020 at 13:54 Comment(6)
And BSWAP doesn't work, because ... ? (IMHO, that's the "actual" question)Moreville
@Endocarp Thank you so much, it works! Can you, please, explain what the following line does? "Result := (B0 shl 8) + B1;"Bomke
@Endocarp If you don't mind I won't mark your question as the solution for a little while until me and Andreas figure out what is wrong in my initial straightforward approach.Bomke
@endocringe: (B0 shl 8) + B1 is the sum of the integers B0 shl 8 and B1. B0 shl 8 is the integer obtained by moving the bits in B0 8 binary digits to the left (SHift Left), adding new zeros from the right (so if B0 is 1111 1101 then B0 shl 8 is 1111 1101 0000 0000). In other words, (B0 shl 8) + B1 is the word with B0 as the first byte and B1 as the second. (It is arguably better to write this (B0 shl 8) or B1.)Moreville
@Bomke I update my answer to add more conversion functions and operator overloading for implicit conversion to/from integer. Using my code, you'll use big-endian integer data just as you do with native integer data.Endocarp
@fpiette: The OP is using Delphi 7, so operator overloading hasn't been invented yet.Moreville

© 2022 - 2024 — McMap. All rights reserved.