Isn't it dangerous to use the Longint count with the Int64 size in Stream.read?
Asked Answered
B

1

9

I was examining the TMemoryStream class and found the following routine:

procedure TMemoryStream.LoadFromStream(Stream: TStream);
var
  Count: Longint;
begin
  Stream.Position := 0;
  Count := Stream.Size; // <-- assigning Int64 to Longint
  SetSize(Count);
  if Count <> 0 then Stream.ReadBuffer(FMemory^, Count);
end;

I have seen this pattern a lot where an Int64 is assigned to a Longint.

My understanding is that Longint is four bytes and Int64 is eight bytes in both 32-bit and 64-bit Windows, so if my file size is $1 FFFF FFFF == 8589934591 == 8 GB then this routine will simply fail to read because the final count will be $ FFFF FFFF == -1.

I do not understand how this is allowed and maybe not taken into consideration (maybe not many people are trying to read an 8+ GB file).

Bali answered 31/12, 2017 at 1:27 Comment(19)
FWIW: Longint only has a range to + ~2GB when in "4 byte" mode.Opportune
@Opportune exactly a plus than 2Gb will fail also by this routine. it is just confusing what I see here.Bali
I'd call it a bug. I noticed that "unsafe typecast" warning is set to False by default. But even after setting it to True, I got no warning for the clearly unsafe assignment (even with explicit typecast). In my experience, Delphi has always been dodgy with it's warnings. I know var and out parameters wreaked havoc with "uninitialised warnings" in older versions. And it doesn't warn about with :PTameika
TMemoryStream doesn't support allocating more than 2GB, not even in a 64bit application. So, even though TStream.Size is an Int64, TMemoryStream limits it to 2GB. So you can't load a >2GB file into a TMemoryStream. If you need that (why?), you will have to write your own stream class that supports >2GB, or find a 3rd party stream class that does.Diadelphous
@RemyLebeau in modern systems where mining is taking all over the place, machines that have more than 16GB of Ram (one of the PCs we have here has 32GB) are more frequent than ever, so loading a 2.1GB file to RAM for better performance does not seem as a bad idea. FWIW I was building my own stream and wanted to see how Emba implemented this functionality.Bali
@NasreddineAbdelillahGalfout TMemoryStream was never updated to support >2GB even after the RTL memory manager was updated to support 64bit allocations. Feel free to file a bug report with Embarcadero.Diadelphous
I wouldn't do it anyway. I would use file mapping.Hornbill
@NasreddineAbdelillahGalfout Loading an entire large file into memory is an extremely inefficient way to process it. It is much more efficient to process as you read. This should go without saying if you're implementing a process that doesn't need any backtracking. But even if earlier data is needed at times in processing: it's usually better to have placed that into more meaningful internal structures at first pass, rather than reprocess raw data. Placing an entire file in memory is not justifiable, because it's simply not scalable. File size is likely to grow much faster system RAM.Tameika
@RemyLebeau OP used TMemoryStream as an example. But the problem stands for any combination of Int64 and Longint. Failure to pick up these discrepancies leads to surprise production bugs. The point of a strongly typed language is to try avoid them. A suitable compiler warning is not an unreasonable expectation.Tameika
@NasreddineAbdelillahGalfout Expanding on my previous comment. DVD files are often > 1 GB. Wouldn't it get frustrating if you always had to load the entire file into memory before you can start watching the movie? Therein lies a big clue. ;) I.e. Plan to process on the fly from the start. Your file structure should not in any way expect you to load the entire file in order to process it. This will be a much "friendlier" design.Tameika
The lack of compiler warning is one reason why so many pointer truncation bugs still exist so long after the 64 bit compiler was born. That said, in this specific instance, that routine alone is fine because the rest of the class is constrained by its use of 32 bit types for size and position. On of the great anti patterns in Delphi is the belief that files are read by copying them to a memory stream. Avoid the anti pattern and the problem dissolves.Windowpane
@CraigYoung a warning would have been great (with an optional enable/disable option). as for what you suggested 1) you are correct 2) I read an article about a company who kept all of the servers data structures on RAM and managed to process 6M request per second in one thread. and when I started to design my stream the articale got stuck in my head and I wanted to keep the option of loading the hall data structure and keep the processing in RAM (I know most of the risks this type of design would lead to, it is just fun to see where are your limits).Bali
@DavidHeffernan there is no guarantee that a consumer of stream would know that it is by pattern to not load a hall file to memory (this is not mentioned in the documentation) and that Int64 size is misleading. I mean if the size of a buffer can reach 8 GB. why can't I load more than 2GB even if it was anti-pattern.(if the size was in longint I would have been hinted and protected when I try to load more than 2GB)Bali
@RemyLebeau I noticed a pattern and I did not find any thing about it in documentation so I asked here about it. I do not think a sane person would actually load the hall file in memory and then process it each time it wants to read something from it (as Craig Young said in his comment). but it is still something to consider and to be careful about.Bali
@Hornbill I couldn't agree with you more.Bali
@nas clearly it's a defect, but it's trivial to create your own fixed version, but it should affect you because you shouldn't load arbitrarily large files into memoryWindowpane
@Nasreddine Re: the article you read about keeping all server data in memory. There's a huge difference between keeping all data in memory in structured object and keeping a raw file in memory.Tameika
@DavidHeffernan Size and position aren't limited to 32 bit types when compiling for 64 bit, only when compiling in 32 bit, because they are declared as NativeInt. In 64 bit you will get access violations if the size of the memory stream is set to larger than 2GB. We ran into this exact issue because we had some 3GB images that we loaded.Confirmatory
@Confirmatory Well they are, certainly in the XE7 code I am looking at. That constraint is implied by procedure SetSize(NewSize: Longint); override;. Declaring FSize, FPosition: NativeInt; is no use if the value you feed in is truncated to Longint. That same problem is there in Seattle.Windowpane
C
1

I logged a ticket for this and it has apparently been fixed in Tokyo 10.2. This is an issue for 64 bit compilation.

https://quality.embarcadero.com/browse/RSP-19094

There are problems with large (>2GB) files in both TCustomMemoryStream and TMemoryStream. In TMemoryStream the issues are simple as the local variables need to be declared as NativeInt instead of LongInt's and Capacity needs to be changed to an NativeInt. In TCustomMemoryStream they are more subtle because both TCustomMemoryStream.Read methods assign the result of an Int64 - Int64 calculation directly to a LongInt. This will overflow even when the result of this calculation isn't larger than a LongInt.

If you want to fix this in Seattle then you will need to either do a code hook, replace the System.Classes unit or roll out your own replacement class for TMemoryStream. Bear in mind that for the last option, you will need to also replace TBytesStream and TStringStream because these descend from TMemoryStream.

The other problem with the the last option is that third party components won't have your "fixes". For us, we only had a couple of places that needed to work with files larger than 2GB so we switched those across.

The fixes for TCustomMemoryStream.Read (must be to both methods) will look something like this:

function TCustomMemoryStream.Read(var Buffer; Count: Longint): Longint;
{ These 2 lines are new }
var
  remaining: Int64;
begin
  if (FPosition >= 0) and (Count >= 0) then
  begin
    remaining{Result} := FSize - FPosition;
    if remaining{Result} > 0 then
    begin
      if remaining{Result} > Count then 
        Result := Count
      else
        Result := remaining;
      Move((PByte(FMemory) + FPosition)^, Buffer, Result);
      Inc(FPosition, Result);
      Exit;
    end;
  end;
  Result := 0;
end;
Confirmatory answered 9/1, 2018 at 2:22 Comment(3)
Good to hear about this. tokyo is still buggy from what I heard. it will take some time before I upgrade. I have created my stream fix and it is stable after the tests made. As for third party components I think it is always a good idea to avoid using something you do not know the inside of.Bali
@NasreddineAbdelillahGalfout Make sure you also fix both TCustomMemoryStream.Read methods because that will sometimes overflow when reading in data, depending on the amount of data you read.Confirmatory
yep, the tests took care of that.Bali

© 2022 - 2024 — McMap. All rights reserved.