Check the file-size without opening file in C++?
Asked Answered
N

5

30

I'm trying to get the filesize of a large file (12gb+) and I don't want to open the file to do so as I assume this would eat a lot of resources. Is there any good API to do so with? I'm in a Windows environment.

Nobile answered 24/1, 2012 at 17:20 Comment(0)
E
54

You should call GetFileSizeEx which is easier to use than the older GetFileSize. You will need to open the file by calling CreateFile but that's a cheap operation. Your assumption that opening a file is expensive, even a 12GB file, is false.

You could use the following function to get the job done:

__int64 FileSize(const wchar_t* name)
{
    HANDLE hFile = CreateFile(name, GENERIC_READ, 
        FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, 
        FILE_ATTRIBUTE_NORMAL, NULL);
    if (hFile==INVALID_HANDLE_VALUE)
        return -1; // error condition, could call GetLastError to find out more

    LARGE_INTEGER size;
    if (!GetFileSizeEx(hFile, &size))
    {
        CloseHandle(hFile);
        return -1; // error condition, could call GetLastError to find out more
    }

    CloseHandle(hFile);
    return size.QuadPart;
}

There are other API calls that will return you the file size without forcing you to create a file handle, notably GetFileAttributesEx. However, it's perfectly plausible that this function will just open the file behind the scenes.

__int64 FileSize(const wchar_t* name)
{
    WIN32_FILE_ATTRIBUTE_DATA fad;
    if (!GetFileAttributesEx(name, GetFileExInfoStandard, &fad))
        return -1; // error condition, could call GetLastError to find out more
    LARGE_INTEGER size;
    size.HighPart = fad.nFileSizeHigh;
    size.LowPart = fad.nFileSizeLow;
    return size.QuadPart;
}

If you are compiling with Visual Studio and want to avoid calling Win32 APIs then you can use _wstat64.

Here is a _wstat64 based version of the function:

__int64 FileSize(const wchar_t* name)
{
    __stat64 buf;
    if (_wstat64(name, &buf) != 0)
        return -1; // error, could use errno to find out more

    return buf.st_size;
} 

If performance ever became an issue for you then you should time the various options on all the platforms that you target in order to reach a decision. Don't assume that the APIs that don't require you to call CreateFile will be faster. They might be but you won't know until you have timed it.

Expulsive answered 24/1, 2012 at 17:23 Comment(15)
Of course, CreateFile() can be rather slow if you're opening the file on slow media like network drives, but the slowness would be due to storage access latencies and not because of the fact that the file is huge.Cram
@Insilico Or tape drives! But I believe opening the file is the only way to find the file size, at least on windows.Expulsive
@DavidHeffernan: No! The file size is in the header and thus in the directory. The FindFirstFile() as shown below will read that information without having to open the file.Pseudoscope
@Alexis Read Raymond's article to learn the details. The metadata contains a copy of the size but it can be out of date. The true size is in the file. blogs.msdn.com/b/oldnewthing/archive/2011/12/26/10251026.aspxExpulsive
Floppy drives and damaged CDs are also slow media. Moreover, you may be enumerating thousands of not-massive files and having to open and close each one to get the size is cumbersome, especially since the size is already stored in the directory entry which could/should be cached in memory; another reason that FAT(32) and CDFS are still good.Snaky
@Snaky There may be perf reasons on different file systems, but certainly on NTFS then the file size in the dir entry may not be accurate.Expulsive
Yes, that’s why I said FAT is still good (I know a lot of people have moved to NTFS, but this is just another reason that I like to use FAT32 for everything, other than the Windows drive which now requires NTFS).Snaky
Everyone claiming that opening a file is a cheap operation should test this statement with 10'000 or 100'000 files and enjoy the result.Habiliment
@Anton The question asks about one file and the asker thinks that opening large files is more expensive than opening small files.Expulsive
Please take std::wstring arguments by const reference... you're doing memory copies on each call :SUdo
@EmilyL. Apparently there is debate over that issue: #10231849 In any case, I don't think it's worth getting too exercised at this question, it being really about winapi. Thanks!Expulsive
@DavidHeffernan Winapi or not, I'd be happy if you showed good practice to new (or copy paste) C++ programmers who see your example code and are likely to copy it verbatim... Regarding that debate, this is not one of the cases where it is advisable to pass by value. If anything you should pass by const wchar_t* as all you really want is to call .c_str() anyway let the user decide where and if they want a memcpy.Udo
@Emily OK. I'm really not an expert on C++ and am somewhat busy right now. Perhaps you could edit.Expulsive
So File Explorer has to open every file it displays the size of? Even as you scroll through thousands of file names?Commissure
Does GetCompressedFileSize have to open the file too, even though that takes a file name and not a file handle?Commissure
P
40

I've also lived with the fear of the price paid for opening a file and closing it just to get its size. And decided to ask the performance counter^ and see how expensive the operations really are.

This is the number of cycles it took to execute 1 file size query on the same file with the three methods. Tested on 2 files: 150 MB and 1.5 GB. Got +/- 10% fluctuations so they don't seem to be affected by actual file size. (obviously this depend on CPU but it gives you a good vantage point)

  • 190 cycles - CreateFile, GetFileSizeEx, CloseHandle
  • 40 cycles - GetFileAttributesEx
  • 150 cycles - FindFirstFile, FindClose

The GIST with the code used^ is available here.

As we can see from this highly scientific :) test, slowest is actually the file opener. 2nd slowest is the file finder while the winner is the attributes reader. Now, in terms of reliability, CreateFile should be preferred over the other 2. But I still don't like the concept of opening a file just to read its size... Unless I'm doing size critical stuff, I'll go for the Attributes.

PS: When I'll have time I'll try to read sizes of files that are opened and am writing to. But not right now...

Prat answered 4/8, 2013 at 5:35 Comment(3)
With regard to your P.S.: It appears that GetFileAttributesEx() does in fact return the correct file size while the file is still being updated by another process, making it the fastest (correct file size) choice. If it only had the last file changed time (not to be confused with the last write time), as well, this function would be perfect!Buskus
@MichaelGoldshteyn What exactly is the last file changed time you mentioned in the above comment? Is there another API to get this time?Pigweed
This is great to see some figures, but I suspect the real question is how much IO does each involved. It's not clear whether they are different in that respect.Rubescent
P
13

Another option using the FindFirstFile function

#include "stdafx.h"
#include <windows.h>
#include <tchar.h>
#include <stdio.h>

int _tmain(int argc, _TCHAR* argv[])
{
   WIN32_FIND_DATA FindFileData;
   HANDLE hFind;
   LPCTSTR  lpFileName = L"C:\\Foo\\Bar.ext";

   hFind = FindFirstFile(lpFileName , &FindFileData);
   if (hFind == INVALID_HANDLE_VALUE) 
   {
      printf ("File not found (%d)\n", GetLastError());
      return -1;
   } 
   else 
   {
      ULONGLONG FileSize = FindFileData.nFileSizeHigh;
      FileSize <<= sizeof( FindFileData.nFileSizeHigh ) * 8; 
      FileSize |= FindFileData.nFileSizeLow;
      _tprintf (TEXT("file size is %u\n"), FileSize);
      FindClose(hFind);
   }
   return 0;

}
Publicize answered 24/1, 2012 at 17:58 Comment(4)
Use ULARGE_INTEGER instead of twiddling the ULONGLONG bits manually, eg: ULARGE_INTEGER ul; ul.LowPart = FindFileData.nFileSizeLow; ul.HighPart = FindFileData.nFileSizeHigh; ULONGLONG FileSize = ul.QuadPart;. Also, %u expects a 32-bit unsigned int on Windows, you need to use %Lu instead for a 64-bit integer.Winded
I believe FindFirstFile retrieves the file size as recorded in the directory entry. Note that under some circumstances this may not be accurate, e.g., if the file is hard linked and was modified via a different hard link, or if another application has the file open and has modified it. See blogs.msdn.com/b/oldnewthing/archive/2011/12/26/10251026.aspxDeibel
Presumably the issue that Harry points to is why the Delphi RTL stopped using FindFirstFile in its file size sys function.Expulsive
This method doesn't work for symbolic link, it returns zero.Afterclap
P
5

As of C++17, there is file_size as part of the standard library. (Then the implementor gets to decide how to do it efficiently!)

Pemba answered 18/9, 2017 at 6:18 Comment(0)
F
0

What about GetFileSize function?

Falmouth answered 24/1, 2012 at 17:22 Comment(4)
That requires opening the file, which the OP said is not desirable.Winded
@remy but the file is where the size is stored so the two requests in the question are contradictoryExpulsive
Actually no, the file itself does not store the size. The filesystem stores it. GetFileSize() requires the file to be opened first, then it uses that handle to determine where the file is located in the filesystem so it can grab the size. If you use FindFirstFile() instead, it queries the filesystem without needing to open the file.Winded
@Remy Not according to Raymond: blogs.msdn.com/b/oldnewthing/archive/2011/12/26/10251026.aspx Also, if you don't use <at>name then there won't be a notification so you just end up talking to yourself!Expulsive

© 2022 - 2024 — McMap. All rights reserved.