How to get the file size in bytes with C++17
Asked Answered
H

2

102

Are there pitfalls for specific operating systems, I should know of?

There are many duplicates (1, 2, 3, 4, 5) of this question but they were answered decades ago. The very high voted answers in many of these questions are wrong today.

Methods from other (old QA's) on .sx

  • stat.h (wrapper sprintstatf), uses syscall

  • tellg(), returns per definition a position but not necessarily bytes. The return type is not int.

Hygrograph answered 30/6, 2019 at 20:33 Comment(9)
Starter for 10: en.cppreference.com/w/cpp/header/filesystemScissors
How, exactly, do those answers go wrong?Pageantry
@L.F.: Well, the first question has been closed as a duplicate of the second, which explains why the accepted answer in the first is wrong. The third one is asking about similar tellg problems. The only one worth bothering with is the fourth one, and that one's not great, since it talks too much about ofstream, in both the question and its answers. This one is far better at expressing the intent than the others (except for the first, which is oddly closed).Correy
Please stop adding irrelevant information to your question and the question title. The year is irrelevant; the technologies are relevant.Enamelware
What's wrong with stat(2) anyways? Has it grown too old or what?Definitive
@LorinczyZsigmond What's wrong with stat(2) It's not part of the language standard.Canara
@LorinczyZsigmond and Andrew: Thank you for the question. I have added a line to the questionHygrograph
@TedLyngmo The question explains already why it should not be marked as duplicate. I linked the mentioned question and explained, why it makes sense to ask this question again with the scope of C++17. Please remove the duplicate tag.Hygrograph
@JonasStein Ok, I apparently got a downvote on the answer I gave there after I marked this as a duplicate. Why I don't know since it was a good answer to the question asked. I marked this as a duplicate since the answer contains two parts, one pre C++17 and one for C++17 where <filesystem> is used, but as Nicol implied, it was perhaps too embedded in that questions rotating log functionality to be of much use here.Chicago
B
138

<filesystem> (added in C++17) makes this very straightforward.

#include <cstdint>
#include <filesystem>

// ...

std::uintmax_t size = std::filesystem::file_size("c:\\foo\\bar.txt");

As noted in comments, if you're planning to use this function to decide how many bytes to read from the file, keep in mind that...

...unless the file is exclusively opened by you, its size can be changed between the time you ask for it and the time you try to read data from it.
– Nicol Bolas

But answered 30/6, 2019 at 20:37 Comment(23)
Little offtopic: is there a world where std::uintmax_t will be able to hold greater values than std::size_t? If not, why not use std::size_t, which arguably is more recognisable? +1 on the answer, btwMeatball
@Meatball I used just because that's the type file_size returns. Looks slightly weird to me too.But
@Meatball std::size_t is only required to hold the max size of in memory objects. Files can be considerably larger,Scissors
@RichardCritten so the answer to the first question of my comment is "yes"?Meatball
@Meatball Well, on 32-bit Windows (and I assume on most modern 32-bit platforms) size_t is 32 bits, and uintmax_t is 64 bits.But
What does file_size return in case of an error? Is it 0xFFF..FF, because it is uint? I think the en.cppreference.com/w/cpp/filesystem/file_size page has 3 contradicting answers to this question.Hygrograph
@JonasStein: "What does file_size return in case of an error?" Ihe value is meaningless because the function errored out. That error means you're not supposed to use it, and checking errors isn't optional. That being said, that page (and the standard) only has one answer for this case: -1, cast to a uintmax_t. So I don't see the "3 contradicting answers".Correy
@HolyBlackCat: It would be good to say something about the fact that the filesystem is global, and thus unless the file is exclusively opened by you, its size can be changed between the time you ask for it and the time you try to read data from it.Correy
@JonasStein The overload mentioned throws an exception on error. The other overload return static_cast<uintmax_t>(-1) and stores the corresponding error to ec.Pageantry
Can't we just agree on auto size = std::filesystem::file_size("c:\\foo\\bar.txt");?Most
This edit seems pedantic. The stat function and friends are strictly only correct from the time the kernel interrogates the logical FS, and may have already changed once the file status struct has been made available to the caller. Suggestions on file locking, like m[un]lock for memory, would be more constructive.Brandebrandea
@BrettHale I don't know much about file locks unfortunately, that's why I added Nicol's comment to the answer without going into further details.But
@NicolBolas Even "exclusively opened by you" is misleading. What if you wrote both the code that writes your logs to the file in the background, and the code that prints the most recent few lines on demand?Mccrae
@NicHartley: Then you have broken your own code, and therefore you on some level both know you've broken it and have the tools to fix it. With other processes coming along and changing the file size behind your back, neither of those is the case. So we're talking about very different scenarios.Correy
@NicolBolas - this is only response to the 'self-own' :)Brandebrandea
@NicolBolas My point was that just because you wrote the code, doesn't mean it's safe, which is what you implied. As long as that file isn't modified between reading the size and depending on it, it's safe; if it's accessed outside, even by your own code, it's not. That outside code might be outside of that block in another thread, in a library you're calling, in another process, whatever.Mccrae
@Meatball I believe there is: systems with 32-bit size_t and large-file support. What I really wonder is, why not use off_t? Isnt’t this exactly what it’s for?Menchaca
@Meatball Also of historical interest: 16-bit implementations and those with segmented memory.Menchaca
@Menchaca off_t is not defined in standard C and I suspect not in C++ either. See https://mcmap.net/q/52922/-where-to-find-the-complete-definition-of-off_t-type/2410359Lionize
@chux You’re right, so I guess they went with the maximum type because it would be guaranteed as wide as any implementation specific off_t, unsigned long long long int, etc.Menchaca
if I'm updating/writing to a file with fstream and iteratively call this at the top of a loop to determine size so I can find the new first and last records, will it work?Emanuele
@Emanuele Hard to say anything without seeing the code. I suggest asking a separate question about it, with the code.But
I was reading in objects from a file for read and also writing. These objects are sequentially numbered so I decided to push them on to a vector as they're read one by one and then do vector.size(). Then the new object I'm creating in writing to file would be numbered vector.size()+1.Emanuele
F
31

C++17 brings std::filesystem which streamlines a lot of tasks on files and directories. Not only you can quickly get file size, its attributes, but also create new directories, iterate through files, work with path objects.

The new library gives us two functions that we can use:

std::uintmax_t std::filesystem::file_size( const std::filesystem::path& p );

std::uintmax_t std::filesystem::directory_entry::file_size() const;

The first function is a free function in std::filesystem, the second one is a method in directory_entry.

Each method also has an overload, as it can throw an exception or return an error code (through an output parameter). Below is the detail code explaining all the possible cases.

#include <chrono>
#include <filesystem>  
#include <iostream>

namespace fs = std::filesystem;

int main(int argc, char* argv[])
{
    try
    {
        const auto fsize = fs::file_size("a.out");
        std::cout << fsize << '\n';
    }
    catch (const fs::filesystem_error& err)
    {
        std::cerr << "filesystem error! " << err.what() << '\n';
        if (!err.path1().empty())
            std::cerr << "path1: " << err.path1().string() << '\n';
        if (!err.path2().empty())
            std::cerr << "path2: " << err.path2().string() << '\n';
    }
    catch (const std::exception& ex)
    {
        std::cerr << "general exception: " << ex.what() << '\n';
    }

    // using error_code
    std::error_code ec{};
    auto size = std::filesystem::file_size("a.out", ec);
    if (ec == std::error_code{})
        std::cout << "size: " << size << '\n';
    else
        std::cout << "error when accessing test file, size is: " 
              << size << " message: " << ec.message() << '\n';
}
Fondly answered 30/6, 2019 at 20:38 Comment(2)
What exactly is "this"? Can you explain what all this code is used for, especially when the accepted answer uses much less code?Blancablanch
The accepted answer ignores exceptions, which can be thrown when the file doesn't exist, which is pretty common.Flavourful

© 2022 - 2024 — McMap. All rights reserved.