Why does std::filesystem provide so many non-member functions?
Asked Answered
P

4

11

Consider for example file_size. To get the size of a file we will be using

std::filesystem::path p = std::filesystem::current_path();
// ... usual "does this exist && is this a file" boilerplate
auto n = std::filesystem::file_size(p);

Nothing wrong with that, if it were plain ol' C, but having been taught that C++ is an OO language [I do know it's multi-paradigm, apologies to our language lawyers :-)] that just feels so ... imperative (shudder) to me, where I have come to expect the object-ish

auto n = p.file_size();

instead. The same holds for other functions, such as resize_file, remove_file and probably more.

Do you know of any rationale why Boost and consequently std::filesystem chose this imperative style instead of the object-ish one? What is the benefit? Boost mentions the rule (at the very bottom), but no rationale for it.

I was thinking about inherent issues such as ps state after remove_file(p), or error flags (overloads with additional argument), but neither approach solves these less elegant than the other.


You can observe a similar pattern with iterators, where nowadays we can (are supposed to?) do begin(it) instead of it.begin(), but here I think the rationale was to be more in line with the non-modifying next(it) and such.

Prelature answered 27/3, 2017 at 17:57 Comment(8)
Do you also think that std::string should have more member functions?Edla
IIUC P0317R1 makes it possible to write directory_entry(p).file_size().Raccoon
@KerrekSB std::string to me is a good example of the object-ish style. Those may be two screenfuls of member functions, but they all feel canonic to me, and only few things are non-member. So yeah, it's surely not good to overdo it, but there are worse things than std::string out there, right? ;-)Prelature
Whoever taught you that C++ is an OO language is wrong. Also, see #5990234 and drdobbs.com/cpp/how-non-member-functions-improve-encapsu/… and #1692584 and gotw.ca/gotw/084.htmAdmiral
The members of std::string operate on the string. The functions you are suggesting should be members of path do not operate on paths (i.e. the names of files), they operate on the filesystem (i.e. the real, physical files themselves). That's completely different.Admiral
@JonathanWakely "Whoever taught you that C++ is an OO language is wrong" - I think that's a bold assertion. Evidently C++ is not exclusively OO, but saying it is not seems equally wrong to me (when after all it used to be called "C with classes" by Bjarne, and classes being a pivotal OO concept).Prelature
It's entirely possible to write large C++ programs that never use virtual functions and dynamic polymorphism, or even inheritance. And simply making everything a member function of a class isn't "OO" anyway! So although you can use C++ to write OO code, the language itself is not an OO language, and your understanding of OO seems flawed. Using member functions != OOP. Finally, take note of the last paragraph of stroustrup.com/bs_faq.html#oop which says 'Please note that object-oriented programming is not a panacea. "OOP" does not simply mean "good"'Admiral
auto n = std::filesystem::file_size(p); should read auto n = file_size(p); ADL is guaranteed to find file_size and consider it.Swamper
J
13

There are a couple of good answers already posted, but they do not get to the heart of the matter: all other things being equal, if you can implement something as a free, non-friend function, you always should.

Why?

Because, free, non-friend functions, do not have privileged access to state. Testing classes is much harder than testing functions because you have to convince yourself that the class' invariants are maintained no matter which members functions are called, or even combinations of member functions. The more member/friend functions you have, the more work you have to do.

Free functions can be reasoned about and tested standalone. Because they don't have privileged access to class state, they cannot possibly violate any class invariants.

I don't know the details of what invariants and what privileged access path allows, but obviously they were able to implement a lot of functionality as free functions, and they make the right choice and did so.

Scott Meyers brilliant article on this topic, giving the "algorithm" for whether to make a function a member or not.

Here's Herb Sutter bemoaning the massive interface of std::string. Why? Because, much of string's interface could have been implemented as free functions. It may be a bit more unwieldy to use on occasion, but it's easier to test, reason about, improves encapsulation and modularity, opens opportunities up for code reuse that were not there before, etc.

Jury answered 27/3, 2017 at 19:38 Comment(2)
Food for thought - that reasoning never occured to me before, but it makes a lot of sense.Prelature
Has anyone ever produced the Herb version of std::string and would it really be easier to use?Barbarize
A
13

The Filesystem library has a very clear separation between the filesystem::path type, which represents an abstract path name (that doesn't even have be the name of a file that exists) and operations that access the actual physical filesystem, i.e. read+write data on disks.

You even pointed to the explanation of that:

The design rule is that purely lexical operations are supplied as class path member functions, while operations performed by the operating system are provided as free functions.

This is the reason.

It's theoretically possible to use a filesystem::path on a system with no disks. The path class just holds a string of characters and allows manipulating that string, converting between character sets and using some rules that define the structure of filenames and pathnames on the host OS. For example it knows that directory names are separated by / on POSIX systems and by \ on Windows. Manipulating the string held in a path is a "lexical operation", because it just performs string manipulation.

The non-member functions that are known as "filesystem operations" are entirely different. They don't just work with an abstract path object that is just a string of characters, they perform the actual I/O operations that access the filesystem (stat system calls, open, readdir etc.). These operations take a path argument that names the files or directories to operate on, and then they access the real files or directories. They don't just manipulate strings in memory.

Those operations depend on the API provided by the OS for accessing files, and they depend on hardware that might fail in completely different ways to in-memory string manipulations. Disks might be full, or might get unplugged before an operation completes, or might have hardware faults.

Looked at like that, of course file_size isn't a member of path, because it's nothing to do with the path itself. The path is just a representation of a filename, not of an actual file. The function file_size looks for a physical file with the given name and tries to read its size. That's not a property of the file name, it's a property of a persistent file on the filesystem. Something that exists entirely separately from the string of characters in memory that holds the name of a file.

Put another way, I can have a path object that contains complete nonsense, like filesystem::path p("hgkugkkgkuegakugnkunfkw") and that's fine. I can append to that path, or ask if it has a root directory etc. But I can't read the size of such a file if it doesn't exist. I can have a path to files that do exist, but I don't have permission to access, like filesystem::path p("/root/secret_admin_files.txt"); and that's also fine, because it's just a string of characters. I'd only get a "permission denied" error when I tried to access something in that location using the filesystem operation functions.

Because path member functions never touch the filesystem they can never fail due to permissions, or non-existent files. That's a useful guarantee.

You can observe a similar pattern with iterators, where nowadays we can (are supposed to?) do begin(it) instead of it.begin(), but here I think the rationale was to be more in line with the non-modifying next(it) and such.

No, it was because it works equally well with arrays (which can't have member functions) and class types. If you know the range-like thing you are dealing with is a container not an array then you can use x.begin() but if you're writing generic code and don't know whether it's a container or an array then std::begin(x) works in both cases.

The reasons for both these things (the filesystem design and the non-member range access functions) are not some anti-OO preference, they're for far more sensible, practical reasons. It would have been poor design to have based either of them because it feels better to some people who like OO, or feels better to people who don't like OO.

Also, there are things you can't do when everything's a member function:

struct ConvertibleToPath {
  operator const std::filesystem::path& () const;
  // ...
};

ConvertibleToPath c;
auto n = std::filesystem::file_size(c);  // works fine

But if file_size was a member of path:

c.file_size();   // wouldn't work
static_cast<const std::filesystem::path&>(c).file_size(); // yay, feels object-ish!
Admiral answered 27/3, 2017 at 19:21 Comment(1)
Valid point that Boost already gives the answer - expanding it was helpful nevertheless, since I completely missed the point about what path is (and is not) from the Boost doc.Prelature
J
13

There are a couple of good answers already posted, but they do not get to the heart of the matter: all other things being equal, if you can implement something as a free, non-friend function, you always should.

Why?

Because, free, non-friend functions, do not have privileged access to state. Testing classes is much harder than testing functions because you have to convince yourself that the class' invariants are maintained no matter which members functions are called, or even combinations of member functions. The more member/friend functions you have, the more work you have to do.

Free functions can be reasoned about and tested standalone. Because they don't have privileged access to class state, they cannot possibly violate any class invariants.

I don't know the details of what invariants and what privileged access path allows, but obviously they were able to implement a lot of functionality as free functions, and they make the right choice and did so.

Scott Meyers brilliant article on this topic, giving the "algorithm" for whether to make a function a member or not.

Here's Herb Sutter bemoaning the massive interface of std::string. Why? Because, much of string's interface could have been implemented as free functions. It may be a bit more unwieldy to use on occasion, but it's easier to test, reason about, improves encapsulation and modularity, opens opportunities up for code reuse that were not there before, etc.

Jury answered 27/3, 2017 at 19:38 Comment(2)
Food for thought - that reasoning never occured to me before, but it makes a lot of sense.Prelature
Has anyone ever produced the Herb version of std::string and would it really be easier to use?Barbarize
E
1

Several reasons (somewhat speculative though, I don't follow the standardization process very closely):

  1. Because it's based on boost::filesystem, which is designed that way. Now, you could ask "Why is boost::filesystem designed that way?", which would be a fair question, but given that it was, and that it's seen a lot of mileage the way it is, it was accepted into the standard with very few changes. So were some other Boost constructs (although sometimes there are some changes, under the hood mostly).

  2. A common principle when designing classes is "if a function doesn't need access to a class' protected/private members, and can instead use existing members - you don't make it a member as well." While not everyone ascribes to that - it seems the designers of boost::filesystem do.

    See a discussion of (and an argument for) this in the context of std::string(), a "monolith" class with a zillion methods, by C++ luminary Hebert Sutter, in Guru of the Week #84.

  3. It was expected that in C++17 we might already have Uniform Call Syntax (see Bjarne's Stroustrup highly-readable proposal). If that had been accepted into the standard, calling

    p.file_size();
    

    would have been equivalent to calling

    file_size(p);
    

    so you could have chosen whatever you like. Basically.

Edict answered 27/3, 2017 at 19:29 Comment(0)
W
1

Just in addition to what others already stated. One of the reasons why people are unhappy with "nonmember" approach is the need to type std::filesystem:: in the front of the API or to use using directives. But actually you don't have to, and simply skipping namespace for API call like this:

#include <iostream>
#include <filesystem>

int main()
{
    auto p = std::filesystem::path{"/bin/cat"};
    //notice file_size below has no namespace qualifiers
    std::cout << "Binary size for your /bin/cat is " << file_size(p);
}

works perfectly fine because of function names are also looked up in the namespaces of their arguments due to ADL.

(live sample https://wandbox.org/permlink/JrFz8FJG3OdgRwg9)

Wintry answered 1/11, 2018 at 17:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.