What is the most elegant way to read a text file with c++?
Asked Answered
I

5

69

I'd like to read whole content of a text file to a std::string object with c++.

With Python, I can write:

text = open("text.txt", "rt").read()

It is very simple and elegant. I hate ugly stuff, so I'd like to know - what is the most elegant way to read a text file with C++? Thanks.

Instantly answered 12/10, 2008 at 10:40 Comment(4)
If you hate ugly stuff, you better not use C++ :PTremblay
A note about the elegance, even though the most elegant iostream solution will still seem ugly to you, you can just encapsulate in a beautiful function so that it won't hurt your eyes ;)Tremblay
Regarding the 'ugly stuff' argument: while(ugly()) encapsulate_more();Chasidychasing
"it is very simple and elegant" - and deadly dangerous, as if the file happens to be 1T in size - well, your system will most likely get into an enormously heavy swapping (or will cause an infamous OOM killing of any process except for guilty one). Files are NOT strings (at least unless you checked for size first) !Headdress
G
139

There are many ways, you pick which is the most elegant for you.

Reading into char*:

ifstream file ("file.txt", ios::in|ios::binary|ios::ate);
if (file.is_open())
{
    file.seekg(0, ios::end);
    size = file.tellg();
    char *contents = new char [size];
    file.seekg (0, ios::beg);
    file.read (contents, size);
    file.close();
    //... do something with it
    delete [] contents;
}

Into std::string:

std::ifstream in("file.txt");
std::string contents((std::istreambuf_iterator<char>(in)), 
    std::istreambuf_iterator<char>());

Into vector<char>:

std::ifstream in("file.txt");
std::vector<char> contents((std::istreambuf_iterator<char>(in)),
    std::istreambuf_iterator<char>());

Into string, using stringstream:

std::ifstream in("file.txt");
std::stringstream buffer;
buffer << in.rdbuf();
std::string contents(buffer.str());

file.txt is just an example, everything works fine for binary files as well, just make sure you use ios::binary in ifstream constructor.

Gorges answered 12/10, 2008 at 11:6 Comment(16)
I like your answer even better than mine, which is not something I say often. Good job! +1Hardigg
you actually need an extra set of parentheses around the first argument to contents' constructor with istreambuf_iterator<> to prevent it from being treated as a function declaration.Mika
delete [] missing from char* version?Roop
memblock in the first version should probably be contents.Colley
@Shadow2531: I figured it should not be deleted until you're done doing something with it.Course
FWIW, I think ios_base format flags like 'binary' and 'ate' etc. should be referenced by ios_base::binary and ios_base::ate etc. I think using ios::binary and ios::ate etc. is deprecated.Roop
@Shadow2531: I tried with fairly recent GCC (4.2.3) but it does not give any deprecation warning. Care to give some URL that talks about it?Course
'Deprecated' might not be the correct term. But, I've been told that ios_base::binary is the proper way and that ios::binary is a left-over pre-Standardization.Roop
To find out for sure, I think you'd have to look in a copy of ISO/IEC 14882. However, fwiw, binary and such is defined under the ios_base class in include\c++\4.2.1-dw2\bits\ios_base.hRoop
Great examples! I'm confused about the second & third examples. If I remove what looks like an extra set of parentheses around the first parameter to the string constructor, it fails to compile. Why are they necessary?Otiliaotina
@Ferruccio: please see above comment by Greg Rogers.Course
Use std::vector contents(size) rather than char* contents;Melodeemelodeon
To improve your answer, I guess it would be nice to point out the methods loading up the whole file in memory, and those reading it iteratively. I guess the first case is applied to all of the methods you've pointed, right?Vitiated
I got this warning from icpc: remark #981: operands are evaluated in unspecified order vector<char> vm((std::istreambuf_iterator<char>(vmifs)), std::istreambuf_iterator<char>());Regnant
The first solution has two problems. First, you don't give the type of size (should be int, I assume?), and second the char* isn't terminated properly. Here's a tested version: ifstream file("..\\TESAITest\\data\\BlueprintManagerTestData.json", ios::in | ios::binary | ios::ate); if (file.is_open()) { file.seekg(0, ios::end); int size = file.tellg(); char *contents = new char[size+1]; file.seekg(0, ios::beg); file.read(contents, size); file.close(); contents[size] = '\0'; // do something delete[] contents; }Tacklind
Make sure to #include <fstream>Quadrivium
D
13

There's another thread on this subject.

My solutions from this thread (both one-liners):

The nice (see Milan's second solution):

string str((istreambuf_iterator<char>(ifs)), istreambuf_iterator<char>());

and the fast:

string str(static_cast<stringstream const&>(stringstream() << ifs.rdbuf()).str());
Daly answered 12/10, 2008 at 14:36 Comment(6)
actually, the first is faster because it operates on the istream buffer directly, and the latter relies on the first but adds some failure status bits.Sable
@Sable The first uses a very inefficient copy to construct the string without prior allocation, which leads to a lot of re-allocations. The second pre-allocates a buffer of the required size.Daly
I just tested it with VC++10. It actually depends. It depends on the file size, the first is faster for smaller files and the the second is faster for the larger files, which seems to prove what your said.Sable
string str((istreambuf_iterator<char>(ifs))); works fine for me, what is the problem with that?Agitate
@ThomasE It uses a non existent constructor overload. No idea why it works on your compiler, or what exactly it does.Daly
#195823Quadrivium
S
3

You seem to speak of elegance as a definite property of "little code". This is ofcourse subjective in some extent. Some would say that omitting all error handling isn't very elegant. Some would say that clear and compact code you understand right away is elegant.

Write your own one-liner function/method which reads the file contents, but make it rigorous and safe underneath the surface and you will have covered both aspects of elegance.

All the best

/Robert

Shadbush answered 12/10, 2008 at 11:6 Comment(1)
Corollary: Elegance is as elegance does; notions of elegant code differ between languages and paradigms. What a C++ programmer might consider elegant could be horrific for a Ruby or Python programmer, and vice-versa.Cariecaries
R
2

I like Milan's char* way, but with std::string.


#include <iostream>
#include <string>
#include <fstream>
#include <cstdlib>
using namespace std;

string& getfile(const string& filename, string& buffer) {
    ifstream in(filename.c_str(), ios_base::binary | ios_base::ate);
    in.exceptions(ios_base::badbit | ios_base::failbit | ios_base::eofbit);
    buffer.resize(in.tellg());
    in.seekg(0, ios_base::beg);
    in.read(&buffer[0], buffer.size());
    return buffer;
}

int main(int argc, char* argv[]) {
    if (argc != 2) {
        cerr << "Usage: this_executable file_to_read\n";
        return EXIT_FAILURE;
    }
    string buffer;
    cout << getfile(argv[1], buffer).size() << "\n";
}

(with or without the ios_base::binary, depending on whether you want newlines tranlated or not. You could also change getfile to just return a string so that you don't have to pass a buffer string in. Then, test to see if the compiler optimizes the copy out when returning.)

However, this might look a little better (and be a lot slower):


#include <iostream>
#include <string>
#include <fstream>
#include <cstdlib>
using namespace std;

string getfile(const string& filename) {
    ifstream in(filename.c_str(), ios_base::binary);
    in.exceptions(ios_base::badbit | ios_base::failbit | ios_base::eofbit);
    return string(istreambuf_iterator<char>(in), istreambuf_iterator<char>());
}

int main(int argc, char* argv[]) {
    if (argc != 2) {
        cerr << "Usage: this_executable file_to_read\n";
        return EXIT_FAILURE;
    }
    cout << getfile(argv[1]).size() << "\n";
}
Roop answered 12/10, 2008 at 14:24 Comment(0)
C
2

But beware that a c++-string (or more concrete: An STL-string) is as little as a C-String capable of holding a string of arbitraty length - of course not!

Take a look at the member max_size() which gives you the maximum number of characters a string might contain. This is an implementation definied number and may not be portable among different platforms. Visual Studio gives a value of about 4gigs for strings, others might give you only 64k and on 64Bit-platforms it might give you something really huge! It depends and of course normally you will run into a bad_alloc-exception due to memory exhaustion a long time before reaching the 4gig limit...

BTW: max_size() is a member of other STL-containers as well! It will give you the maximum number of elements of a certain type (for which you instanciated the container) which this container will (theoretically) be able to hold.

So, if you're reading from a file of unknow origin you should:
- Check its size and make sure it's smaller than max_size()
- Catch and process bad_alloc-exceptions

And another point: Why are you keen on reading the file into a string? I would expect to further process it by incrementally parsing it or something, right? So instead of reading it into a string you might as well read it into a stringstream (which basically is just some syntactic sugar for a string) and do the processing. But then you could do the processing directly from the file as well. Because if properly programmed the stringstream could seamlessly be replaced by a filestream, i. e. by the file itself. Or by any other input stream as well, they all share the same members and operators and can thus be seamlessly interchanged!

And for the processing itself: There's also a lot you can have automated by the compiler! E. g. let's say you want to tokenize the string. When defining a proper template the following actions:
- Reading from a file (or a string or any other input stream)
- Tokenizing the content
- pushing all found tokens into an STL-container
- sort the tokens alphabetically
- eleminating any double values
can all(!!) be achived in one single(!) line of C++-code (let aside the template itself and the error handling)! It's just a single call of the function std::copy()! Just google for "token iterator" and you'll get an idea of what I mean. So this appears to me to be even more "elegant" than just reading from a file...

Coe answered 23/9, 2010 at 11:41 Comment(1)
Of note, max_size() is defined relative to the size of size_t, which is relative to the bit size of your platform. It's defined this way to allow for a string to be as large as your platform can address.Laylalayman

© 2022 - 2024 — McMap. All rights reserved.