Why is this C code faster than this C++ code ? getting biggest line in file

Asked 13/1, 2012 at 15:22 Answered 27/6, 2012 at 18:35

I have two versions of a program that does basically the same thing, getting the biggest length of a line in a file, I have a file with about 8 thousand lines, my code in C is a little bit more primitive (of course!) than the code I have in C++. The C programm takes about 2 seconds to run, while the program in C++ takes 10 seconds to run (same file I am testing with for both cases). But why? I was expecting it to take the same amount of time or a little bit more but not 8 seconds slower!

my code in C:

#include <stdio.h>
#include <stdlib.h> 
#include <string.h>

#if _DEBUG
    #define DEBUG_PATH "../Debug/"
#else
    #define DEBUG_PATH ""
#endif

const char FILE_NAME[] = DEBUG_PATH "data.noun";

int main()
{   
    int sPos = 0;
    int maxCount = 0;
    int cPos = 0;
    int ch;
    FILE *in_file;              

    in_file = fopen(FILE_NAME, "r");
    if (in_file == NULL) 
    {
        printf("Cannot open %s\n", FILE_NAME);
        exit(8);
    }       

    while (1) 
    {
        ch = fgetc(in_file);
        if(ch == 0x0A || ch == EOF) // \n or \r or \r\n or end of file
        {           
            if ((cPos - sPos) > maxCount)
                maxCount = (cPos - sPos);

            if(ch == EOF)
                break;

            sPos = cPos;
        }
        else
            cPos++;
    }

    fclose(in_file);

    printf("Max line length: %i\n",  maxCount); 

    getch();
    return (0);
}

my code in C++:

#include <iostream>
#include <fstream>
#include <stdio.h>
#include <string>

using namespace std;

#ifdef _DEBUG
    #define FILE_PATH "../Debug/data.noun"
#else
    #define FILE_PATH "data.noun"
#endif

int main()
{
    string fileName = FILE_PATH;
    string s = "";
    ifstream file;
    int size = 0;

    file.open(fileName.c_str());
    if(!file)
    {
        printf("could not open file!");
        return 0;
    }

    while(getline(file, s) )
            size = (s.length() > size) ? s.length() : size;
    file.close();

    printf("biggest line in file: %i", size);   

    getchar();
    return 0;
}

Wile answered 13/1, 2012 at 15:22 Comment(11)

The programs are far from being equivalent. You actually copy the whole line into memory in C++ version. Equivalent C++ version would be C version with file >> ch; instead of ch = fgetc(in_file); – Psychopathist 13/1, 2012 at 15:31

What does getline() in the C++ version do? I've seen ifstream.getline() but not a getline function that takes ifstream...? – Defeatist 13/1, 2012 at 15:32

Btw, both 2 and 10 seconds seems very slow for just a few thousand lines. What kind of computer are you using? – Phoneme 13/1, 2012 at 15:35

@JoachimIsaksson It's global version of getline(...). It takes std::string & unlike istream::getline that takes char *. – Psychopathist 13/1, 2012 at 15:38

@AzzA Thanks, I learn something new every day... :) – Defeatist 13/1, 2012 at 15:48

as @ThomasPadron-McCarthy says, it seems slow... It would be useful to know how big the file is or how long the lines are in average. I have a feeling that longer lines will make the C++ go slower. – Herra 13/1, 2012 at 15:53

Turn on optimizations – Phlegm 13/1, 2012 at 16:55

In the future, you should compare against '\n' instead of 0x0A. It's much clearer. – Entirety 13/1, 2012 at 16:56

If you claim C is faster than C++, your question will get serious attention from C++ programmers. ;) – Gallion 14/1, 2012 at 1:42

If you were to run this on Mac OS X, you could use the 'sample' command line program, or the time profile instrument in instruments to see exactly why one version is faster than the other. – Lupulin 14/1, 2012 at 2:15

Is it bad that I read the title and want to write max(open(".../f"), key=len)? – Complice 17/1, 2012 at 17:57

The C++ version constantly allocates and deallocates instances of std::string. Memory allocation is a costly operation. In addition to that the constructors/destructors are executed.

The C version however uses constant memory, and just does was necessary: Reading in single characters, setting the line-length counter to the new value if higher, for each newline and that's it.

Panthia answered 13/1, 2012 at 15:26 Comment(7)

I find it hard to believe that the cost of allocation is so expensive compared to the actual disk I/O cost. – Peripeteia 13/1, 2012 at 15:41

Allocating and freeing small chunks of memory in a single-threaded application is not that expensive. It can't add 8 seconds to the run time. – Mealymouthed 13/1, 2012 at 15:43

The C++ version almost doesn't reallocate memory. Each iteration reuses the memory of s from the previous line. So if the longest line is N characters long, then there will be at most O(log N) reallocation. Most probably the OP didn't enable optimizations. – Race 13/1, 2012 at 16:21

ybungalobill is right. There are only two string variables declared. Memory is not reallocated each time s gets a new value - a reallocation is only done if the existing buffer isn't big enough. When a reallocation is done, the buffer grows by some constant factor (not a constant addition) - e.g. doubling. This buffer doesn't shrink again even if a much smaller line occurs in the file. In practice, I'd be surprised if you manage even 10 reallocations in a run - that's more like 8 microseconds than 8 seconds. Constructors, destructors and heap allocation are irrelevant here. – Nevski 13/1, 2012 at 22:0

This is all good and fair, but the dynamic allocation of memory is really the only difference I can see. Of course C stdio is buffered as well, so this only a weak argument. However in the OP's case I see no other reason for the observed behaviour, than bad or no optimization of the C++ overhead I mentioned. – Panthia 14/1, 2012 at 1:42

@Panthia - getline is an instantiation of a template. Has to be - there's more that one kind of string in C++, and more than one kind of stream - for example, strings and streams of wchar_t. That getline call may not be to an already-highly-optimised supplied-with-the-compiler static library implementation. Disabling optimisation may affect the internals of getline (and other functions) in the C++ example in ways that can't apply to the C case. – Nevski 15/1, 2012 at 3:27

@Steve314: This just adds to what I was suggesting. The getline instanciation, in it's unoptimized form may introduce a lot of allocation/deallocation overhead. – Panthia 16/1, 2012 at 23:22

My guess is that it is a problem with the compiler options you are using, the compiler itself, or the file system. I just now compiled both versions (with optimizations on) and ran them against a 92,000 line text file:

c++ version:  113 ms
c version:    179 ms

And I suspect that the reason that the C++ version is faster is because fgetc is most likely slower. fgetc does use buffered I/O, but it is making a function call to retrieve every character. I've tested it before and fgetc is not as fast as making a call to read the entire line in one call (e.g., compared to fgets).

Mealymouthed answered 13/1, 2012 at 15:40 Comment(1)

Note: fgetc is required to lock the stream per each call according to POSIX. Most systems, including MSVC CRT, follow this guideline. The C++ version in MSVC locks the stream once per a call to getline, although this is not required by any standard. – Race 13/1, 2012 at 16:18

So in a few comments I echoed peoples' answers that the problem was likely the extra copying done by your C++ version, where it copies the lines into memory in a string. But I wanted to test that.

First I implemented the fgetc and getline versions and timed them. I confirmed that in debug mode the getline version is slower, about 130 µs vs 60 µs for the fgetc version. This is unsurprising given conventional wisdom that iostreams are slower than using stdio. However in the past it's been my experience that iostreams get a significant speed up from optimization. This was confirmed when I compared my release mode times: about 20 µs using getline and 48 µs with fgetc.

The fact that using getline with iostreams is faster than fgetc, at least in release mode, runs counter to the reasoning that copying all that data must be slower than not copying it, so I'm not sure what all optimization is able to avoid, and I didn't really look to find any explanation, but it'd be interesting to understand what's being optimized away. edit: when I looked at the programs with a profiler it wasn't obvious how to compare the performance since the different methods looked so different from each other

Anwyay I wanted to see if I could get a faster version by avoiding the copying using the get() method on the fstream object and just do exactly what the C version is doing. When I did this I was quite surprised to find that using fstream::get() was quite a bit slower than both the fgetc and getline methods in both debug and release; About 230 µs in debug, and 80 µs in Release.

To narrow down whatever the slow-down is I went ahead and and did another version, this time using the stream_buf attached to the fstream object, and snextc() method on that. This version is by far the fastest; 25 µs in debug and 6 µs in release.

I'm guessing that the thing that makes the fstream::get() method so much slower is that it constructs a sentry objects for every call. Though I haven't tested this, I can't see that get() does much beyond just getting the next character from the stream_buf, except for these sentry objects.

Anyway, the moral of the story is that if you want fast io you're probably best off using high level iostream functions rather than stdio, and for really fast io access the underlying stream_buf. edit: actually this moral may only apply to MSVC, see update at bottom for results from a different toolchain.

For reference:

I used VS2010 and chrono from boost 1.47 for timing. I built 32-bit binaries (seems required by boost chrono because it can't seem to find a 64 bit version of that lib). I didn't tweak the compile options but they may not be completely standard since I did this in a scratch vs project I keep around.

The file I tested with was the 1.1 MB 20,000 line plain text version of Oeuvres Complètes de Frédéric Bastiat, tome 1 by Frédéric Bastiat from Project Gutenberg, http://www.gutenberg.org/ebooks/35390

Release mode times

fgetc time is: 48150 microseconds
snextc time is: 6019 microseconds
get time is: 79600 microseconds
getline time is: 19881 microseconds

Debug mode times:

fgetc time is: 59593 microseconds
snextc time is: 24915 microseconds
get time is: 228643 microseconds
getline time is: 130807 microseconds

Here's my fgetc() version:

{
    auto begin = boost::chrono::high_resolution_clock::now();
    FILE *cin = fopen("D:/bames/automata/pg35390.txt","rb");
    assert(cin);
    unsigned maxLength = 0;
    unsigned i = 0;
    int ch;
    while(1) {
        ch = fgetc(cin);
        if(ch == 0x0A || ch == EOF) {
            maxLength = std::max(i,maxLength);
            i = 0;
            if(ch==EOF)
                break;
        } else {
            ++i;
        }
    }
    fclose(cin);
    auto end = boost::chrono::high_resolution_clock::now();
    std::cout << "max line is: " << maxLength << '\n';
    std::cout << "fgetc time is: " << boost::chrono::duration_cast<boost::chrono::microseconds>(end-begin) << '\n';
}

Here's my getline() version:

{
    auto begin = boost::chrono::high_resolution_clock::now();
    std::ifstream fin("D:/bames/automata/pg35390.txt",std::ios::binary);
    unsigned maxLength = 0;
    std::string line;
    while(std::getline(fin,line)) {
        maxLength = std::max(line.size(),maxLength);
    }
    auto end = boost::chrono::high_resolution_clock::now();
    std::cout << "max line is: " << maxLength << '\n';
    std::cout << "getline time is: " << boost::chrono::duration_cast<boost::chrono::microseconds>(end-begin) << '\n';
}

the fstream::get() version

{
    auto begin = boost::chrono::high_resolution_clock::now();
    std::ifstream fin("D:/bames/automata/pg35390.txt",std::ios::binary);
    unsigned maxLength = 0;
    unsigned i = 0;
    while(1) {
        int ch = fin.get();
        if(fin.good() && ch == 0x0A || fin.eof()) {
            maxLength = std::max(i,maxLength);
            i = 0;
            if(fin.eof())
                break;
        } else {
            ++i;
        }
    }
    auto end = boost::chrono::high_resolution_clock::now();
    std::cout << "max line is: " << maxLength << '\n';
    std::cout << "get time is: " << boost::chrono::duration_cast<boost::chrono::microseconds>(end-begin) << '\n';
}

and the snextc() version

{
    auto begin = boost::chrono::high_resolution_clock::now();
    std::ifstream fin("D:/bames/automata/pg35390.txt",std::ios::binary);
    std::filebuf &buf = *fin.rdbuf();
    unsigned maxLength = 0;
    unsigned i = 0;
    while(1) {
        int ch = buf.snextc();
        if(ch == 0x0A || ch == std::char_traits<char>::eof()) {
            maxLength = std::max(i,maxLength);
            i = 0;
            if(ch == std::char_traits<char>::eof())
                break;
        } else {
            ++i;
        }
    }
    auto end = boost::chrono::high_resolution_clock::now();
    std::cout << "max line is: " << maxLength << '\n';
    std::cout << "snextc time is: " << boost::chrono::duration_cast<boost::chrono::microseconds>(end-begin) << '\n';
}

update:

I reran the tests using clang (trunk) on OS X with libc++. The results for the iostream based implementations stayed relatively the same (with optimization turned on); fstream::get() much slower than std::getline() much slower than filebuf::snextc(). But the performance of fgetc() improved relative to the getline() implementation and became faster. Perhaps this is because the copying done by getline() becomes an issue with this toolchain whereas it wasn't with MSVC? Maybe Microsoft's CRT implementation of fgetc() is bad or something?

Anyway, here are the times (I used a much larger file, 5.3 MB):

using -Os

fgetc time is: 39004 microseconds
snextc time is: 19374 microseconds
get time is: 145233 microseconds
getline time is: 67316 microseconds

using -O0

fgetc time is: 44061 microseconds
snextc time is: 92894 microseconds
get time is: 184967 microseconds
getline time is: 209529 microseconds

-O2

fgetc time is: 39356 microseconds
snextc time is: 21324 microseconds
get time is: 149048 microseconds
getline time is: 63983 microseconds

-O3

fgetc time is: 37527 microseconds
snextc time is: 22863 microseconds
get time is: 145176 microseconds
getline time is: 67899 microseconds

Luciano answered 13/1, 2012 at 17:4 Comment(2)

+1 for testing and providing nice examples. More answers should be like this. – Nole 13/1, 2012 at 18:52

Added -O2 and -O3 times. They're not much different. The platform vendor recommends -Os since small executable size has its own performance benefits. I would also show -O4 (link time optimization) times, but I would have to use an older compiler to do that because the installed linker isn't updated for the latest version of my compiler. – Luciano 18/1, 2012 at 1:8

The C++ version constantly allocates and deallocates instances of std::string. Memory allocation is a costly operation. In addition to that the constructors/destructors are executed.

The C version however uses constant memory, and just does was necessary: Reading in single characters, setting the line-length counter to the new value if higher, for each newline and that's it.

Panthia answered 13/1, 2012 at 15:26 Comment(7)

I find it hard to believe that the cost of allocation is so expensive compared to the actual disk I/O cost. – Peripeteia 13/1, 2012 at 15:41

Allocating and freeing small chunks of memory in a single-threaded application is not that expensive. It can't add 8 seconds to the run time. – Mealymouthed 13/1, 2012 at 15:43

@Steve314: This just adds to what I was suggesting. The getline instanciation, in it's unoptimized form may introduce a lot of allocation/deallocation overhead. – Panthia 16/1, 2012 at 23:22

You are not comparing apples to apples. Your C program does no copying of data from FILE* buffer into your program's memory. It also operates on raw files.

Your C++ program needs to traverse the length of each string several times - once in the stream code to know when to terminate the string that it returns to you, once in the constructor of std::string, ~~and once in your code's call to s.length()~~.

It is possible that you could improve the performance of your C program, for example by using getc_unlocked if it is available to you. But the biggest win comes from not having to copy your data.

EDIT: edited in response to a comment by bames53

Thilda answered 13/1, 2012 at 15:28 Comment(13)

The call to s.length() is constant time. But otherwise, yeah copying is probably the real performance issue. – Luciano 13/1, 2012 at 15:36

This is my favorite answer because, indeed, the algorithms used in each version are different. – Guiscard 13/1, 2012 at 15:38

But... the C version is copying file content into memory. Visibly it does it one char at a time (lots of call overheads, aas Mark Wilkins points out). In the background, it uses a buffer. Also, there is no repeated construction/destruction of strings in the C++ version. The string s is modified by getline, but that doesn't cause s to be re-constructed. It will only cause a reallocation (of a behind-the-scenes buffer - not of the std::string instance itself) if the new line is sufficiently bigger than any previous one. – Nevski 13/1, 2012 at 22:6

@Steve314 I am talking only about the things that C++ program does in addition to what the C program does. Of course there is buffering going on in the file I/O layers, and I mention it in the answer (see the FILE* buffer part), but it is the same for both programs. Looking for '\n' to decide when to terminate the line returned from getline, however, is unique to the OP's C++ solution: his C version does it only once in his while loop. As far as allocation/deallocation goes, my answer does not mention it at all. – Thilda 13/1, 2012 at 23:30

@dasblinkenlight - you say "Your C++ program needs to traverse the length of each string several times" ... "once in the constructor of std::string". The constructor of std::string is called precisely twice in the program, neither of them in any way connected to the traversal of each line. As the std::string constructor cannot be relevant in the way you suggest, it seemed sensible to consider if reallocation could be an issue in any way related to the "traversal of each line", which can only mean the getline call. – Nevski 14/1, 2012 at 22:21

@Steve314 "neither of them in any way connected to the traversal of each line." Of course it does! How else do you think the constructor copies the content passed into it into the memory "owned" by the string without going through the data character-by-character (the process also known as "traversal")? – Thilda 14/1, 2012 at 22:40

@dasblinkenlight - that copying isn't done by a constructor. There are many ways to copy data into an std::string instance. That getline function will not create an extra std::string instance just as something to copy from - that would be crazy. For a start, the data would have to be copied to that extra instance first anyway. If getline uses another std::string instance at all, it will use the swap method (or similar) to exchange buffers instead, avoiding that redundant extra copy from one std::string to the other. – Nevski 15/1, 2012 at 3:16

@Steve314 It absolutely does not matter what particulat piece of code does the copying: the point is, the data needs to get from the buffer into the memory owned by an instance of a string, and in order to do that, it must be copied. There is simply no way around that. Constructor or not, there is an act of transfering data from the buffer of the stream into the string passed in, that is happening inside the getline method. My point is, that copying takes time, and since it is not happening in the C program, it could be part of the reason why the C++ code is slower. What is your point? – Thilda 15/1, 2012 at 4:12

@dasblinkenlight - on the copying, absolutely true. In a comment elsewhere, I already pointed out that the internals of getline may not be optimised. The issue may be things like operator[] call overheads within getline. But that has nothing to do with the std::string constructor. My point is that at the very least you're explaining your point badly, using the wrong words, and therefore misleading people. Reword it and I'll be happy to delete these comments. – Nevski 15/1, 2012 at 5:42

@Steve314 re "the C version is copying file content into memory" reading the bytes and sticking them in a variable one by one could easily be done without any store instructions. The C++ version is writing the data out to memory while the C version only has to store the value in a register for the comparison. – Luciano 18/1, 2012 at 14:14

@Luciano - I was referring to the fact that the C runtime has buffers for I/O, because literally reading one character at a time is inefficient. Similar buffers also exist in C++, but using std::string as another intermediate buffer adds a layer of buffering that isn't present in the C version. This should at worst double the running time (probably much less) assuming that memory bandwidth is an issue, but without compiler optimisation, memory bandwidth probably isn't the issue. – Nevski 18/1, 2012 at 14:34

@Luciano - Reallocation of memory is also an extra cost in the C++ version as the std::string gets resized, but it's a tiny extra cost as the reallocation happens very rarely - NOT for each line. The most likely problem with the C++ version, as I already said, is that because getline is a template it is compiled along with the program, with the same disabled optimisation options - no inlining of trivial method calls such as operator[], no moving of related calculations out of the inner loops, etc. As a rule, unoptimised C++ is much less efficient than unoptimised C. – Nevski 18/1, 2012 at 14:37

@Steve314 In my test the getline version takes about a 70% performance hit (the -Os results in my answer), though I'm not sure I isolated the performance cost of just the extra copying. – Luciano 18/1, 2012 at 14:45

2 seconds for just 8.000 lines? I don't know how long your lines are, but the chances are that you are doing something very wrong.

This trivial Python program executes almost instantly with El Quijote downloaded from Project Gutenberg (40006 lines, 2.2MB):

import sys
print max(len(s) for s in sys.stdin)

The timing:

~/test$ time python maxlen.py < pg996.txt
76

real    0m0.034s
user    0m0.020s
sys     0m0.010s

You could improve your C code by buffering the input rather than reading char by char.

About why is the C++ slower than C, it should be related with building the string objects and then calling the length method. In C you are just counting the chars as you go.

Herra answered 13/1, 2012 at 15:33 Comment(3)

The input is already buffered by stdio, so I don't think you'll gain much by additional buffering. But I agree that the stated times seem slow. – Phoneme 13/1, 2012 at 15:39

On my fairly new computer, and with 10 million lines, the C program runs in about 3 seconds -- and the C++ program in less than one second. (GCC on Linux, though.) So I agree that "something very wrong" seems likely to be going on here. – Phoneme 13/1, 2012 at 15:50

but still reasonable to think that something is weird... I'm looking forward to see how big our op's lines are :-) – Herra 13/1, 2012 at 16:11

I tried compiling and running your programs against 40K lines of C++ source and they both completed in about 25ms or so. I can only conclude that your input files have extremely long lines, possibly 10K-100K characters per line. In that case the C version doesn't have any negative performance from the long line length while the C++ version would have to keep increasing the size of the string and copying the old data into the new buffer. If it had to increase in size a sufficient number of times that could account for the excessive performance difference.

The key here is that the two programs don't do the same thing so you can't really compare their results. If you were able to provide the input file we might be able to provide additional details.

You could probably use tellg and ignore to do this faster in C++.

Peripeteia answered 13/1, 2012 at 15:52 Comment(0)

The C++ program builds string objects of the lines, while the C program just reads characters and looks at the characters.

EDIT:

Thanks for the upvotes, but after the discussion I now think this answer is wrong. It was a reasonable first guess, but in this case it seems that the different (and very slow) execution times are caused by other things.

Phoneme answered 13/1, 2012 at 15:25 Comment(0)

-1

I'm alright with the theory folks. But let's get empirical.

I generated a file with 13 million lines of text file to work with.

~$ for i in {0..1000}; do cat /etc/* | strings; done &> huge.txt

The original code edited to read from stdin (shouldn't affect too much the performance) made it in almost 2 min.

C++ code:

#include <iostream>
#include <stdio.h>

using namespace std;

int main(void)
{
    string s = "";
    int size = 0;

    while (cin) {
        getline(cin, s);
        size = (s.length() > size) ? s.length() : size;
    }
    printf("Biggest line in file: %i\n", size);

    return 0;
}

C++ time:

~$ time ./cplusplus < huge.txt
real    1m53.122s
user    1m29.254s
sys     0m0.544s

A 'C' version:

#include <stdio.h>
int main(void)
{
    char *line = NULL;
    int read, max = 0, len = 0;

    while ((read = getline(&line, &len, stdin)) != -1)
        if (max < read)
            max = read -1;
    printf("Biggest line in file %d\n", max);

    return 0;
}

C performance:

~$ time ./ansic < huge.txt
real    0m4.015s
user    0m3.432s
sys     0m0.328s

Do your own math...

Holstein answered 27/6, 2012 at 18:35 Comment(1)

As you're using cin instead of fstreams, you should add cin.sync_with_stdio(false);. With disabled synchronisation, the performance is about the same. By the way: getline is a POSIX function, it's not in the C Standard; <stdio.h> is a C header, the corresponding C++ header is <cstdio> (or just use cout << "Biggest line in file: " << size;). For getline and string, you should include <string>. – Noenoel 21/10, 2013 at 15:32

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

update:

Recommended topics

Hot tags