I had been struggling for weeks with a poor-performing translator I had written. On the following simple bechmark
#include<stdio.h>
int main()
{
int x;
char buf[2048];
FILE *test = fopen("test.out", "wb");
setvbuf(test, buf, _IOFBF, sizeof buf);
for(x=0;x<1024*1024; x++)
fprintf(test, "%04d", x);
fclose(test);
return 0
}
we see the following result
bash-3.1$ gcc -O2 -static test.c -o test
bash-3.1$ time ./test
real 0m0.334s
user 0m0.015s
sys 0m0.016s
As you can see, the moment the "-std=c99" flag is added in, performance comes crashing down:
bash-3.1$ gcc -O2 -static -std=c99 test.c -o test
bash-3.1$ time ./test
real 0m2.477s
user 0m0.015s
sys 0m0.000s
The compiler I'm using is gcc 4.6.2 mingw32.
The file generated is about 12M, so this is a difference between of about 21MB/s between the two.
Running diff
shows the the generated files are identical.
I assumed this has something to do with file locking in fprintf
, of which the program makes heavy use, but I haven't been able to find a way to switch that off in the C99 version.
I tried flockfile
on the stream I use at the beginning of the program, and an corresponding funlockfile
at the end, but was greeted with compiler errors about implicit declarations, and linker errors claiming undefined references to those functions.
Could there be another explanation for this problem, and more importantly, is there any way to use C99 on windows without paying such an enormous performance price?
Edit:
After looking at the code generated by these options, it looks like in the slow versions, mingw sticks in the following:
_fprintf:
LFB0:
.cfi_startproc
subl $28, %esp
.cfi_def_cfa_offset 32
leal 40(%esp), %eax
movl %eax, 8(%esp)
movl 36(%esp), %eax
movl %eax, 4(%esp)
movl 32(%esp), %eax
movl %eax, (%esp)
call ___mingw_vfprintf
addl $28, %esp
.cfi_def_cfa_offset 4
ret
.cfi_endproc
In the fast version, this simply does not exist; otherwise, both are exactly the same. I assume __mingw_vfprintf
seems to be the slowpoke here, but I have no idea what behavior it needs to emulate that makes it so slow.
printf
I broke the cardinal rule of this site. It didn't even occur to me the problem would be elsewhere. – Dora_fprintf
does not exist in the fast version. But then, what does the main loop look like? – Mullinaxfprintf
the main loops are identical. I'm trying to look at the mingw source now to see if there's anything that seems bottleneck-ey. – Dorafprintf()
function from Microsoft C Runtime, which is not C99 compliant, so it's overloaded by mingw32's one – Zeal