I know that similar questions have been asked before, but:
- In regards to gprof produces empty output I am using GCC 10.2.0 from MSYS2 on Windows 10, i.e. it's a MingW64 distribution. I have also added
-no-pie
to the linker arguments with no results still. The version of GNU Prof is 2.36.1 - In regards to gprof gives no output I am executing this on a program that explicitly takes time to execute.
The program is the code from this tutorial which is replicated below:
//test_gprof.c
#include<stdio.h>
void new_func1(void)
{
printf("\n Inside new_func1()\n");
int i = 0;
for(;i<0xffffffee;i++);
return;
}
void func1(void)
{
printf("\n Inside func1 \n");
int i = 0;
for(;i<0xffffffff;i++);
new_func1();
return;
}
static void func2(void)
{
printf("\n Inside func2 \n");
int i = 0;
for(;i<0xffffffaa;i++);
return;
}
int main(void)
{
printf("\n Inside main()\n");
int i = 0;
for(;i<0xffffff;i++);
func1();
func2();
return 0;
}
I'm building with Code::Blocks. I've tried with both the 32bit and 64bit versions of MingW from MSYS2. I've upgraded everything to the latest versions of the packages available.
Output from the Code::Blocks build log:
g++.exe -pg -c C:\Users\david\Documents\GameDev\CPPTESTS\main.cpp -o obj\Debug\CPPTESTS\main.o
g++.exe -o CPPTESTS-d.exe obj\Debug\CPPTESTS\main.o -pg -lgmon -no-pie
The program compiles and executes successfully. After execution, a gmon.out file is generated. Executing gprof to interpret this file:
gprof CPPTests-d.exe gmon.out > gprofoutput.txt
produces the following in gprofoutput.txt
:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
% the percentage of the total running time of the
time program used by this function.
cumulative a running sum of the number of seconds accounted
seconds for by this function and those listed above it.
self the number of seconds accounted for by this
seconds function alone. This is the major sort for this
listing.
calls the number of times this function was invoked, if
this function is profiled, else blank.
self the average number of milliseconds spent in this
ms/call function per call, if this function is profiled,
else blank.
total the average number of milliseconds spent in this
ms/call function and its descendents per call, if this
function is profiled, else blank.
name the name of the function. This is the minor sort
for this listing. The index shows the location of
the function in the gprof listing. If the index is
in parenthesis it shows where it would appear in
the gprof listing if it were to be printed.
Copyright (C) 2012-2021 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.
Call graph (explanation follows)
granularity: each sample hit covers 4 byte(s) no time propagated
index % time self children called name
This table describes the call tree of the program, and was sorted by
the total amount of time spent in each function and its children.
Each entry in this table consists of several lines. The line with the
index number at the left hand margin lists the current function.
The lines above it list the functions that called this function,
and the lines below it list the functions this one called.
This line lists:
index A unique number given to each element of the table.
Index numbers are sorted numerically.
The index number is printed next to every function name so
it is easier to look up where the function is in the table.
% time This is the percentage of the `total' time that was spent
in this function and its children. Note that due to
different viewpoints, functions excluded by options, etc,
these numbers will NOT add up to 100%.
self This is the total amount of time spent in this function.
children This is the total amount of time propagated into this
function by its children.
called This is the number of times the function was called.
If the function called itself recursively, the number
only includes non-recursive calls, and is followed by
a `+' and the number of recursive calls.
name The name of the current function. The index number is
printed after it. If the function is a member of a
cycle, the cycle number is printed between the
function's name and the index number.
For the function's parents, the fields have the following meanings:
self This is the amount of time that was propagated directly
from the function into this parent.
children This is the amount of time that was propagated from
the function's children into this parent.
called This is the number of times this parent called the
function `/' the total number of times the function
was called. Recursive calls to the function are not
included in the number after the `/'.
name This is the name of the parent. The parent's index
number is printed after it. If the parent is a
member of a cycle, the cycle number is printed between
the name and the index number.
If the parents of the function cannot be determined, the word
`<spontaneous>' is printed in the `name' field, and all the other
fields are blank.
For the function's children, the fields have the following meanings:
self This is the amount of time that was propagated directly
from the child into the function.
children This is the amount of time that was propagated from the
child's children to the function.
called This is the number of times the function called
this child `/' the total number of times the child
was called. Recursive calls by the child are not
listed in the number after the `/'.
name This is the name of the child. The child's index
number is printed after it. If the child is a
member of a cycle, the cycle number is printed
between the name and the index number.
If there are any cycles (circles) in the call graph, there is an
entry for the cycle-as-a-whole. This entry shows who called the
cycle (as parents) and the members of the cycle (as children.)
The `+' recursive calls entry shows the number of function calls that
were internal to the cycle, and the calls entry for each member shows,
for that member, how many times it was called from other members of
the cycle.
Copyright (C) 2012-2021 Free Software Foundation, Inc.
Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.
Index by function name
which you can see it's completely empty. Further, if I open gmon.out
in a hex editor, it's mostly blank inside, here's the first bit:
F0 14 C5 00 18 8A C5 00 B4 3A 00 00 79 18 05 00 64 00 00 00 96 1A C5 00 A7 16 C5 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 05 00 FE 00 5D 00 E2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 07 00 C6 01 A5 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 06 00 E5 00 77 00 F5 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
and then the end is the following:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E4 15 C5 00 6C 89 C5 00 01 00 00 00 14 16 C5 00 6C 89 C5 00 01 00 00 00 2C 16 C5 00 DB 15 C5 00 01 00 00 00 48 16 C5 00 6C 89 C5 00 01 00 00 00 84 16 C5 00 6C 89 C5 00 01 00 00 00 A0 16 C5 00 0A 16 C5 00 01 00 00 00 A4 16 C5 00 3C 16 C5 00 01 00 00 00
The rest is a few kilobytes of 00
.
I have tried everything I can find in google about getting this to work. Originally I was having trouble executing it on actual code for a project, but have resorted to using the code above as there seems to be something I'm doing wrong which I just can't figure out.
Any help would be appreciated.
EDIT
I managed to produce results in gmon.out that were parsed correctly with gprof and gave me timing for all the functions in my test program. Making no changes, not even closing the MSYS2 terminal (I abandoned Code::Blocks to be sure it wasn't the issue) , I then recompiled the program with the same command, executed it again, and back to having an empty results.
It's unclear to me if it's something during compilation (hard to believe, it was compiled using the last command in the terminal history), or something on my machine when I execute it. Or if it's a problem with gprof translating the gmon.out file.
EDIT 2
I've executed the program on a different machine (it's a similar Windows 10 Surface Pro) that has never been used for development. It's produced gmon.out also seems to have no results when parsed through gprof.
EDIT 3
I've tired with both gcc and g++ with the same results. I've tried separate compilation and link steps or doing both together. I've tried linking to -lgmon
or not as well. All gives same empty results when gprof is parsed.
I haven't been able to get results again since the one random time that it worked.
EDIT 4
I compiled the program with both -g
and -pg
and then ran it through VTune. I can see within a Hotspots analysis that function calls related to the profiler are being called.
Here's the entire callee stack. You can see the profile-related function calls profthr_func
and get_thrpc
being called.
Callees CPU Time: Total CPU Time: Self
BaseThreadInitThunk 100.0% 0s
_tmainCRTStartup 99.1% 0s
main 99.1% 0.016s
func1 65.8% 20.521s
new_func1 32.8% 20.456s
func2 33.2% 20.695s
profthr_func 0.6% 0.009s
WaitForSingleObject 0.4% 0.231s
get_thrpc 0.2% 0s
SwitchToThread 0.2% 0.116s
TlsGetValue 0.0% 0.002s
GetLastError 0.0% 0.001s
SwitchToThread 0.0% 0.001s
GetLastError 0.0% 0.001s
[Outside any known module] 0.0% 0.001s
func@0x4b308730 0.0% 0.000s
TlsGetValue 0.0% 0.000s
get_thrpc 0.4% 0s
GetThreadContext 0.3% 0.214s
GetLastError 0.0% 0.006s
TlsGetValue 0.0% 0.000s
[Outside any known module] 0.0% 0.002s
SuspendThread 0.0% 0.001s
[Unknown stack frame(s)] 0.0% 0s
profthr_func 0.0% 0s
WaitForSingleObject 0.0% 0.000s
And then all the functions and their sources etc:
Function CPU Time: Total CPU Time: Self Module Function (Full) Source File Start Address
BaseThreadInitThunk 100.0% 0s kernel32.dll BaseThreadInitThunk [Unknown] 0x6b81fa10
func@0x4b2e7a03 100.0% 0s ntdll.dll func@0x4b2e7a03 [Unknown] 0x4b2e7a03
func@0x4b2e7a1f 100.0% 0s ntdll.dll func@0x4b2e7a1f [Unknown] 0x4b2e7a1f
_tmainCRTStartup 99.1% 0s CPPTESTS-d.exe _tmainCRTStartup crtexe.c 0x401170
main 99.1% 0.016s CPPTESTS-d.exe main main.cpp 0x40165c
func1 65.8% 20.521s CPPTESTS-d.exe func1(void) main.cpp 0x4015fd
func2 33.2% 20.695s CPPTESTS-d.exe func2 main.cpp 0x40162f
new_func1 32.8% 20.456s CPPTESTS-d.exe new_func1(void) main.cpp 0x4015d0
profthr_func 0.6% 0.009s CPPTESTS-d.exe profthr_func profil.c 0x408620
get_thrpc 0.5% 0s CPPTESTS-d.exe get_thrpc profil.c 0x4085c0
WaitForSingleObject 0.4% 0.231s KernelBase.dll WaitForSingleObject [Unknown] 0x10110440
GetThreadContext 0.3% 0.214s KernelBase.dll GetThreadContext [Unknown] 0x101b61a0
SwitchToThread 0.2% 0.116s KernelBase.dll SwitchToThread [Unknown] 0x100f5b40
GetLastError 0.0% 0.007s kernel32.dll GetLastError [Unknown] 0x6b81e010
[Outside any known module] 0.0% 0.002s [Unknown] [Outside any known module] [Unknown] 0
TlsGetValue 0.0% 0.002s KernelBase.dll TlsGetValue [Unknown] 0x101193a0
SuspendThread 0.0% 0.001s kernel32.dll SuspendThread [Unknown] 0x6b834e90
SwitchToThread 0.0% 0.001s kernel32.dll SwitchToThread [Unknown] 0x6b817cc0
GetLastError 0.0% 0.001s KernelBase.dll GetLastError [Unknown] 0x10110580
TlsGetValue 0.0% 0.000s kernel32.dll TlsGetValue [Unknown] 0x6b81df20
func@0x4b308730 0.0% 0.000s ntdll.dll func@0x4b308730 [Unknown] 0x4b308730
[Unknown stack frame(s)] 0.0% 0s [Unknown] [Unknown stack frame(s)] [Unknown] 0
I'm not sure if this information is of any help in sorting out my problem. The gmon.out file once again, when parsed through gprof, features no information.
I've read this answer to a question about how gprof works, so I would expect that the calls to main
, func1
, func2
, and new_func1
would have featured some kind of instrument call?
I ran VTune and checked the Threading option and noticed that gprof seems to do it's thing via a new thread, which is different to my understanding above. However, it doesn't explain why I did once get some output and have no been able to recreate it.
But, in Vtune, I see no calls to mcount
which I thought was how gprof worked. It does show me bits and pieces (as you can see in the call stack and function list above) that profiling stuff from the -pg
flag is present.
EDIT 5
I've uploaded my compiled program and it's gmon.out in case it helps anyone help answer the question.
EDIT 6
I reinstalled MSYS2 in the hopes that that would solve the problem. It didn't.
EDIT 7
I have tried linking with -static -static-libgcc -static-libstdc++
. It had no effect.
EDIT 8
I've executed gprof with -d
flag. Debug output can be found here. But it's mostly just gibberish to me.
EDIT 9
I've installed Mingw-Builds
directly outside of MSYS2 and tested version 8.1.0, both Dwarf and SJLJ in both i686 and 86_64 versions. After compiling and linking with this version of G++, a gmon.out file is generated, but gprof then gives the following error (or are variation of the number):
BFD: Dwarf Error: Could not find abbrev number 108.
But the generated output works. I've also checked that an 8.1.0 compiled programs gmon.out can be parsed by the newer MSYS2 gprof without issue. So it appears the problem is with more recent versions of Mingw64.
Gprof reports version 2.30, G++ reports version 8.1.0.
EDIT 10
I've tried compiling and running gprof on a clean Windows 7 Virtual Box machine and get the same results. So I don't it's my OS (Windows 10) or my particular computer.
Doesn't look like this question is getting answered here. If I ever figure it out I will answer the question. Until then, I've created issues on both the MSYS2 tracker and Mingw64 tracker.
EDIT 11
I've been working now on the following code:
#include <iostream>
bool is_prime(const int& number)
{
if(number == 0 || number == 1)
return false;
else
{
for(int i = 2; i <= number / 2; ++i)
{
if(number % i == 0)
{
return false;
}
}
}
return true;
}
int main()
{
int low = 0;
int high = 300000;
while(low < high)
{
if(is_prime(low))
std::cout << low << ", ";
++low;
}
std::cout << std::endl;
return 0;
}
using the same compiler flags as previously g++ -g -pg -O0 main.cpp -o CPPTESTS-d.exe
objdump -d CPPTESTS-d.exe
provides me with the following from the main
and is_prime
sections:
004015d0 <__Z8is_primeRKi>:
4015d0: 55 push %ebp
4015d1: 89 e5 mov %esp,%ebp
4015d3: 83 ec 10 sub $0x10,%esp
4015d6: e8 a5 15 00 00 call 402b80 <_mcount>
4015db: 8b 45 08 mov 0x8(%ebp),%eax
4015de: 8b 00 mov (%eax),%eax
4015e0: 85 c0 test %eax,%eax
4015e2: 74 0a je 4015ee <__Z8is_primeRKi+0x1e>
4015e4: 8b 45 08 mov 0x8(%ebp),%eax
4015e7: 8b 00 mov (%eax),%eax
4015e9: 83 f8 01 cmp $0x1,%eax
4015ec: 75 07 jne 4015f5 <__Z8is_primeRKi+0x25>
4015ee: b8 00 00 00 00 mov $0x0,%eax
4015f3: eb 3b jmp 401630 <__Z8is_primeRKi+0x60>
4015f5: c7 45 fc 02 00 00 00 movl $0x2,-0x4(%ebp)
4015fc: 8b 45 08 mov 0x8(%ebp),%eax
4015ff: 8b 00 mov (%eax),%eax
401601: 89 c2 mov %eax,%edx
401603: c1 ea 1f shr $0x1f,%edx
401606: 01 d0 add %edx,%eax
401608: d1 f8 sar %eax
40160a: 39 45 fc cmp %eax,-0x4(%ebp)
40160d: 7f 1c jg 40162b <__Z8is_primeRKi+0x5b>
40160f: 8b 45 08 mov 0x8(%ebp),%eax
401612: 8b 00 mov (%eax),%eax
401614: 99 cltd
401615: f7 7d fc idivl -0x4(%ebp)
401618: 89 d0 mov %edx,%eax
40161a: 85 c0 test %eax,%eax
40161c: 75 07 jne 401625 <__Z8is_primeRKi+0x55>
40161e: b8 00 00 00 00 mov $0x0,%eax
401623: eb 0b jmp 401630 <__Z8is_primeRKi+0x60>
401625: 83 45 fc 01 addl $0x1,-0x4(%ebp)
401629: eb d1 jmp 4015fc <__Z8is_primeRKi+0x2c>
40162b: b8 01 00 00 00 mov $0x1,%eax
401630: c9 leave
401631: c3 ret
00401632 <_main>:
401632: 8d 4c 24 04 lea 0x4(%esp),%ecx
401636: 83 e4 f0 and $0xfffffff0,%esp
401639: ff 71 fc push -0x4(%ecx)
40163c: 55 push %ebp
40163d: 89 e5 mov %esp,%ebp
40163f: 51 push %ecx
401640: 83 ec 24 sub $0x24,%esp
401643: e8 38 15 00 00 call 402b80 <_mcount>
401648: e8 73 1a 00 00 call 4030c0 <__monstartup>
40164d: e8 be 01 00 00 call 401810 <___main>
401652: c7 45 f0 00 00 00 00 movl $0x0,-0x10(%ebp)
401659: c7 45 f4 e0 93 04 00 movl $0x493e0,-0xc(%ebp)
401660: 8b 45 f0 mov -0x10(%ebp),%eax
401663: 39 45 f4 cmp %eax,-0xc(%ebp)
401666: 7e 3d jle 4016a5 <__fu0___ZSt4cout+0x27>
401668: 8d 45 f0 lea -0x10(%ebp),%eax
40166b: 89 04 24 mov %eax,(%esp)
40166e: e8 5d ff ff ff call 4015d0 <__Z8is_primeRKi>
401673: 84 c0 test %al,%al
401675: 74 23 je 40169a <__fu0___ZSt4cout+0x1c>
401677: 8b 45 f0 mov -0x10(%ebp),%eax
40167a: 89 04 24 mov %eax,(%esp)
40167d: b9 .byte 0xb9
I can see that mcount
is called at the start of both. So it appears that the compiled code is instrumented properly, but there's still no output in gprof.
-no-pie
, and I got the expected result, just like in your linked tutorial. So it's hard to say why it doesn't work for you. – Etemgmon.out
properly. – Procopius