GCC .obj file output is not deterministic (.debug_info, PROGBITS section)
Asked Answered
P

1

5

My compile command is
C:\work\PROJ-test\QNX_SDK\host\win32\x86/usr/bin/qcc -c -Wc,-frandom-seed="sadfsasafssadsa" -Wc,-MP,-MT,C:/work/PROJ-test/N_Manag/src/bld/N_Manag//armle-v7/release/nav_event_rcv.cpp.o,-MMD,C:/work/PROJ-test/N_Manag/src/bld/N_Manag//armle-v7/release/nav_event_rcv.cpp.d -Vgcc_ntoarmv7le -w9 -shared -O3 -ggdb3 -DBUILD_VERSION= -DPASLOGOPTIONS=0x02 -DPASLOGAPPZONES=31,23,30,9,8,3 -DNS1_5PORT -DBOARD_TYPE=PRODUCTION C:/work/PROJ-test/N_Manag/src/nav_event_rcv.cpp -o C:/work/PROJ-test/N_Manag/src/bld/N_Manag//armle-v7/release/nav_event_rcv.cpp.o

When I run this command twice in a row, the two .obj files are different and not just a few bytes from a timestamp.

We're switching build systems so we want our builds to be binary compatible. The vast majority of my object files are binary identical. A few that use the __DATE__ and __TIME__ macros are different by a few bytes but this one is wildly different!

I used an elf-dump utility and found the section that is wildly different between two compiles is this

  [544] .debug_info
       PROGBITS        00000000 047d70 1021ed 00   0   0  1
       [00000000]: 

But I don't know what PROGBITS contains and why it contains different items for consective compiles. This site just states that PROGBITS is an attribute but not what it indicates (and why it'd be different for consecutive compiles).

QUESTION

How do I make the generation of the .obj binary deterministic ?

THOUGHTS

Somehow, the code being compiled is actually modifying the .debug_info section of the .obj. This .cpp uses a bunch of boost libraries; is it possible that's the cause?

UPDATE

I looked at the assembly files being generated and they are different. Makes sense that the resulting .objs would be different.
Still doesn't make sense why this is happening.

UPDATE The qcc command above is not the actual compiler command executed: qcc is a compiler "redirector" in that it will call the one that matches the -V argument. The "real" compiler call is this:

C:/work/Proj/QNX_SDK/host/win32/x86/usr/lib/gcc/arm-unknown-nto-qnx6.5.0eabi/4.4.2/cc1plus -Wall -O3 -ggdb3 -DBUILD_VERSION= -DPASLOGOPTIONS=0x02 -DPASLOGAPPZONES=31,23,30,9,8,3 -DNS1_5PORT -DBOARD_TYPE=PRODUCTION -quiet -fno-builtin -fpic -march=armv7-a -mfloat-abi=softfp -mfpu=vfpv3-d16 -mlittle-endian -nostdinc -nostdinc++ -D__cplusplus -D__QNX__ -D__QNXNTO__ -D__GNUC__=4 -D__GNUC_MINOR__=4 -D__GNUC_PATCHLEVEL__=2 -D__NO_INLINE__ -D__DEPRECATED -D__EXCEPTIONS -D__unix__ -D__unix -D__ELF__ -fpic -DPIC=1 -D__ARM__ -D__arm__ -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=softfp -D__LITTLEENDIAN__ -D__ARMEL__ -U__ARMEB__ -frandom-seed=sadfsasafssadsa -MP -MT C:/work/Proj/N_Manag/src/bld/N_Manag//armle-v7/release/nav_event_rcv.cpp.o -MMD C:/work/Proj/N_Manag/src/bld/N_Manag//armle-v7/release/nav_event_rcv.cpp.d -isystem C:/work/Proj/QNX_SDK/target/qnx6/usr/include -isystem C:/work/Proj/QNX_SDK/host/win32/x86/usr/lib/gcc/arm-unknown-nto-qnx6.5.0eabi/4.4.2/include -isystem C:/work/Proj/QNX_SDK/target/qnx6/usr/include/cpp/c -isystem C:/work/Proj/QNX_SDK/target/qnx6/usr/include/cpp C:/work/Proj/N_Manag/src/nav_event_rcv.cpp -dumpbase C:/work/Proj/N_Manag/src/nav_event_rcv.cpp -o C:\work\Proj\nav_event_rcv.s

UPDATE

I think it'd be worthwhile to look at the .s assembly output since there are major differences there.

Remember, I'm using -frandom-seed.

The .s file is 1.05mil lines and it's at line ~900k that the differences start.

Left:

.LASF17345:
.ascii "_ZN5boost6detail7variant21make_initializer_node5app"
.ascii "lyINS_3mpl4pairINS3_INS5_INS3_INS5_INS3_INS5_INS3_I"
.ascii "NS5_INS3_INS5_INS3_INS5_INS3_INS5_INS3_INS5_INS3_IN"
.ascii "S5_INS3_INS5_INS3_INS5_INS3_INS5_INS3_INS5_INS3_INS"
.ascii "5_INS3_INS5_INS3_INS5_INS3_INS5_INS3_INS5_INS1_16in"
.ascii "itializer_rootEN4mpl_4int_ILi0EEEEENS4_6l_iterINS4_"
...

Right:

.LASF17764:
.ascii "_ZNKSt8numpunctIcE13decimal_pointEv\000"
.LASF10304:
.ascii "cAlpha0\000"
.LASF10222:
.ascii "usWeek\000"
.LASF14117:
.ascii "_ZN5boost10shared_ptrI27TnRespTravelEstimationEvent"
.ascii "EaSERKS2_\000"
...

It goes on for several hundred bytes.

Now that I examine my beyond compare closely, all the difference sections are due to boost::detail::variant::make_initializer_node. Does that boost function generate different code each time?

RESOLUTION

Turns out it's a gcc bug. I compiled my .cpp with all permutations of -O<X> -ggdb<Y> and for Y>=2, the assembly files .s and the objects .obj are non-deterministic.

I found a gcc bug that describes this issue.


I had to delete the other post for . . . reasons.

Pearlene answered 27/1, 2017 at 19:42 Comment(14)
PROGBITS is the attribute, the section name is .debug_info.Julianjuliana
@Julianjuliana why is the generation of .debug_info non-deterministic?Pearlene
You can use readelf's --debug-dump=info to dump contents of .debuginfo in textual form and hopefully get better understanding of the differences. Or --dwarf=info for objdump.Overpass
@Overpass I have the assembly file(s). Isn't that better than elf-dump?Pearlene
@Overpass ok did it. The differences are in the second hex number as in: <7d162> DW_AT_name : (indirect string, offset: 0x373c94): operator=Pearlene
"Isn't that better than elf-dump" - probably not. Debug info in asm files is encoded in unreadable hex numbers (at least that's how it's done on Linux).Overpass
This post does not explain the reasons for differences but suggests that -frandom-seed often helps. Usually differences in debuginfo are plain bugs (e.g. PR65015) which need to be fixed.Overpass
Any resolution/answer does not belong in the question. Please edit your question not to include the answer, and then feel free to answer your own question.Leucocyte
@KubaOber i need to delete the question but it won't let me.Pearlene
That's because you are not supposed to delete the question. Please edit the question to cut the answer. Then answer your own question and paste the resolution there. And there's no such thing as "deleting the post for reasons". You had a real question, there's a real answer, it belongs here. There's nothing wrong with the question itself.Leucocyte
@KubaOber I need to delete it for legal reasons.Pearlene
Contact support about that. Comments aren't a place for it. Also, internet never forgets.Leucocyte
@Adrian Well, afaik, according to Terms of Service (ToS) you have already irrevocably licensed your content under CC BY-SA 3.0 so I don't think it matters much if you attempt to delete the question or the edit history. If you have illegaly posted stuff here, you're most likely in violation of the ToS.Bachelorism
@Bachelorism may not matter to you.Pearlene
B
7

Causes for non-determinism

The usual culprits are the macros __DATE__, __TIME__, __TIMESTAMP__ which the compiler expands to values calculated from the system time.

One possibility is that the debug info generated for the binary is written in a non-deterministic manner. This could happen, for example, when the in-memory layout of the debug info in the compiler process is not deterministic. I don't know the internals of GCC. But I guess something like this can happen when

The latter source of non-determinism is usually considered to be a bug in the compiler (e.g. GCC PR65015)

Mitigation

To force reproducible expansions of the __DATE__, __TIME__ and __TIMESTAMP__ macros, one has to emulate and fake the system time (e.g. by using libfaketime/faketime) to the compiler. The -Wdate-time command-line option to GCC can be used to warn whenever these predefined macros are used.

To force reproducible "randomness" for GUIDs and mangling, you could try to compile with -frandom-seed=<somestring> where <somestring> is a unique string for your build (e.g. the hash of the contents of the source file you're compiling should do it).

Alternatively you can try to compile without debug information (e.g. without the -ggdb etc flags) or use some strip tool to remove the debug information section later.

See also

Bachelorism answered 30/1, 2017 at 20:29 Comment(6)
This is just guesswork and does not really answer anything.Overpass
Removing the debug info is not an option because, as, @Overpass said, the point is figure out why this is happening. I have hundreds of .obj files and it's only a few that have this issue. I will try using the -frandom-seed suggestion and report back.Pearlene
@Overpass I added the -frandom-seed="obj" and still got different .s filesPearlene
@Adrian Please, let's be clear now. You got different generated assembly files, but did you also get different object files? Your original question was about object files only. Btw, GCC 4.4 is very old, so it most likely contains some bugs which result in the difference.Bachelorism
@Bachelorism yes. See my answer below.Pearlene
@Bachelorism turned out to be this thing that you mentioned gcc.gnu.org/bugzilla/show_bug.cgi?id=65015Pearlene

© 2022 - 2024 — McMap. All rights reserved.