Same Program code with same compiler leads to different binaries
Asked Answered
J

2

9

I have an issue with my code that has some very strange symptoms.

  1. The code is compiled on my computer with the following versions:

    a. GCC Version: 4.4.2

    b. CMAKE verson: 2.8.7

    c. QNX (operating system) version: 6.5.0

And the code has a segfault whilst freeing some memory and exiting from a function (not dying on any code, just on the exit from a function).

The weird things about this are:

  1. The code does it in release mode but not debug mode:

    a. The code is threaded so this indicates a race condition.

    b. I cannot debug by putting it in debug mode.

  2. The code when compiled on a workmates machine with the same versions of everything, does not have this problem.

    a. The wierd things about this are that the workmates code works, but also that the binary created from compiling on his machine, which is the same, is about 6mB bigger.

Now annoyingly I cannot post the code because it is too big and also for work. But can anyone point me along a path to fixing this.

Since I am using QNX I am limited for my debug tools, I cannot use Valgrind and since it is not supported in QNX, GDB doesn't really help.

I am looking for anyone who has had a similar/same problem and what the cause was and how they fixed it.

EDIT:

Sooo... I found out what it was, but im still a bit confused about how it happened.

The culprit code was this:

Eigen::VectorXd msBb = data.modelSearcher->getMinimumBoundingBox();

where the definition for getMinimumBoundingBox is this:

Eigen::VectorXd ModelSearcher::getMinimumBoundingBox();

and it returns a VectorXd which is always initialised as VectorXd output(6, 1). So I immediately thought, right it must be because the VectorXd is not being initialised, but changing it to this:

Eigen::VectorXd msBb(6, 1); msBb = data.modelSearcher->getMinimumBoundingBox();

But this didn't work. In fact I had to fix it by changing the definition of the function to this:

void ModelSearcher::getMinimumBoundingBox(Eigen::MatrixXd& input);

and the call to this

Eigen::VectorXd msBb(6, 1); data.modelSearcher->getMinimumBoundingBox(msBb);

So now the new question:

What the hell? Why didn't the first change work but the second did, why do I have to pass by reference? Oh and the big question, how the hell didn't this break when my co-worker compiled it and I ran it? Its a straight out memory error, surely it shouldn't depend on which computer compiles it, especially since the compiler and all the other important things are the same!!??

Thanks for your help guys.

Juvenilia answered 16/11, 2012 at 1:21 Comment(12)
Sounds like a memory issue - valgrind in debug mode is still worth a try - it will help decide if it is memory or not.Michaeline
"1.The code does it in release mode but not debug mode:" - that is a not uncommon symptom of a memory issue (due to footprint differences)Highsounding
@Michaeline Unfortunately I cant use valgrind because I am running on QNX which has no support. I can try to take the function out and into linux (ubuntu or fedora) but that will take ages so I want to try other things first.Juvenilia
Look for an uninitialized pointer. In debug mode generally variables get zeroed out. Since it is happening with multi-threading it may be a pointer that normally gets set, but due to a race condition is left with its initial value and thus when it is freed/deleted causes a segfault.Zigmund
Typically optimizations and symbols are different compiler flags. Have you tried compiling with optimizations on in debug? Have you used a diff program on the two binaries? Have you tried moving the binary between systems? Have you checked what is different about the two systems? Is it consistent?Runic
@Yakk The systems are essentially the same, i dont think diff on a binary file will be very useful? The binary from my workmates computer works on all the computers we have tried, the one from mine works on no computers that we have tried. Im pretty sure all of the compile flags are set the same, but i will check.Juvenilia
Shouldnt you guys add these as answers, they are all good advice and I would happily upvote them. Since the answer is pretty broad and I cant provide code, these would pass as answers and not just comments.Juvenilia
Your workmate's machine might have different libraries. Have you compareed your tool installation with his?Tarrel
I have not, how do i do that?Juvenilia
You could start with a tree compare (for example. WinMerge winmerge.org). For binaries, it will show files that are different (but won't actually do a diff, which is fine). If you can't access his machine over the network, just zip up everything on one machine and unzip it on the other.Tarrel
For the first fix, you are initializing something and then immediately replacing its value via assignment, so it's not surprising that this had no effect. (I don't have enough information to answer the rest of your question.)Tarrel
Sounds like you didn't follow the rule of three, and that copies are breaking the object.Janey
R
8

... the binary created from compiling on his machine, which is the same, is about 6mB bigger

It's worth figuring out what the difference is (even if it's just the case that his build hides, while yours exposes, a real bug):

  • double-check you're compiling exactly the same code (no un-committed local changes, no extra headers in the include search path, etc.)
    • triple-check by adding a -E switch to your gcc arguments in cmake, so it will pre-process your files with the same include path as regular compilation; diff the pre-processor output
  • compare output from nm or objdump or whatever you have to for your two linked executables: if some system or 3rd-party library is a different version on one box, it may show up here
  • compare output from ldd if it's dynamically linked, make sure they're both getting the same library versions
    • compare the library versions it actually gets at runtime too, if possible. Hopefully you can do one of: run pldd, compare the .so entries in /proc/pid/map, run the process under strace/dtrace/truss and compare the runtime linker activity

As for the code ... if this doesn't work:

Eigen::VectorXd ModelSearcher::getMinimumBoundingBox();
// ...
Eigen::VectorXd msBb(6, 1); msBb = data.modelSearcher->getMinimumBoundingBox();

and this does:

void ModelSearcher::getMinimumBoundingBox(Eigen::MatrixXd& input);
// ...
Eigen::VectorXd msBb(6, 1); data.modelSearcher->getMinimumBoundingBox(msBb);

you presumably have a problem with the assignment operator. If it does a shallow copy and there is dynamically-allocated memory in the vector, you'll end up with two vectors holding the same pointer, and they'll both free/delete it.

Note that if the operator isn't defined at all, the default is to do this shallow copy.

Rottenstone answered 20/11, 2012 at 10:11 Comment(1)
Great answer, The second part explains the problem with the memory error. I will look into the first part and see what i find.Juvenilia
M
0

You said you have to change from:

void ModelSearcher::getMinimumBoundingBox(Eigen::MatrixXd& input);

What was it before?

If it was:

void ModelSearcher::getMinimumBoundingBox(Eigen::MatrixXd input);

and the copy constructors / assignment operators weren't implemented properly it might have caused the problem.

Please do check how they are both implemented. Here's some info that might help.

Mckinney answered 20/11, 2012 at 9:7 Comment(3)
It says in the question what it was before. This is not the problem. Thanks for the help though.Juvenilia
It must be. You did not follow the rule of three.Janey
@LightnessRacesinOrbit Nope it wasn't, that was not the input before.Juvenilia

© 2022 - 2024 — McMap. All rights reserved.