Why does this calculation give different result in boost::thread and std::thread?
Asked Answered
M

4

14

When this floating point calculation is executed in boost::thread, it gives different result than when executed in std::thread or in main thread.

void print_number()
{
    double a = 5.66;
    double b = 0.0000001;
    double c = 500.4444;
    double d = 0.13423;
    double v = std::sin(d) * std::exp(0.4 * a + b) / std::pow(c, 2.3);

    printf("%llX\n%0.25f\n", *reinterpret_cast<unsigned long long*>(&v), v);
}

This seems to happen because boost::thread is by default using 53-bit internal precision for floating point math, while the main thread is using 64-bit precision. If status of FPU unit is reset with _fpreset() after the boost::thread has been created, the result is the same as in the main thread.

I am using Embarcadero C++ Builder 10.1 (compiler bcc32c version 3.3.1) and Boost 1.55.0. My environment is Windows 7, and I am building for 32-bit Windows target.

Working example:

#include <tchar.h>
#include <thread>
#include <boost/thread.hpp>
#include <cstdio>
#include <cmath>
#include <cfloat>

namespace boost { void tss_cleanup_implemented() {} }

void print_number()
{
    double a = 5.66;
    double b = 0.0000001;
    double c = 500.4444;
    double d = 0.13423;
    double v = std::sin(d) * std::exp(0.4 * a + b) / std::pow(c, 2.3);

    // Edit:
    // Avoiding the undefined behaviour by a reinterpret_cast, as
    // mentioned in some answers and comments.
    unsigned long long x;
    memcpy(&x, &v, sizeof(x));

    printf("%llX\n%0.25f\n", x, v);
}

void print_number_2()
{
    // Reset FPU precision to default
    _fpreset();
    print_number();
}

int _tmain(int argc, _TCHAR* argv[])
{
    print_number();

    std::thread t1(&print_number);
    t1.join();

    boost::thread t2(&print_number);
    t2.join();

    boost::thread t3(&print_number_2);
    t3.join();

    getchar();
    return 0;
}

Output:

3EAABB3194A6E99A
0.0000007966525939409087744
3EAABB3194A6E99A
0.0000007966525939409087744
3EAABB3194A6E999
0.0000007966525939409087488
3EAABB3194A6E99A
0.0000007966525939409087744

Question:

  • Why does this happen? Isn't a new thread supposed to inherit floating point environment from the parent thread?
  • Is this a bug in the compiler or in Boost, or are my expectations wrong?
Meuser answered 14/7, 2016 at 10:17 Comment(7)
Completely unrelated, but I gotta say, your question presentation is absolutely stellar.Pingpingpong
Just out of curiosity, which Operating System did you use?Toronto
@Toronto I am using Windows 7.Meuser
I can confirm this, using C++Builder 10.1 Berlin, 32 bit target, on Windows 7. Obviously the _fpreset() makes the difference. I assume that boost::thread doesn't do it, and std::thread does.Farman
FWIW, to make it compile I had to add a line with #define BOOST_THREAD_USE_LIB before #include <boost/thread.hpp>. I assume this was defined externally?Farman
@Ville-ValtteriTiittanen: the difference was indeed the fact that Boost doesn't do a _fpreset(), which is a C++Builder specific thing, and the std library was obviously modified to make that call. See my answer.Farman
FWIW, I edited your print_number() function to use memcpy() to stop the complaints about undefined behaviour, since that was irrelevant to the question.Farman
F
2

The difference seems to be the fact that the std::thread implementation does an _fpreset(), while boost::thread obviously doesn't. If you change the line

namespace boost { void tss_cleanup_implemented() { } }

to (formatted a little for clarity):

namespace boost 
{ 
    void tss_cleanup_implemented() 
    { 
        _fpreset(); 
    }
}

You will see that all values are exactly the same now (3EAABB3194A6E99A). That tells me that Boost doesn't do an _fpreset(). This call is necessary because some Windows API calls mess up the standard FPU settings C++Builder (32 bit) uses and don't set them back to what they were (this is a problem you can encounter in Delphi as well).

both std::thread and boost:thread use Win32 API calls to handle threads.

Something tells me that you expected this already, hence the test with print_number_2() which does an _fpreset().

Farman answered 15/7, 2016 at 19:38 Comment(6)
Is this still an issue when using a 64-bit version of Windows? I would only hope that they stop using the x87 FP stack.Sideboard
I didn't try this on 64 bit yet, but I would guess it isn't. I'll check.Farman
@Tim: no, it is not an issue. The same value in each test, but now it is 3EAABB3194A6E998! I guess because for Win64, the C++Builder compiler uses SSE, and that does not have 80 bit intermediates.Farman
This answer correctly identifies the problem, so I marked it as accepted. However, tss_cleanup_implemented() seems to be wrong place to add this. It is only called for the first boost::thread, which can be seen when a breakpoint is added in debugger. If you add more boost threads, which you start like t2, they will still give incorrect results.Meuser
I know about tss_cleanup_implemented() and why it exists, but for this sample project, it was the easiest way to insert _fpreset(). In real life, it should be in the Boost sources, with a conditional for the Embarcadero compiler.Farman
I see. I have made a bug report to Boost: svn.boost.org/trac/boost/ticket/12330Meuser
F
5

This: *reinterpret_cast<unsigned long long*>(&v) is undefined behaviour as v is not unsigned_long_long. If you want to copy the binary representation of a double to an integral type, use memcpy(). Note that, even with memcpy(), it's implementation defined how the binary representation will look like, but you're guaranteed that you can 'load back what you've saved'. Nothing more AFAIK.

Flesh answered 14/7, 2016 at 13:29 Comment(10)
And why are the values different when the double value is printed in decimal form? In this example I used the internal representation just to show that this is not a bug in printf.Meuser
@Ville-ValtteriTiittanen This answer is only pointing out a bug in your code, it doesn't mean that this bug is the source of your problem. Fixing the bug is a prerequisite for any further investigation, even if it appears not to "change" anything.Memnon
I this answer is only pointing out a bug in the code, then it should have been a comment. The bug is not responsible for the difference in the results (in 32 bit C++Builder, an unsigned long long is the same size as a double) , so this does not answer the question.Farman
@RudyVelthuis: being the same size doesn't mean it's not UB. It's still UB. Also, in the second part I explicitly wrote that you've only got the 'load back what you've saved' warranty, nothing else. In particular, for doubles there's no warranty that your implementation will use the same precision everywhere or that the result of two mathematically equivalent calculations will be the same. That's for Std. C++.Flesh
I agree it is UB. But in this implementation, it does not change the result and it does not explain the difference. I just tried with memcpy() from v to an unsigned long long, and the result is the same. So this UB is not the cause for the difference between the uses of boost::thread and std::thread.So again, I think this should have been a comment, not an answer.Farman
@RudyVelthuis: please read my last two sentences as well (both in the answer and in the previous comment) Thanks.Flesh
@lorro: I did. Did you see how I changed the question to use memcpy() as you suggested and how this did not change anything? I ran the code here, both using reinterpret_cast<>() and memcpy() and I see the same results. The behaviour may have been undefined, but not undetermined and it was completely irrelevant to the problem. The problem is that you see different results in boost::thread and in std::thread. The hex display was merely to demonstrate how much the results differ (1 ulp), and could have been omitted altogether. I still think this should not be an answer, as it isn't.Farman
@RudyVelthuis: 'In particular, for doubles there's no warranty that your implementation will use the same precision everywhere or that the result of two mathematically equivalent calculations will be the same. That's for Std. C++.', post: 'Note that, even with memcpy(), it's implementation defined how the binary representation will look like, but you're guaranteed that you can 'load back what you've saved'. Nothing more AFAIK.'Flesh
This is not about "everywhere", so that doesn't matter. This is a question about the new, Clang-based 32 bit compiler for C++Builder. In Win32, the format of a double is known to be IEEE 754 compliant, a little-endian 64 bit type. The problem probably does not appear in other compilers, so "everywhere" does not count.Farman
Anyway, pointing to UB should have been a comment, not an answer. It certainly does not answer the question of why there is a difference in results, which would also exist if the hex display was not given at all.Farman
B
4

This isn't a difference between 64 and 53 bit precision FPU calculations, it is a difference in ROUNDING. The only difference between the two results is in the least significant bit of the answer. It looks like boost's thread start code is not properly initializing the FPU flags, and the default rounding mode is down or chop, rather than nearest.

If this is the case, then it could be a bug in boost::thread. It could also come around if another library is changing the FPU flags (via _controlfp_s or a similar function), or if the new thread is part of a thread pool, a previous user of the thread changed the flags, and the pool did not reset them before reusing the thread.

Barner answered 14/7, 2016 at 16:13 Comment(1)
Not a previous use of the thread. Win32 API functions tend to use different FPU settings from those in C++Builder (and Delphi). Thread implementations on Win32 use such API functions. The C++Builder STL knows this, and resets the FPU. Boost does not. Hence the differences.Farman
F
2

The difference seems to be the fact that the std::thread implementation does an _fpreset(), while boost::thread obviously doesn't. If you change the line

namespace boost { void tss_cleanup_implemented() { } }

to (formatted a little for clarity):

namespace boost 
{ 
    void tss_cleanup_implemented() 
    { 
        _fpreset(); 
    }
}

You will see that all values are exactly the same now (3EAABB3194A6E99A). That tells me that Boost doesn't do an _fpreset(). This call is necessary because some Windows API calls mess up the standard FPU settings C++Builder (32 bit) uses and don't set them back to what they were (this is a problem you can encounter in Delphi as well).

both std::thread and boost:thread use Win32 API calls to handle threads.

Something tells me that you expected this already, hence the test with print_number_2() which does an _fpreset().

Farman answered 15/7, 2016 at 19:38 Comment(6)
Is this still an issue when using a 64-bit version of Windows? I would only hope that they stop using the x87 FP stack.Sideboard
I didn't try this on 64 bit yet, but I would guess it isn't. I'll check.Farman
@Tim: no, it is not an issue. The same value in each test, but now it is 3EAABB3194A6E998! I guess because for Win64, the C++Builder compiler uses SSE, and that does not have 80 bit intermediates.Farman
This answer correctly identifies the problem, so I marked it as accepted. However, tss_cleanup_implemented() seems to be wrong place to add this. It is only called for the first boost::thread, which can be seen when a breakpoint is added in debugger. If you add more boost threads, which you start like t2, they will still give incorrect results.Meuser
I know about tss_cleanup_implemented() and why it exists, but for this sample project, it was the easiest way to insert _fpreset(). In real life, it should be in the Boost sources, with a conditional for the Embarcadero compiler.Farman
I see. I have made a bug report to Boost: svn.boost.org/trac/boost/ticket/12330Meuser
S
1

To whit, you need a better compiler.


This seems to happen because boost::thread is by default using 53-bit internal precision for floating point math, while the main thread is using 64-bit precision. If status of FPU unit is reset with _fpreset() after the boost::thread has been created, the result is the same as in the main thread.

This is insane. If your compiler is using a different FP unit (i.e., x87 vs SSE) for different regions of code, you should burn that compiler with the biggest fire you can find.

Running this code under g++-6.1 and clang++-3.8 on Linux Mint 17.3, gives identical results for each thread type.

#include <thread>
#include <boost/thread.hpp>
#include <cstdio>
#include <cmath>

void print_number() {
    double a = 5.66;
    double b = 0.0000001;
    double c = 500.4444;
    double d = 0.13423;
    double v = std::sin(d) * std::exp(0.4 * a + b) / std::pow(c, 2.3);

    printf("%llX\n%0.25f\n", *reinterpret_cast<unsigned long long*>(&v), v);
}

int main() {
    print_number();

    std::thread t1(&print_number);
    t1.join();

    boost::thread t2(&print_number);
    t2.join();
}

CXX -std=c++14 -O3 -c test test.c -pthread -lboost_thread -lboost_system

3EAABB3194A6E999
0.0000007966525939409086685

3EAABB3194A6E999
0.0000007966525939409086685

3EAABB3194A6E999
0.0000007966525939409086685


As @lorro noted in his/her answer, you are breaking the aliasing rules in the reinterpret_cast.

Sideboard answered 14/7, 2016 at 14:8 Comment(11)
Re: "To whit [sic], you need a better compiler." Since std::thread works as expected and boost::thread doesn't, it seems much more likely that boost::thread is doing something peculiar than that the compiler is.Repletion
Yet, using two of the three top-tier C++ compilers (the third being MSVC which I don't have access to) provide consistent results across implementations of std::thread.Sideboard
I don't see what your point is. Yes, std::thread works fine everywhere, as far as has been reported. It's boost::thread that doesn't work. There's nothing here that justifies blaming the compiler.Repletion
By switching compilers and using the same boost implementation, I see consistent results between std::thread (at least two different implementations) and boost::thread indicating that it is not the Boost implementation that is at fault.Sideboard
Non sequitur. Even if there's something wrong in what the compiler does (and there's nothing here to indicate that that's the case), it's the responsibility of the library to work around it. Since floating-point settings seem to be wrong for boost:;thread, it's not doing what it's supposed to do.Repletion
I have not tested other compilers, but if this is Boost bug, some things could hide the problem from you: 1) you are using -O3, maybe your computer does the whole calculation compile-time and prints a constant value, 2) there might be some compiler or OS specific code in Boost headers that causes this.Meuser
@Ville-ValtteriTiittanen I can rule out (1) because I looked at the emitted assembly and the cmath routines are being called. This makes sense as they are not constexpr. (2) is an open possibility. For example, I didn't consider if the OP is using a 32-bit environment.Sideboard
@PeteBecker Embarcadero claims to be based on Clang. Yet Clang++-3.8 has no issue building correct code for Boost::Thread on my system. As I noted I don't have access to a Windows machine, so I will hold out judgement on that platform until someone can build it there with clang++-3.8. "Since floating-point settings seem to be wrong for boost:;thread" Conclusion assumed without evidence. If this were true, then my results would have been the same as the OP's (unless Clang has a yet-unknown WIndows-specific bug for Boost::Thread).Sideboard
Go for it, Tim. You have not presented any evidence that the compiler is at fault, just hand-waving arguments that it must be the case because Boost works okay with some other compiler. And, again, even if there's a bug in the Embarcadero compiler, it's the responsibility of the library to work around it. Your made-up explanations for why the compiler is insane simply don't hold water.Repletion
it's the responsibility of the library to work around it That's a strange view.Carbamate
It is not really a bug in Boost, it is an omission of calling _fpreset(), which is specific to C++Builder on Win32, since it uses different defaults for the FPU control word and Win32 API calls tend to change this. The STL for C++Builder was enhanced to call this function, the general Boost code has no knowledge of this.Farman

© 2022 - 2024 — McMap. All rights reserved.