double or float, which is faster? [duplicate]
Asked Answered
N

8

69

I am reading "accelerated C++". I found one sentence which states "sometimes double is faster in execution than float in C++". After reading sentence I got confused about float and double working. Please explain this point to me.

Neddie answered 3/1, 2011 at 13:2 Comment(7)
Almost the same as: https://mcmap.net/q/95295/-float-vs-double-performanceGreff
@Devendra: That's C#, not C++.Choong
If you are reading "accelerated C++", the last thing you should be worrying about is which type is faster - focus on the concepts and when you have a real problem, then worry about it...Henton
@Hippo: Are you sure that the language make a difference?Greff
The float range is '1.175494351 E – 38' to '3.402823466 E + 38' while double range is '2.2250738585072014 E – 308' to '1.7976931348623158 E + 308'. Subsequently the size and length varies accordingly. It has nothing to do with the language one is using.Greff
That other question is limited to intel CPUs. This is a more general question. So it's not exactly a duplicate.Projectile
Shure, the language makes a big difference.Greybeard
P
84

Depends on what the native hardware does.

  • If the hardware is (or is like) x86 with legacy x87 math, float and double are both extended (for free) to an internal 80-bit format, so both have the same performance (except for cache footprint / memory bandwidth)

  • If the hardware implements both natively, like most modern ISAs (including x86-64 where SSE2 is the default for scalar FP math), then usually most FPU operations are the same speed for both. Double division and sqrt can be slower than float, as well as of course being significantly slower than multiply or add. (Float being smaller can mean fewer cache misses. And with SIMD, twice as many elements per vector for loops that vectorize).

  • If the hardware implements only double, then float will be slower if conversion to/from the native double format isn't free as part of float-load and float-store instructions.

  • If the hardware implements float only, then emulating double with it will cost even more time. In this case, float will be faster.

  • And if the hardware implements neither, and both have to be implemented in software. In this case, both will be slow, but double will be slightly slower (more load and store operations at the least).

The quote you mention is probably referring to the x86 platform, where the first case was given. But this doesn't hold true in general.

Also beware that x * 3.3 + y for float x,y will trigger promotion to double for both variables. This is not the hardware's fault, and you should avoid it by writing 3.3f to let your compiler make efficient asm that actually keeps numbers as floats if that's what you want.

Projectile answered 3/1, 2011 at 13:13 Comment(10)
AFAIK x86 actually has 80bit registers, not floats nor doubles.Unsteady
Additionally, it depends on the amount of data you are processing. With large matrices or arrays, the cache can start to have an effect on the performance.Crossway
@Bart, I've done tests before and basically double tends to win against float, even with large data sets. If you want to be sure you should do a benchmark, but basically float rarely wins on x86.Cottony
Even on x86, it's not quite that simple. The old x87 FPU uses 80-bit registers internally, which means that a conversion is required for both floats and doubles. But if you use SSE/SSE2, the CPU no longer uses 80-bit precision internally, so both floats and doubles are computed at their native precision.Manual
Whether you can actually use the 80 bit extended register depends, among other things, on your OS (Windows specifically makes you jump through some hoops). I recommend forgetting about this aspect, and choosing the data type by other criteria, like: What precision do you actually need? Leave the implementation details of this to compiler and optimiser unless you have a really good reason to hand-optimise these things yourself. (The only case I've ever had to was speed-optimised FFT on embedded hardware).Projectile
Addendum regarding 80 bit support: "Intel started discouraging the use of x87 with the introduction of the P4 in late 2000. AMD deprecated x87 since the K8 in 2003, as x86-64 is defined with SSE2 support; VIA’s C7 has supported SSE2 since 2005. In 64-bit versions of Windows, x87 is deprecated for user-mode, and prohibited entirely in kernel-mode." quoted from realworldtech.com/physx87/4Projectile
What about in terms of modern intel CPUs?Deloris
@Petah: If you need 80-bit floating-point, x86-64 still has hardware support for x87, even when running in 64-bit mode. There was misinformation going around for a while that 64-bit Windows didn't support MMX or x87 in 64-bit processes, but that is not the case. (And of course Linux / OS X have no problem either). MSVC might not help you create x87 code (they define long double as a 64-bit type), but if you need more than 64-bit precision but don't need 128-bit double-double, 80-bit long double is by far the fastest option.Tomfool
@foo: x86-64 has SIMD float (SSE) and double (SSE2). SSE2 is baseline for x86-64. Modern x86 CPUs have SIMD with the same performance per vector for float or double add/mul/FMA (thus twice the FLOPS for float because of twice the elements per vector). Mysticial has a detailed answer on How do I achieve the theoretical maximum of 4 FLOPs per cycle?. double division / sqrt is slower than float Floating point division vs floating point multiplicationTomfool
@YakovGalka: I finally got around to fixing this answer (which was completely wrong for x86 in both reason and conclusion, and omitted the other common case of HW that natively supports both). I (or someone else) should have done that years ago instead of just commenting! (Vote totals before my edit: 66 up / 5 down, including mine which I'm about to remove :/)Tomfool
G
38

You can find a complete answer in this article:

What Every Computer Scientist Should Know About Floating-Point Arithmetic

This is a quote from a previous Stack Overflow thread, about how float and double variables affect memory bandwidth:

If a double requires more storage than a float, then it will take longer to read the data. That's the naive answer. On a modern IA32, it all depends on where the data is coming from. If it's in L1 cache, the load is negligible provided the data comes from a single cache line. If it spans more than one cache line there's a small overhead. If it's from L2, it takes a while longer, if it's in RAM then it's longer still and finally, if it's on disk it's a huge time. So the choice of float or double is less imporant than the way the data is used. If you want to do a small calculation on lots of sequential data, a small data type is preferable. Doing a lot of computation on a small data set would allow you to use bigger data types with any significant effect. If you're accessing the data very randomly, then the choice of data size is unimportant - data is loaded in pages / cache lines. So even if you only want a byte from RAM, you could get 32 bytes transfered (this is very dependant on the architecture of the system). On top of all of this, the CPU/FPU could be super-scalar (aka pipelined). So, even though a load may take several cycles, the CPU/FPU could be busy doing something else (a multiply for instance) that hides the load time to a degree

Gabby answered 3/1, 2011 at 13:6 Comment(0)
V
21

Short answer is: it depends.

CPU with x87 will crunch floats and doubles equally fast. Vectorized code will run faster with floats, because SSE can crunch 4 floats or 2 doubles in one pass.

Another thing to consider is memory speed. Depending on your algorithm, your CPU could be idling a lot while waiting for the data. Memory intensive code will benefit from using floats, but ALU limited code won't (unless it is vectorized).

Vaisya answered 3/1, 2011 at 14:23 Comment(0)
V
8

I can think of two basic cases when doubles are faster than floats:

  1. Your hardware supports double operations but not float operations, so floats will be emulated by software and therefore be slower.

  2. You really need the precision of doubles. Now, if you use floats anyway you will have to use two floats to reach similar precision to double. The emulation of a true double with floats will be slower than using floats in the first place.

    1. You do not necessarily need doubles but your numeric algorithm converges faster due to the enhanced precision of doubles. Also, doubles might offer enough precision to use a faster but numerically less stable algorithm at all.

For completeness' sake I also give some reasons for the opposite case of floats being faster. You can see for yourself whichs reasons dominate in your case:

  1. Floats are faster than doubles when you don't need double's precision and you are memory-bandwidth bound and your hardware doesn't carry a penalty on floats.

  2. They conserve memory-bandwidth because they occupy half the space per number.

  3. There are also platforms that can process more floats than doubles in parallel.

Vicegerent answered 3/1, 2011 at 13:6 Comment(1)
Because I am repeatedly getting uncommented downvotes I decided to amend my answer. The new stuff is in the first part of the answer.Vicegerent
N
8

On Intel, the coprocessor (nowadays integrated) will handle both equally fast, but as some others have noted, doubles result in higher memory bandwidth which can cause bottlenecks. If you're using scalar SSE instructions (default for most compilers on 64-bit), the same applies. So generally, unless you're working on a large set of data, it doesn't matter much.

However, parallel SSE instructions will allow four floats to be handled in one instruction, but only two doubles, so here float can be significantly faster.

Nesmith answered 3/1, 2011 at 13:19 Comment(0)
R
6

In experiments of adding 3.3 for 2000000000 times, results are:

Summation time in s: 2.82 summed value: 6.71089e+07 // float
Summation time in s: 2.78585 summed value: 6.6e+09 // double
Summation time in s: 2.76812 summed value: 6.6e+09 // long double

So double is faster and default in C and C++. It's more portable and the default across all C and C++ library functions. Alos double has significantly higher precision than float.

Even Stroustrup recommends double over float:

"The exact meaning of single-, double-, and extended-precision is implementation-defined. Choosing the right precision for a problem where the choice matters requires significant understanding of floating-point computation. If you don't have that understanding, get advice, take the time to learn, or use double and hope for the best."

Perhaps the only case where you should use float instead of double is on 64bit hardware with a modern gcc. Because float is smaller; double is 8 bytes and float is 4 bytes.

Roughdry answered 18/3, 2012 at 18:41 Comment(6)
well let's hope for the best thenCephalad
Double had an higher accuracy that float and uses more memory double 8 bytes and float 4 bytes. The fastest is floats through the memory writing. I don't know how your test looks like, but timing is noisy.Excide
This experiment would need to be run on all of our machines, since the other answers show that it's hardware-dependent.Frenchpolish
Also, you should consider using a hypothesis test to demonstrate that the differences are statistically significant. (Will this ordering always be the same? Will you get the same timings if you re-ran it?)Frenchpolish
I suspect your CPU wasn't up to full clock speed for the float test, probably because you didn't include any "warm up" in your benchmark. Idiomatic way of performance evaluation?. float should be the same speed as double on a normal C++ implementation on x86 or ARM or whatever. Unless you do it wrong and do the float version in a way that has to convert to double and back because in C++ 3.3 is a double constant, unlike 3.3f. But if that was the case, you'd expect a bigger slowdown.Tomfool
Not that you shouldn't use double, just that being faster is not a reason. (Unless you misuse C++ and make the compiler convert to double and back by writing things like x * 3.3 + y.)Tomfool
T
1

float is usually faster. double offers greater precision. However performance may vary in some cases if special processor extensions such as 3dNow or SSE are used.

Tiffanytiffi answered 3/1, 2011 at 13:8 Comment(1)
If you can use SIMD (like SSE or whatever), float is definitely going to be even more faster: more work per instruction, as well as smaller cache footprint / lower memory bandwidth.Tomfool
E
1

There is only one reason 32-bit floats can be slower than 64-bit doubles (or 80-bit 80x87). And that is alignment. Other than that, floats take less memory, generally meaning faster access, better cache performance. It also takes fewer cycles to process 32-bit instructions. And even when (co)-processor has no 32-bit instructions, it can perform them on 64-bit registers with the same speed. It probably possible to create a test case where doubles will be faster than floats, and v.v., but my measurements of real statistics algos didn't show noticeable difference.

Enameling answered 3/1, 2011 at 16:24 Comment(2)
You seem to assume memory access would cost no time. But from my experience (and the data sheets of all hardware I've seen), it does.Projectile
80-bit x87 loads/stores are significantly slower than 32-bit or 64-bit float/double fld / fstp. e.g. Skylake fstp tbyte is 7 uops, with throughput of 1 per 5 cycles vs. 1 uop and 1 per clock for normal float/double stores. See this answer for moreTomfool

© 2022 - 2024 — McMap. All rights reserved.