Approximate Number of CPU Cycles for Various Operations
Asked Answered
R

4

12

I am trying to find a reference for approximately how many CPU cycles various operations require.

I don't need exact numbers (as this is going to vary between CPUs) but I'd like something relatively credible that gives ballpark figures that I could cite in discussion with friends.

As an example, we all know that floating point division takes more CPU cycles than say doing a bitshift.

I'd guess that the difference is that the division is around 100 cycles, where as a shift is 1 but I'm looking for something to cite to back that up.

Can anyone recommend such a resource?

Rainband answered 23/4, 2010 at 22:50 Comment(0)
E
4

For x86 processors, see Intel® 64 and IA-32 Architectures Optimization Reference Manual, probably Appendix C.

However, it's not in any way easy to figure out how many cycles an instruction takes to execute on a modern x86 processor, as it depends too much on e.g. accessing data in cache,aligned access, whether branch prediction fails, if there's a stall in the instruction pipeline and quite a lot of other things.

Elyn answered 23/4, 2010 at 23:27 Comment(0)
W
3

I did a small app to test this. A very approximate app using synthmaker free edition... e is for empty, numbers are very approx cycles

  divide|e:115|10
    mult|e: 48|10
     add|e: 48|10
    subs|e: 50|10
compare>|e: 50|10
     sin|e:135:10

The readings in the cycle analyser vary wildly from 50 to 100, usually single or double of the expected amount, these are figures that represent averages,the cycle analyzer is a very rough tool, but it gives fair results, a workaround user made exponent coded in ASM that calculates both the exp and the base at audio rate for example is around 800 cycles, so I'd say the above figures are close to at least 50 percent. I thought the divide was way more! It seems about twice as much. If you want the file I made to run in SM free version mail me, I was going to save an exe that is why i did it but you cant save in free version silly me! I am not going to code it from square one in version 1.17 :/ ant.stewart at the place yahoo dotty com.

Wagner answered 24/4, 2011 at 10:11 Comment(1)
Why is MULT around the same latency as ADD? I heard integer MULT is usually 3 times slower than ADD in modern CPUs. However floating point MULT might have the same speed as floating point ADD.Vivienne
T
2

There is the research made by Agner Fog:

  1. Instruction tables

Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD, and VIA CPUs.

Last updated 2022-11-04

Therianthropic answered 27/4, 2021 at 20:55 Comment(0)
A
1

This is going to be hardware-dependent. The best thing to do is to run some benchmarks on the particular hardware you want to test.

A benchmark would go roughly like this:

  • Run a primitive operation a million times (say, adding two integers)
  • Record the time it took to run (say, in seconds)
  • Multiply by the number of cycles your machine executes per second - this will give you the total number of cycles spent.
  • Divide 1000000 by the number from the previous step - this will give you the number of instructions per cycle. Keep in mind that with pipelining, this could be less than 1.
Asti answered 23/4, 2010 at 23:12 Comment(1)
How can you eliminate the time used to run the 1000000 cycles, and the register allocation used for storing the number of times, branch prediction, etc.?Acree

© 2022 - 2024 — McMap. All rights reserved.