I actually did a study on this a few years ago. The answer depends on what exactly your question is:
In today's processors, power consumption is not much determined by the type of instruction (scalar vs. SIMD), but rather everything else such as:
- Memory/cache
- Instruction decoding
- OOE, register file
- And lots others.
So if the question is:
All other things being equal: Does a SIMD instruction consume more power than a scalar instruction.
For this, I dare to say yes.
One of my graduate school projects eventually became this answer: A side-by-side comparison of SSE2 (2-way SIMD) and AVX (4-way SIMD) did in fact show that AVX had a noticably higher power consumption and higher processor temperatures. (I don't remember the exact numbers though.)
This is because the code is identical between the SSE and the AVX. Only the width of the instruction was different. And the AVX version did double the work.
But if the question is:
Will vectorizing my code to use SIMD consume more power than a scalar implementation.
There's numerous factors involved here so I'll avoid a direct answer:
Factors that reduce power consumption:
We need to remember that the point of SIMD is to improve performance. And if you can improve performance, your app will take less time to run thus saving you power.
Depending on the application and the implementation, SIMD will reduce the number instructions that are needed to do a certain task. That's because you're doing several operations per instruction.
Factors that increase power consumption:
- As mentioned earlier, SIMD instructions do more work and can use more power than scalar equivalents.
- Use of SIMD introduces overhead not present in scalar code (such as shuffle and permute instructions). These also need to go through the instruction execution pipeline.
Breaking it down:
- Fewer instructions -> less overhead for issuing and executing them -> less power
- Faster code -> run less time -> less power
- SIMD takes more power to execute -> more power
So SIMD saves you power by making your app take less time. But while its running, it consumes more power per unit time. Who wins depends on the situation.
From my experience, for applications that get a worthwhile speedup from SIMD (or anything other method), the former usually wins and the power consumption goes down.
That's because run-time tends to be the dominant factor in power consumption for modern PCs (laptops, desktops, servers). The reason being that most of the power consumption is not in the CPU, but rather in everything else: motherboard, ram, hard drives, monitors, idle video cards, etc... most of which have a relatively fixed power draw.
For my computer, just keeping it on (idle) already draws more than half of what it can draw under an all-core SIMD load such as prime95 or Linpack. So if I can make an app 2x faster by means of SIMD/parallelization, I've almost certainly saved power.