Does anyone know any Compiler which optimizes code for energy consumption for embedded devices?

Asked 20/1, 2011 at 11:17 Answered 17/8, 2014 at 7:39

compiler-construction cpu-architecture energy

It's a general view that a faster code will consume less power because you can put CPU in idle state for more time but when we talk about energy consumption, is following a possibility:

Suppose there's a instruction sequence which gets executed in 1ms and during the execution process the average current consumption was say 40mA . .and your Vdd is 3.3V

so total energy consumed = V*I*t = 3.3 * 40*10^-3 * 1*10^-3 Joules = 13.2*10^-6 Joules

and in another case there's a instruction sequence which gets executed in 2ms and during execution process the average current consumption is 15mA . .and Vdd is 3.3V

so total energey consumed = V*I*t = 3.3 * 15*10^-3 * 2*10^-3 Joules = 9.9*10^-6 Joules

so the question comes to. .. . Is there any architecture which has different instruction sets for performing the same task with different current consumptions?

And if there are ...then is there any compiler which takes this into account and generates a code which is energy efficient?

Com answered 20/1, 2011 at 11:17 Comment(2)

One example is dividing by power of 2 which can be done through right shift. Both division and a right shift take a single instruction on x86, but I'm almost sure right shifts will use simpler circuits that in turn consume less memory. That's the simplest example, but I'm not aware of neither a compiler that optimizes exactly for energy consumption nor any data sheet for any processor that implicitly mentions data consumption for specific instructions. – Papery 20/1, 2011 at 11:27

Btw do you see those upvotes? That's because the question is great. – Papery 20/1, 2011 at 11:28

There is none that I know of, but I think this should be possible using a compiler framework like LLVM, by adapting the instruction scheduler's weighting algorithm.

Edit: there has been a talk about Energy Consumption Analytics in LLVM at FOSDEM.

Topee answered 20/1, 2011 at 13:26 Comment(0)

Virtually any "code optimization" done by a compiler, that computes the answer more quickly than the non-optimized code, is "energy saving". (As another poster observed, avoiding cache misses is a big win). So the real question is, "what optimizations are explicitly intended to save energy, vs. reduce execution time?" (Note: some "optimizations" reduce code footprint size (by abstracting sequences of code into subroutines, etc.); this may actually cost more energy).

An unusual one, that I have not seen in any compiler, is changing the representation of the data. It turns out that the cost of storing/transmitting a zero bit, is different than the cost of storing a one bit. (My experience with TTL and CMOS is "zero" are are more expensive, because they are implemented in hardware as a kind of "active pull-down" through a resistor from the powersupply, causing current flow thus heat, whereas "ones" are implemented by letting a signal "float high" through the same pull down). If there is a bias, then one should implement the program code and data to maximize the number of one bits, rather than zero bits.

For data, this should be relatively straightforward to do. See this paper for a very nice survey and analysis of value found in memory; it contains some pretty wonderful charts. A common theme is A large number of memory locations are occupied by members of a small set of distinct values. In fact, only a very small number of values (up to 8) occupy up to 48% of memory locations, often being very small numbers (the papers shows for some programs that a significant fraction of the data transfers are for small values, e.g., 0 to 4, with zero being essentially the most common value). If zeros are truly more expensive to store/transfer than ones, small common values suggest storing values in their ones complement format. This is a pretty easy optimization to implement. Given that the values are not always the smallest N naturals, one could replace the Nth most frequent value in memory with N and store the complement of N, doing a lookup of the actual value closer to the processor. (The paper's author suggests a hardware "value reuse" cache, but that's not a compiler optimization).

This is a bit hard to organize for program code, since the instruction set determines what you can say, and usually the instruction set was designed independently of any energy measurements. Yet one could choose different instruction sequences (that's what optimizers do) and maximized for one bits in the instruction stream. I doubt this is very effective on conventional instruction set opcodes. Once certainly could place variables into locations whose address has large numbers of one bits, and prefer use registers with higher numbers rather than lower ones (on the x86, EAX is binary-register-number 000 and EDI is register number 111) One could go so far as to design an instruction set according to instruction execution frequencies, assigning opcode with larger numbers of one bits to frequently executed instructions.

Muldoon answered 17/8, 2014 at 7:39 Comment(0)

At the individual instruction level, things like shifting rather than multiplying would certainly lower current and therefore energy consumption, but I'm not sure I buy your example of taking twice as long but using half the current (for a given clockrate). Does replacing a multiply with a shift and add, which doubles the time, really take half the current? There's so much other stuff going on in a CPU (just clock distribution across the chip takes current) that I'd think the background current usage dominates.

Lowering the clock rate is probably the single biggest thing you can do to cut power consumption. And doing as much in parallel as you can is the easiest way to lower the clock rate. For instance, using DMA over explicit interrupts allows algorithmic processing to finish in fewer cycles. If your CPU has weird addressing modes or parallel instructions (I'm looking at you, TMS320) I'd be surprised if you couldn't halve the execution time of tight loops for well under double the current, giving a net energy savings. And on the Blackfin family of CPUs, lowering the clock allows you to lower the core voltage, dramatically decreasing power consumption. I imagine this is true on other embedded processors as well.

After clock rate, I bet that power consumption is dominated by external I/O access. In low power environments, things like cache misses hurt you twice - once in speed, once in going to external memory. So loop unrolling, for instance, might make things quite a bit worse, as would doubling the number of instructions you need for that multiply.

All of which is to say, creative system architecture will probably make much more of a power impact than telling the compiler to favor one set of instructions over another. But I have no numbers to back this up, I'd be very curious to see some.

Howenstein answered 20/1, 2011 at 14:39 Comment(0)

Try "MAGEEC". I do not have first hand experience of the compiler. But the description in the website states that one can generate energy efficient code.

Potentiality answered 17/8, 2014 at 6:56 Comment(0)

Recommended topics

Hot tags