Virtually any "code optimization" done by a compiler, that computes the answer more quickly than the non-optimized code, is "energy saving". (As another poster observed, avoiding cache misses is a big win). So the real question is, "what optimizations are explicitly intended to save energy, vs. reduce execution time?" (Note: some "optimizations" reduce code footprint size (by abstracting sequences of code into subroutines, etc.); this may actually cost more energy).
An unusual one, that I have not seen in any compiler, is changing the representation of the data. It turns out that the cost of storing/transmitting a zero bit, is different than the cost of storing a one bit. (My experience with TTL and CMOS is "zero" are are more expensive, because they are implemented in hardware as a kind of "active pull-down" through a resistor from the powersupply, causing current flow thus heat, whereas "ones" are implemented by letting a signal "float high" through the same pull down). If there is a bias, then one should implement the program code and data to maximize the number of one bits, rather than zero bits.
For data, this should be relatively straightforward to do. See this paper for a very nice survey and analysis of value found in memory; it contains some pretty wonderful charts. A common theme is A large number of memory locations are occupied by members of a small set of distinct values. In fact, only a very small number of values (up to 8) occupy up to 48% of memory locations, often being very small numbers (the papers shows for some programs that a significant fraction of the data transfers are for small values, e.g., 0 to 4, with zero being essentially the most common value). If zeros are truly more expensive to store/transfer than ones, small common values suggest storing values in their ones complement format. This is a pretty easy optimization to implement. Given that the values are not always the smallest N naturals, one could replace the Nth most frequent value in memory with N and store the complement of N, doing a lookup of the actual value closer to the processor. (The paper's author suggests a hardware "value reuse" cache, but that's not a compiler optimization).
This is a bit hard to organize for program code, since the instruction set determines what you can say, and usually the instruction set was designed independently of any energy measurements. Yet one could choose different instruction sequences (that's what optimizers do) and maximized for one bits in the instruction stream. I doubt this is very effective on conventional instruction set opcodes. Once certainly could place variables into locations whose address has large numbers of one bits, and prefer use registers with higher numbers rather than lower ones (on the x86, EAX is binary-register-number 000 and EDI is register number 111) One could go so far as to design an instruction set according to instruction execution frequencies, assigning opcode with larger numbers of one bits to frequently executed instructions.