How much should I worry about the Intel C++ compiler emitting suboptimal code for AMD?
Asked Answered
S

7

28

We've always been an Intel shop. All the developers use Intel machines, recommended platform for end users is Intel, and if end users want to run on AMD it's their lookout. Maybe the test department had an AMD machine somewhere to check we didn't ship anything completely broken, but that was about it.

Up until a few of years ago we just used the MSVC compiler and since it doesn't really offer a lot of processor tuning options beyond SSE level, noone worried too much about whether the code might favour one x86 vendor over another. However, more recently we've been using the Intel compiler a lot. Our stuff definitely gets some significant performance benefits from it (on our Intel hardware), and its vectorization capabilities mean less need to go to asm/intrinsics. However people are starting to get a bit nervous about whether the Intel compiler may actually not be doing such a good job for AMD hardware. Certainly if you step into the Intel CRT or IPP libraries you see a lot of cpuid queries to apparently set up jump tables to optimised functions. It seems unlikely Intel go to much trouble to do anything good for AMDs chips though.

Can anyone with any experience in this area comment on whether it's a big deal or not in practice ? (We've yet to actually do any performance testing on AMD ourselves).

Update 2010-01-04: Well the need to support AMD never became concrete enough for me to do any testing myself. There are some interesting reads on the issue here, here and here though.

Update 2010-08-09: It seems the Intel-FTC settlement has something to say about this issue - see "Compilers and Dirty Tricks" section of this article.

Shemeka answered 8/5, 2009 at 12:55 Comment(5)
Is it your businesses prerogative to be CPU-agnostic?Rachealrachel
I have been scouring the internet for the past several minutes looking for any evidence that Intel has actually done something to make code for AMD run slowly... however, I can find nothing but an endless flood of garbage articles from tech bloggers that have no idea what they are talking about. Not one benchmark in sight.Parapet
@Meta: Bear in mind all the real fuss about this was around 2004-2005. See links at the end of agner.org/optimize/blog/read.php?i=49#49 for links to things like groups.google.com/forum/?hl=en#!topic/comp.arch/… which reports a 22% boost to Intel from the "feature".Shemeka
@Shemeka - sure the big fuss was around 2004-2005 because that's when the issue was revealed and litigated in court. However, Intel didn't really back away from their strategy of checking the vendor string (rather than CPU feature detection) - they were just forced to put a disclaimer that their compilers optimize for Intel only, etc, etc - which you can find today as a footer across many of their websites. So as far as I know, even today in modern ICC they still often dispatch on the vendor string and only go down the fast paths for Intel chips.Clavicembalo
When I was deposed at ftc after amd and Intel had settled their lawsuit, the ftc lawyer demanded repeatedly that I agree Intel should never release a compiler since amd would always prevent Intel from testing new CPU s under development at amd, we went past 7pm pst on this. This in spite of the fact that amd had been most successful when judged on performance of applications built in the past before any of the current cpu s were available. If you micro-optimized for your favorite CPU of any brand, your extra work would be down the drain after 2 or 3 years.Dolichocephalic
R
17

Buy an AMD box and run it on that. That seems like the only responsible thing to do, rather than trusting strangers on the internet ;)

Apart from that, I believe part of AMD's lawsuit against Intel is based on the claim that Intel's compiler specifically produces code that runs inefficiently on AMD processors. I don't know whether that's true or not, but AMD seems to believe so.

But even if they don't willfully do that, there's no doubt that Intel's compiler optimizes specifically for Intel processors and nothing else.

When that is said, I doubt it'd make a huge difference. AMD CPU's would still benefit from all the auto-vectorization and other clever features of the compiler.

Revetment answered 8/5, 2009 at 13:10 Comment(0)
P
5

I'm surely stating the obvious, if performance is crucial for your application, then you'd better do some testing - on all combinations of hardware/compiler. There are no guarantees. As outsiders, we can only give you our guesses/biases. Your software may have unique characteristics that are unlike what we've seen.

My experience:

I used to work at Intel, and developed an in-house (C++) application where performance was critical. We tried to use Intel's C++ compiler, and it always under performed gcc - even after doing profile runs, recompiling using the profiled information (which icc supposedly uses to optimize) and re-running on the exact same dataset (this was in 2005-2007, things may be different now). So, based on my experience, you might want to try gcc (in addition to icc and MSVC), it's possible you will get better performance that way and side-step the question. It shouldn't be too hard to switch compilers (if your build process is reasonable).

Now I work at a different company, and the IT folks do extensive hardware testing, and for a while Intel and AMD hardware was relatively comparable, but the latest generation of Intel hardware significantly out-performed the AMD. As a result, I believe they purchased significant amounts of Intel CPUs and recommend the same for our customers who run our software.

But, back to the question as to whether the Intel compiler specifically targets AMD hardware to run slowly. I doubt Intel bothers with that. It could be that certain optimizations that use knowledge about the internals of Intel CPU architecture or chipsets could run slower on AMD hardware, but I doubt they specifically target AMD hardware.

Prerogative answered 8/5, 2009 at 17:8 Comment(1)
These days, clang does some things better than gcc. (But not everything). I definitely agree with the recommendation to try your whole codebase on each of the major compilers.Nu
F
5

What we have seen is that wherever the Intel compiler must make a runtime choice about the available instruction set, if it does not recognize an Intel CPU, it goes in their "standard" code (which, as you might expect, may not be optimal).

Note that even if I used the word "compiler" above, this mainly happens in their supplied (pre-compiled) libraries and intrinsics that check the instruction set and call the best code.

Fidgety answered 5/1, 2010 at 17:2 Comment(0)
U
2

Sorry if you hit my general button.

This is on the subject of low-level optimization, so it only matters for code that 1) the program counter spends much time in, and 2) the compiler actually sees. For example, if the PC spends most of its time in library routines that you don't compile, it shouldn't matter very much.

Whether or not conditions 1 & 2 are met, here's my experience of how optimization goes:

Several iterations of sampling and fixing are done. In each of these, a problem is identified and most often it is not about where the program counter is. Rather it is that there are function calls at mid-levels of the call stack that, since performance is paramount, could be replaced. To find them quickly, I do this.

Keep in mind that if there is a function call instruction that is on the stack for a significant fraction of execution time, whether in a few long invocations, or a great many short ones, that call is responsible for that fraction of time, so removing it or executing it less often can save a lot of time. And, that savings far exceeds any low-level optimization.

The program can now be many times faster than it was to begin with. I've never seen any good-sized program, no matter how carefully written, that could not benefit from this process. If the process has not been done, it should not be assumed that low-level optimization is the only way to speed up the program.

After this process has been done to the point where it simply can't be done any more, and if samples show that the PC is in code that the compiler sees, then the low-level optimization can make a difference.

Unsound answered 11/5, 2009 at 11:51 Comment(2)
I don't see that this is applicable, though. The question is how code compiled with the Intel compiler works on AMD processors, not how to optimize by hand.Shalom
@David: timday's question was "how much should he worry", so that is what I was trying to answer. Yes for hotspot-type code he should worry. (agner.org/optimize/blog/read.php?i=49) I was trying to convey that hotspot-type code is rarer than one might think.Unsound
D
2

At the time this thread was started, Microsoft C++ defaulted to code generation which was good in some cases for AMD and bad for Intel. Their more recent compilers default to the blend option which is good for both, particularly after both brands of CPUs had worked out their peculiar performance bugs. When I first worked at Intel, their compilers reserved some optimizations for Intel-specific architecture settings. I guess that might have been a topic of some FTC depositions, although it didn't come up in my 10 hours of testimony, and the practice was already on the way out due to convergence of optimization requirements between up to date CPU models and the need for more productive use of compiler development time. If you used one of those obsolete compilers on an up to date Intel CPU, you might see some of the same performance deficiencies.

Dolichocephalic answered 10/2, 2016 at 0:45 Comment(1)
This answer is wrong. The practice wasn't and isn't "on the way out" - even modern Intel compilers reserve the fastest code paths only for Intel chips based on the vendor string/cpu family/model, even when AMD chips support the same instruction set and have largely the same performance (i.e., would benefit from the fast path). See for example this thread which indicates even ICC 18 behaves this way. Finally, Intel has disclaimers all of their software forms that they do this.Clavicembalo
H
1

It's pointless to worry if you can't act. Possible actions are: Not buying AMD, or using a different compiler. So the obvious things to do are:

(1) Buy one AMD box, and measure the speed of the code compiled with the Intel compiler. Is it fast enough? If yes, you're done, you can buy AMD, don't worry.

(2) If no: Compile the code with a different compiler and run it on the AMD box. Is it fast enough? If no, you're done, you can't buy AMD, don't worry.

(3) If yes: Run the same code on an Intel box. Is it fast enough? If yes, you're done, you can buy AMD but have to switch compilers, don't worry.

(4) If no: Possibilities are: Don't buy AMD, throw all Intel computers out, or compile with two different compilers. Pick one.

Handling answered 14/7, 2014 at 15:2 Comment(0)
C
-1

I have directly experienced purposeful crippling of technology when a vendor attempted to prevent a Lotus product from reaching market before their offering. A working technology was available, but Lotus was forbidden to use it. Ah well...

A few years back there were blogs that showed users that patching a single byte in the Intel compiler caused it to emit "optimal" code that was not crippled when used on AMD. I have not looked for those blog entries in years.

I am inclined to believe that such competitive behavior continues. I have no other evidence to offer.

Clarissa answered 29/9, 2015 at 22:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.