How can Intel and AMD be different but still compatible?

Asked 23/6, 2016 at 0:57 Answered 24/6, 2016 at 18:12

optimization x86 intel cpu-architecture amd-processor

As I have always understood it, AMD built their CPUs by reverse engineering Intel's instruction set and now pay Intel to use their instruction set, and Intel do the same for AMDs 64-bit instructions.

This is how windows can be installed on both types of CPUs without needing to purchase a specific build, such as a version compiled for ARM, and so all apps, games etc work in the same way, working interchangeable on CPUs...

However lately some things have been making me question some of this...

Firstly, I've noticed some games have been a bit laggy on my system (AMD) and after reading it turns out the game is optimised for Intel CPUs...

Also, OSX is sold on Intel CPUs but after discovering the hackintosh community it turns out it is possible but very hard to get OSX to run on AMD. This is because again OSX is designed for Intel...

After these things..

What does it mean to be optimised for Intel or AMD? How can it be possible to be different / optimised for one but not the other, if they are meant to be slot in replacements for each other? I.e both support same instructions etc.

Flannel answered 23/6, 2016 at 0:57 Comment(3)

This site is for programming (code) and programmers tools related questions. What specific programming question can we help you with? – Sensitive 23/6, 2016 at 2:29

AMD never "reverse engineered" anything, thats complete and utter nonsense -- internet BS. This myth is repeated over and over again -- without citing any credible source. Reverse engineering hardware like that would be highly illegal in many countries, in fact if it was true Intel would've already sued the crap out of AMD, probably even bought the entire company after sueing it into bankruptcy. Just because some criminal activities might be possible they wont be legal. AMD created their own solutions whilst building upon shared knowledge. End of story. – Shielashield 18/7, 2016 at 7:40

See also #38517323 – Protoplasm 23/7, 2016 at 6:23

They implement the same ISA, but with different performance characteristics because the microarchitecture is different.

e.g. see Agner Fog's microarch pdf for details, and other links from the x86 tag wiki. e.g. David Kanter's Haswell microarchitecture writeup vs. his writeup of AMD Bulldozer.

Agner Fog's instruction tables also show you exactly how fast each instruction is on each CPU. e.g. imul r64, r64/m32, imm32 is 6 cycle latency / one per 4c throughput on AMD Bulldozer-family. On Intel SnB-family, it's 3c latency with one per 1c throughput.

So when tuning for AMD, it would be worth replacing a 64bit multiply by a constant with a couple shifts / adds if possible. On Intel, it's maybe only worth it if you can get the job done in one or 2 shift / lea instructions.

AMD's designs also have a notably weaker cache hierarchy, and lower single-threaded throughput due to using pairs of cores that are permanently split instead of Intel's Hyperthreading dynamic sharing of resources between two hardware threads on the same core. IIRC, AMD is planning to change that for their next microarchitecture. Some of this is stuff you can't really "optimize for", it's just AMD being slower. :(

So they run the same code, because that's what it means to be the same architecture.

Some CPUs support ISA extensions (new instructions) that the other doesn't. e.g. XOP is AMD-only, while AVX2 and BMI2 are (so far) Intel-only, so code that wants to use more than a common baseline has to check for support at runtime.

Wikipedia's AMD Excavator article is not very up to date. Hardware has been out for a while now, but the article still says it's "expected to have" AVX2 and BMI2. Agner Fog hasn't tested it and updated his instruction tables yet, either.

Protoplasm answered 23/6, 2016 at 2:19 Comment(1)

Also en.wikipedia.org/wiki/… / What is the compatible subset of Intel's and AMD's x86-64 implementations? / What EXACTLY is the difference between intel's and amd's ISA, if any? - not all the differences are performance and extensions supported, especially if you're doing OS development. – Protoplasm 26/10, 2020 at 0:57

When I first saw this question it had more downvotes than upvotes. But I think it is a reasonable question related to system performance and the differences between AMD and Intel processors. I think there are a couple of points worth addressing.

ISA Licensing

As I have always understood it, AMD built their CPUs by reverse engineering Intel's instruction set and now pay Intel to use their instruction set, and Intel do the same for AMDs 64-bit instructions.

I don't know the full history of AMD and Intel license agreement for x86, but this is a bit of an oversimplification. Currently AMD and Intel have a cross licensing agreement that allows both of them to implement the same ISA. For instance the 64-bit extensions to the x86 ISA were developed by AMD back when Intel was pushing the Itanium ISA. Regardless it is true that both AMD and Intel support the same core x86 ISA now and they generally have extensions to it that are compatible with each other.

Overall performance

Firstly, I've noticed some games have been a bit laggy on my system (AMD) and after reading it turns out the game is optimised for Intel CPUs...

The overall performance of program execution depends on three basic things. The number of instructions that need to be executed, the frequency of the CPU (clock speed), and the number of instructions executed per cycle (per clock tick). Currently high-end Intel CPUs tend to have better overall performance than AMD CPUs, even when executing the exact same application that does not have any specific optimizations. So it's likely that if the game is slow on your system it is just because the CPU is too slow, rather than because it's optimized for a particular microarchitecture. Also there could be other factors (GPU tends to matter the most for gaming), but debugging the performance of a game isn't going to be on-topic for stackoverflow, unless you are a game developer trying to understand a specific coding problem.

CPU Specific Optimizations

What does it mean to be optimised for Intel or AMD? How can it be possible to be different / optimised for one but not the other, if they are meant to be slot in replacements for each other? I.e both support same instructions etc.

Although Intel and AMD both develop CPUs that run x86 applications, the internal microarchitecture of the CPUs is different. And there is not simply an Intel microarchitecture or an AMD microarchitecture. Instead each company has various different groups of CPUs that it develops under a specific microarchitecture. So a program could be optimized for Skylake (and Intel microarchitecture) or Bulldozer (an AMD microarchitecture).

When the compiler is generating code it can make very minor tweaks that might benefit one microarchitecture more than another. If a developer doesn't know what the target CPU family is then it might make sense not to target a specific microarchitecture and simply generate code that is expected to perform the best overall. But if the developer know which microarchitecture the program will run on then it can be possible to get a slight performance improvement by specializing for that microarchitecture.

Usually these performance gains are pretty small compared to the baseline optimization. One exception is when a new feature like SSE4 is available. In that case it could make a big difference for certain workloads that are able to take advantage of the new feature. But even then the optimization is more specific to that feature than a specific processor vendor since both AMD and Intel support SSE4 now.

Ascariasis answered 24/6, 2016 at 18:12 Comment(5)

re: last pagraph: On AMD Bulldozer-family, 256b AVX typically doesn't gain you anything. The execution units are still only 128b wide, so ymm instructions decode to two m-ops. IIRC, only Piledriver and later can decode a 2-2 pattern of m-ops (instead of 2-1-1 or something), so a sequence of 2 m-op instructions is the worst case for decode throughput on Bulldozer. Piledriver has a horrible 256b store performance bug. Anyway, according to Agner Fog's guide, using 256b instructions doesn't make sense if you're tuning specifically for AMD. (He doesn't have results for Excavator yet, though) – Protoplasm 24/6, 2016 at 18:45

re: first paragraph. I voted to migrate this to Superuser (not to close as off-topic for a re-ask on SU, since there are many answers). The question could have been on-topic here if it was asked differently, but the focus on "why is my AMD slow for games" makes it an SU question. – Protoplasm 24/6, 2016 at 18:48

Interesting point about Bulldozer and AVX. I think my high-level point remains, but maybe something like SSE4 would have been a better example if AMD doesn't have good performance with AVX right now. As for whether this fits for SO I can see arguments against it. But I think a lot of computer architecture conceptual questions are borderline for SO. I agree figuring out why a game is slow would not be a good fit, but I took that as more of an example of performance differences rather than the focus of the question. – Ascariasis 24/6, 2016 at 19:38

Yes, right idea, wrong example. SSE4.1 would be a perfect example. It's still not 100% universally supported, so most software can't use it except with runtime dispatching. But it has some very useful instructions for vectorizing stuff with 32bit integers, and roundps, and immediate blends. SSSE3 added pshufb, which is a game-changer (the first variable shuffle). IDK if games require it as a baseline, though. AMD K10 doesn't have it, so even PhenomII CPUs don't have it. – Protoplasm 24/6, 2016 at 19:44

I edited my answer to use SSE4 instead of AVX since it sounds like that is a better example for now. – Ascariasis 24/6, 2016 at 20:39

Software compatibility with processors is ensured by the fact that they can be queried for availability of certain well-defined instructions or instruction groups. (The instruction sets are extremely volatile these days; this can be a nightmare for developers.)

So even among the Intel family, programs can run at quite different performance, depending on what the processor supports and how the software exploits it.

Livelong answered 23/6, 2016 at 8:40 Comment(0)

-1

basically there is a difference in the processing. AMD and Intel pay each other fees for using the patents of the others. That does not mean that both have the same design. The base instruction set is equal, but both have additional instructions that are specific for the CPU while they are basically emulated on the other CPU (at least most of them) which causes a software using the additional (optimized) instructions from Intel on AMD might be slower as the other way round. Additionally it is not said that all instructions will be emulated on both of the CPUs. There can be slight differences.

Hope this clarifies it a little ;-)

Hockey answered 23/6, 2016 at 1:13 Comment(1)

No CPUs "emulate" any important instructions; if an instruction runs at all, it has hardware support. (Actually, bsf/bsr are much slower on AMD than Intel, but those instructions have existed since 386. So they actually are emulated in microcode to some degree). – Protoplasm 24/6, 2016 at 18:56

-1

SIMD instructions are very different, and for some tasks (like games) they can make a difference. See this answer for specific example: https://mcmap.net/q/22888/-fast-counting-the-number-of-set-bits-in-__m128i-register

If you really want to, you can create several versions of your inner-loop algorithms, and use cpuid in runtime to select the best implementation for the platform. Some people do just that, e.g. the people developing x264 video codec definitely do:

int x264_intra_satd_x9_4x4_ssse3( uint8_t *, uint8_t *, uint16_t * ); // Intel 2006+, AMD 2011+
int x264_intra_satd_x9_4x4_sse4( uint8_t *, uint8_t *, uint16_t * ); // Both around 2006 but slightly different instructions
int x264_intra_satd_x9_4x4_avx( uint8_t *, uint8_t *, uint16_t * ); // Intel 2011, AMD around 2012
int x264_intra_satd_9_4x4_xop( uint8_t *, uint8_t *, uint16_t * ); // AMD only

For many projects, doing like that i.e. optimizing for all of them is prohibitively expensive. So the software got optimized for only most popular architecture[s].

This page http://store.steampowered.com/hwsurvey?platform=pc (click on Other Settings) tells that:

99.95% have SSE3
91.04% have SSSE3
84.76% have SSE4.1
81.60% SSE4.2
67.56% AVX (mostly Intel, I think)
22.05% SSE4a (that’s AMD only)

If you’re managing a project, and you have a choice how to spend your budget: would you specifically optimize your software for 67% users who have AVX or for 22% of users who have SSE4a?

AMD implemented SSE4a before it implemented SSSE3. 22.83% of the users use AMD and since 22.05% of the users have SSE4a it's safe to say nearly all AMD users have SSE4a. I think we can conclude that the majority of users without SSSE3 there are AMD K10 users. This the main reason that SSE3 is becoming the baseline and not SSSE3.

Dune answered 23/6, 2016 at 1:54 Comment(4)

AMD Bulldozer-family supports AVX. Unfortunately the lowest-end Intel chips (Pentium/Celeron) still don't have AVX. i.e. a Skylake Pentium doesn't have AVX, so it's still going to be ages before we can assume AVX as a baseline. Of course, Silvermont doesn't have it either, so crappy Skylake Pentiums aren't the only thing holding back progress. – Protoplasm 23/6, 2016 at 2:30

The most interesting observation that you are missing here is that only 91% of users have SSSE3. I think these are all AMD users that have SSE4a but not SSSE3. – Invar 18/7, 2016 at 7:20

So the base line is SSE3. It would be SSSE3 if it were not for AMD. – Invar 18/7, 2016 at 7:22

Also 22.83% of the users use AMD and of those 22.05% have SSE4a. It's safe to say that nearly all AMD users have SSE4a but a lot of them still don't have SSSE3. – Invar 18/7, 2016 at 7:25

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

ISA Licensing

Overall performance

CPU Specific Optimizations

Recommended topics

Hot tags