Concise SSE and MMX instruction reference with latencies and throughput
Asked Answered
C

3

12

I am trying to optimize some arithmetic by using the MMX and SSE instruction sets with inline assembly. However, I have been unable to find good references for the timings and usages of these enhanced instruction sets. Could you please help me find references that contain information about the throughput, latency, operands, and perhaps short descriptions of the instructions?

So far, I have found:

Intel Instruction References Intel 64 and IA-32 Architectures Developer's Manual: Vol. 2A and Intel 64 and IA-32 Architectures Developer's Manual: Vol. 2B

Intel Optimization Guide http://www.intel.com/Assets/PDF/manual/248966.pdf

Timings of Integer Operations http://gmplib.org/~tege/x86-timing.pdf

Capo answered 2/6, 2010 at 22:6 Comment(0)
T
10

The Intel Instruction Reference should provide an adequate guide to what these instructions actually do, I would have thought? It has pseudocode for each one, a description of its operation, and in some cases even a little diagram of a representative case.

For timings, there's no official guide that I'm aware of. Agner Fog's page is the standard reference:

http://www.agner.org/optimize/

Tiflis answered 2/6, 2010 at 22:12 Comment(1)
The Agner guide is great. Exactly what I need.Capo
S
7

The Intel's Intrinsic Guide (at the bottom left of the AVX page), is a well-organized searchable tool, where you can narrow down by SSE version and/or instruction type, e.g., FP arithmetic or Integer Logical.

For each instruction, it also shows a latency/throughput table by CPU and by parameters.

Sonora answered 13/7, 2011 at 7:15 Comment(1)
It's the best I've found for seeing what instructions might do what you want. Other guides are great for detailing exactly what each ASM instruction does, but that takes so much space that you can't get an overview. I was hoping there'd be something similar to the intrinsics guide, but for asm directly. Still, it's mostly a 1:1 mapping.Dendriform
C
1

The timing are in the "Intel Optimization Guide"; see Appendix C for throughput and latencies for each instruction per CPU architecture.

Cassondra answered 13/7, 2011 at 8:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.