Does intel have a separate instruction set for it's GPU
Asked Answered
J

1

9

Assume I'm using my Intel x64 based laptop with no dedicated GPU. I must have some GPU onboard otherwise my screen won't work, right?

  1. Are onboard GPUs typically embedded into the CPU?
  2. Does intel have a separate instruction set for it's GPU? if so is there a doc?
  3. Do GPU instructions greatly differ from CPU? for example do GPUs have shift, add, load, store instructions as well? What other instructions do they have that regular CPUs don't have?
  4. Is there a difference between the instruction set/pipeline of an onboard GPU vs Dedicated? or the difference is just about the number of extra cores and dedicated RAM?
  5. On a machine with dedicated GPU, how do generated instructions from a C++ OpenGL code get executed on the GPU and not end up with the regular CPU?
Jelsma answered 7/5, 2020 at 19:2 Comment(7)
The GPU cores are not x86 cores at all, totally separate instruction set. The onboard GPU is on the same physical silicon chip as the CPU cores, e.g. on Intel connected to the same ring bus as the cores, memory controllers, and system agent. See anandtech.com/show/3922/… for a diagram of a quad-core Sandybridge. realworldtech.com/sandy-bridge/8 is part of a detailed writeup that focuses on the IA cores (Intel Architecture = x86 = CPU cores)Concealment
And yes, different GPU vendors have their own instruction set. This is why OpenGL shaders in a program have to be compiled by the GPU driver for the specific GPU that will run them, from a portable language.Concealment
So what happens if I compile something on my laptop and run it on another intel x64 with dedicated nvidia? it wont use nvidia I guess?Jelsma
Graphics drivers compile shader programs at runtime, every time you run. That's how games and whatnot can distribute a single version of the program that works on any OpenGL system, including on future GPUs that didn't exist when the game was written. And conversely how graphics drivers can expose a stable portable API / ABI.Concealment
Say I write some CPP code with opengl and compile. I'm guessing the generated code makes system calls at runtime which get handed by the Graphics drivers which then generate the actual instructions to do the graphical work?Jelsma
Not even system calls, just library function calls to OpenGL functions, the implementation of which depends on the graphics driver. But yes.Concealment
Oh so each graphic driver like intel, amd, nvidia etc etc has to have its own implementation of Opengl? there are other alternatives to opengl, does that mean each driver have to have them all implemented?Jelsma
S
9

Full hardware reference

One can find a full documentation of Intel's graphic controller at 01.org:

Hardware Specification - PRMs Published by: Paul Parenteau Last modification: Jun 15, 2020

Answering to question 2: yes, there are separate assembly instructions, as developed below (from "Introduction to GEN assembly")

General form of Intel GPU assembly

Typically, all instructions have the following form:

[(pred)] opcode (exec-size|exec-offset) dst src0 [src1] [src2]

(pred) is the optional predicate. We are going to skip it for now.

opcode is the symbol of the instruction, like add or mov (we have a full table of opcodes below.

exec-size is the SIMD width of the instruction, which of our architecture could be 1, 2, 4, 8, or 16. In SIMD32 compilation, typically two instructions of execution size 8 or 16 are grouped into one.

exec-offset is the part that's telling the EU, which part of the ARF registers to read or write from, e.g. (8|M24) consults the bits 24-31 of the execution mask. When emitting SIMD16 or SIMD32 code like the following:

mov  (8|M0)   r11.0<1>:q   r5.0<8;8,1>:d   // id:1
mov  (8|M8)   r13.0<1>:q   r6.0<8;8,1>:d   // id:1
mov  (8|M16)  r15.0<1>:q   r9.0<8;8,1>:d   // id:1
mov  (8|M24)  r17.0<1>:q   r10.0<8;8,1>:d  // id:1

(mov instructions of SIMD32 assembly)

the compiler has to emit four 8-wide operations due to a limitation of how many bytes can be accessed per operand in the GRF.

dst is a destination register

src0 is a source register

src1 is an optional source register. Note, that it could also be an immediate value, like 0x3F000000:f (0.5) or 0x2A:ud (42).

src2 is an optional source register.

General Register File (GRF) Registers

Each thread has a dedicated space of 128 registers, r0 through r127. Each register is 256 bits or 32 bytes.

Architecture Register File (ARF) Registers

In the assembly code above, we only saw one of these special registers, the null register, which is typically used as a destination for send instructions used for writing and indicating end of thread. Here is a full table of other architecture registers:

Architecture Register File (ARF) Registers

Available GEN (general) Assembly Instructions

Available GEN (general) Assembly Instructions Available GEN (general) Assembly Instructions 2

Shon answered 19/1, 2021 at 5:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.