Does intel have a separate instruction set for it's GPU

Assume I'm using my Intel x64 based laptop with no dedicated GPU. I must have some GPU onboard otherwise my screen won't work, right?

Are onboard GPUs typically embedded into the CPU?
Does intel have a separate instruction set for it's GPU? if so is there a doc?
Do GPU instructions greatly differ from CPU? for example do GPUs have shift, add, load, store instructions as well? What other instructions do they have that regular CPUs don't have?
Is there a difference between the instruction set/pipeline of an onboard GPU vs Dedicated? or the difference is just about the number of extra cores and dedicated RAM?
On a machine with dedicated GPU, how do generated instructions from a C++ OpenGL code get executed on the GPU and not end up with the regular CPU?

Full hardware reference

One can find a full documentation of Intel's graphic controller at 01.org:

Hardware Specification - PRMs Published by: Paul Parenteau Last modification: Jun 15, 2020

Answering to question 2: yes, there are separate assembly instructions, as developed below (from "Introduction to GEN assembly")

General form of Intel GPU assembly

Typically, all instructions have the following form:

[(pred)] opcode (exec-size|exec-offset) dst src0 [src1] [src2]

(pred) is the optional predicate. We are going to skip it for now.

opcode is the symbol of the instruction, like add or mov (we have a full table of opcodes below.

exec-size is the SIMD width of the instruction, which of our architecture could be 1, 2, 4, 8, or 16. In SIMD32 compilation, typically two instructions of execution size 8 or 16 are grouped into one.

exec-offset is the part that's telling the EU, which part of the ARF registers to read or write from, e.g. (8|M24) consults the bits 24-31 of the execution mask. When emitting SIMD16 or SIMD32 code like the following:

mov  (8|M0)   r11.0<1>:q   r5.0<8;8,1>:d   // id:1
mov  (8|M8)   r13.0<1>:q   r6.0<8;8,1>:d   // id:1
mov  (8|M16)  r15.0<1>:q   r9.0<8;8,1>:d   // id:1
mov  (8|M24)  r17.0<1>:q   r10.0<8;8,1>:d  // id:1

(mov instructions of SIMD32 assembly)

the compiler has to emit four 8-wide operations due to a limitation of how many bytes can be accessed per operand in the GRF.

dst is a destination register

src0 is a source register

src1 is an optional source register. Note, that it could also be an immediate value, like 0x3F000000:f (0.5) or 0x2A:ud (42).

src2 is an optional source register.

General Register File (GRF) Registers

Each thread has a dedicated space of 128 registers, r0 through r127. Each register is 256 bits or 32 bytes.

Architecture Register File (ARF) Registers

In the assembly code above, we only saw one of these special registers, the null register, which is typically used as a destination for send instructions used for writing and indicating end of thread. Here is a full table of other architecture registers:

Full hardware reference

General form of Intel GPU assembly

General Register File (GRF) Registers

Architecture Register File (ARF) Registers

Available GEN (general) Assembly Instructions

Recommended topics

Hot tags