Let us assume that I have two pointers that are pointing to unrelated addresses that are not cached, so they will both have to come all the way from main memory when being dereferenced.
int load_and_add(int *pA, int *pB)
{
int a = *pA; // will most likely miss in cache
int b = *pB; // will most likely miss in cache
// ... some code that does not use a or b
int c = a + b;
return c;
}
If out-of-order execution allows executing the code before the value of c
is computed, how will the fetching of values a
and b
proceed on a modern Intel processor?
Are the potentially-pipelined memory accesses completely serialized or may there be some sort of fetch overlapping performed by the CPU's memory controller?
In other words, if we assume that hitting main memory costs 300 cycles. Will fetching a
and b
cost 600 cycles or does out-of-order execution enable some possible overlap and perhaps cost less cycles?