Does a one cycle instruction take one cycle, even if RAM is slow?
Asked Answered
O

2

2

I am using an embedded RISC processor. There is one basic thing I have a problem figuring out.

The CPU manual clearly states that the instruction ld r1, [p1] (in C: r1 = *p1) takes one cycle. Size of register r1 is 32 bits. However, the memory bus is only 16 bits wide. So how can it fetch all data in one cycle?

Osmometer answered 10/10, 2009 at 1:16 Comment(0)
E
7

The clock times are assuming full width zero wait state memory. The time it takes for the core to execute that instruction is one clock cycle.

There was a time when each instruction took a different number of clock cycles. Memory was relatively fast then too, usually zero wait state. There was a time before pipelines as well where you had to burn a clock cycle fetching, then a clock cycle decoding, then a clock cycle executing, plus extra clock cycles for variable length instructions and extra clock cycles if the instruction had a memory operation.

Today clock speeds are high, chip real estate is relatively cheap so a one clock cycle add or multiply is the norm, as are pipelines and caches. Processor clock speed is no longer the determining factor for performance. Memory is relatively expensive and slow. So caches (configuration, number of and size), bus size, memory speed, peripheral speed determine the overall performance of a system. Normally increasing the processor clock speed but not the memory or peripherals will show minimal if any performance gain, in some occasions it can make it slower.

Memory size and wait states are not part of the clock execution spec in the reference manual, they are talking about only what the core itself costs you in units of clocks for each of the instructions. If it is a harvard architecture where the instruction and data bus are separate, then one clock is possible with the memory cycle. The fetch of the instruction happens at least the prior clock cycle if not before that, so at the beginning of the clock cycle the instruction is ready, decode, and execute (the read memory cycle) happen during the one clock at the end of the one clock cycle the result of the read is latched into the register. If the instruction and data bus are shared, then you could argue that it still finishes in one clock cycle, but you do not get to fetch the next instruction so there is a bit of a stall there, they might cheat and call that one clock cycle.

Eliseoelish answered 10/10, 2009 at 2:35 Comment(5)
please can you confirm to me, that a cache miss either on instruction cache (thus in fetch stage) or on data cache (thus in memory stage) causes a pipeline stall which length is the memory latency ?Caban
sure if a fetch misses and there are no other transactions in flight then the memory controller waits for that fetch to complete. That doesnt mean the pipeline stalls and if it does stall that doesnt mean it stalls for a specific amount of time. Your question was about a data cycle not an instruction (fetch) cycle and you didnt mention a cache or caches. with a data cache you probably need an mmu or some other scheme, an mmu adds clock cycles as well for fetches or data cycles. again the pipe is not necessarily stalled and if so not by the length of the memory cycles.Eliseoelish
thanks for the reply, but I am not clear at all in fact between data cycle vs instruction (fetch) cycle. My knowledge of CPU architecture very limited and I reach this question while having problems with an application that is faster when I add code to it.Caban
by adding code you are changing the mixture of things in the cache, that may be why it is faster. Sometimes you can add or remove nops in the startup code to move the whole program around and see performance changes as various things line up with cache lines.Eliseoelish
if you have a cache and in fact you have a separate instruction cache and data cache, then instruction fetches are...cached...by the instruction cache where data reads and writes are cached using the data cache. the instruction cycles do not affect the data cache, do not affect data performance, etc. Having a cache implies you have a slow memory system, so your code performance can/will be sensitive to how the code/instructions and cache interact. I am guessing from your responses you need to do a little studying research at wikipedia or elsewhere on this stuff.Eliseoelish
A
2

My understanding is : when saying some instruction take one cycle , it is not that instruction will be finished in one cycle. We should take in count of instruction pipe-line. Suppose your CPU has 5 stage pipe line , that instruction would takes 5 cycles if it were exectued sequentially.

Accumulate answered 10/10, 2009 at 1:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.