What is pipelining? how does it increase the speed of execution?

Asked 4/3, 2012 at 1:40 Answered 5/3, 2012 at 0:34

Solved architecture assembly arm pipelining

I believe that no question is silly if it is bugging you. I have this question about pipe-lining?

What is pipe-lining?

Theory says that : "With pipelining, the CPU begins executing a second instruction before the first instruction is completed. Pipelining results in faster processing because the CPU does not have to wait for one instruction to complete the machine cycle."

My question is considering i am working on a uni-processor system, where only one instruction can be executed at a time, how is it possible that simultaneous operation of fetching next instruction is performed when my CPU is busy? If i am lacking conceptual clarity please throw some light on me. If there is separate hardware which makes simultaneous processing happen, what is it? Kindly explain.

Daman answered 4/3, 2012 at 1:40 Comment(2)

This is exactly what Henry Ford did a 100 years ago. You don't have to wait for one car (or instruction) to be completed before starting to work on the next one. – Chambers 4/3, 2012 at 10:41

I'm surprised that no one has mentioned it but pipelining increases throughput which in turn achieves better IPC and hence performance. – Nashom 5/6, 2017 at 23:15

There is indeed separate hardware for fetching. There is a whole bunch of bits of separate hardware, arranged in a pipeline. Each part is executing one part of a separate instruction simultaneously. On every clock edge, the results of one stage get passed down to the next.

Eiland answered 4/3, 2012 at 1:43 Comment(2)

Thanks. I would like to get more clarity as to how it all happens? Any useful links can be of real help. – Daman 4/3, 2012 at 1:55

@happy2Help: Maybe something like this? en.wikipedia.org/wiki/Classic_RISC_pipeline – Eiland 4/3, 2012 at 1:56

Pipelining has nothing to do with uni- versus multi-processor systems. It has to do with thinking hard about the steps taken in executing a single instruction on a machine, in hardware.

Imagine you want to implement the MIPS "add-immediate" instruction, addi $d, $s, $t, which adds an integer stored in the register named by $s to an integer $t directly encoded in the instruction, and stores the result in the register named by $t. Think about the steps you'd need to take to do that. Here's one way of breaking it down (for example only, this doesn't necessarily correspond to real hardware):

Parse out the (binary-encoded) instruction to find out which instruction it is.
Once you recognize that it is an addi instruction, parse out the source and destination registers and the literal integer to add.
Read the appropriate register, and compute the sum of its value and the immediate integer.
Write the result into the named result register.

Now remember, all this needs to be built in hardware, meaning there are physical circuits associated with each of these things. And if you executed one instruction at a time, three fourths of these circuits would be sitting idle, doing nothing all the time. Pipelining takes advantage of this observation: If the processor needs to execute two addi instructions in a row, then it can:

Identify the first one
Parse the first one, and identify the second one with circuits that would otherwise be idle
Add the first one, and parse the second
Write out the first one, and add the second
Write out the second one

So now, even though each instruction takes 4 processing rounds, the processor has finished two instructions in just 5 rounds total.

This gets complicated due to the fact that sometimes you've got to wait for one instruction to finish before you know what to do in the next one (or even what the next one is), but that's the basic idea.

Powel answered 4/3, 2012 at 2:12 Comment(1)

One more thing. With pipelining the function units (Fetch/Decode/Execute/Writeback) get more independence from each other. So the units itself can be more compact than the whole block would be. This enables higher clock speeds because the hardware propagation delays are reduced (electrons need time to travel through the silicon and switch the transistors). – Leu 6/3, 2012 at 23:27

Eiland answered 4/3, 2012 at 1:43 Comment(2)

Thanks. I would like to get more clarity as to how it all happens? Any useful links can be of real help. – Daman 4/3, 2012 at 1:55

@happy2Help: Maybe something like this? en.wikipedia.org/wiki/Classic_RISC_pipeline – Eiland 4/3, 2012 at 1:56

Rather than try to cram a year-long university course into this text box, I'll point you at a textbook that explains this whole subject in clear detail:

Hennessy, John L.; and Patterson, David A. Computer Architecture, Fifth Edition: A Quantitative Approach. Morgan Kauffman.

Goldner answered 4/3, 2012 at 2:4 Comment(5)

I still remember the 2nd Ed. from university, predicting the imminent heat-death of x86. Excellent book regardless. – Moreover 5/3, 2012 at 11:26

It turns out the book I actually had in class was not Hennessy and Patterson, but Patterson and Hennessy's "Computer organization & Design: The hardware-software interface", 2nd ed. amazon.com/Computer-Organization-Design-Fourth-Architecture/dp/… That's super confusing! – Goldner 6/3, 2012 at 0:10

Can i directly read 5th edition or do i have to read its previous editions? Asking it so that i can buy the book accordingly. Please reply. – Daman 14/3, 2012 at 11:56

@happy2Help You can start with the current edition; the previous editions are just older versions of the same book. The authors must update their textbook to keep up with new computers, after all! The same authors also have another textbook that covers mostly the same subject but with a different style: amazon.com/Computer-Organization-Design-Fourth-Architecture/dp/… You can take a look at both to see which suits you better. The "Hardware/Software" interface is the one I actually used in class. – Goldner 15/3, 2012 at 0:14

Hey. Thanks. I read the chapter 1 of CA from net. I found it really good and ordered it. Will read and let you know. Thanks for suggesting a good book. – Daman 15/3, 2012 at 5:58

Think about those How its made or other tv shows where you see a factory in action. Think about what you may have read or seen about a car factory. The "Car" moves through the factory starting as a frame or body and things are added to it as it moves. If you sat on the outside of the building you would see tires and paint cans and rolls of wire and steel go into the building and a steady stream of cars going out. Just because it is a single (uniprocessor) factory doesnt mean it cant have an assembly line (pipeline). A uniprocessor with a pipeline is not actually, necessarily executing one instruction at a time any more than the car in the factory is built one car at a time. A little bit of the construction of that car happens at each station that it passes through, likewise the execution of your program happens a little bit at each station in the pipeline.

The typical simple stages in the pipe are fetch, decode, and execute, three stages. it takes three clocks to execute one instruction, minimum (usually many more due to I/O being slow) lets say three stages in the pipe. While instruction a is in the execution phase though you have instruction b being decoded and instruction c being fetched. Back to the auto factory, they might produce "one car every 7 minutes" that doesnt mean it takes 7 minutes to make a car, it might take a week to make a car, but they start a new one every 7 minutes and the average time at each station is such that you can roll one out the door every 7 minutes. Same here, with a pipeline it doesnt mean you can fetch, decode, and execute all three steps at the clock rate for the processor. Like the factory it is more of an average thing. If you can feed each of the stages in the pipeline at the processor clock rate then it will complete one instruction per clock (if designed to do that). these days you cant feed the data/instructions that fast and there are pipeline stalls, etc which cause you to have to start over or discard some of the progress and back up some.

Pipelining is simply taking an assembly line approach to executing instructions in a processor.

Cheese answered 5/3, 2012 at 0:34 Comment(0)

I thought it was used when there are branches in the code, and the logic predicts which branch will be taken, and preloads the instructions for that branch into a cache. If the prediction proves to be false, then it needs to throw away those instructions and load the alternate, resulting in a loss. But I believe there are patterns in code that make the prediction true more often than not, especially with modern compilers that repeat patterns over and over.

I'm not up on the actual implementation, but I don't really think that additional hardware is necessarily required, although it is useful for optimum speed.

Tao answered 4/3, 2012 at 1:47 Comment(5)

Nope, that's just branch prediction. While it's conceptually similar with a part of pipelining (doing memory access for some instructions before they are reached), it's a quite different beast AFAIK. – Hyperpituitarism 4/3, 2012 at 1:49

Yeah, this is something different. – Eiland 4/3, 2012 at 1:55

I looked it up, and found a (too?) simple animation here:. But I was wondering... is this technique only used for risc processors, or is that just where it started out? – Tao 4/3, 2012 at 2:11

@Marty: By the time the processor executes a branch, it will have filled its pipeline with the instructions immediately following the branch. So if it takes the branch, all the stuff in the pipeline can't be used and the processor has to throw it all away and start loading instructions at the branch destination instead. So branch prediction is where the processor tries to guess which instruction to execute after the branch so it doesn't have to flush its pipeline. And you'd be hard pressed to find a processor today that doesn't use pipelining. – Phiphenomenon 4/3, 2012 at 3:53

Thanks; I guess I'm a bit behind on my hardware theory. Hard enough to just keep up with new programming techniques. Back when I started out, I was on top of the hardware, which is probably why it's been downhill ever since. :) Now, 8080, that was easy! – Tao 5/3, 2012 at 0:51

Recommended topics

Hot tags