What is pipelining and how can we increase throughput using
pipelining?

Answer Posted / ankit

Pipelining is a technique used to improve the execution throughput of a CPU by using the processor resources in a more efficient manner.

The basic idea is to split the processor instructions into a series of small independent stages. Each stage is designed to perform a certain part of the instruction. At a very basic level, these stages can be broken down into:

Fetch Unit Fetch an instruction from memory
Decode Unit Decode the instruction be executed
Execute Unit Execute the instruction
Write Unit Write the result back to register or memory
cpu pipelininghttp://static.digitalinternals.com/wp-content/uploads/2009/02/pipelining.png

There will be a dedicated CPU module for each of the stages mentioned above.

On a non-pipelined CPU, when a instruction is being processed at a particular stage, the other stages are at an idle state – which is very inefficient. If you look at the diagram, when the 1st instruction is being decoded, the Fetch, Execute and Write Units of the CPU are not being used and it takes 8 clock cycles to execute the 2 instructions.

On the other hand, on a pipelined CPU, all the stages work in parallel. When the 1st instruction is being decoded by the Decoder Unit, the 2nd instruction is being fetched by the Fetch Unit. It only takes 5 clock cycles to execute 2 instructions on a pipelined CPU.

Note that increasing the number of stages in the pipeline will not always result in an increase of the execution throughput. On a non-pipelined CPU, an instruction could only take 3 cycles, but on a pipelined CPU it could take 4 cycles because of the different stages involved. Therefore, a single instruction might require more clock cycles to execute on a pipelined CPU. But the time taken to complete the execution of multiple instructions gets faster in pipelined CPUs. So there needs to a balance in between.

One of the major complications with deep pipelining (eg, 31-stage pipelining used in some of the Intel Pentium 4 processors) is when a conditional branch instruction is being executed – due to the fact that the processor will not be able to determine the location of the next instruction, therefore it has to wait for the branch instruction to finish and the whole pipeline may need to be flushed as a result. If a program has many conditional branch instructions, pipelining could have a negative effect on the overall perfomance. To alleviate this problem, branch prediction can be used, but this too can have a negative effect if the branches are predicted wrongly.

Due to the different ways AMD and Intel implement pipelining in their CPUs, comparing their CPUs purely based on the clock speed is never accurate.

Is This Answer Correct ?    1 Yes 0 No



Post New Answer       View All Answers


Please Help Members By Posting Answers For Below Questions

Process technology? What package was used and how did you model the package/system? What parasitic effects were considered?

1793


What was your role in the silicon evaluation/product ramp? What tools did you use?

3219


How do you size NMOS and PMOS transistors to increase the threshold voltage?

2546


Cross section of a PMOS transistor?

4263


Give a big picture of the entire SRAM Layout showing your placements of SRAM Cells, Row Decoders, Column Decoders, Read Circuit, Write Circuit and Buffers

642






Give the logic expression for an AOI gate. Draw its transistor level equivalent. Draw its stick diagram

861


Explain the Working of a 2-stage OPAMP?

716


Explain why present VLSI circuits use MOSFETs instead of BJTs?

652


Process technology? What package was used and how did you model the package/system? What parasitic effects were considered?

2645


What happens if we use an Inverter instead of the Differential Sense Amplifier?

2483


Draw the stick diagram of a NOR gate. Optimize it

766


How can you construct both PMOS and NMOS on a single substrate?

4490


Explain the three regions of operation of a mosfet.

628


What is the main function of metastability in vsdl?

608


Calculate rise delay of a 3-input NAND gate driving a 3-input NOR gate through a 6mm long and 0.45m wide metal wire with sheet resistance R = 0.065 / and Cpermicron= 0.25 fF/m. The resistance and capacitance of the unit NMOS are 6.5k and 2.5fF. Use a 3 segment -model for the wire. Consider PMOS and NMOS size of reference inverter as 2 and 1 respectively. Use appropriate sizing for the NAND and NOR gate.

3390