Pipelining
Suppose the propagation delays are the same as before, 300 ps for combinational logic, 20 ps for the register
- But now, lets break up the combinational logic into 3 pieces, each with 100 pc of propagation delay (uniform partitioning)
Then, we could overlap (pipeline) the execution of 3 successive instructions by adding registers after each piece A, B, and C.
The latency is now
Minimum clock cycle time is now
Non-Uniform Partitioning
If the partitioning is non-uniform, then the gain from pipelining is limited by the speed of the slowest stage (the bottleneck)
The latency is now
The minimum clock cycle is now 170 ps
Diminishing Returns of Deep Pipelining
Suppose we partition the logic into 6 pieces, each of 50 ps latency
The latency is now
Minimum clock cycle time is now
- Max clock frequency is
Throughput is 14.29 GIPS
Speedup is . Diminishing Returns.
Uniform Pieces
Latency would be
Throughput would be
As