Pipelining

Suppose the propagation delays are the same as before, 300 ps for combinational logic, 20 ps for the register

  • But now, lets break up the combinational logic into 3 pieces, each with 100 pc of propagation delay (uniform partitioning)

Then, we could overlap (pipeline) the execution of 3 successive instructions by adding registers after each piece A, B, and C.

The latency is now

Minimum clock cycle time is now

Non-Uniform Partitioning

If the partitioning is non-uniform, then the gain from pipelining is limited by the speed of the slowest stage (the bottleneck)

The latency is now

The minimum clock cycle is now 170 ps

Diminishing Returns of Deep Pipelining

Suppose we partition the logic into 6 pieces, each of 50 ps latency

The latency is now

Minimum clock cycle time is now

  • Max clock frequency is

Throughput is 14.29 GIPS

Speedup is . Diminishing Returns.

Uniform Pieces

Latency would be

Throughput would be

As

Pipelining with Feedback