Intro to Pipelining

Pipelining

Suppose the propagation delays are the same as before, 300 ps for combinational logic, 20 ps for the register

But now, lets break up the combinational logic into 3 pieces, each with 100 pc of propagation delay (uniform partitioning)

Then, we could overlap (pipeline) the execution of 3 successive instructions by adding registers after each piece A, B, and C.

The latency is now $3 \times (100 + 20) ps = 360 ps$

Minimum clock cycle time is now $(100 + 20) ps = 120 ps$

If the partitioning is non-uniform, then the gain from pipelining is limited by the speed of the slowest stage (the bottleneck)

The latency is now $3 \times (max (50, 150, 100) + 20) ps = 510 ps$

The minimum clock cycle is now 170 ps

Suppose we partition the logic into 6 pieces, each of 50 ps latency

The latency is now $6 \times (50 + 20) ps = 420 ps$

Minimum clock cycle time is now $(50 + 20) ps = 70 ps$

Throughput is 14.29 GIPS

Speedup is $\frac{14.29}{3.12} = 4.61$ . Diminishing Returns.

Latency would be $k \cdot (\frac{300}{k} + 20) ps = (300 + 20 k) ps$

Throughput would be $\frac{1 instruction}{( \frac{300}{k} + 20 ) \times 1 0 ^{- 12} s} = \frac{1000 k}{300 + 20 k} GIPS$

As $k \to \infty : latency \to \infty, throughput \to 50 GIPS$

$Speedup \to \frac{50}{3.12} = 16.02$