Search topics...

Explain processor pipelines, and the pro/cons of shorter or longer pipelines.

0 upvotes
Practice with AISoon

What a pipeline is

A CPU pipeline breaks instruction execution into stages so multiple instructions are processed concurrently, like an assembly line. A classic RISC 5-stage pipeline: IF (instruction fetch) → ID (decode/register read) → EX (execute/ALU) → MEM (memory access) → WB (write-back). With one instruction in each stage, the CPU can (ideally) retire one instruction per clock even though each instruction takes several cycles end-to-end. Pipelining increases instruction throughput and lets the chip run at a higher clock frequency, because each stage does less work per cycle (shorter critical path).

Hazards limit the ideal one-per-cycle rate:

  • Structural hazards: two stages need the same resource.
  • Data hazards: an instruction needs a result not yet written back (mitigated by forwarding/bypassing and, failing that, stalls/bubbles).
  • Control hazards: branches change the PC, so instructions fetched after a branch may be wrong; mitigated by branch prediction, branch delay slots (some ISAs), and speculative execution. A mispredict forces a pipeline flush.

Shorter pipelines — pros/cons

Pros:

  • Lower branch mispredict / flush penalty: fewer stages to flush, so wrong-path recovery costs fewer cycles → more deterministic, better for hard real-time.
  • Lower latency per instruction, simpler hazard logic, smaller area/power, easier to verify.
  • More predictable timing — valued in microcontrollers (e.g., Cortex-M0+ has a 2-stage pipeline; M3/M4 a 3-stage).

Cons:

  • Lower maximum clock frequency (each stage does more work → longer critical path), so potentially lower peak throughput.

Longer (deeper) pipelines — pros/cons

Pros:

  • Higher achievable clock frequency (less logic per stage), enabling higher peak throughput on well-predicted, streaming code.
  • Allows aggressive superscalar/out-of-order designs.

Cons:

  • Higher mispredict penalty: a flush throws away many in-flight instructions (a deep pipeline like the old Pentium 4 "NetBurst," ~20–31 stages, paid a heavy branch-mispredict cost) → requires very good branch prediction.
  • More complex forwarding, hazard, and speculation logic; more power; harder to make timing-deterministic (bad for hard real-time).
  • Diminishing returns: beyond a point, added pipelining yields little net performance because stall/flush overhead grows.

Bottom line: deeper pipelines trade single-instruction latency and predictability for higher clock speed and throughput; shorter pipelines trade peak speed for low latency, low power, and determinism — which is why MCUs favor short pipelines and high-performance application/desktop CPUs use deeper ones (balanced with strong branch prediction).