Explain processor pipelines, and the pro/cons of shorter or longer pipelines.
What a pipeline is
A CPU pipeline breaks instruction execution into stages so multiple instructions are processed concurrently, like an assembly line. A classic RISC 5-stage pipeline: IF (instruction fetch) → ID (decode/register read) → EX (execute/ALU) → MEM (memory access) → WB (write-back). With one instruction in each stage, the CPU can (ideally) retire one instruction per clock even though each instruction takes several cycles end-to-end. Pipelining increases instruction throughput and lets the chip run at a higher clock frequency, because each stage does less work per cycle (shorter critical path).
Hazards limit the ideal one-per-cycle rate:
- Structural hazards: two stages need the same resource.
- Data hazards: an instruction needs a result not yet written back (mitigated by forwarding/bypassing and, failing that, stalls/bubbles).
- Control hazards: branches change the PC, so instructions fetched after a branch may be wrong; mitigated by branch prediction, branch delay slots (some ISAs), and speculative execution. A mispredict forces a pipeline flush.
Shorter pipelines — pros/cons
Pros:
- Lower branch mispredict / flush penalty: fewer stages to flush, so wrong-path recovery costs fewer cycles → more deterministic, better for hard real-time.
- Lower latency per instruction, simpler hazard logic, smaller area/power, easier to verify.
- More predictable timing — valued in microcontrollers (e.g., Cortex-M0+ has a 2-stage pipeline; M3/M4 a 3-stage).
Cons:
- Lower maximum clock frequency (each stage does more work → longer critical path), so potentially lower peak throughput.
Longer (deeper) pipelines — pros/cons
Pros:
- Higher achievable clock frequency (less logic per stage), enabling higher peak throughput on well-predicted, streaming code.
- Allows aggressive superscalar/out-of-order designs.
Cons:
- Higher mispredict penalty: a flush throws away many in-flight instructions (a deep pipeline like the old Pentium 4 "NetBurst," ~20–31 stages, paid a heavy branch-mispredict cost) → requires very good branch prediction.
- More complex forwarding, hazard, and speculation logic; more power; harder to make timing-deterministic (bad for hard real-time).
- Diminishing returns: beyond a point, added pipelining yields little net performance because stall/flush overhead grows.
Bottom line: deeper pipelines trade single-instruction latency and predictability for higher clock speed and throughput; shorter pipelines trade peak speed for low latency, low power, and determinism — which is why MCUs favor short pipelines and high-performance application/desktop CPUs use deeper ones (balanced with strong branch prediction).
