Search topics...

What is "duff's device"? Have you ever used it?

0 upvotes
Practice with AISoon

Duff's device is a loop-unrolling technique (attributed to Tom Duff) that interleaves a switch statement with a do/while loop so that the loop body is unrolled N times and the leftover iterations (when the count isn't a multiple of N) are handled by jumping into the middle of the unrolled body. The surprising part is that C allows case labels to sit anywhere inside the loop, and execution falls through from the entry case onward.

Canonical version copying count words to a memory-mapped register *to (the original used a serial output port, hence no to++):

c
void send(volatile int *to, const int *from, int count) {
if (count <= 0) return; /* canonical Duff's device assumes count > 0 */
int n = (count + 7) / 8; /* number of 8-iteration chunks (rounded up) */
switch (count % 8) { /* jump into the unrolled body for the remainder */
case 0: do { *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
}

How it works: Suppose count = 11. count % 8 == 3, so control jumps to case 3, executes 3 copies, falls through to the bottom of the loop, then do/while runs two more full passes of 8 (total 3 + 8 = 11). n = (11 + 7) / 8 = 2 controls the number of loop passes. The switch is entered exactly once; thereafter the do/while cycles normally.

Why it was done: It amortizes the loop-overhead (the decrement-and-branch) across 8 useful operations instead of 1, which mattered on older machines and in tight copy loops, while still handling arbitrary (non-multiple-of-8) counts without a separate cleanup loop.

Practical note: On modern systems it is usually counterproductive — memcpy/memmove are heavily optimized, compilers auto-unroll, and manual unrolling can hurt the instruction cache and confuse the optimizer. Its legitimate niche today is copying to/from a single fixed hardware register/FIFO (where you must not advance the destination pointer, as above) on constrained targets. It's mostly of historical and educational value as a demonstration of C's fall-through and statement-interleaving semantics.