Search topics...

If you create a circular buffer, what size of buffer might optimized code be slightly faster to execute? why?

0 upvotes
Practice with AISoon

A power-of-two sized circular buffer (16, 32, 64, 256, 1024, …) can be slightly faster. The reason is index wrap-around.

In a circular (ring) buffer, after you increment the head/tail index you must wrap it back to the start when it reaches the end. The general way is:

c
index = (index + 1) % size; // modulo / division

The modulo operator requires a division, which is comparatively expensive — and on many embedded processors (small MCUs, DSPs) there is no hardware divide instruction at all, so % becomes a slow software routine.

If size is a power of two, the wrap reduces to a single cheap bitwise AND with size − 1:

c
index = (index + 1) & (size - 1); // mask, e.g. & 0xFF for size 256

Because size − 1 is a mask of all 1s in the low bits (e.g., 256 − 1 = 0xFF), ANDing keeps only the low bits and naturally discards the overflow, giving the same result as modulo but in a single fast instruction. This avoids division entirely, which is the speed win.

(Additional minor benefits: address calculation can use shifts, and some DSPs offer hardware circular/modulo addressing that also requires power-of-two, or at least aligned, buffer sizes. The trade-off is that power-of-two sizing may waste some memory if your natural capacity isn't a power of two.)