Explain fixed-point math. How do you convert a number into a fixed-point, and back again? Have you ever written any C functions or algorithms that used fixed-point math? Why did you?
Concept. Fixed-point represents fractional numbers using ordinary integers by mentally placing a binary "point" at a fixed bit position. A value is stored as v = round(x * 2^F), where F is the number of fractional bits. The real number it represents is x = v / 2^F. Unlike floating-point, the position of the radix point never moves, so the range and resolution are fixed and known at compile time.
Q-format notation. Qm.f means m integer bits and f fractional bits. For example Q16.16 uses a signed 32-bit integer with 16 fractional bits: resolution is 2^-16 ≈ 0.0000153, and range is roughly -32768 .. +32767.99998.
Conversion to and from fixed-point:
#include <stdint.h>#include <math.h>#define FRAC_BITS 16#define FIXED_ONE (1 << FRAC_BITS) /* 1.0 in Q16.16 */typedef int32_t fixed_t;/* float -> fixed: multiply by 2^F and round to nearest. */static inline fixed_t to_fixed(double x) {return (fixed_t)lround(x * (double)FIXED_ONE);}/* fixed -> float: divide by 2^F. */static inline double from_fixed(fixed_t v) {return (double)v / (double)FIXED_ONE;}
Rounding (lround / adding 0.5) rather than truncation halves the worst-case conversion error.
Arithmetic rules:
- Add / subtract: operands with the same Q-format add directly; the format is preserved.
cfixed_t fixed_add(fixed_t a, fixed_t b) { return a + b; }
- Multiply: multiplying two QF values yields a Q(2F) value, so you must shift right by
Fto restore the format. Use a wider intermediate to avoid overflow (Q16.16 × Q16.16 needs 64 bits):For round-to-nearest on the multiply, add half an LSB before shifting:cfixed_t fixed_mul(fixed_t a, fixed_t b) {int64_t tmp = (int64_t)a * (int64_t)b; /* result is Q32.32 */return (fixed_t)(tmp >> FRAC_BITS); /* back to Q16.16 */}cfixed_t fixed_mul_round(fixed_t a, fixed_t b) {int64_t tmp = (int64_t)a * (int64_t)b;tmp += (int64_t)1 << (FRAC_BITS - 1);return (fixed_t)(tmp >> FRAC_BITS);} - Divide: pre-shift the numerator left by
F(in a wider type) before dividing so the quotient lands back in QF:cfixed_t fixed_div(fixed_t a, fixed_t b) {int64_t tmp = ((int64_t)a << FRAC_BITS);return (fixed_t)(tmp / b);}
Why use fixed-point:
- No FPU: many small MCUs (Cortex-M0/M0+, older 8/16-bit parts) have no hardware floating-point; software float is large and slow, whereas fixed-point uses native integer ALU ops.
- Speed: integer add/multiply/shift are single-cycle or near it; this matters in DSP inner loops (filters, PID controllers, audio, sensor fusion).
- Determinism: integer results are bit-exact and reproducible across builds and devices, with no rounding-mode surprises — important for control loops, lockstep systems, and golden-output tests.
- Code/data size: avoids pulling in the soft-float library.
Typical applications: digital IIR/FIR filters, PID control loops, audio sample scaling/mixing, sensor calibration and unit conversion, and rendering math on graphics-limited MCUs.
