Search topics...

Explain fixed-point math. How do you convert a number into a fixed-point, and back again? Have you ever written any C functions or algorithms that used fixed-point math? Why did you?

0 upvotes
Practice with AISoon

Concept. Fixed-point represents fractional numbers using ordinary integers by mentally placing a binary "point" at a fixed bit position. A value is stored as v = round(x * 2^F), where F is the number of fractional bits. The real number it represents is x = v / 2^F. Unlike floating-point, the position of the radix point never moves, so the range and resolution are fixed and known at compile time.

Q-format notation. Qm.f means m integer bits and f fractional bits. For example Q16.16 uses a signed 32-bit integer with 16 fractional bits: resolution is 2^-16 ≈ 0.0000153, and range is roughly -32768 .. +32767.99998.

Conversion to and from fixed-point:

c
#include <stdint.h>
#include <math.h>
#define FRAC_BITS 16
#define FIXED_ONE (1 << FRAC_BITS) /* 1.0 in Q16.16 */
typedef int32_t fixed_t;
/* float -> fixed: multiply by 2^F and round to nearest. */
static inline fixed_t to_fixed(double x) {
return (fixed_t)lround(x * (double)FIXED_ONE);
}
/* fixed -> float: divide by 2^F. */
static inline double from_fixed(fixed_t v) {
return (double)v / (double)FIXED_ONE;
}

Rounding (lround / adding 0.5) rather than truncation halves the worst-case conversion error.

Arithmetic rules:

  • Add / subtract: operands with the same Q-format add directly; the format is preserved.
    c
    fixed_t fixed_add(fixed_t a, fixed_t b) { return a + b; }
  • Multiply: multiplying two QF values yields a Q(2F) value, so you must shift right by F to restore the format. Use a wider intermediate to avoid overflow (Q16.16 × Q16.16 needs 64 bits):
    c
    fixed_t fixed_mul(fixed_t a, fixed_t b) {
    int64_t tmp = (int64_t)a * (int64_t)b; /* result is Q32.32 */
    return (fixed_t)(tmp >> FRAC_BITS); /* back to Q16.16 */
    }
    For round-to-nearest on the multiply, add half an LSB before shifting:
    c
    fixed_t fixed_mul_round(fixed_t a, fixed_t b) {
    int64_t tmp = (int64_t)a * (int64_t)b;
    tmp += (int64_t)1 << (FRAC_BITS - 1);
    return (fixed_t)(tmp >> FRAC_BITS);
    }
  • Divide: pre-shift the numerator left by F (in a wider type) before dividing so the quotient lands back in QF:
    c
    fixed_t fixed_div(fixed_t a, fixed_t b) {
    int64_t tmp = ((int64_t)a << FRAC_BITS);
    return (fixed_t)(tmp / b);
    }

Why use fixed-point:

  • No FPU: many small MCUs (Cortex-M0/M0+, older 8/16-bit parts) have no hardware floating-point; software float is large and slow, whereas fixed-point uses native integer ALU ops.
  • Speed: integer add/multiply/shift are single-cycle or near it; this matters in DSP inner loops (filters, PID controllers, audio, sensor fusion).
  • Determinism: integer results are bit-exact and reproducible across builds and devices, with no rounding-mode surprises — important for control loops, lockstep systems, and golden-output tests.
  • Code/data size: avoids pulling in the soft-float library.

Typical applications: digital IIR/FIR filters, PID control loops, audio sample scaling/mixing, sensor calibration and unit conversion, and rendering math on graphics-limited MCUs.