What flags optimize for size, and what are the trade-offs?
The primary size-optimization flag is -Os, which is roughly -O2 minus optimizations that grow code (loop unrolling, function inlining beyond a small threshold). For most embedded firmware, -Os is the default. There's also -Oz in newer GCCs for "even more aggressive size optimization at the expense of speed."
Two extremely effective additions:
-ffunction-sections -fdata-sections + -Wl,--gc-sections: this trio puts each function and each global in its own section, then lets the linker garbage-collect any section not reachable from the entry point. Saves 10-30% Flash on typical firmware with zero source changes. The trade-off is slightly larger object files and a marginally slower link.
--specs=nano.specs (newlib-nano): switches to a stripped-down libc with smaller printf, simpler malloc, etc. Typically saves 15-30 KB on hello-world-class firmware. Trade-off: float printf is off by default (re-enable with -u _printf_float if needed).
-flto (Link-Time Optimization): defers final code generation until link, when the compiler can see all translation units together. Enables cross-TU inlining and dead-code elimination. Often saves 5-15% Flash AND improves runtime performance. Trade-offs: longer link times, occasionally incompatible with hand-tuned inline assembly, and harder-to-debug stack traces because functions get inlined across files.
The general principle: optimize for size first (-Os), then add section-level GC, then evaluate LTO. Don't reach for -O3 on Flash-constrained targets — it grows code aggressively for marginal speed gains.
Source: Build Systems Q&A
