Quick Cap
C++ templates and constexpr bring zero-cost abstractions to embedded firmware -- the compiler resolves types and computes values at build time, producing code as tight as hand-written C but with dramatically stronger type safety. These features let you build register maps that catch address errors at compile time, physical-unit types that prevent mixing volts with milliamps, and lookup tables that live in flash without any runtime initialization.
Interviewers test whether you understand the compile-time vs runtime boundary, can spot template bloat on flash-constrained MCUs, and know when CRTP is preferable to virtual dispatch.
Key Facts:
- Function templates replace type-unsafe C macros like
MAX(a,b)with type-checked, single-evaluation alternatives -- the compiler generates code only for types actually used. constexprforces evaluation at compile time: CRC tables, sine lookup tables, and pin configurations are computed by the compiler and placed directly in flash as constants.- CRTP (Curiously Recurring Template Pattern) provides static polymorphism -- interface enforcement and code reuse without a vtable pointer or virtual dispatch overhead.
- Template bloat is the main risk:
CircularBuffer<uint8_t, 64>andCircularBuffer<uint16_t, 64>generate two complete copies of every method in flash. if constexpr(C++17) enables compile-time branching -- dead branches are discarded entirely, replacing#ifdefplatform switches with type-safe alternatives.- Template specialization lets you provide optimized implementations for specific types or hardware variants while keeping a generic default.
Deep Dive
At a Glance
| Concept | Detail |
|---|---|
| Function template | template<typename T> T clamp(T val, T lo, T hi) -- one definition, works for any type |
| Class template | CircularBuffer<T, N> -- type and size are compile-time parameters |
constexpr function | Evaluated at compile time when inputs are constant; at runtime otherwise |
constexpr variable | Must be initialized at compile time -- replaces #define constants with type safety |
| CRTP | class Derived : public Base<Derived> -- static polymorphism, no vtable |
| Full specialization | template<> class Driver<STM32F4> { ... } -- hardware-specific implementation |
if constexpr | Compile-time branch -- discarded path generates zero code |
Function Templates vs Macros
Function templates solve every problem that function-like C macros have -- double evaluation, no type checking, no debugger support -- while producing identical machine code.
| Aspect | C macro #define MAX(a,b) | C++ template max(a,b) |
|---|---|---|
| Type safety | None -- text substitution | Full -- compiler checks both arguments match |
| Side effects | MAX(++x, y) evaluates ++x twice | Arguments evaluated exactly once |
| Debugging | Cannot set breakpoint inside macro | Full step-through in debugger |
| Error messages | Refers to expanded text, not macro definition | Points to template definition with type context |
| Code generation | Always expanded inline | Compiler decides: inline small functions, call large ones |
| Overload resolution | Not possible | Works with overloads and ADL |
The key insight: templates generate code per type used, not per call site. If you only call max<int>() and max<float>(), only two instantiations exist in the binary. A macro expands at every call site regardless.
Class Templates: Type-Safe Hardware Abstractions
Class templates let you parameterize data structures and hardware abstractions by type and size, with everything resolved at compile time.
Compile-time-sized circular buffer -- the size N is a template parameter, so the array lives on the stack (no heap allocation) and the compiler can optimize modulo operations for power-of-two sizes:
template<typename T, size_t N>class CircularBuffer {T buf_[N];size_t head_ = 0, tail_ = 0;public:bool push(const T& val) {size_t next = (head_ + 1) % N;if (next == tail_) return false; // fullbuf_[head_] = val;head_ = next;return true;}// ... pop(), empty(), full()};// Usage -- no heap, no runtime size parameterCircularBuffer<uint8_t, 64> uart_rx; // 64-byte UART bufferCircularBuffer<CanFrame, 16> can_queue; // 16-frame CAN queue
Type-safe register access -- a template wrapping a memory-mapped register prevents accidentally writing to a read-only register or reading from a write-only one at compile time, something raw #define macros cannot do.
Constexpr: Compile-Time Computation
constexpr tells the compiler "this can be evaluated at compile time." When all inputs are compile-time constants, the computation happens during compilation and the result is placed directly in flash -- zero runtime cost, zero startup initialization.
Compile-time CRC table -- instead of computing 256 CRC values at startup (wasting cycles and RAM), constexpr computes them at build time:
constexpr uint32_t crc32_for_byte(uint32_t byte) {uint32_t crc = byte;for (int i = 0; i < 8; ++i)crc = (crc >> 1) ^ (0xEDB88320 & -(crc & 1));return crc;}constexpr auto make_crc_table() {std::array<uint32_t, 256> table{};for (uint32_t i = 0; i < 256; ++i)table[i] = crc32_for_byte(i);return table;}// Entire 1 KB table computed by compiler, stored in .rodata (flash)constexpr auto crc_table = make_crc_table();
This replaces the common C pattern of either a hand-written 256-entry table (error-prone, hard to verify) or a startup-time computation loop (wastes boot time on resource-constrained MCUs). The constexpr version is provably correct -- the compiler is the test.
Pin configuration -- constexpr functions can encode GPIO pin settings into register values at compile time, catching invalid configurations as compiler errors rather than runtime faults.
const means "I promise not to modify this" -- but the value can still be computed at runtime. constexpr means "this must be evaluable at compile time when given constant inputs." For embedded, constexpr is stronger: it guarantees the value is in flash, not computed during main() startup.
If Constexpr (C++17): Compile-Time Branching
if constexpr evaluates a condition at compile time and discards the false branch entirely -- it does not generate code, does not require the discarded branch to compile for the given type, and produces zero overhead. This is a type-safe replacement for #ifdef platform switches.
template<typename Platform>void init_uart() {if constexpr (std::is_same_v<Platform, STM32F4>) {// STM32 register writes -- only compiled for STM32F4USART1->BRR = compute_brr(115200);} else if constexpr (std::is_same_v<Platform, NRF52>) {// Nordic register writes -- only compiled for NRF52NRF_UARTE0->BAUDRATE = 0x01D7E000;} else {static_assert(always_false<Platform>, "Unsupported platform");}}
Unlike #ifdef, the compiler still parses both branches for syntax errors (unless they depend on the template parameter), catching typos even in the branch you are not currently building.
CRTP: Static Polymorphism Without Vtables
The Curiously Recurring Template Pattern (CRTP) gives you interface enforcement and code reuse -- the two main benefits of virtual functions -- without a vtable pointer (4-8 bytes per object) and without virtual dispatch overhead (indirect call + pipeline stall).
The pattern: a base class template takes the derived class as its template parameter, then casts this to the derived type to call its methods.
template<typename Derived>class SensorBase {public:int read_filtered() {// Call derived class's read_raw() -- resolved at compile timeint raw = static_cast<Derived*>(this)->read_raw();return (raw + last_) / 2; // simple averaging filter}private:int last_ = 0;};class Accelerometer : public SensorBase<Accelerometer> {public:int read_raw() { return read_adc(ACCEL_CHANNEL); }};class Thermistor : public SensorBase<Thermistor> {public:int read_raw() { return read_adc(TEMP_CHANNEL); }};
The compiler resolves static_cast<Derived*>(this)->read_raw() at compile time -- no indirect call, no vtable lookup. The generated assembly is identical to calling read_adc() directly. If a derived class forgets to implement read_raw(), the compiler emits an error, providing the same contract enforcement as a pure virtual function.
When to use CRTP vs virtual: CRTP when you know all types at compile time and need zero overhead (ISR-called sensors, tight control loops). Virtual when you need runtime polymorphism (plugin architectures, dynamically loaded drivers).
Template Specialization for Hardware Variants
Full specialization lets you provide a completely different implementation for a specific type while keeping a generic default. Partial specialization lets you specialize on patterns (e.g., all pointer types, all arrays of size N).
This is how embedded frameworks support multiple MCU families from a single codebase: the generic template defines the interface, and specializations provide hardware-specific register access.
| Specialization Type | Syntax | Use Case |
|---|---|---|
| Primary template | template<typename MCU> class SpiDriver { ... } | Generic/default implementation |
| Full specialization | template<> class SpiDriver<STM32F4> { ... } | STM32F4-specific register layout |
| Partial specialization | template<typename T> class Buffer<T*> { ... } | Specialized for all pointer types |
This replaces the C approach of #ifdef STM32F4 scattered throughout the codebase. Template specialization keeps platform-specific code in one place, and adding a new MCU family means adding one specialization file -- not modifying existing code.
Code Bloat: The Flash-Size Trade-Off
Every unique template instantiation generates its own copy of the code in flash. This is the primary cost of templates on embedded targets.
CircularBuffer<uint8_t, 64>, CircularBuffer<uint16_t, 64>, and CircularBuffer<uint32_t, 64> produce three complete copies of push(), pop(), full(), and every other method. On a 64 KB flash MCU, this adds up fast.
Mitigation strategies:
| Strategy | How It Works | Trade-Off |
|---|---|---|
| Thin template wrapper over void* | Template provides type safety; implementation uses void* and sizeof(T) internally | Slight complexity; some optimizations lost |
| Limit instantiation count | Use uint32_t everywhere instead of uint8_t, uint16_t, uint32_t | Wastes RAM on small types |
| Extern template | extern template class Buffer<uint8_t>; prevents implicit instantiation in each TU | Must explicitly instantiate in one .cpp file |
| Factor out non-dependent code | Move code that does not depend on T into a non-template base class | Requires careful design |
| Link-time optimization (LTO) | Compiler merges identical function bodies across TUs | Longer build times; not all toolchains support well |
On a 256 KB flash Cortex-M4, a team instantiated std::function (a heavily templated type) with 12 different signatures. The template code alone consumed 38 KB -- 15% of total flash. Replacing std::function with function pointers and a void* context parameter saved 35 KB. Always check your map file (arm-none-eabi-nm --size-sort) after adding template-heavy code.
Type-Safe Physical Units
Templates can encode physical units into the type system, making it impossible to accidentally add volts to milliamps or pass a duration where a frequency is expected.
The idea: wrap a numeric value in a template parameterized by the unit. Arithmetic operations between incompatible units become compile-time errors.
template<typename Unit, typename Rep = int32_t>struct Quantity {Rep value;constexpr explicit Quantity(Rep v) : value(v) {}};// Define unit tagsstruct MillivoltTag {};struct MilliampTag {};struct MillisecondTag {};using Millivolt = Quantity<MillivoltTag>;using Milliamp = Quantity<MilliampTag>;using Duration = Quantity<MillisecondTag>;Millivolt read_battery() { return Millivolt{3300}; }void set_led_current(Milliamp ma);// set_led_current(read_battery()); // COMPILE ERROR -- cannot convert Millivolt to Milliampset_led_current(Milliamp{20}); // OK
This costs zero bytes at runtime -- the Quantity struct is the same size as a bare int32_t, and the tag type exists only in the type system. The compiler optimizes it away completely. NASA's Mars Climate Orbiter famously crashed because of a unit mismatch (pound-seconds vs newton-seconds) that this pattern would have caught at compile time.
Templates vs C Macros: Full Comparison
| Criterion | C Macros | C++ Templates |
|---|---|---|
| Type checking | None | Full compiler checking |
| Code generation | Every expansion site | Per unique instantiation |
| Debugging | Preprocessor output only | Full source-level debugging |
| Error messages | Refers to expanded text | Refers to template definition (can be verbose) |
| Side effects | Double evaluation | Arguments evaluated once |
| Constexpr computation | Not possible | Tables, CRCs, configs at compile time |
| Code bloat risk | Low (text substitution) | High (each instantiation = new code) |
| Portability | C89+ everywhere | Requires C++ compiler |
| Compile time | Fast (text substitution) | Slower (template instantiation + type checking) |
| MISRA compliance | Restricted (Rule 4.9) | Allowed with AUTOSAR C++14 guidelines |
Debugging Story: The Template That Ate Flash
A team developing a wearable health monitor on a Cortex-M0+ with 128 KB flash used a template-based HAL library. Each peripheral driver was a class template parameterized by the peripheral instance: Uart<USART1>, Uart<USART2>, Spi<SPI1>, I2c<I2C1>, I2c<I2C2>. The design was clean, type-safe, and followed modern C++ best practices.
At 60% feature completion, the firmware hit the 128 KB flash limit. The linker map showed that Uart<USART1> and Uart<USART2> had nearly identical code -- the only difference was the base address constant. Two copies of init(), send(), receive(), handle_irq(), and every helper function, all duplicated.
The fix was the "thin template over common implementation" pattern: a non-template UartImpl class took the base address as a constructor parameter and contained all the logic. The Uart<Instance> template became a thin wrapper that only stored the constexpr base address and forwarded calls. Flash usage dropped by 18 KB, and the team finished the product without upgrading to a larger (more expensive) MCU.
Lesson: Templates provide excellent type safety, but on flash-constrained targets, always check the linker map after adding template-heavy code. Factor out type-independent logic into non-template base classes or void*-based implementations.
What interviewers want to hear: You can explain that templates and constexpr are zero-cost at runtime but have a compile-time and flash-size cost. You reach for constexpr to move computation from runtime to build time -- CRC tables, pin configs, lookup tables. You use CRTP when you need polymorphism in ISR or tight-loop contexts where vtable overhead matters. You know the code bloat problem, can describe mitigation strategies (thin wrappers, extern template, limiting instantiation count), and you check linker map files to verify. You understand that if constexpr replaces #ifdef with type-safe compile-time branching. You do not blindly apply templates everywhere -- you weigh type safety against flash cost for each use case.
Interview Focus
Classic Interview Questions
Q1: "What is the difference between constexpr and const in C++?"
Model Answer Starter: "const means the variable will not be modified after initialization, but its value can still be computed at runtime -- for example, const int x = read_adc(); is valid. constexpr means the value must be computable at compile time when given constant inputs. For embedded, constexpr is stronger: it guarantees the value is resolved by the compiler and placed directly in flash, not computed during startup. A constexpr function can also be called at runtime with non-constant arguments, in which case it behaves like a normal function."
Q2: "How does CRTP provide polymorphism without virtual functions?"
Model Answer Starter: "CRTP stands for Curiously Recurring Template Pattern. The base class is a template parameterized by the derived class: class Derived : public Base<Derived>. The base class can call derived-class methods by casting this to Derived* -- this cast is resolved at compile time, so there is no vtable pointer and no indirect call. The generated assembly is identical to a direct function call. I use CRTP for sensor drivers and protocol handlers called from ISRs where the virtual dispatch overhead -- an indirect call plus a potential pipeline stall -- is unacceptable."
Q3: "What is template code bloat and how do you mitigate it on embedded targets?"
Model Answer Starter: "Every unique template instantiation generates its own copy of the code in the binary. If I have Buffer<uint8_t>, Buffer<uint16_t>, and Buffer<uint32_t>, I get three copies of every method. On a 128 KB flash MCU, this adds up quickly. Mitigation strategies include: factoring type-independent logic into a non-template base class, using a thin type-safe template wrapper over a void* implementation, limiting the number of distinct instantiations, using extern template to control where instantiation happens, and enabling link-time optimization. I always check the linker map file after adding template-heavy code to catch bloat early."
Q4: "Give an example of using constexpr to move computation from runtime to compile time in firmware."
Model Answer Starter: "A classic example is a CRC-32 lookup table. In C, you either hand-write 256 entries or compute them in a startup function -- both are error-prone or waste boot time. With constexpr, I write the CRC computation as a constexpr function, call it in a constexpr array initializer, and the compiler computes all 256 entries at build time. The table goes directly into the .rodata section in flash. Zero runtime cost, zero RAM usage, and the compiler is the test -- if the logic is wrong, the build fails. I use the same technique for sine tables, baud rate divisors, and GPIO pin configuration masks."
Q5: "When would you use templates vs C-style void* generics in embedded code?"
Model Answer Starter: "Templates when type safety matters and the number of instantiations is small -- a circular buffer used with two or three types, a register accessor parameterized by peripheral instance. void* when I need one shared implementation to save flash -- a generic queue used by an RTOS scheduler, a message-passing framework with many payload types. The hybrid approach works well too: a void*-based implementation for the logic, with a thin template wrapper that provides type-safe push(const T&) and T pop() methods. This gives type safety at the API boundary without code duplication."
Trap Alerts
- Don't say: "Templates have no cost" -- they have zero runtime cost but real flash cost. Each instantiation duplicates code. Interviewers expect you to acknowledge this trade-off.
- Don't forget:
constexprfunctions can also run at runtime when called with non-constant arguments -- they are not purely compile-time constructs. - Don't ignore: The practical limit on template usage in embedded. Using
std::variant,std::function, or deep template metaprogramming on a 64 KB flash MCU is usually impractical. Know where to draw the line.
Follow-up Questions
- "How would you verify that a
constexprfunction is actually being evaluated at compile time?" - "What is the difference between
static_assertand a runtime assertion, and when do you use each?" - "Can you use CRTP with multiple levels of inheritance? What are the pitfalls?"
- "How does
extern templatehelp reduce compile time and binary size?" - "What C++ standard level do most embedded toolchains support today (C++14, C++17, C++20)?"
For more C++ templates and constexpr questions in a flashcard-style format, see the C++ Concepts Interview Q&A page.
Practice
❓ What does `constexpr` guarantee that `const` does not?
❓ What is the main risk of using templates extensively on a flash-constrained MCU?
❓ How does CRTP achieve polymorphism without a vtable?
❓ What advantage does `if constexpr` (C++17) have over `#ifdef` for platform-specific code?
❓ Which mitigation strategy reduces template code bloat by keeping only one copy of the core logic?
Real-World Tie-In
Automotive Sensor Fusion -- An ADAS (Advanced Driver Assistance System) ECU used CRTP-based sensor drivers for radar, lidar, and camera interfaces. Each driver inherited from SensorBase<Derived> which provided a common filtering and timestamping pipeline. The CRTP design eliminated vtable pointers on 23 sensor objects (saving 92-184 bytes of RAM) and removed indirect call overhead from the 1 kHz fusion loop. The design passed AUTOSAR C++14 static analysis without deviations.
IoT Edge Device CRC Validation -- A LoRaWAN sensor node on a Cortex-M0+ with 64 KB flash used constexpr to generate its CRC-16 lookup table at compile time. The previous C implementation computed the table during main() startup, adding 3.2 ms to boot time and requiring 512 bytes of RAM for the table. The constexpr version moved the table to .rodata (flash), eliminated the startup delay, and freed 512 bytes of RAM for the application -- critical on a device with only 8 KB total.
Medical Device Type-Safe Units -- A patient infusion pump used template-based physical unit types (Milliliter, MicroliterPerHour, Milligram) to prevent dosage calculation errors. A code review caught a junior engineer's attempt to add a flow rate to a volume -- the compiler rejected it with a clear type error. The same class of bug had caused a recall on a competitor's product two years earlier. The template overhead was zero: the unit types compiled down to bare int32_t arithmetic.
