Templates & Constexpr

Quick Cap

C++ templates and constexpr bring zero-cost abstractions to embedded firmware -- the compiler resolves types and computes values at build time, producing code as tight as hand-written C but with dramatically stronger type safety. These features let you build register maps that catch address errors at compile time, physical-unit types that prevent mixing volts with milliamps, and lookup tables that live in flash without any runtime initialization.

Interviewers test whether you understand the compile-time vs runtime boundary, can spot template bloat on flash-constrained MCUs, and know when CRTP is preferable to virtual dispatch.

Key Facts:

Function templates replace type-unsafe C macros like MAX(a,b) with type-checked, single-evaluation alternatives -- the compiler generates code only for types actually used.
constexpr forces evaluation at compile time: CRC tables, sine lookup tables, and pin configurations are computed by the compiler and placed directly in flash as constants.
CRTP (Curiously Recurring Template Pattern) provides static polymorphism -- interface enforcement and code reuse without a vtable pointer or virtual dispatch overhead.
Template bloat is the main risk: CircularBuffer<uint8_t, 64> and CircularBuffer<uint16_t, 64> generate two complete copies of every method in flash.
if constexpr (C++17) enables compile-time branching -- dead branches are discarded entirely, replacing #ifdef platform switches with type-safe alternatives.
Template specialization lets you provide optimized implementations for specific types or hardware variants while keeping a generic default.

Deep Dive

At a Glance

Concept	Detail
Function template	`template<typename T> T clamp(T val, T lo, T hi)` -- one definition, works for any type
Class template	`CircularBuffer<T, N>` -- type and size are compile-time parameters
`constexpr` function	Evaluated at compile time when inputs are constant; at runtime otherwise
`constexpr` variable	Must be initialized at compile time -- replaces `#define` constants with type safety
CRTP	`class Derived : public Base<Derived>` -- static polymorphism, no vtable
Full specialization	`template<> class Driver<STM32F4> { ... }` -- hardware-specific implementation
`if constexpr`	Compile-time branch -- discarded path generates zero code

Function Templates vs Macros

Function templates solve every problem that function-like C macros have -- double evaluation, no type checking, no debugger support -- while producing identical machine code.

Aspect	C macro `#define MAX(a,b)`	C++ template `max(a,b)`
Type safety	None -- text substitution	Full -- compiler checks both arguments match
Side effects	`MAX(++x, y)` evaluates `++x` twice	Arguments evaluated exactly once
Debugging	Cannot set breakpoint inside macro	Full step-through in debugger
Error messages	Refers to expanded text, not macro definition	Points to template definition with type context
Code generation	Always expanded inline	Compiler decides: inline small functions, call large ones
Overload resolution	Not possible	Works with overloads and ADL

The key insight: templates generate code per type used, not per call site. If you only call max<int>() and max<float>(), only two instantiations exist in the binary. A macro expands at every call site regardless.

Class Templates: Type-Safe Hardware Abstractions

Class templates let you parameterize data structures and hardware abstractions by type and size, with everything resolved at compile time.

Compile-time-sized circular buffer -- the size N is a template parameter, so the array lives on the stack (no heap allocation) and the compiler can optimize modulo operations for power-of-two sizes:

cpp

template<typename T, size_t N>
class CircularBuffer {
    T buf_[N];
    size_t head_ = 0, tail_ = 0;
public:
    bool push(const T& val) {
        size_t next = (head_ + 1) % N;
        if (next == tail_) return false;  // full
        buf_[head_] = val;
        head_ = next;
        return true;
    }
    // ... pop(), empty(), full()
};

// Usage -- no heap, no runtime size parameter
CircularBuffer<uint8_t, 64>  uart_rx;   // 64-byte UART buffer
CircularBuffer<CanFrame, 16> can_queue;  // 16-frame CAN queue

Type-safe register access -- a template wrapping a memory-mapped register prevents accidentally writing to a read-only register or reading from a write-only one at compile time, something raw #define macros cannot do.

Constexpr: Compile-Time Computation

constexpr tells the compiler "this can be evaluated at compile time." When all inputs are compile-time constants, the computation happens during compilation and the result is placed directly in flash -- zero runtime cost, zero startup initialization.

Compile-time CRC table -- instead of computing 256 CRC values at startup (wasting cycles and RAM), constexpr computes them at build time:

cpp

constexpr uint32_t crc32_for_byte(uint32_t byte) {
    uint32_t crc = byte;
    for (int i = 0; i < 8; ++i)
        crc = (crc >> 1) ^ (0xEDB88320 & -(crc & 1));
    return crc;
}

constexpr auto make_crc_table() {
    std::array<uint32_t, 256> table{};
    for (uint32_t i = 0; i < 256; ++i)
        table[i] = crc32_for_byte(i);
    return table;
}

// Entire 1 KB table computed by compiler, stored in .rodata (flash)
constexpr auto crc_table = make_crc_table();

This replaces the common C pattern of either a hand-written 256-entry table (error-prone, hard to verify) or a startup-time computation loop (wastes boot time on resource-constrained MCUs). The constexpr version is provably correct -- the compiler is the test.

Pin configuration -- constexpr functions can encode GPIO pin settings into register values at compile time, catching invalid configurations as compiler errors rather than runtime faults.

💡Constexpr vs Const

const means "I promise not to modify this" -- but the value can still be computed at runtime. constexpr means "this must be evaluable at compile time when given constant inputs." For embedded, constexpr is stronger: it guarantees the value is in flash, not computed during main() startup.

If Constexpr (C++17): Compile-Time Branching

if constexpr evaluates a condition at compile time and discards the false branch entirely -- it does not generate code, does not require the discarded branch to compile for the given type, and produces zero overhead. This is a type-safe replacement for #ifdef platform switches.

cpp

template<typename Platform>
void init_uart() {
    if constexpr (std::is_same_v<Platform, STM32F4>) {
        // STM32 register writes -- only compiled for STM32F4
        USART1->BRR = compute_brr(115200);
    } else if constexpr (std::is_same_v<Platform, NRF52>) {
        // Nordic register writes -- only compiled for NRF52
        NRF_UARTE0->BAUDRATE = 0x01D7E000;
    } else {
        static_assert(always_false<Platform>, "Unsupported platform");
    }
}

Unlike #ifdef, the compiler still parses both branches for syntax errors (unless they depend on the template parameter), catching typos even in the branch you are not currently building.

CRTP: Static Polymorphism Without Vtables

The Curiously Recurring Template Pattern (CRTP) gives you interface enforcement and code reuse -- the two main benefits of virtual functions -- without a vtable pointer (4-8 bytes per object) and without virtual dispatch overhead (indirect call + pipeline stall).

The pattern: a base class template takes the derived class as its template parameter, then casts this to the derived type to call its methods.

cpp

template<typename Derived>
class SensorBase {
public:
    int read_filtered() {
        // Call derived class's read_raw() -- resolved at compile time
        int raw = static_cast<Derived*>(this)->read_raw();
        return (raw + last_) / 2;  // simple averaging filter
    }
private:
    int last_ = 0;
};

class Accelerometer : public SensorBase<Accelerometer> {
public:
    int read_raw() { return read_adc(ACCEL_CHANNEL); }
};

class Thermistor : public SensorBase<Thermistor> {
public:
    int read_raw() { return read_adc(TEMP_CHANNEL); }
};

The compiler resolves static_cast<Derived*>(this)->read_raw() at compile time -- no indirect call, no vtable lookup. The generated assembly is identical to calling read_adc() directly. If a derived class forgets to implement read_raw(), the compiler emits an error, providing the same contract enforcement as a pure virtual function.

When to use CRTP vs virtual: CRTP when you know all types at compile time and need zero overhead (ISR-called sensors, tight control loops). Virtual when you need runtime polymorphism (plugin architectures, dynamically loaded drivers).

Template Specialization for Hardware Variants

Full specialization lets you provide a completely different implementation for a specific type while keeping a generic default. Partial specialization lets you specialize on patterns (e.g., all pointer types, all arrays of size N).

This is how embedded frameworks support multiple MCU families from a single codebase: the generic template defines the interface, and specializations provide hardware-specific register access.

Specialization Type	Syntax	Use Case
Primary template	`template<typename MCU> class SpiDriver { ... }`	Generic/default implementation
Full specialization	`template<> class SpiDriver<STM32F4> { ... }`	STM32F4-specific register layout
Partial specialization	`template<typename T> class Buffer<T*> { ... }`	Specialized for all pointer types

This replaces the C approach of #ifdef STM32F4 scattered throughout the codebase. Template specialization keeps platform-specific code in one place, and adding a new MCU family means adding one specialization file -- not modifying existing code.

Code Bloat: The Flash-Size Trade-Off

Every unique template instantiation generates its own copy of the code in flash. This is the primary cost of templates on embedded targets.

CircularBuffer<uint8_t, 64>, CircularBuffer<uint16_t, 64>, and CircularBuffer<uint32_t, 64> produce three complete copies of push(), pop(), full(), and every other method. On a 64 KB flash MCU, this adds up fast.

Mitigation strategies:

Strategy	How It Works	Trade-Off
Thin template wrapper over void*	Template provides type safety; implementation uses `void*` and `sizeof(T)` internally	Slight complexity; some optimizations lost
Limit instantiation count	Use `uint32_t` everywhere instead of `uint8_t`, `uint16_t`, `uint32_t`	Wastes RAM on small types
Extern template	`extern template class Buffer<uint8_t>;` prevents implicit instantiation in each TU	Must explicitly instantiate in one `.cpp` file
Factor out non-dependent code	Move code that does not depend on `T` into a non-template base class	Requires careful design
Link-time optimization (LTO)	Compiler merges identical function bodies across TUs	Longer build times; not all toolchains support well

⚠️Template Bloat Is Real

On a 256 KB flash Cortex-M4, a team instantiated std::function (a heavily templated type) with 12 different signatures. The template code alone consumed 38 KB -- 15% of total flash. Replacing std::function with function pointers and a void* context parameter saved 35 KB. Always check your map file (arm-none-eabi-nm --size-sort) after adding template-heavy code.

Type-Safe Physical Units

Templates can encode physical units into the type system, making it impossible to accidentally add volts to milliamps or pass a duration where a frequency is expected.

The idea: wrap a numeric value in a template parameterized by the unit. Arithmetic operations between incompatible units become compile-time errors.

cpp

template<typename Unit, typename Rep = int32_t>
struct Quantity {
    Rep value;
    constexpr explicit Quantity(Rep v) : value(v) {}
};

// Define unit tags
struct MillivoltTag {};
struct MilliampTag {};
struct MillisecondTag {};

using Millivolt = Quantity<MillivoltTag>;
using Milliamp  = Quantity<MilliampTag>;
using Duration  = Quantity<MillisecondTag>;

Millivolt read_battery() { return Millivolt{3300}; }
void set_led_current(Milliamp ma);

// set_led_current(read_battery());  // COMPILE ERROR -- cannot convert Millivolt to Milliamp
set_led_current(Milliamp{20});       // OK

This costs zero bytes at runtime -- the Quantity struct is the same size as a bare int32_t, and the tag type exists only in the type system. The compiler optimizes it away completely. NASA's Mars Climate Orbiter famously crashed because of a unit mismatch (pound-seconds vs newton-seconds) that this pattern would have caught at compile time.

Templates vs C Macros: Full Comparison

Criterion	C Macros	C++ Templates
Type checking	None	Full compiler checking
Code generation	Every expansion site	Per unique instantiation
Debugging	Preprocessor output only	Full source-level debugging
Error messages	Refers to expanded text	Refers to template definition (can be verbose)
Side effects	Double evaluation	Arguments evaluated once
Constexpr computation	Not possible	Tables, CRCs, configs at compile time
Code bloat risk	Low (text substitution)	High (each instantiation = new code)
Portability	C89+ everywhere	Requires C++ compiler
Compile time	Fast (text substitution)	Slower (template instantiation + type checking)
MISRA compliance	Restricted (Rule 4.9)	Allowed with AUTOSAR C++14 guidelines

Debugging Story: The Template That Ate Flash

A team developing a wearable health monitor on a Cortex-M0+ with 128 KB flash used a template-based HAL library. Each peripheral driver was a class template parameterized by the peripheral instance: Uart<USART1>, Uart<USART2>, Spi<SPI1>, I2c<I2C1>, I2c<I2C2>. The design was clean, type-safe, and followed modern C++ best practices.

At 60% feature completion, the firmware hit the 128 KB flash limit. The linker map showed that Uart<USART1> and Uart<USART2> had nearly identical code -- the only difference was the base address constant. Two copies of init(), send(), receive(), handle_irq(), and every helper function, all duplicated.

The fix was the "thin template over common implementation" pattern: a non-template UartImpl class took the base address as a constructor parameter and contained all the logic. The Uart<Instance> template became a thin wrapper that only stored the constexpr base address and forwarded calls. Flash usage dropped by 18 KB, and the team finished the product without upgrading to a larger (more expensive) MCU.

Lesson: Templates provide excellent type safety, but on flash-constrained targets, always check the linker map after adding template-heavy code. Factor out type-independent logic into non-template base classes or void*-based implementations.

What interviewers want to hear: You can explain that templates and constexpr are zero-cost at runtime but have a compile-time and flash-size cost. You reach for constexpr to move computation from runtime to build time -- CRC tables, pin configs, lookup tables. You use CRTP when you need polymorphism in ISR or tight-loop contexts where vtable overhead matters. You know the code bloat problem, can describe mitigation strategies (thin wrappers, extern template, limiting instantiation count), and you check linker map files to verify. You understand that if constexpr replaces #ifdef with type-safe compile-time branching. You do not blindly apply templates everywhere -- you weigh type safety against flash cost for each use case.

Interview Focus

Classic Interview Questions

Q1: "What is the difference between constexpr and const in C++?"

Model Answer Starter: "const means the variable will not be modified after initialization, but its value can still be computed at runtime -- for example, const int x = read_adc(); is valid. constexpr means the value must be computable at compile time when given constant inputs. For embedded, constexpr is stronger: it guarantees the value is resolved by the compiler and placed directly in flash, not computed during startup. A constexpr function can also be called at runtime with non-constant arguments, in which case it behaves like a normal function."

Q2: "How does CRTP provide polymorphism without virtual functions?"

Model Answer Starter: "CRTP stands for Curiously Recurring Template Pattern. The base class is a template parameterized by the derived class: class Derived : public Base<Derived>. The base class can call derived-class methods by casting this to Derived* -- this cast is resolved at compile time, so there is no vtable pointer and no indirect call. The generated assembly is identical to a direct function call. I use CRTP for sensor drivers and protocol handlers called from ISRs where the virtual dispatch overhead -- an indirect call plus a potential pipeline stall -- is unacceptable."

Q3: "What is template code bloat and how do you mitigate it on embedded targets?"

Model Answer Starter: "Every unique template instantiation generates its own copy of the code in the binary. If I have Buffer<uint8_t>, Buffer<uint16_t>, and Buffer<uint32_t>, I get three copies of every method. On a 128 KB flash MCU, this adds up quickly. Mitigation strategies include: factoring type-independent logic into a non-template base class, using a thin type-safe template wrapper over a void* implementation, limiting the number of distinct instantiations, using extern template to control where instantiation happens, and enabling link-time optimization. I always check the linker map file after adding template-heavy code to catch bloat early."

Q4: "Give an example of using constexpr to move computation from runtime to compile time in firmware."

Model Answer Starter: "A classic example is a CRC-32 lookup table. In C, you either hand-write 256 entries or compute them in a startup function -- both are error-prone or waste boot time. With constexpr, I write the CRC computation as a constexpr function, call it in a constexpr array initializer, and the compiler computes all 256 entries at build time. The table goes directly into the .rodata section in flash. Zero runtime cost, zero RAM usage, and the compiler is the test -- if the logic is wrong, the build fails. I use the same technique for sine tables, baud rate divisors, and GPIO pin configuration masks."

Q5: "When would you use templates vs C-style void* generics in embedded code?"

Model Answer Starter: "Templates when type safety matters and the number of instantiations is small -- a circular buffer used with two or three types, a register accessor parameterized by peripheral instance. void* when I need one shared implementation to save flash -- a generic queue used by an RTOS scheduler, a message-passing framework with many payload types. The hybrid approach works well too: a void*-based implementation for the logic, with a thin template wrapper that provides type-safe push(const T&) and T pop() methods. This gives type safety at the API boundary without code duplication."

Trap Alerts

Don't say: "Templates have no cost" -- they have zero runtime cost but real flash cost. Each instantiation duplicates code. Interviewers expect you to acknowledge this trade-off.
Don't forget: constexpr functions can also run at runtime when called with non-constant arguments -- they are not purely compile-time constructs.
Don't ignore: The practical limit on template usage in embedded. Using std::variant, std::function, or deep template metaprogramming on a 64 KB flash MCU is usually impractical. Know where to draw the line.

Follow-up Questions

"How would you verify that a constexpr function is actually being evaluated at compile time?"
"What is the difference between static_assert and a runtime assertion, and when do you use each?"
"Can you use CRTP with multiple levels of inheritance? What are the pitfalls?"
"How does extern template help reduce compile time and binary size?"
"What C++ standard level do most embedded toolchains support today (C++14, C++17, C++20)?"

💡Interview Q&A Practice

For more C++ templates and constexpr questions in a flashcard-style format, see the C++ Concepts Interview Q&A page.

Practice

❓ What does `constexpr` guarantee that `const` does not?

❓ What is the main risk of using templates extensively on a flash-constrained MCU?

❓ How does CRTP achieve polymorphism without a vtable?

❓ What advantage does `if constexpr` (C++17) have over `#ifdef` for platform-specific code?

❓ Which mitigation strategy reduces template code bloat by keeping only one copy of the core logic?

Real-World Tie-In

Automotive Sensor Fusion -- An ADAS (Advanced Driver Assistance System) ECU used CRTP-based sensor drivers for radar, lidar, and camera interfaces. Each driver inherited from SensorBase<Derived> which provided a common filtering and timestamping pipeline. The CRTP design eliminated vtable pointers on 23 sensor objects (saving 92-184 bytes of RAM) and removed indirect call overhead from the 1 kHz fusion loop. The design passed AUTOSAR C++14 static analysis without deviations.

IoT Edge Device CRC Validation -- A LoRaWAN sensor node on a Cortex-M0+ with 64 KB flash used constexpr to generate its CRC-16 lookup table at compile time. The previous C implementation computed the table during main() startup, adding 3.2 ms to boot time and requiring 512 bytes of RAM for the table. The constexpr version moved the table to .rodata (flash), eliminated the startup delay, and freed 512 bytes of RAM for the application -- critical on a device with only 8 KB total.

Medical Device Type-Safe Units -- A patient infusion pump used template-based physical unit types (Milliliter, MicroliterPerHour, Milligram) to prevent dosage calculation errors. A code review caught a junior engineer's attempt to add a flow rate to a volume -- the compiler rejected it with a clear type error. The same class of bug had caused a recall on a competitor's product two years earlier. The template overhead was zero: the unit types compiled down to bare int32_t arithmetic.