Debugging
QCompare JTAG and SWD for debugging embedded systems. When would you choose one over the other?
JTAG (Joint Test Action Group, IEEE 1149.1) is the original debug and boundary-scan interface. It uses at minimum four signals: TCK (clock), TMS (mode select), TDI (data in), and TDO (data out), plus an optional TRST (reset). JTAG supports daisy-chaining multiple devices on a single scan chain, which is essential for debugging complex boards with multiple ICs (MCU + FPGA + DSP). It also supports boundary scan testing — the ability to toggle and read individual pins of a device without running any code — making it invaluable for board-level manufacturing test and verifying solder connections.
SWD (Serial Wire Debug) is an ARM-specific two-pin debug protocol that uses SWDIO (bidirectional data) and SWCLK (clock). It provides the same CoreSight debug functionality as JTAG on ARM Cortex-M and Cortex-A cores — breakpoints, watchpoints, register inspection, flash programming, and real-time memory access — but with only two pins instead of four or five. SWD also supports the SWO (Serial Wire Output) pin for trace output (ITM stimulus ports), which JTAG does not natively provide.
Choose SWD for the vast majority of ARM Cortex-M projects: it uses fewer pins (critical on small QFN packages where every pin is precious), achieves the same debug speeds, and the SWO trace capability is a significant bonus. Choose JTAG when you need to debug non-ARM devices (MIPS, RISC-V, FPGAs), when you need boundary scan for production testing, or when you have a multi-device scan chain. On boards with both an ARM MCU and an FPGA, a common setup is to use the JTAG scan chain for the FPGA and break out a separate SWD header for the ARM core.
QHow do you debug a hard fault on ARM Cortex-M? Walk through the process from the moment the fault triggers.
When a hard fault occurs on Cortex-M, the processor stacks eight registers onto the current stack (R0-R3, R12, LR, PC, xPSR) and vectors to the HardFault_Handler. The first step is to write a fault handler that captures this stacked frame. The handler must determine whether the MSP or PSP was active at the time of the fault by examining bit 2 of the EXC_RETURN value in the LR register: if bit 2 is 0, the main stack (MSP) was used; if 1, the process stack (PSP) was used. This distinction matters in RTOS environments where tasks use PSP.
Once you have the stacked frame, the stacked PC is the most critical value — it points to the instruction that was executing (or about to execute) when the fault occurred. Look up this address in your map file or disassembly to identify the exact function and line of code. Next, read the fault status registers: CFSR (Configurable Fault Status Register, at 0xE000ED28) combines three sub-registers — MemManage, BusFault, and UsageFault status bits. HFSR (HardFault Status Register, at 0xE000ED2C) tells you whether the hard fault was caused by a failed escalation from a lower-priority fault (FORCED bit) or a vector table read error. If the CFSR shows a MemManage or BusFault with the MMAR/BFAR valid bit set, read the corresponding address register to find the exact address that caused the fault.
Common root causes and their CFSR signatures: IMPRECISERR (BusFault) means a buffered write failed — the stacked PC will not point to the offending instruction because the write buffer deferred the access. Disable write buffering temporarily (set DISDEFWRT in the Auxiliary Control Register) to make it precise and reproducible. INVSTATE (UsageFault) means the processor tried to execute an instruction with the Thumb bit cleared — usually caused by a corrupted function pointer or calling a function through a zero-initialized pointer. UNALIGNED (UsageFault) means an unaligned multi-byte access on a Cortex-M0 that does not support unaligned access, or with the UNALIGN_TRP bit enabled on M3/M4. A systematic approach is to always have a permanent fault handler in your firmware that logs the stacked PC, CFSR, and HFSR to a dedicated RAM region that survives reset, so you can diagnose field faults from crash logs.
QYour embedded application crashes intermittently — describe your systematic debugging approach.
Intermittent crashes are the hardest class of embedded bugs because they resist simple breakpoint debugging. The systematic approach has three phases: contain, reproduce, isolate.
Contain and log: First, ensure your hard fault handler and watchdog reset handler both log diagnostic data (stacked PC, fault status registers, stack pointer values, and a monotonic timestamp) to a persistent region — either a dedicated section of flash, a battery-backed SRAM, or an unused region of RAM marked no-init in the linker script. Collect several crash instances. Look for patterns: does the stacked PC always point to the same function? Does it correlate with a particular peripheral operation, interrupt, or time of day? Also check whether the stack pointer at crash time is near the end of the stack region — stack overflow is the single most common cause of intermittent hard faults in embedded systems, and it manifests differently every time because the corrupted data varies.
Reproduce: If the crash log shows a pattern, try to stress the triggering condition. If crashes correlate with heavy interrupt activity, artificially increase interrupt rates. If they correlate with memory allocation, run stress tests that fragment the heap. Use data watchpoints (available on Cortex-M3 and above via the DWT unit) to break when specific memory locations are written — this catches stack overflows (set a watchpoint at the bottom of the stack) and buffer overruns. Enable stack canaries or MPU-based stack guards to catch corruption immediately rather than after it propagates.
Isolate: If no pattern emerges from logs, begin binary elimination. Disable subsystems one at a time: turn off DMA, reduce interrupt sources, simplify the RTOS task structure. If the crash disappears when a particular interrupt is disabled, the bug is in that ISR or in a shared-data race condition between the ISR and a task. Use static analysis tools (PC-lint, Polyspace) to scan for undefined behavior, uninitialized variables, and race conditions. Check every shared variable between ISR context and thread context for proper volatile qualification and atomic access. In RTOS environments, verify that every shared resource is protected by a mutex and that no task holds two mutexes in different order (deadlock potential that can manifest as a watchdog timeout).
QPrintf debugging vs SWO/ITM trace — what are the tradeoffs?
Printf debugging routes formatted text through a UART peripheral to a serial terminal. It is universally available, requires no special hardware beyond a USB-to-serial adapter, and works on any MCU architecture. The fundamental problem is intrusion: printf calls are slow (formatting a single integer takes thousands of CPU cycles), they block on UART transmission unless you implement DMA-backed buffering, and they change the timing of your application. A printf that takes 500 microseconds can mask a race condition or shift an ISR deadline, making the bug disappear when instrumented and reappear when the printf is removed — the classic Heisenbug.
SWO (Serial Wire Output) with ITM (Instrumentation Trace Macrocell) is an ARM CoreSight feature that provides a dedicated trace output channel on the SWO pin (shared with the JTAG TDO pin). ITM has 32 stimulus ports, each of which can emit 1-4 bytes with a single 32-bit write to a memory-mapped register. The write takes only a few cycles (the ITM has a small FIFO), and the data is clocked out asynchronously on the SWO pin at a configurable baud rate (up to several MHz). The debug probe (J-Link, ST-Link) captures and displays the trace data on the host PC.
The tradeoffs are clear: SWO/ITM is orders of magnitude less intrusive — a 4-byte ITM write takes roughly 10 cycles versus thousands for printf. It does not require a UART peripheral, leaving all UARTs available for application use. It supports timestamped trace (the DWT can stamp each ITM packet with a cycle counter), enabling precise profiling of event timing. The downsides: SWO requires a debug probe that supports trace capture, it is ARM-specific (not available on RISC-V, MSP430, AVR, or PIC), and the SWO pin bandwidth is finite — at 2 MHz SWO clock, you get about 200 KB/s of trace data, which is insufficient if you try to dump large buffers. For production logging (field diagnostics), printf over UART is still the practical choice because you cannot ship a JTAG probe with every device. The ideal workflow is SWO/ITM during development for timing-sensitive debugging, and a lightweight ring-buffer logger for production diagnostics.
Testing
QHow do you unit test embedded C code that directly accesses hardware registers?
The fundamental challenge in unit testing embedded C is that production code reads and writes hardware registers at fixed memory addresses — registers that do not exist on the host machine where tests run. The solution is hardware abstraction and mocking: separate the logic you want to test from the hardware access, then substitute fake hardware during testing.
The most practical approach uses a Hardware Abstraction Layer (HAL) with function pointers or link-time substitution. For example, instead of writing GPIOB->ODR |= (1 << 5) directly in application code, call gpio_write(GPIO_PORT_B, 5, HIGH), where gpio_write is implemented differently for target and test builds. In the target build, gpio_write accesses the real register. In the test build, it writes to a simulated register variable that the test can inspect. Frameworks like CMock (companion to Unity) auto-generate mock functions from your HAL header files, producing fake implementations that record calls and return preset values.
Popular frameworks for embedded C unit testing include Unity (pure C, minimal footprint, assert-based), CppUTest (C/C++ compatible, built-in memory leak detection, widely used in embedded — the "Test-Driven Development for Embedded C" book by James Grenning is built around it), and Google Test (C++ only, powerful matchers and parameterized tests, heavier dependency). All three run on the host machine (x86/x64) — you compile your application logic with a host compiler (GCC, Clang), link against mock HAL implementations, and run the test binary natively. This gives you fast iteration (millisecond test runs), debugger support, and CI integration.
The key architectural decision is where to draw the HAL boundary. Too low (mocking individual register accesses) makes tests brittle and tightly coupled to the hardware. Too high (mocking entire subsystems) leaves too much untested logic. The sweet spot is a thin HAL that abstracts peripheral operations: adc_read_channel(ch), timer_set_period_us(us), spi_transfer(tx, rx, len). This level is stable across MCU families, testable, and maps naturally to mock functions.
QWhat is Hardware-in-the-Loop (HIL) testing and when do you need it?
Hardware-in-the-Loop (HIL) testing places the real embedded device (the Device Under Test, or DUT) in a controlled test environment where its physical I/O — GPIO pins, ADC inputs, PWM outputs, communication buses — are connected to a test harness that simulates the real-world environment. The test harness generates stimuli (voltage levels, CAN messages, sensor signals) and monitors the DUT's responses, verifying correct behavior automatically. The "loop" is the closed feedback path: the DUT acts on simulated inputs, the harness observes the DUT's outputs, and the harness adjusts stimuli accordingly — just as the real environment would.
HIL testing fills the gap between host-based unit tests (which test logic but not real hardware interaction) and manual bench testing (which tests real hardware but is slow, unreproducible, and does not scale). You need HIL testing when: (1) safety certification requires it — standards like ISO 26262 (automotive) and DO-178C (avionics) mandate testing on the target hardware with documented coverage; (2) the real environment is expensive, dangerous, or slow to reproduce — you cannot crash a car to test the airbag controller, and you cannot wait for a rare sensor failure to test your fault-handling logic; (3) regression testing must be automated — manual testing does not scale when you have hundreds of test cases across multiple firmware versions.
A typical HIL setup for an automotive ECU includes: a real-time simulation computer (dSPACE, NI PXI, or a custom FPGA-based system) running a plant model (engine dynamics, vehicle kinematics), analog I/O boards to generate and measure sensor signals, CAN/LIN/Ethernet interfaces to communicate with the DUT on the vehicle buses, and a test automation framework (Python scripts or MATLAB/Simulink Test) that orchestrates test scenarios and evaluates pass/fail criteria. For simpler embedded products, a "poor man's HIL" uses an additional MCU or Raspberry Pi as the test harness, with GPIO and bus connections to the DUT — this is surprisingly effective for products like motor controllers, IoT sensors, and industrial PLCs.
QWhat code coverage metrics exist and which is required for safety certification? Explain the differences.
Code coverage measures how much of your source code is exercised by your test suite. The three primary metrics, in increasing order of rigor, are:
Statement coverage (C0) measures whether each executable statement (line of code) has been executed at least once. It is the weakest metric — 100% statement coverage can miss entire branches. For example, in if (a && b) { action(); }, executing the function once with both a and b true achieves 100% statement coverage but never tests the case where the condition is false. Statement coverage answers "was this line reached?" but not "were all paths through this line tested?"
Branch coverage (C1, also called decision coverage) measures whether each branch of every decision point (if/else, switch, loop entry/exit) has been taken at least once. This is significantly stronger — it requires testing both the true and false outcomes of every conditional. In the example above, branch coverage requires at least two test cases: one where (a && b) is true and one where it is false. Branch coverage is the minimum required by IEC 61508 SIL 2, ISO 26262 ASIL B, and DO-178C Level C.
MC/DC (Modified Condition/Decision Coverage) is the most rigorous practical metric. It requires that every individual condition within a compound decision independently affects the outcome. For if (a && b), MC/DC requires test cases demonstrating that: (1) changing a alone changes the decision outcome (while b is true), and (2) changing b alone changes the decision outcome (while a is true). This typically requires N+1 test cases for a decision with N conditions. MC/DC is mandated by DO-178C Level A (flight-critical avionics software) and ISO 26262 ASIL D (highest automotive safety level). It catches masking bugs where one condition hides the effect of another — a real concern in complex boolean expressions controlling safety interlocks.
Tools like gcov/lcov (GCC-based, free), BullseyeCoverage, and VectorCAST measure these metrics. For safety-certified projects, the coverage tool itself must often be qualified, meaning you must demonstrate that the tool accurately reports coverage — an additional engineering and documentation effort.
Power Profiling
QHow do you measure the power consumption of an embedded device?
Power consumption measurement in embedded systems requires capturing current draw that varies over many orders of magnitude — from hundreds of milliamps during active radio transmission to single-digit microamps in deep sleep — often with transitions happening in microseconds. The measurement method must match the dynamic range and bandwidth of the current profile.
Shunt resistor method: Insert a low-value resistor (typically 1 to 100 ohms, depending on the current range) in series with the power supply line and measure the voltage drop across it. Current = V_shunt / R_shunt. Use an oscilloscope for time-domain visualization of current spikes, or a high-resolution multimeter for average current. The tradeoff is the resistor value: too high drops excessive voltage (affecting the DUT's operation), too low produces a signal buried in noise. A 10-ohm shunt works well for microamp-range sleep currents but drops 1V at 100 mA — unacceptable for a 3.3V system. Many engineers use switchable shunt resistors (high value for sleep, low value for active) or a logarithmic current amplifier.
Dedicated power analyzers: Tools like the Nordic Power Profiler Kit II (PPK2), Qoitech Otii Arc, and Joulescope combine a programmable power supply with a high-dynamic-range current measurement front end. They handle the shunt-switching internally, providing seamless measurement from nanoamps to hundreds of milliamps with microsecond time resolution. They also supply power to the DUT, eliminating the need for a separate supply. These tools typically include software that correlates current waveforms with GPIO trigger signals from the DUT firmware, enabling you to map current consumption to specific code sections (radio TX, sensor read, CPU active, sleep).
Software-based estimation: Some MCUs (notably Nordic nRF series and STM32) provide online power calculators that estimate consumption based on configured peripherals, clock speeds, and duty cycles. These are useful for initial budget estimation but are no substitute for actual measurement — peripheral leakage, external component quiescent currents, and PCB layout parasitics are not captured in simulation.
QYour device's battery life is 10x worse than calculated — how do you debug it?
A 10x discrepancy between calculated and actual battery life almost always means the device is not reaching its lowest power state, or something is drawing current that your power budget did not account for. The debugging approach is methodical current measurement combined with elimination of potential leakage sources.
Step 1 — Measure the actual sleep current. Connect a power analyzer (PPK2, Joulescope, or shunt resistor with oscilloscope) and put the device into its deepest sleep mode. Compare the measured sleep current against the MCU datasheet's specification for that mode. If the MCU datasheet says 2 microamps in Stop mode but you measure 500 microamps, the MCU itself is not fully asleep. Common causes: a peripheral clock was left enabled (a single enabled UART peripheral can draw 100+ microamps), the debug probe is still connected (JTAG/SWD keeps the debug domain powered, adding 500 microamps or more on many STM32 parts), the PLL or HSE oscillator was not shut down, or the voltage regulator is in main mode instead of low-power mode.
Step 2 — Check GPIOs. Floating (unconnected) GPIO pins are the most common hidden current drain. If an input pin is floating near the threshold voltage, the input buffer's CMOS transistors are both partially on, creating a shoot-through current of 10-100 microamps per pin. Multiply by 20 floating pins and you have milliamps. The fix: configure all unused GPIOs as analog inputs (disables the digital input buffer) or drive them to a known level. Also check GPIOs connected to external circuits — a pin configured as push-pull output driving into a pull-up or pull-down resistor creates a DC current path.
Step 3 — Isolate external components. Measure current with external peripherals removed one at a time: sensors, voltage regulators, LEDs, pull-up resistor networks, level shifters. Many sensors have quiescent currents of 1-10 microamps that are not negligible in a microamp-level power budget. LDO regulators have quiescent currents ranging from 1 microamp (modern ultra-low-power parts) to 5 milliamps (older general-purpose parts) — this alone can explain a 10x discrepancy. Also check whether the sensor's I2C or SPI bus pull-ups are drawing current when the bus is idle — a 4.7K pull-up to 3.3V on an I2C line that is low during sleep draws 700 microamps.
Step 4 — Verify the duty cycle. If sleep current checks out but battery life is still poor, the device is spending more time awake than intended. Use a GPIO toggle at sleep entry/exit and measure the duty cycle on an oscilloscope. Common causes: a periodic interrupt waking the device more frequently than designed (a misconfigured RTC timer period), a sensor interrupt that fires repeatedly (a noisy accelerometer triggering motion interrupts), or a communication stack that retries failed transmissions with aggressive backoff.
QWhat power profiling tools do you use and how do they compare?
The three most common power profiling tools in embedded development each target different measurement needs and budgets:
Nordic Power Profiler Kit II (PPK2) is the best value option at around $100. It measures current from 200 nA to 1 A with a dynamic range that covers both deep sleep and active radio transmission. It operates in two modes: source mode (PPK2 supplies power to the DUT at a configurable voltage from 0.8V to 5.0V) and ampere meter mode (inline measurement with an external supply). The companion nRF Connect Power Profiler software provides real-time current visualization, average/peak statistics, and supports GPIO trigger lines to correlate current draw with firmware events. The main limitation is its sampling rate — approximately 100 kHz — which may miss very fast transients (sub-10 microsecond current spikes during flash writes or radio calibration).
Joulescope (JS110/JS220) is a professional-grade power analyzer priced around $500-1000. It offers a wider dynamic range (nanoamps to amps), higher bandwidth, and more precise measurements than the PPK2. The JS220 supports continuous streaming at 2 Msps (megasamples per second) with 18-bit resolution, capturing even the fastest current transients. Its software supports multi-channel correlation, energy accumulation, and scripting for automated test campaigns. Joulescope is the tool of choice when you need to accurately characterize energy-per-operation metrics (energy per radio TX packet, energy per sensor read) for precise battery life calculations. The downside is cost and the fact that it is measurement-only — it does not supply power, so you need a separate low-noise bench supply.
Shunt resistor with oscilloscope is the zero-cost baseline approach that every embedded engineer should know. A 10-ohm resistor in the power line converts current to voltage, which an oscilloscope displays as a time-domain waveform. This approach has the highest bandwidth (limited only by your oscilloscope), costs nothing beyond equipment most labs already have, and provides immediate visual feedback. The drawbacks are poor dynamic range (a single shunt value cannot simultaneously resolve 1 microamp sleep current and 100 mA active current without either saturating the ADC or losing the small signal in noise) and the voltage drop across the shunt perturbing the DUT's supply. For quick checks and debugging, it remains indispensable — especially when you need to correlate current waveforms with SPI traffic or GPIO events captured on other oscilloscope channels simultaneously.