Quick Cap
Debugging embedded systems requires a blend of hardware tools (JTAG/SWD probes, oscilloscopes, logic analyzers) and software techniques (GDB remote sessions, printf/trace logging, hard fault analysis). Unlike desktop software, embedded bugs often involve hardware-software interactions, timing-dependent failures, and faults that corrupt the very stack you need to diagnose. Interviewers test whether you can systematically narrow down a problem using the right tool at each stage.
Key Facts:
- JTAG uses 4 wires (TDI, TDO, TMS, TCK); SWD uses 2 wires (SWDIO, SWCLK) and is ARM Cortex specific
- GDB remote debugging connects GDB to a target via a debug probe (J-Link, ST-Link) through a GDB server
- Hard faults on Cortex-M can be decoded by reading CFSR, HFSR, and the stacked PC from the exception frame
- SWO/ITM trace provides low-overhead runtime logging without the timing distortion of UART-based printf
- Logic analyzers capture and decode digital protocols (SPI, I2C, UART); oscilloscopes show analog signal integrity
- Systematic methodology: reproduce, isolate, hypothesize, instrument, verify, fix, regression-test
Deep Dive
At a Glance
| Tool / Technique | Best For | Key Trade-off |
|---|---|---|
| JTAG | Boundary scan, multi-core, non-ARM targets | More pins (4-5), supports daisy-chaining |
| SWD | ARM Cortex-M/R/A debugging | Only 2 pins, but ARM-only |
| GDB remote | Setting breakpoints, stepping, inspecting memory | Halts CPU -- unusable for real-time observation |
| Printf (UART) | Quick-and-dirty logging | High latency, disturbs timing, blocks on UART TX |
| SWO / ITM trace | Low-overhead timestamped logging | Needs SWO pin routed; limited bandwidth |
| Logic analyzer | Protocol decoding (I2C, SPI, UART, CAN) | Shows digital levels only, no analog detail |
| Oscilloscope | Signal integrity, rise times, voltage levels | Fewer channels, harder to decode protocols |
JTAG vs SWD
| Feature | JTAG | SWD |
|---|---|---|
| Pins required | 4 (TDI, TDO, TMS, TCK) + optional TRST | 2 (SWDIO, SWCLK) |
| Standard | IEEE 1149.1 (cross-architecture) | ARM-specific (CoreSight) |
| Daisy-chaining | Yes -- multiple devices on one chain | No |
| Boundary scan | Yes | No |
| Debug speed | Similar | Similar (slightly lower wire overhead) |
| SWO output | Not inherent (needs separate SWO pin) | SWO pin available on same connector |
| Typical use | FPGA, multi-core SoCs, production test | Cortex-M development and debugging |
SWD is the default choice for Cortex-M work because it frees up pins (only 2 vs 4-5) and still provides full debug access -- breakpoints, watchpoints, memory read/write, and flash programming. JTAG remains essential when you need boundary scan for board-level test, daisy-chain multiple devices, or target non-ARM architectures.
GDB Remote Debugging Workflow
The typical embedded GDB session uses a three-layer stack:
+----------+ TCP/IP or pipe +---------------+ SWD/JTAG +-----------+| GDB | <---------------------> | GDB Server | <----------------> | Target || (host) | (RSP protocol) | (OpenOCD / | (probe wires) | MCU || | | J-Link GDB) | | |+----------+ +---------------+ +-----------+
- The debug probe (J-Link, ST-Link, CMSIS-DAP) connects to the target via SWD or JTAG.
- A GDB server (OpenOCD, JLinkGDBServer, pyOCD) translates the GDB Remote Serial Protocol (RSP) into probe commands.
- GDB on the host connects via TCP (typically port 3333 for OpenOCD, 2331 for J-Link) and provides breakpoints, stepping, register/memory inspection, and variable watches.
Key GDB commands for embedded work: target remote :3333 to connect, monitor reset halt to reset and halt, load to flash the ELF, info registers to dump CPU state, x/16xw 0x20000000 to examine 16 words of SRAM, and bt for a backtrace.
Printf vs SWO/ITM Trace
| Aspect | Printf over UART | SWO / ITM Trace |
|---|---|---|
| Setup | Route UART TX pin, redirect _write | Route SWO pin, configure ITM stimulus ports |
| Overhead | High -- blocks on UART TX or DMA, 115200 baud is common bottleneck | Low -- hardware serializes data via SWO at up to several MHz |
| Timing distortion | Severe at high log rates; can mask or create bugs | Minimal -- timestamps come from DWT cycle counter |
| Bandwidth | ~11.5 KB/s at 115200 baud | Hundreds of KB/s via Manchester or UART SWO |
| Formatting | Full printf string formatting on target (costs flash + cycles) | Typically raw 32-bit stimulus writes; formatting done on host |
| Production use | Often left disabled; UART pin may be shared | SWO pin often unused in production; easy to enable for field debug |
When you need quick visibility during early development, UART printf is fine. Once timing matters or log volume grows, switch to ITM stimulus port writes and let the host tool (Ozone, SWO Viewer, or OpenOCD) decode and timestamp the output.
Logic Analyzer vs Oscilloscope
| Criterion | Logic Analyzer | Oscilloscope |
|---|---|---|
| Signal type | Digital (threshold-based 0/1) | Analog (continuous voltage) |
| Channel count | 8-34+ typical | 2-4 typical |
| Protocol decode | Built-in (SPI, I2C, UART, CAN) | Available on some models (MSO) |
| Best for | Bus protocol debugging, timing verification | Signal integrity (overshoot, ringing, rise time) |
| Typical tool | Saleae Logic, sigrok | Rigol DS1054Z, Keysight, Tektronix |
Use a logic analyzer first when chasing a protocol-level bug (wrong SPI clock polarity, missing I2C ACK, garbled UART). Switch to an oscilloscope when you suspect an analog problem (insufficient pull-up strength, ringing due to impedance mismatch, ground bounce).
Hard Fault Debugging on Cortex-M
When a Cortex-M processor takes a hard fault, the hardware pushes eight registers onto the current stack (R0-R3, R12, LR, PC, xPSR). The stacked PC tells you the exact instruction that faulted. The fault cause is encoded in the Configurable Fault Status Register (CFSR) and Hard Fault Status Register (HFSR):
| Register | Address | What It Tells You |
|---|---|---|
| HFSR | 0xE000ED2C | Bit 30 (FORCED) = a configurable fault was escalated to hard fault |
| CFSR | 0xE000ED28 | Contains MemManage, BusFault, and UsageFault status bits |
| MMFAR | 0xE000ED34 | Faulting address for memory management faults |
| BFAR | 0xE000ED38 | Faulting address for bus faults |
Common CFSR bits: IMPRECISERR (bus fault on a buffered write -- the stacked PC may not point to the offending instruction), PRECISERR (bus fault at the stacked PC address), INVSTATE (attempted to execute in an invalid EPSR state, often a function pointer to even address), UNDEFINSTR (illegal opcode, common with corrupted stacks).
A minimal hard fault handler that prints the stacked PC:
void HardFault_Handler(void) {__asm volatile ("TST LR, #4 \n""ITE EQ \n""MRSEQ R0, MSP \n""MRSNE R0, PSP \n""B hard_fault_diag\n");}void hard_fault_diag(uint32_t *frame) {volatile uint32_t pc = frame[6];volatile uint32_t cfsr = *(volatile uint32_t *)0xE000ED28;volatile uint32_t hfsr = *(volatile uint32_t *)0xE000ED2C;/* Set breakpoint here -- inspect pc, cfsr, hfsr in debugger */while (1);}
The TST LR, #4 determines whether the exception frame is on MSP (main stack) or PSP (process stack). Once you have the stacked PC, use addr2line -e firmware.elf 0x<pc_value> to map it back to source.
Embedded Linux Debugging Tools
For Linux-based embedded platforms, a different tool set supplements JTAG/SWD:
| Tool | Purpose |
|---|---|
dmesg | Kernel ring buffer -- shows driver probe failures, hardware errors |
strace | Traces system calls for a user-space process |
gdbserver | Runs on target; host GDB attaches remotely over TCP |
ftrace / trace-cmd | Kernel function tracing with low overhead |
perf | Performance profiling (CPU cycles, cache misses, branch prediction) |
/proc, /sys | Runtime inspection of GPIO state, interrupt counts, device status |
Systematic Debugging Methodology
+----------+ +-----------+ +-------------+ +------------+| Reproduce | --> | Isolate | --> | Hypothesize | --> | Instrument || the bug | | the scope | | root cause | | & measure |+----------+ +-----------+ +-------------+ +------------+^ || +--------+ +--------+ |+-------------- | Verify | <-- | Fix | <------------+| & test | | it |+--------+ +--------+
- Reproduce -- Get a reliable reproduction case. Intermittent bugs need persistent logging or triggered captures.
- Isolate -- Narrow the scope: is it hardware, software, or an interaction? Swap boards, swap firmware, bisect commits.
- Hypothesize -- Form a testable theory. "The hard fault occurs when the DMA callback fires during a context switch."
- Instrument -- Add the right probe: SWO trace, GPIO toggle measured on scope, logic analyzer capture, or conditional breakpoint.
- Fix -- Implement the correction.
- Verify -- Confirm the fix under the original failure conditions and run regression tests.
SWD pull-ups from a connected debug probe can keep a Cortex-M out of deep sleep modes. Always disconnect the probe (or tri-state SWDIO/SWCLK) when measuring sleep current. This is one of the most common causes of "10x expected sleep current" bug reports.
When CFSR shows IMPRECISERR, the stacked PC does not point to the faulting instruction because the write was buffered. Disable write buffering temporarily (SCnSCB->ACTLR |= ACTLR_DISDEFWBUF) to get a precise fault, then re-enable it after finding the bug.
Debugging Story: The Intermittent Crash
An IoT gateway running FreeRTOS crashed every 4-8 hours with a hard fault. The crash was never seen during development because it only occurred under sustained production traffic. The team connected a J-Link in SWO trace mode and added ITM writes at key points (task entry, ISR entry/exit, malloc calls). After an overnight capture, the SWO log showed the crash always followed a specific sequence: a UART RX DMA callback interrupted a malloc call in a low-priority task.
Reading the CFSR after the crash revealed IMPRECISERR. Disabling write buffering via ACTLR turned it into a PRECISERR, and the stacked PC pointed to a STR instruction inside the heap allocator. The root cause: the DMA callback called pvPortMalloc from ISR context without acquiring the heap lock, corrupting the free-list. The fix was to defer the allocation to a task via a queue, and the crash never returned.
The lesson: hard fault registers plus SWO trace are a powerful combination -- the registers tell you what happened, and the trace tells you why.
Interview Focus
Classic Interview Questions
Q1: "When would you choose SWD over JTAG, and vice versa?"
Model Answer Starter: "I default to SWD for all ARM Cortex-M work because it only needs two wires and provides full debug access. I switch to JTAG when I need boundary scan testing for board-level manufacturing test, daisy-chaining multiple devices on one debug chain, or targeting non-ARM architectures like MIPS or RISC-V where SWD is unavailable. On some Cortex-A SoCs with multiple cores, JTAG may also be required for multi-core debug topologies."
Q2: "Walk me through how you would debug a hard fault on a Cortex-M device."
Model Answer Starter: "First, I implement a hard fault handler that reads the link register to determine whether the exception frame is on MSP or PSP, then extracts the stacked PC, LR, and xPSR. I read CFSR and HFSR to classify the fault -- for example, IMPRECISERR means a buffered write failed and the stacked PC might not be exact, so I temporarily disable write buffering to get a precise fault. I use addr2line with the ELF file to map the stacked PC to a source line. If it is a stack overflow, the stacked frame itself may be corrupt, so I check the stack sentinel or MPU fault address."
Q3: "What are the trade-offs between printf debugging and SWO/ITM trace?"
Model Answer Starter: "Printf over UART is easy to set up but introduces significant timing distortion -- a single printf at 115200 baud can take hundreds of microseconds. SWO/ITM writes a raw 32-bit value to a stimulus port in a few cycles, and the debug probe timestamps and streams the data to the host. SWO is better for timing-sensitive debugging and high-volume logging. The downside is that SWO requires the SWO pin to be routed to the debug connector and a probe that supports it."
Q4: "How do you decide whether to reach for a logic analyzer or an oscilloscope?"
Model Answer Starter: "I start with a logic analyzer when I suspect a protocol-level issue -- wrong clock polarity, missing ACK, incorrect byte order -- because it gives me decoded protocol frames directly. I switch to an oscilloscope when I suspect an analog problem -- slow rise times, ringing, voltage levels that do not meet logic thresholds, or ground bounce. For mixed issues I use a mixed-signal oscilloscope (MSO) that combines analog channels with digital decode."
Q5: "Describe your systematic approach to debugging an intermittent embedded bug."
Model Answer Starter: "I follow a structured methodology: first reproduce the bug as reliably as possible, even if that means running a stress test overnight. Then isolate the scope -- is it hardware, firmware, or a specific interaction? Next, I form a hypothesis and choose the minimum instrumentation to test it: SWO trace for execution flow, GPIO toggles measured on a scope for timing, or a logic analyzer for protocol issues. After identifying the root cause I implement a fix and verify it under the same conditions that triggered the original failure, then add a regression test."
Trap Alerts
- Don't say: "I just add printf statements until I find the bug" -- this suggests no systematic methodology
- Don't forget: Hard fault register analysis (CFSR/HFSR) -- it is the single fastest path to identifying Cortex-M crashes
- Don't ignore: The difference between precise and imprecise bus faults, and how write buffering affects root-cause accuracy
Follow-up Questions
- "How would you debug a crash that only occurs when the debug probe is disconnected?"
- "What is the role of ETM (Embedded Trace Macrocell) vs ITM, and when would you use each?"
- "How do you recover an MCU whose SWD pins have been repurposed as GPIO?"
- "Describe how you would use watchpoints to catch memory corruption."
Practice
❓ What does the CFSR bit IMPRECISERR indicate on a Cortex-M hard fault?
❓ How many pins does SWD require compared to JTAG?
❓ Why does printf debugging over UART disturb real-time behavior more than SWO/ITM trace?
❓ In a Cortex-M hard fault handler, what does TST LR, #4 determine?
❓ When should you use a logic analyzer instead of an oscilloscope?
Real-World Tie-In
Automotive ECU Field Failure -- A production ECU crashed sporadically during cold starts. The team could not reproduce it on the bench. They added a persistent hard fault logger that wrote the stacked PC, CFSR, and HFSR to a reserved flash sector before resetting. After collecting three crashes from the field, all three stacked PCs pointed to the same flash-read instruction. The root cause was a flash wait-state misconfiguration that only manifested below -20 C when the flash access time increased. Adding the correct wait-state setting for the low-temperature clock configuration eliminated the crash.
Wearable BLE Sensor Battery Drain -- A wearable device consumed 3 mA in sleep instead of the expected 300 uA. The team measured current with a Nordic PPK2 and saw periodic 2 mA spikes every 100 ms that should not have been there. Disconnecting the J-Link debug probe dropped the current to 400 uA -- the SWD pull-ups on SWDIO and SWCLK were preventing the MCU from fully entering System OFF mode. After disconnecting the probe, the remaining 100 uA excess was traced (via GPIO toggling and scope measurement) to an accidentally enabled internal pull-up on an unused GPIO pin. Disabling it brought sleep current to the expected 300 uA.