Debugging embedded systems

Quick Cap

Debugging embedded systems requires a blend of hardware tools (JTAG/SWD probes, oscilloscopes, logic analyzers) and software techniques (GDB remote sessions, printf/trace logging, hard fault analysis). Unlike desktop software, embedded bugs often involve hardware-software interactions, timing-dependent failures, and faults that corrupt the very stack you need to diagnose. Interviewers test whether you can systematically narrow down a problem using the right tool at each stage.

Key Facts:

JTAG uses 4 wires (TDI, TDO, TMS, TCK); SWD uses 2 wires (SWDIO, SWCLK) and is ARM Cortex specific
GDB remote debugging connects GDB to a target via a debug probe (J-Link, ST-Link) through a GDB server
Hard faults on Cortex-M can be decoded by reading CFSR, HFSR, and the stacked PC from the exception frame
SWO/ITM trace provides low-overhead runtime logging without the timing distortion of UART-based printf
Logic analyzers capture and decode digital protocols (SPI, I2C, UART); oscilloscopes show analog signal integrity
Systematic methodology: reproduce, isolate, hypothesize, instrument, verify, fix, regression-test

Deep Dive

At a Glance

Tool / Technique	Best For	Key Trade-off
JTAG	Boundary scan, multi-core, non-ARM targets	More pins (4-5), supports daisy-chaining
SWD	ARM Cortex-M/R/A debugging	Only 2 pins, but ARM-only
GDB remote	Setting breakpoints, stepping, inspecting memory	Halts CPU -- unusable for real-time observation
Printf (UART)	Quick-and-dirty logging	High latency, disturbs timing, blocks on UART TX
SWO / ITM trace	Low-overhead timestamped logging	Needs SWO pin routed; limited bandwidth
Logic analyzer	Protocol decoding (I2C, SPI, UART, CAN)	Shows digital levels only, no analog detail
Oscilloscope	Signal integrity, rise times, voltage levels	Fewer channels, harder to decode protocols

JTAG vs SWD

Feature	JTAG	SWD
Pins required	4 (TDI, TDO, TMS, TCK) + optional TRST	2 (SWDIO, SWCLK)
Standard	IEEE 1149.1 (cross-architecture)	ARM-specific (CoreSight)
Daisy-chaining	Yes -- multiple devices on one chain	No
Boundary scan	Yes	No
Debug speed	Similar	Similar (slightly lower wire overhead)
SWO output	Not inherent (needs separate SWO pin)	SWO pin available on same connector
Typical use	FPGA, multi-core SoCs, production test	Cortex-M development and debugging

SWD is the default choice for Cortex-M work because it frees up pins (only 2 vs 4-5) and still provides full debug access -- breakpoints, watchpoints, memory read/write, and flash programming. JTAG remains essential when you need boundary scan for board-level test, daisy-chain multiple devices, or target non-ARM architectures.

GDB Remote Debugging Workflow

The typical embedded GDB session uses a three-layer stack:

text

+----------+       TCP/IP or pipe       +---------------+       SWD/JTAG       +-----------+
  |   GDB    |  <--------------------->  |  GDB Server   |  <---------------->  |   Target  |
  | (host)   |    (RSP protocol)         | (OpenOCD /     |    (probe wires)     |   MCU     |
  |          |                           |  J-Link GDB)   |                      |           |
  +----------+                           +---------------+                       +-----------+

The debug probe (J-Link, ST-Link, CMSIS-DAP) connects to the target via SWD or JTAG.
A GDB server (OpenOCD, JLinkGDBServer, pyOCD) translates the GDB Remote Serial Protocol (RSP) into probe commands.
GDB on the host connects via TCP (typically port 3333 for OpenOCD, 2331 for J-Link) and provides breakpoints, stepping, register/memory inspection, and variable watches.

Key GDB commands for embedded work: target remote :3333 to connect, monitor reset halt to reset and halt, load to flash the ELF, info registers to dump CPU state, x/16xw 0x20000000 to examine 16 words of SRAM, and bt for a backtrace.

Printf vs SWO/ITM Trace

Aspect	Printf over UART	SWO / ITM Trace
Setup	Route UART TX pin, redirect `_write`	Route SWO pin, configure ITM stimulus ports
Overhead	High -- blocks on UART TX or DMA, 115200 baud is common bottleneck	Low -- hardware serializes data via SWO at up to several MHz
Timing distortion	Severe at high log rates; can mask or create bugs	Minimal -- timestamps come from DWT cycle counter
Bandwidth	~11.5 KB/s at 115200 baud	Hundreds of KB/s via Manchester or UART SWO
Formatting	Full `printf` string formatting on target (costs flash + cycles)	Typically raw 32-bit stimulus writes; formatting done on host
Production use	Often left disabled; UART pin may be shared	SWO pin often unused in production; easy to enable for field debug

When you need quick visibility during early development, UART printf is fine. Once timing matters or log volume grows, switch to ITM stimulus port writes and let the host tool (Ozone, SWO Viewer, or OpenOCD) decode and timestamp the output.

Logic Analyzer vs Oscilloscope

Criterion	Logic Analyzer	Oscilloscope
Signal type	Digital (threshold-based 0/1)	Analog (continuous voltage)
Channel count	8-34+ typical	2-4 typical
Protocol decode	Built-in (SPI, I2C, UART, CAN)	Available on some models (MSO)
Best for	Bus protocol debugging, timing verification	Signal integrity (overshoot, ringing, rise time)
Typical tool	Saleae Logic, sigrok	Rigol DS1054Z, Keysight, Tektronix

Use a logic analyzer first when chasing a protocol-level bug (wrong SPI clock polarity, missing I2C ACK, garbled UART). Switch to an oscilloscope when you suspect an analog problem (insufficient pull-up strength, ringing due to impedance mismatch, ground bounce).

Hard Fault Debugging on Cortex-M

When a Cortex-M processor takes a hard fault, the hardware pushes eight registers onto the current stack (R0-R3, R12, LR, PC, xPSR). The stacked PC tells you the exact instruction that faulted. The fault cause is encoded in the Configurable Fault Status Register (CFSR) and Hard Fault Status Register (HFSR):

Register	Address	What It Tells You
HFSR	`0xE000ED2C`	Bit 30 (`FORCED`) = a configurable fault was escalated to hard fault
CFSR	`0xE000ED28`	Contains MemManage, BusFault, and UsageFault status bits
MMFAR	`0xE000ED34`	Faulting address for memory management faults
BFAR	`0xE000ED38`	Faulting address for bus faults

Common CFSR bits: IMPRECISERR (bus fault on a buffered write -- the stacked PC may not point to the offending instruction), PRECISERR (bus fault at the stacked PC address), INVSTATE (attempted to execute in an invalid EPSR state, often a function pointer to even address), UNDEFINSTR (illegal opcode, common with corrupted stacks).

A minimal hard fault handler that prints the stacked PC:

void HardFault_Handler(void) {
    __asm volatile (
        "TST   LR, #4        \n"
        "ITE   EQ             \n"
        "MRSEQ R0, MSP        \n"
        "MRSNE R0, PSP        \n"
        "B     hard_fault_diag\n"
    );
}

void hard_fault_diag(uint32_t *frame) {
    volatile uint32_t pc   = frame[6];
    volatile uint32_t cfsr = *(volatile uint32_t *)0xE000ED28;
    volatile uint32_t hfsr = *(volatile uint32_t *)0xE000ED2C;
    /* Set breakpoint here -- inspect pc, cfsr, hfsr in debugger */
    while (1);
}

The TST LR, #4 determines whether the exception frame is on MSP (main stack) or PSP (process stack). Once you have the stacked PC, use addr2line -e firmware.elf 0x<pc_value> to map it back to source.

Embedded Linux Debugging Tools

For Linux-based embedded platforms, a different tool set supplements JTAG/SWD:

Tool	Purpose
`dmesg`	Kernel ring buffer -- shows driver probe failures, hardware errors
`strace`	Traces system calls for a user-space process
`gdbserver`	Runs on target; host GDB attaches remotely over TCP
`ftrace` / `trace-cmd`	Kernel function tracing with low overhead
`perf`	Performance profiling (CPU cycles, cache misses, branch prediction)
`/proc`, `/sys`	Runtime inspection of GPIO state, interrupt counts, device status

Systematic Debugging Methodology

text

+----------+     +-----------+     +-------------+     +------------+
  | Reproduce | --> | Isolate   | --> | Hypothesize | --> | Instrument |
  | the bug   |     | the scope |     | root cause  |     | & measure  |
  +----------+     +-----------+     +-------------+     +------------+
       ^                                                       |
       |               +--------+     +--------+              |
       +-------------- | Verify | <-- |  Fix   | <------------+
                       | & test |     |  it    |
                       +--------+     +--------+

Reproduce -- Get a reliable reproduction case. Intermittent bugs need persistent logging or triggered captures.
Isolate -- Narrow the scope: is it hardware, software, or an interaction? Swap boards, swap firmware, bisect commits.
Hypothesize -- Form a testable theory. "The hard fault occurs when the DMA callback fires during a context switch."
Instrument -- Add the right probe: SWO trace, GPIO toggle measured on scope, logic analyzer capture, or conditional breakpoint.
Fix -- Implement the correction.
Verify -- Confirm the fix under the original failure conditions and run regression tests.

⚠️Common Trap: Debug Probe Keeps MCU Awake

SWD pull-ups from a connected debug probe can keep a Cortex-M out of deep sleep modes. Always disconnect the probe (or tri-state SWDIO/SWCLK) when measuring sleep current. This is one of the most common causes of "10x expected sleep current" bug reports.

⚠️Common Trap: Imprecise Bus Faults

When CFSR shows IMPRECISERR, the stacked PC does not point to the faulting instruction because the write was buffered. Disable write buffering temporarily (SCnSCB->ACTLR |= ACTLR_DISDEFWBUF) to get a precise fault, then re-enable it after finding the bug.

Debugging Story: The Intermittent Crash

An IoT gateway running FreeRTOS crashed every 4-8 hours with a hard fault. The crash was never seen during development because it only occurred under sustained production traffic. The team connected a J-Link in SWO trace mode and added ITM writes at key points (task entry, ISR entry/exit, malloc calls). After an overnight capture, the SWO log showed the crash always followed a specific sequence: a UART RX DMA callback interrupted a malloc call in a low-priority task.

Reading the CFSR after the crash revealed IMPRECISERR. Disabling write buffering via ACTLR turned it into a PRECISERR, and the stacked PC pointed to a STR instruction inside the heap allocator. The root cause: the DMA callback called pvPortMalloc from ISR context without acquiring the heap lock, corrupting the free-list. The fix was to defer the allocation to a task via a queue, and the crash never returned.

The lesson: hard fault registers plus SWO trace are a powerful combination -- the registers tell you what happened, and the trace tells you why.

Interview Focus

Classic Interview Questions

Q1: "When would you choose SWD over JTAG, and vice versa?"

Model Answer Starter: "I default to SWD for all ARM Cortex-M work because it only needs two wires and provides full debug access. I switch to JTAG when I need boundary scan testing for board-level manufacturing test, daisy-chaining multiple devices on one debug chain, or targeting non-ARM architectures like MIPS or RISC-V where SWD is unavailable. On some Cortex-A SoCs with multiple cores, JTAG may also be required for multi-core debug topologies."

Q2: "Walk me through how you would debug a hard fault on a Cortex-M device."

Model Answer Starter: "First, I implement a hard fault handler that reads the link register to determine whether the exception frame is on MSP or PSP, then extracts the stacked PC, LR, and xPSR. I read CFSR and HFSR to classify the fault -- for example, IMPRECISERR means a buffered write failed and the stacked PC might not be exact, so I temporarily disable write buffering to get a precise fault. I use addr2line with the ELF file to map the stacked PC to a source line. If it is a stack overflow, the stacked frame itself may be corrupt, so I check the stack sentinel or MPU fault address."

Q3: "What are the trade-offs between printf debugging and SWO/ITM trace?"

Model Answer Starter: "Printf over UART is easy to set up but introduces significant timing distortion -- a single printf at 115200 baud can take hundreds of microseconds. SWO/ITM writes a raw 32-bit value to a stimulus port in a few cycles, and the debug probe timestamps and streams the data to the host. SWO is better for timing-sensitive debugging and high-volume logging. The downside is that SWO requires the SWO pin to be routed to the debug connector and a probe that supports it."

Q4: "How do you decide whether to reach for a logic analyzer or an oscilloscope?"

Model Answer Starter: "I start with a logic analyzer when I suspect a protocol-level issue -- wrong clock polarity, missing ACK, incorrect byte order -- because it gives me decoded protocol frames directly. I switch to an oscilloscope when I suspect an analog problem -- slow rise times, ringing, voltage levels that do not meet logic thresholds, or ground bounce. For mixed issues I use a mixed-signal oscilloscope (MSO) that combines analog channels with digital decode."

Q5: "Describe your systematic approach to debugging an intermittent embedded bug."

Model Answer Starter: "I follow a structured methodology: first reproduce the bug as reliably as possible, even if that means running a stress test overnight. Then isolate the scope -- is it hardware, firmware, or a specific interaction? Next, I form a hypothesis and choose the minimum instrumentation to test it: SWO trace for execution flow, GPIO toggles measured on a scope for timing, or a logic analyzer for protocol issues. After identifying the root cause I implement a fix and verify it under the same conditions that triggered the original failure, then add a regression test."

Trap Alerts

Don't say: "I just add printf statements until I find the bug" -- this suggests no systematic methodology
Don't forget: Hard fault register analysis (CFSR/HFSR) -- it is the single fastest path to identifying Cortex-M crashes
Don't ignore: The difference between precise and imprecise bus faults, and how write buffering affects root-cause accuracy

Follow-up Questions

"How would you debug a crash that only occurs when the debug probe is disconnected?"
"What is the role of ETM (Embedded Trace Macrocell) vs ITM, and when would you use each?"
"How do you recover an MCU whose SWD pins have been repurposed as GPIO?"
"Describe how you would use watchpoints to catch memory corruption."

Practice

❓ What does the CFSR bit IMPRECISERR indicate on a Cortex-M hard fault?

❓ How many pins does SWD require compared to JTAG?

❓ Why does printf debugging over UART disturb real-time behavior more than SWO/ITM trace?

❓ In a Cortex-M hard fault handler, what does TST LR, #4 determine?

❓ When should you use a logic analyzer instead of an oscilloscope?

Real-World Tie-In

Automotive ECU Field Failure -- A production ECU crashed sporadically during cold starts. The team could not reproduce it on the bench. They added a persistent hard fault logger that wrote the stacked PC, CFSR, and HFSR to a reserved flash sector before resetting. After collecting three crashes from the field, all three stacked PCs pointed to the same flash-read instruction. The root cause was a flash wait-state misconfiguration that only manifested below -20 C when the flash access time increased. Adding the correct wait-state setting for the low-temperature clock configuration eliminated the crash.

Wearable BLE Sensor Battery Drain -- A wearable device consumed 3 mA in sleep instead of the expected 300 uA. The team measured current with a Nordic PPK2 and saw periodic 2 mA spikes every 100 ms that should not have been there. Disconnecting the J-Link debug probe dropped the current to 400 uA -- the SWD pull-ups on SWDIO and SWCLK were preventing the MCU from fully entering System OFF mode. After disconnecting the probe, the remaining 100 uA excess was traced (via GPIO toggling and scope measurement) to an accidentally enabled internal pull-up on an unused GPIO pin. Disabling it brought sleep current to the expected 300 uA.