Debugging, Testing & Tools
intermediate
Weight: 3/10

Debugging embedded systems

Master embedded debugging: JTAG/SWD, GDB remote debugging, hard fault analysis, printf vs trace, logic analyzers, and systematic debugging methodology.

debugging
jtag
swd
gdb
hard-fault
swo
logic-analyzer

Quick Cap

Debugging embedded systems requires a blend of hardware tools (JTAG/SWD probes, oscilloscopes, logic analyzers) and software techniques (GDB remote sessions, printf/trace logging, hard fault analysis). Unlike desktop software, embedded bugs often involve hardware-software interactions, timing-dependent failures, and faults that corrupt the very stack you need to diagnose. Interviewers test whether you can systematically narrow down a problem using the right tool at each stage.

Key Facts:

  • JTAG uses 4 wires (TDI, TDO, TMS, TCK); SWD uses 2 wires (SWDIO, SWCLK) and is ARM Cortex specific
  • GDB remote debugging connects GDB to a target via a debug probe (J-Link, ST-Link) through a GDB server
  • Hard faults on Cortex-M can be decoded by reading CFSR, HFSR, and the stacked PC from the exception frame
  • SWO/ITM trace provides low-overhead runtime logging without the timing distortion of UART-based printf
  • Logic analyzers capture and decode digital protocols (SPI, I2C, UART); oscilloscopes show analog signal integrity
  • Systematic methodology: reproduce, isolate, hypothesize, instrument, verify, fix, regression-test

Deep Dive

At a Glance

Tool / TechniqueBest ForKey Trade-off
JTAGBoundary scan, multi-core, non-ARM targetsMore pins (4-5), supports daisy-chaining
SWDARM Cortex-M/R/A debuggingOnly 2 pins, but ARM-only
GDB remoteSetting breakpoints, stepping, inspecting memoryHalts CPU -- unusable for real-time observation
Printf (UART)Quick-and-dirty loggingHigh latency, disturbs timing, blocks on UART TX
SWO / ITM traceLow-overhead timestamped loggingNeeds SWO pin routed; limited bandwidth
Logic analyzerProtocol decoding (I2C, SPI, UART, CAN)Shows digital levels only, no analog detail
OscilloscopeSignal integrity, rise times, voltage levelsFewer channels, harder to decode protocols

JTAG vs SWD

FeatureJTAGSWD
Pins required4 (TDI, TDO, TMS, TCK) + optional TRST2 (SWDIO, SWCLK)
StandardIEEE 1149.1 (cross-architecture)ARM-specific (CoreSight)
Daisy-chainingYes -- multiple devices on one chainNo
Boundary scanYesNo
Debug speedSimilarSimilar (slightly lower wire overhead)
SWO outputNot inherent (needs separate SWO pin)SWO pin available on same connector
Typical useFPGA, multi-core SoCs, production testCortex-M development and debugging

SWD is the default choice for Cortex-M work because it frees up pins (only 2 vs 4-5) and still provides full debug access -- breakpoints, watchpoints, memory read/write, and flash programming. JTAG remains essential when you need boundary scan for board-level test, daisy-chain multiple devices, or target non-ARM architectures.

GDB Remote Debugging Workflow

The typical embedded GDB session uses a three-layer stack:

px-2 py-1 rounded text-sm font-mono border
+----------+ TCP/IP or pipe +---------------+ SWD/JTAG +-----------+
| GDB | <---------------------> | GDB Server | <----------------> | Target |
| (host) | (RSP protocol) | (OpenOCD / | (probe wires) | MCU |
| | | J-Link GDB) | | |
+----------+ +---------------+ +-----------+
  1. The debug probe (J-Link, ST-Link, CMSIS-DAP) connects to the target via SWD or JTAG.
  2. A GDB server (OpenOCD, JLinkGDBServer, pyOCD) translates the GDB Remote Serial Protocol (RSP) into probe commands.
  3. GDB on the host connects via TCP (typically port 3333 for OpenOCD, 2331 for J-Link) and provides breakpoints, stepping, register/memory inspection, and variable watches.

Key GDB commands for embedded work: target remote :3333 to connect, monitor reset halt to reset and halt, load to flash the ELF, info registers to dump CPU state, x/16xw 0x20000000 to examine 16 words of SRAM, and bt for a backtrace.

Printf vs SWO/ITM Trace

AspectPrintf over UARTSWO / ITM Trace
SetupRoute UART TX pin, redirect _writeRoute SWO pin, configure ITM stimulus ports
OverheadHigh -- blocks on UART TX or DMA, 115200 baud is common bottleneckLow -- hardware serializes data via SWO at up to several MHz
Timing distortionSevere at high log rates; can mask or create bugsMinimal -- timestamps come from DWT cycle counter
Bandwidth~11.5 KB/s at 115200 baudHundreds of KB/s via Manchester or UART SWO
FormattingFull printf string formatting on target (costs flash + cycles)Typically raw 32-bit stimulus writes; formatting done on host
Production useOften left disabled; UART pin may be sharedSWO pin often unused in production; easy to enable for field debug

When you need quick visibility during early development, UART printf is fine. Once timing matters or log volume grows, switch to ITM stimulus port writes and let the host tool (Ozone, SWO Viewer, or OpenOCD) decode and timestamp the output.

Logic Analyzer vs Oscilloscope

CriterionLogic AnalyzerOscilloscope
Signal typeDigital (threshold-based 0/1)Analog (continuous voltage)
Channel count8-34+ typical2-4 typical
Protocol decodeBuilt-in (SPI, I2C, UART, CAN)Available on some models (MSO)
Best forBus protocol debugging, timing verificationSignal integrity (overshoot, ringing, rise time)
Typical toolSaleae Logic, sigrokRigol DS1054Z, Keysight, Tektronix

Use a logic analyzer first when chasing a protocol-level bug (wrong SPI clock polarity, missing I2C ACK, garbled UART). Switch to an oscilloscope when you suspect an analog problem (insufficient pull-up strength, ringing due to impedance mismatch, ground bounce).

Hard Fault Debugging on Cortex-M

When a Cortex-M processor takes a hard fault, the hardware pushes eight registers onto the current stack (R0-R3, R12, LR, PC, xPSR). The stacked PC tells you the exact instruction that faulted. The fault cause is encoded in the Configurable Fault Status Register (CFSR) and Hard Fault Status Register (HFSR):

RegisterAddressWhat It Tells You
HFSR0xE000ED2CBit 30 (FORCED) = a configurable fault was escalated to hard fault
CFSR0xE000ED28Contains MemManage, BusFault, and UsageFault status bits
MMFAR0xE000ED34Faulting address for memory management faults
BFAR0xE000ED38Faulting address for bus faults

Common CFSR bits: IMPRECISERR (bus fault on a buffered write -- the stacked PC may not point to the offending instruction), PRECISERR (bus fault at the stacked PC address), INVSTATE (attempted to execute in an invalid EPSR state, often a function pointer to even address), UNDEFINSTR (illegal opcode, common with corrupted stacks).

A minimal hard fault handler that prints the stacked PC:

c
void HardFault_Handler(void) {
__asm volatile (
"TST LR, #4 \n"
"ITE EQ \n"
"MRSEQ R0, MSP \n"
"MRSNE R0, PSP \n"
"B hard_fault_diag\n"
);
}
void hard_fault_diag(uint32_t *frame) {
volatile uint32_t pc = frame[6];
volatile uint32_t cfsr = *(volatile uint32_t *)0xE000ED28;
volatile uint32_t hfsr = *(volatile uint32_t *)0xE000ED2C;
/* Set breakpoint here -- inspect pc, cfsr, hfsr in debugger */
while (1);
}

The TST LR, #4 determines whether the exception frame is on MSP (main stack) or PSP (process stack). Once you have the stacked PC, use addr2line -e firmware.elf 0x<pc_value> to map it back to source.

Embedded Linux Debugging Tools

For Linux-based embedded platforms, a different tool set supplements JTAG/SWD:

ToolPurpose
dmesgKernel ring buffer -- shows driver probe failures, hardware errors
straceTraces system calls for a user-space process
gdbserverRuns on target; host GDB attaches remotely over TCP
ftrace / trace-cmdKernel function tracing with low overhead
perfPerformance profiling (CPU cycles, cache misses, branch prediction)
/proc, /sysRuntime inspection of GPIO state, interrupt counts, device status

Systematic Debugging Methodology

px-2 py-1 rounded text-sm font-mono border
+----------+ +-----------+ +-------------+ +------------+
| Reproduce | --> | Isolate | --> | Hypothesize | --> | Instrument |
| the bug | | the scope | | root cause | | & measure |
+----------+ +-----------+ +-------------+ +------------+
^ |
| +--------+ +--------+ |
+-------------- | Verify | <-- | Fix | <------------+
| & test | | it |
+--------+ +--------+
  1. Reproduce -- Get a reliable reproduction case. Intermittent bugs need persistent logging or triggered captures.
  2. Isolate -- Narrow the scope: is it hardware, software, or an interaction? Swap boards, swap firmware, bisect commits.
  3. Hypothesize -- Form a testable theory. "The hard fault occurs when the DMA callback fires during a context switch."
  4. Instrument -- Add the right probe: SWO trace, GPIO toggle measured on scope, logic analyzer capture, or conditional breakpoint.
  5. Fix -- Implement the correction.
  6. Verify -- Confirm the fix under the original failure conditions and run regression tests.
⚠️Common Trap: Debug Probe Keeps MCU Awake

SWD pull-ups from a connected debug probe can keep a Cortex-M out of deep sleep modes. Always disconnect the probe (or tri-state SWDIO/SWCLK) when measuring sleep current. This is one of the most common causes of "10x expected sleep current" bug reports.

⚠️Common Trap: Imprecise Bus Faults

When CFSR shows IMPRECISERR, the stacked PC does not point to the faulting instruction because the write was buffered. Disable write buffering temporarily (SCnSCB->ACTLR |= ACTLR_DISDEFWBUF) to get a precise fault, then re-enable it after finding the bug.

Debugging Story: The Intermittent Crash

An IoT gateway running FreeRTOS crashed every 4-8 hours with a hard fault. The crash was never seen during development because it only occurred under sustained production traffic. The team connected a J-Link in SWO trace mode and added ITM writes at key points (task entry, ISR entry/exit, malloc calls). After an overnight capture, the SWO log showed the crash always followed a specific sequence: a UART RX DMA callback interrupted a malloc call in a low-priority task.

Reading the CFSR after the crash revealed IMPRECISERR. Disabling write buffering via ACTLR turned it into a PRECISERR, and the stacked PC pointed to a STR instruction inside the heap allocator. The root cause: the DMA callback called pvPortMalloc from ISR context without acquiring the heap lock, corrupting the free-list. The fix was to defer the allocation to a task via a queue, and the crash never returned.

The lesson: hard fault registers plus SWO trace are a powerful combination -- the registers tell you what happened, and the trace tells you why.

Interview Focus

Classic Interview Questions

Q1: "When would you choose SWD over JTAG, and vice versa?"

Model Answer Starter: "I default to SWD for all ARM Cortex-M work because it only needs two wires and provides full debug access. I switch to JTAG when I need boundary scan testing for board-level manufacturing test, daisy-chaining multiple devices on one debug chain, or targeting non-ARM architectures like MIPS or RISC-V where SWD is unavailable. On some Cortex-A SoCs with multiple cores, JTAG may also be required for multi-core debug topologies."

Q2: "Walk me through how you would debug a hard fault on a Cortex-M device."

Model Answer Starter: "First, I implement a hard fault handler that reads the link register to determine whether the exception frame is on MSP or PSP, then extracts the stacked PC, LR, and xPSR. I read CFSR and HFSR to classify the fault -- for example, IMPRECISERR means a buffered write failed and the stacked PC might not be exact, so I temporarily disable write buffering to get a precise fault. I use addr2line with the ELF file to map the stacked PC to a source line. If it is a stack overflow, the stacked frame itself may be corrupt, so I check the stack sentinel or MPU fault address."

Q3: "What are the trade-offs between printf debugging and SWO/ITM trace?"

Model Answer Starter: "Printf over UART is easy to set up but introduces significant timing distortion -- a single printf at 115200 baud can take hundreds of microseconds. SWO/ITM writes a raw 32-bit value to a stimulus port in a few cycles, and the debug probe timestamps and streams the data to the host. SWO is better for timing-sensitive debugging and high-volume logging. The downside is that SWO requires the SWO pin to be routed to the debug connector and a probe that supports it."

Q4: "How do you decide whether to reach for a logic analyzer or an oscilloscope?"

Model Answer Starter: "I start with a logic analyzer when I suspect a protocol-level issue -- wrong clock polarity, missing ACK, incorrect byte order -- because it gives me decoded protocol frames directly. I switch to an oscilloscope when I suspect an analog problem -- slow rise times, ringing, voltage levels that do not meet logic thresholds, or ground bounce. For mixed issues I use a mixed-signal oscilloscope (MSO) that combines analog channels with digital decode."

Q5: "Describe your systematic approach to debugging an intermittent embedded bug."

Model Answer Starter: "I follow a structured methodology: first reproduce the bug as reliably as possible, even if that means running a stress test overnight. Then isolate the scope -- is it hardware, firmware, or a specific interaction? Next, I form a hypothesis and choose the minimum instrumentation to test it: SWO trace for execution flow, GPIO toggles measured on a scope for timing, or a logic analyzer for protocol issues. After identifying the root cause I implement a fix and verify it under the same conditions that triggered the original failure, then add a regression test."

Trap Alerts

  • Don't say: "I just add printf statements until I find the bug" -- this suggests no systematic methodology
  • Don't forget: Hard fault register analysis (CFSR/HFSR) -- it is the single fastest path to identifying Cortex-M crashes
  • Don't ignore: The difference between precise and imprecise bus faults, and how write buffering affects root-cause accuracy

Follow-up Questions

  • "How would you debug a crash that only occurs when the debug probe is disconnected?"
  • "What is the role of ETM (Embedded Trace Macrocell) vs ITM, and when would you use each?"
  • "How do you recover an MCU whose SWD pins have been repurposed as GPIO?"
  • "Describe how you would use watchpoints to catch memory corruption."

Practice

What does the CFSR bit IMPRECISERR indicate on a Cortex-M hard fault?

How many pins does SWD require compared to JTAG?

Why does printf debugging over UART disturb real-time behavior more than SWO/ITM trace?

In a Cortex-M hard fault handler, what does TST LR, #4 determine?

When should you use a logic analyzer instead of an oscilloscope?

Real-World Tie-In

Automotive ECU Field Failure -- A production ECU crashed sporadically during cold starts. The team could not reproduce it on the bench. They added a persistent hard fault logger that wrote the stacked PC, CFSR, and HFSR to a reserved flash sector before resetting. After collecting three crashes from the field, all three stacked PCs pointed to the same flash-read instruction. The root cause was a flash wait-state misconfiguration that only manifested below -20 C when the flash access time increased. Adding the correct wait-state setting for the low-temperature clock configuration eliminated the crash.

Wearable BLE Sensor Battery Drain -- A wearable device consumed 3 mA in sleep instead of the expected 300 uA. The team measured current with a Nordic PPK2 and saw periodic 2 mA spikes every 100 ms that should not have been there. Disconnecting the J-Link debug probe dropped the current to 400 uA -- the SWD pull-ups on SWDIO and SWCLK were preventing the MCU from fully entering System OFF mode. After disconnecting the probe, the remaining 100 uA excess was traced (via GPIO toggling and scope measurement) to an accidentally enabled internal pull-up on an unused GPIO pin. Disabling it brought sleep current to the expected 300 uA.