What are the rules for writing a safe ISR? What mistakes do you commonly see?
The fundamental rules for ISR safety on Cortex-M are: (1) Clear the interrupt flag — the very first thing the ISR should do is clear the pending flag in the peripheral's status register. If you forget, the ISR will re-enter immediately after return because the NVIC still sees the request asserted, creating an infinite loop that locks up the system. For level-triggered peripherals like UART RXNE, clearing means reading the data register; for timer update interrupts, it means writing to the status register. (2) No blocking calls — never call functions that wait, sleep, or spin on a condition. This includes HAL_Delay() (which polls SysTick and will deadlock if SysTick is at equal or lower priority), RTOS osDelay(), busy-wait loops, and mutex locks. (3) No heap operations — malloc(), free(), printf(), sprintf(), and new all touch the heap, which is not reentrant. If the main loop is mid-malloc when an ISR calls malloc, the heap metadata is corrupted silently.
(4) Use volatile for shared variables — any variable written in an ISR and read in the main loop (or vice versa) must be declared volatile. Without it, the compiler may optimize away the read in the main loop, caching the value in a register and never seeing the ISR's update. This is the single most common bug in bare-metal ISR code — it works in debug builds (optimizations disabled) and breaks in release builds. (5) Minimize execution time — capture data and get out. Copy peripheral registers to a buffer, set a flag, and return. Parse, compute, and respond in the main context. (6) Ensure atomicity for multi-byte shared data — a 32-bit write on Cortex-M3/M4 is atomic if word-aligned, but a 64-bit timestamp or a multi-field struct is not. Use a critical section or a double-buffering scheme for non-atomic shared data.
Common mistakes in interviews and in real codebases: calling printf() for debug output inside an ISR (crashes or corrupts output in production), forgetting to clear the interrupt flag (system appears to hang), using HAL_Delay() inside an ISR (deadlocks because SysTick cannot fire), modifying shared data without volatile (works in debug, fails in release), and enabling floating-point operations in an ISR on Cortex-M4F without ensuring the FPU context is saved (the lazy stacking mechanism handles this automatically on ARMv7-M, but older RTOS ports may not preserve FPU state across context switches initiated from ISRs).
Source: Interrupts & Priorities Q&A
