Interrupts & Priorities — Interview Questions & Answers

ISR Design & Basics

QWhat happens step by step when an interrupt fires on a Cortex-M processor?

When an interrupt request is asserted and its priority is higher than the current execution priority, the Cortex-M hardware initiates an automatic exception entry sequence. First, the processor finishes the currently executing instruction — it never aborts mid-instruction, so worst-case latency depends on the longest instruction (an LDM/STM of multiple registers can take many cycles). Then the hardware automatically pushes eight registers onto the current stack (MSP or PSP): R0-R3, R12, LR, PC (the return address), and xPSR. This "stacking" happens in parallel with the vector table lookup, which is a key Cortex-M optimization — the processor fetches the ISR address from the vector table (at offset 0x00000000 + 4 * IRQ_number) simultaneously with the register save, reducing entry latency. On Cortex-M3/M4, this takes 12 cycles from interrupt assertion to first ISR instruction.

Once stacking is complete and the vector is fetched, the processor transitions to Handler mode, updates the link register with a special EXC_RETURN value (e.g., 0xFFFFFFF9 for return to MSP thread mode), and begins executing the ISR. The ISR runs at the priority of the interrupt, meaning only higher-priority interrupts can preempt it. When the ISR returns (via BX LR with the EXC_RETURN value), the hardware automatically pops the eight stacked registers ("unstacking"), restores the processor state, and resumes the interrupted code. The entire save/restore mechanism is done in hardware with zero compiler cooperation, which is why Cortex-M ISRs are ordinary C functions — no special prologue/epilogue or __interrupt keywords needed.

A common interview trap is forgetting the parallel stacking and vector fetch. Candidates who describe them as sequential steps overestimate the entry latency. Another subtlety: if a higher-priority interrupt arrives during stacking, the processor completes the stack frame but redirects to the higher-priority ISR instead — the original pending interrupt is serviced after the higher one returns, with tail-chaining eliminating redundant stack operations.

QWhy should ISRs be kept short? What do you do if an interrupt requires heavy processing?

An ISR runs at an elevated hardware priority level, meaning it blocks all interrupts of equal or lower priority for its entire duration. A long-running ISR directly increases interrupt latency for every other interrupt in the system. For example, if a UART RX ISR takes 200 microseconds to process a received packet, and a motor control timer ISR fires during that window, the motor update is delayed by up to 200 microseconds — potentially causing a missed control loop deadline, audible motor noise, or even a safety hazard. In a real-time system, the worst-case latency of every ISR contributes to the worst-case response time of every other ISR at the same or lower priority level.

Beyond latency, long ISRs cause stack pressure (nested interrupts each consume 32+ bytes of stack), increase the probability of priority inversion scenarios, and make the system harder to reason about. A general rule: ISRs should complete in microseconds, not milliseconds. Do the minimum work necessary — acknowledge the interrupt source by clearing the flag, capture time-critical data (read a register, copy a DMA result, sample a pin state), set a flag or enqueue data, and return immediately.

For heavy processing, the standard pattern is deferred processing. In a bare-metal super-loop, the ISR sets a volatile flag and the main loop checks it, performing the expensive work at base priority level where it cannot block other interrupts. In an RTOS, the ISR posts to a semaphore or message queue that wakes a dedicated task — this is the "bottom half" pattern borrowed from Linux. For example, a UART ISR copies received bytes into a ring buffer and signals a semaphore; the parser task wakes up, processes the complete message, and goes back to sleep. FreeRTOS provides xSemaphoreGiveFromISR() and xQueueSendFromISR() specifically for this. The key discipline is separating the time-critical capture (ISR) from the time-tolerant processing (task or main loop).

QWhat are the rules for writing a safe ISR? What mistakes do you commonly see?

The fundamental rules for ISR safety on Cortex-M are: (1) Clear the interrupt flag — the very first thing the ISR should do is clear the pending flag in the peripheral's status register. If you forget, the ISR will re-enter immediately after return because the NVIC still sees the request asserted, creating an infinite loop that locks up the system. For level-triggered peripherals like UART RXNE, clearing means reading the data register; for timer update interrupts, it means writing to the status register. (2) No blocking calls — never call functions that wait, sleep, or spin on a condition. This includes HAL_Delay() (which polls SysTick and will deadlock if SysTick is at equal or lower priority), RTOS osDelay(), busy-wait loops, and mutex locks. (3) No heap operations — malloc(), free(), printf(), sprintf(), and new all touch the heap, which is not reentrant. If the main loop is mid-malloc when an ISR calls malloc, the heap metadata is corrupted silently.

(4) Use volatile for shared variables — any variable written in an ISR and read in the main loop (or vice versa) must be declared volatile. Without it, the compiler may optimize away the read in the main loop, caching the value in a register and never seeing the ISR's update. This is the single most common bug in bare-metal ISR code — it works in debug builds (optimizations disabled) and breaks in release builds. (5) Minimize execution time — capture data and get out. Copy peripheral registers to a buffer, set a flag, and return. Parse, compute, and respond in the main context. (6) Ensure atomicity for multi-byte shared data — a 32-bit write on Cortex-M3/M4 is atomic if word-aligned, but a 64-bit timestamp or a multi-field struct is not. Use a critical section or a double-buffering scheme for non-atomic shared data.

Common mistakes in interviews and in real codebases: calling printf() for debug output inside an ISR (crashes or corrupts output in production), forgetting to clear the interrupt flag (system appears to hang), using HAL_Delay() inside an ISR (deadlocks because SysTick cannot fire), modifying shared data without volatile (works in debug, fails in release), and enabling floating-point operations in an ISR on Cortex-M4F without ensuring the FPU context is saved (the lazy stacking mechanism handles this automatically on ARMv7-M, but older RTOS ports may not preserve FPU state across context switches initiated from ISRs).

Priority & Nesting

QExplain preemption priority vs sub-priority on the NVIC. How does PRIGROUP configure them?

The Cortex-M NVIC assigns each interrupt a priority value stored in the upper bits of an 8-bit priority register (most STM32 implementations use only 4 bits, giving 16 priority levels, where 0 is the highest priority). The PRIGROUP field in the Application Interrupt and Reset Control Register (AIRCR) splits these priority bits into two fields: preemption priority (also called group priority) and sub-priority (also called subgroup priority). The split is configurable — for example, with 4 implemented bits, PRIGROUP can be set to give 4 bits of preemption and 0 bits of sub-priority (16 preemption levels, no sub-priority), or 3 bits and 1 bit (8 preemption levels with 2 sub-priority levels each), or 2 and 2, and so on.

Preemption priority determines whether one interrupt can preempt (nest inside) another. A numerically lower preemption priority value means higher urgency. If a preemption-priority-1 interrupt fires while a preemption-priority-3 ISR is running, the hardware immediately preempts: it stacks the ISR context and begins executing the higher-priority handler. Interrupts at the same preemption priority level never preempt each other, regardless of sub-priority values. Sub-priority is only a tie-breaker: when two interrupts with the same preemption priority are pending simultaneously, the one with the lower sub-priority number is serviced first. Once the first one starts executing, the second one waits — it does not preempt.

In practice, most bare-metal projects use the default PRIGROUP setting that allocates all bits to preemption priority and zero bits to sub-priority (e.g., HAL_NVIC_SetPriorityGrouping(NVIC_PRIORITYGROUP_4) on STM32). This gives the maximum number of nesting levels and simplifies reasoning about preemption. Sub-priority only matters when you have interrupts that should be serviced in a defined order but must never preempt each other — a rare scenario. A common mistake is confusing the two and assuming that a lower sub-priority interrupt will preempt a higher sub-priority one at the same preemption level — it will not.

QWhat is tail-chaining and how does it improve interrupt throughput?

Tail-chaining is a hardware optimization in the Cortex-M NVIC that eliminates redundant stack operations when a pending interrupt is waiting while an ISR is completing. Normally, returning from an ISR requires unstacking eight registers (6-12 cycles) and then, if another interrupt is pending, re-stacking the same eight registers (6-12 cycles) to enter the next ISR — a total of 12-24 wasted cycles doing nothing but writing and reading the same values to and from the stack. With tail-chaining, the processor detects the pending interrupt during the exception return sequence, skips both the unstack and the re-stack, and directly begins fetching the next ISR's vector. On Cortex-M3/M4, a tail-chained ISR entry takes only 6 cycles compared to 12 cycles for a fresh entry.

This optimization is critical for high-frequency interrupt systems. Consider a DMA half-transfer and transfer-complete interrupt firing back-to-back on a system running at 72 MHz. Without tail-chaining, the gap between ISRs is approximately 24 cycles (333 ns). With tail-chaining, the gap is 6 cycles (83 ns) — a 4x improvement in transition speed. For systems handling dozens or hundreds of interrupts per millisecond (high-speed communication, motor control with multiple sensors), the cumulative cycle savings are substantial.

A related optimization is late-arriving: if a higher-priority interrupt arrives during the stacking phase of a lower-priority interrupt, the processor completes stacking but branches to the higher-priority ISR instead. When the higher-priority ISR finishes, it tail-chains into the original lower-priority ISR without any additional stacking. This means the processor always services the most urgent interrupt first, even if a lower-priority one triggered the initial context save. Both tail-chaining and late-arriving are automatic hardware behaviors — the programmer does not enable or configure them, but understanding them explains why measured ISR latencies are often shorter than the theoretical maximum.

QHow does interrupt nesting work on Cortex-M? What are the stack implications?

Interrupt nesting occurs automatically on Cortex-M whenever a higher preemption priority interrupt fires while a lower-priority ISR is executing. The hardware pushes another stack frame (8 registers = 32 bytes on Cortex-M3/M4, or 26 words = 104 bytes on Cortex-M4F with FPU context if floating-point was used) onto the current stack and begins executing the higher-priority ISR. This can happen recursively: a priority-0 interrupt can preempt a priority-1 ISR that already preempted a priority-2 ISR, creating three nested stack frames on top of whatever the main application was using.

The stack implications are significant and often underestimated. Each nesting level consumes at least 32 bytes for the hardware-saved frame, plus whatever stack the ISR itself uses for local variables and function calls. On Cortex-M4F with lazy FPU stacking, if any ISR in the chain uses floating-point, an additional 72 bytes are reserved (18 FPU registers times 4 bytes) even if the FPU was not actually used in every frame — the hardware reserves space pessimistically during lazy stacking and fills it only if needed. With 4-5 nesting levels, the worst-case stack consumption from interrupts alone can exceed 500 bytes. If the system uses the MSP for both handler and thread mode (the default without an RTOS), this must be added to the application's own stack usage.

A practical approach to stack sizing: calculate the worst-case nesting depth (determined by how many distinct preemption priority levels you actually use), multiply by the per-level stack cost (frame size + ISR local usage), add the application's own worst-case stack depth, and then add a safety margin of 20-25%. Tools like the Keil stack analyzer or manual call-tree analysis help determine per-ISR stack usage. A common mistake is ignoring ISR stack contributions entirely and then seeing stack overflow corruption that manifests as random crashes under heavy interrupt load — a bug that is extremely difficult to reproduce and diagnose without a stack watermark or MPU guard region.

Shared Data Protection

QHow do you safely share data between an ISR and the main loop?

Sharing data between an ISR and the main loop is one of the most error-prone areas in bare-metal firmware. The fundamental problem is that the ISR can fire at any point during the main loop's execution, creating a race condition if both contexts access the same variable. The first and most critical requirement is the volatile keyword: any shared variable must be declared volatile to prevent the compiler from caching its value in a register. Without volatile, the main loop might read the variable once into a register and never re-read it from memory, missing all subsequent ISR updates. This bug is invisible in debug builds (where optimizations are off) and only appears in release builds.

Beyond volatile, you must ensure atomicity. On Cortex-M3/M4/M7, a single aligned 32-bit read or write is atomic — the bus guarantees it completes in one cycle and cannot be interrupted mid-operation. So a volatile uint32_t flag can be safely set in the ISR and read in the main loop. But a 64-bit variable, a struct, or even two related 32-bit variables that must be consistent with each other (like a timestamp and a sensor value) are not atomic. If the ISR updates both fields and the main loop reads between the two writes, it gets a half-old, half-new snapshot — a torn read.

For non-atomic shared data, the standard solutions are: (1) Critical sections — disable interrupts around the main-loop read (__disable_irq() / __enable_irq()), guaranteeing the ISR cannot fire mid-read. Keep these sections as short as possible. (2) Double buffering or sequence counters — the ISR writes to one buffer and flips an index; the main loop reads from the other buffer. No critical section needed, but the logic is more complex. (3) Lock-free ring buffers — ideal for streaming data (UART bytes, ADC samples). The ISR writes to the head, the main loop reads from the tail, and as long as both pointers are updated atomically and the buffer does not overflow, no locking is needed. This is the gold-standard pattern for ISR-to-main-loop data transfer in production firmware.

QWhat is a critical section and how do you implement one on Cortex-M?

A critical section is a region of code that must execute atomically with respect to interrupts — no ISR can fire during the critical section, ensuring that shared data is read or modified without interference. On Cortex-M, the simplest implementation is to disable all interrupts at the start and re-enable them at the end using the CMSIS intrinsics __disable_irq() and __enable_irq(). These map directly to the CPSID I and CPSIE I instructions, which clear and set the PRIMASK register's I-bit. When PRIMASK is set, all interrupts with configurable priority are masked — only NMI and HardFault can still fire.

__disable_irq();
// Read or modify shared data — no ISR can fire here
shared_timestamp = local_timestamp;
shared_value = local_value;
__enable_irq();

The critical caveat with this simple approach is that it unconditionally enables interrupts at the end, even if interrupts were already disabled before the critical section (e.g., if this code is called from another critical section or from an ISR). The correct pattern saves and restores the previous interrupt state:

uint32_t primask = __get_PRIMASK();
__disable_irq();
// Critical region
__set_PRIMASK(primask);  // Restore previous state

This is nestable — if interrupts were already disabled, they stay disabled after the restore. Every production-quality RTOS provides macros for this (taskENTER_CRITICAL() / taskEXIT_CRITICAL() in FreeRTOS), and bare-metal projects should adopt the same pattern. The most important rule: keep critical sections as short as possible. Every microsecond spent with interrupts disabled is a microsecond of added worst-case latency for every interrupt in the system. Copy data into local variables inside the critical section, then process it outside. A candidate who describes using __disable_irq() / __enable_irq() without mentioning the save/restore pattern or the latency impact is missing production-level understanding.

QWhat are the risks of calling printf from an ISR?

Calling printf() from an ISR is one of the most common and dangerous mistakes in embedded development, yet it is tempting because developers want debug output from interrupt context. The problems are multiple and severe. First, printf() is not reentrant — it uses internal static buffers and state. If the main loop is mid-printf when an ISR calls printf, the internal state (buffer pointers, format parsing position) is corrupted. The result is garbled output at best, a heap corruption or HardFault at worst. This is not theoretical — it is one of the most frequently encountered crash causes in embedded systems during development.

Second, printf() typically calls malloc() internally (for buffer management in many C library implementations) and malloc() is not reentrant either. If the main loop is inside a malloc call when the ISR triggers and calls printf, the heap's linked-list metadata is corrupted, causing either an immediate crash or a delayed, seemingly unrelated crash the next time any code touches the heap. Third, the underlying output mechanism (UART transmit, semihosting, SWO) involves blocking I/O. UART printf waits for the transmit buffer to drain, which can take milliseconds at 115200 baud for a long format string — an eternity in ISR context. During this time, all equal-and-lower-priority interrupts are blocked, latency guarantees are destroyed, and the system may miss time-critical events.

Safe alternatives for ISR debugging: (1) Toggle a GPIO pin and measure timing with an oscilloscope or logic analyzer — zero overhead, zero risk. (2) Write to a RAM-based circular log buffer in the ISR (just a pointer increment and a memcpy), then drain the buffer to UART in the main loop. (3) Use ITM/SWO trace output via the debug port, which is designed for this purpose and has minimal latency. (4) Set a volatile flag or error code that the main loop prints. The bottom line: printf in an ISR violates every ISR safety rule simultaneously — it is non-reentrant, uses the heap, blocks, and takes an unpredictable amount of time. Never use it, not even "temporarily" for debugging.

Latency & Debugging

QWhat causes interrupt latency and how do you minimize it?

Interrupt latency is the time from when a peripheral asserts an interrupt request to when the first instruction of the ISR executes. On Cortex-M3/M4, the theoretical minimum is 12 clock cycles (stacking plus vector fetch), but real-world latency is always higher due to several factors. The most significant is a higher-priority ISR already running — the pending interrupt cannot execute until the current ISR returns (or tail-chains), so worst-case latency includes the entire execution time of every higher-priority ISR. This is why keeping all ISRs short is a system-wide discipline, not just a local optimization.

Other contributors: Critical sections where interrupts are globally disabled via PRIMASK add directly to latency — a 10-microsecond critical section adds 10 microseconds of worst-case latency to every interrupt. Flash memory wait states increase the vector fetch time; at 168 MHz on STM32F4 with 5 wait states, a flash read can take 6 cycles instead of 1. This is mitigated by the ART accelerator (prefetch and instruction cache), but a cache miss during vector lookup adds measurable delay. Bus contention from DMA transfers or other bus masters can stall the stacking operation by a few cycles. Multi-cycle instructions being executed when the interrupt fires (like an LDMIA loading 8 registers) must complete before the interrupt is taken.

To minimize latency: (1) Assign correct preemption priorities — the most time-critical interrupt should have the highest (numerically lowest) priority so it preempts everything else. (2) Keep all ISRs as short as possible, especially higher-priority ones. (3) Minimize critical section duration — use the save/restore PRIMASK pattern and keep the protected region to just the data copy. (4) Place interrupt vector tables and critical ISR code in SRAM or TCM rather than flash to eliminate wait-state penalties (on Cortex-M7, placing code in ITCM gives single-cycle access). (5) Avoid using __disable_irq() entirely if possible — use BASEPRI instead to mask only lower-priority interrupts while leaving higher-priority ones enabled. __set_BASEPRI(priority_threshold) blocks interrupts at or below the threshold while allowing more urgent ones through, preserving responsiveness for the most critical handlers.

QWhat is the difference between edge-triggered and level-triggered interrupts? When does each matter?

An edge-triggered interrupt fires on a signal transition — rising edge (low to high), falling edge (high to low), or both. The interrupt controller captures the edge event in a pending flag, and the ISR must clear this flag to acknowledge it. An edge-triggered interrupt fires exactly once per transition, regardless of how long the signal stays at the new level. A level-triggered interrupt fires whenever the signal is at the active level (high or low). As long as the signal remains asserted, the interrupt keeps re-triggering — the ISR must resolve the condition that caused the level assertion (typically by reading a data register or clearing a status flag in the peripheral), or the ISR will re-enter immediately after return in an infinite loop.

Edge-triggered interrupts are appropriate for discrete events: button presses, encoder pulses, communication start-of-frame signals, or any signal where you care about the transition, not the sustained state. The risk with edge-triggered is missed edges: if two edges occur before the ISR can clear the pending flag, only one interrupt is generated. This matters for high-frequency pulse counting — if the ISR takes too long, edges are lost. The EXTI peripheral on STM32 is edge-triggered and captures edges in the PR (Pending Register), which latches until software clears it.

Level-triggered interrupts are appropriate for status conditions: a FIFO has data available (UART RXNE), a DMA transfer is complete, or an error condition persists. The level-triggered model naturally handles the case where the condition has not been resolved — the interrupt keeps firing until the ISR properly handles the source. The critical mistake with level-triggered interrupts is failing to clear the source condition. For example, if a UART RXNE interrupt fires and the ISR clears the NVIC pending bit but does not read the data register, RXNE remains asserted and the ISR re-enters immediately, creating an infinite loop that starves the rest of the system. The fix is always to address the root cause (read the data register to clear RXNE), not just the symptom (the NVIC pending bit).

QHow would you debug a system where interrupts seem to be lost or not firing?

Lost interrupts are among the most frustrating embedded bugs because the symptom (nothing happens) gives little diagnostic information. A systematic approach works through the interrupt delivery chain from peripheral to application. Step 1: Verify the peripheral is generating the request. Read the peripheral's status register — is the interrupt flag (e.g., UART RXNE, TIM UIF, EXTI PR) actually set? If not, the peripheral is not generating the event. Check that the peripheral is configured correctly, clocked (RCC enable bit set), and that the trigger condition is actually occurring. Use a logic analyzer to confirm the external signal is present if dealing with GPIO-based interrupts.

Step 2: Verify the NVIC path. Even if the peripheral flag is set, the interrupt must pass through the peripheral's interrupt enable bit (e.g., TIM_DIER_UIE, USART_CR1_RXNEIE), then the NVIC enable register (NVIC_EnableIRQ()), and finally the global interrupt mask (PRIMASK must be clear). Check each link in this chain. A common bug is enabling the NVIC IRQ but forgetting the peripheral-level interrupt enable, or vice versa. Read NVIC->ISPR[] to see if the interrupt is pending but masked — this indicates the NVIC is disabled or a higher-priority ISR is blocking it. Check that the priority level is not accidentally masked by a BASEPRI setting.

Step 3: Check for ISR-level issues. Place a GPIO toggle as the very first instruction of the ISR and monitor it with an oscilloscope. If the GPIO toggles, the ISR is firing — the bug is in the ISR logic, not the interrupt delivery. If it does not toggle, verify the vector table: is the function name correct (spelling must match the startup file's weak alias exactly — TIM2_IRQHandler, not Timer2_IRQHandler), and is the vector table located at the correct address? On STM32, if you relocate the vector table via SCB->VTOR and get the offset wrong, interrupts jump to the wrong address and typically HardFault. Another frequent cause of "lost" interrupts: the ISR fires but does not clear the peripheral flag, so it re-enters infinitely and the system appears to hang rather than miss interrupts. Adding a breakpoint inside the ISR and checking the call count can distinguish between "never fires" and "fires too many times."