Quick Cap
Interrupts let hardware events (timers, communication peripherals, GPIO pins) break into normal program flow so the CPU can respond immediately. On ARM Cortex-M, the Nested Vectored Interrupt Controller (NVIC) manages all external interrupts with configurable priority levels and automatic context saving, making it the heart of every real-time embedded system.
Key Facts:
- NVIC handles up to 240 external interrupts on Cortex-M, with 3-8 bits of configurable priority
- Lower number = higher priority: priority 0 preempts priority 1 (a constant source of confusion)
- Hardware context save: Cortex-M automatically pushes R0-R3, R12, LR, PC, xPSR on interrupt entry — no manual save needed
- Tail-chaining: back-to-back interrupts skip the context restore/save cycle, reducing latency to 6 cycles
- ISR golden rule: keep it short, set a flag, do heavy work in the main loop
- Shared data protection:
volatilefor visibility, critical sections or atomics for integrity
Deep Dive
At a Glance
| Characteristic | Detail |
|---|---|
| Controller | NVIC (Cortex-M), PLIC (RISC-V) |
| Max IRQs | 240 external + 16 system exceptions (Cortex-M) |
| Priority bits | 3-8 (vendor-configured, typically 4 on STM32) |
| Context save | Automatic (8 registers pushed to stack) |
| Latency | 12 cycles typical (Cortex-M3/M4), 15 cycles (M0) |
| Tail-chaining | 6 cycles between back-to-back ISRs |
| Stack usage | ~32 bytes per nested interrupt level |
How an Interrupt Works
When an interrupt fires on Cortex-M, the hardware performs this sequence automatically:
1. Finish current instruction2. Push 8 registers to current stack (R0-R3, R12, LR, PC, xPSR)3. Load handler address from vector table4. Switch to Handler mode (privileged)5. Execute ISR code6. Pop 8 registers (detected via special EXC_RETURN value in LR)7. Resume interrupted code
The entire entry sequence takes 12 clock cycles on Cortex-M3/M4 — this is the minimum possible interrupt latency if no higher-priority interrupt is running.
The Vector Table
The vector table is an array of function pointers in Flash (or RAM if relocated via VTOR). The first entry is the initial stack pointer; the second is the Reset handler; entries 3-15 are system exceptions (NMI, HardFault, SVCall, PendSV, SysTick); entry 16 onward are peripheral IRQs.
| Vector # | Exception | Typical Use |
|---|---|---|
| 0 | Initial SP | Stack pointer loaded on reset |
| 1 | Reset | Entry point after power-on / reset |
| 2 | NMI | Non-maskable interrupt (cannot be disabled) |
| 3 | HardFault | Unrecoverable error catch-all |
| 4-10 | MemManage, BusFault, UsageFault, etc. | Configurable fault handlers |
| 11 | SVCall | Supervisor call (used by RTOS) |
| 14 | PendSV | Deferred context switch (used by RTOS) |
| 15 | SysTick | System timer tick |
| 16+ | IRQ0, IRQ1, ... | Peripheral interrupts (UART, Timer, GPIO...) |
NVIC Priority System
This is the single most confusing aspect of Cortex-M interrupts, and interviewers love testing it.
Priority bits are split into two fields:
| Field | Controls | Effect |
|---|---|---|
| Preemption priority (group priority) | Whether an ISR can interrupt another ISR | A running ISR with preemption priority 2 will be preempted by an IRQ with preemption priority 1 |
| Sub-priority | Tie-breaking when two IRQs pend simultaneously | Among IRQs with the same preemption priority, the one with lower sub-priority runs first — but it does NOT preempt |
The split is configured globally via AIRCR.PRIGROUP. With 4 priority bits (common on STM32):
| PRIGROUP | Preemption bits | Sub-priority bits | Preemption levels | Sub-priority levels |
|---|---|---|---|---|
| 0 | 4 | 0 | 16 | 1 |
| 1 | 3 | 1 | 8 | 2 |
| 2 | 2 | 2 | 4 | 4 |
| 3 | 1 | 3 | 2 | 8 |
| 4 | 0 | 4 | 1 (no preemption) | 16 |
Priority value 0 is the HIGHEST priority. Priority 15 is the LOWEST. This is the opposite of what most people intuit. When an interviewer asks "which interrupt runs first?", always check the numeric value — lower wins.
ISR Design Rules
The golden rules for writing interrupt service routines:
| Do | Don't |
|---|---|
| Clear the interrupt flag first | Call printf(), malloc(), or any blocking function |
| Set a flag and return | Use delay_ms() or busy-wait loops |
Use volatile for shared variables | Access complex data structures without protection |
| Keep execution under 10 us for most ISRs | Call non-reentrant library functions |
| Use ring buffers for ISR-to-main data transfer | Perform heavy computation |
A typical well-designed ISR follows this pattern:
volatile uint8_t uart_rx_flag = 0;volatile uint8_t uart_rx_byte;void USART1_IRQHandler(void) {if (USART1->SR & USART_SR_RXNE) {uart_rx_byte = USART1->DR; // Read clears the flaguart_rx_flag = 1; // Signal main loop}}
The main loop checks uart_rx_flag, processes the byte, and clears the flag. The ISR does almost nothing — just captures the data and gets out.
Nested Interrupts
Nesting happens automatically on Cortex-M when a higher-preemption-priority interrupt fires while a lower-priority ISR is running. The hardware pushes the running ISR's context onto the stack and starts the higher-priority handler.
Time ──────────────────────────────────────────────────────►Main code ████████░░░░░░░░░░░░░░░░░░░░░░░░░░░██████████IRQ_Low (P3) ██████░░░░░░░░░░░░░░██████████IRQ_High (P1) ██████████████^ ^ ^ ^│ │ │ │Low fires High fires High Low(preempts) returns returns
Each nesting level consumes ~32 bytes of stack (8 registers x 4 bytes). With 3 levels of nesting, you need at least 96 extra bytes of stack space beyond normal usage. This is why stack sizing must account for worst-case nesting depth.
Tail-Chaining and Late Arrival
Tail-chaining: When an ISR finishes and another interrupt is already pending, the NVIC skips the full pop-then-push cycle. Instead, it takes only 6 cycles to switch directly to the next handler. This dramatically improves throughput when multiple interrupts fire in rapid succession.
Late arrival: If a higher-priority interrupt arrives during the context-save phase of a lower-priority interrupt (within the 12-cycle entry window), the NVIC diverts to the higher-priority handler instead. When that handler finishes, it tail-chains to the original lower-priority handler. No cycles are wasted.
Sharing Data Between ISR and Main Loop
This is where most bugs hide. Three rules:
Rule 1: volatile — The compiler cannot know that an ISR modifies a variable. Without volatile, the compiler may cache the variable in a register and never re-read it from memory, causing the main loop to spin forever on a stale value.
Rule 2: Atomicity — On a 32-bit Cortex-M, a single 32-bit read or write is atomic. But a 64-bit variable, a struct, or a read-modify-write sequence is NOT atomic. If the ISR fires between a read and a write in the main loop, data corruption occurs.
Rule 3: Critical sections — For non-atomic operations on shared data, temporarily disable interrupts:
__disable_irq();// Access shared multi-byte data safelyshared_struct.field1 = new_val1;shared_struct.field2 = new_val2;__enable_irq();
Keep critical sections as short as possible — every cycle with interrupts disabled adds to worst-case latency. For producer-consumer patterns (ISR produces, main loop consumes), a lock-free ring buffer is often better than disabling interrupts.
A ring buffer with separate read and write indices (each written by only one side) can be completely lock-free on 32-bit architectures. The ISR writes data and advances the write index; the main loop reads data and advances the read index. No critical section needed, as long as both indices are volatile and their updates are single 32-bit writes.
Edge vs Level Triggered
| Trigger Type | Fires When | Missed-Interrupt Risk | Best For |
|---|---|---|---|
| Edge | Signal transitions (rising, falling, or both) | Yes — if edge occurs during ISR, the second edge is lost | Button presses, encoder pulses, one-shot events |
| Level | Signal is held at a given state | No — interrupt re-fires until source is cleared | Peripheral status flags (UART RX ready, DMA complete) |
Most internal MCU peripheral interrupts are effectively level-triggered — the interrupt stays pending as long as the status flag is set. Clearing the flag in the ISR deasserts the interrupt. If you forget to clear the flag, the ISR will fire again immediately upon return, creating an infinite loop.
External GPIO interrupts are typically configurable as edge or level. For mechanical buttons, use edge-triggered with debouncing to avoid multiple spurious interrupts from contact bounce.
Interrupt Latency Breakdown
Total interrupt response time = hardware latency + software overhead:
| Component | Cycles (Cortex-M4) | Notes |
|---|---|---|
| Finish current instruction | 1-12 | Multi-cycle instructions (divide, load-multiple) take longer |
| Context save (push 8 regs) | 12 | Fixed by hardware |
| Fetch ISR address from vector table | Included | Part of the 12-cycle entry |
| ISR prologue (compiler-generated) | 0-4 | Stack frame setup if needed |
| Total minimum | 12 | Best case, no higher-priority ISR running |
Factors that increase latency:
- A higher-priority ISR already running (must wait for it to finish or yield)
- Flash wait states (ISR code in Flash with wait states adds stalls)
- Bus contention (DMA competing for bus access)
- Critical sections in main code (
__disable_irq()blocks all maskable interrupts)
Every microsecond spent with interrupts disabled adds directly to worst-case interrupt latency. If your main loop disables interrupts for 50 us to update a data structure, then no interrupt can respond faster than 50 us — even if the NVIC hardware latency is only 12 cycles. Minimize critical section duration or use lock-free data structures.
Debugging Story: The Disappearing UART Bytes
A team was developing a motor controller that received commands over UART at 115200 baud. During light motor loads, communication worked perfectly. Under heavy load, bytes were randomly dropped — about 1 in 50 commands was corrupted or lost.
The root cause: the motor control ISR (running at a higher priority than UART) took 120 us to execute in worst case. At 115200 baud, a new byte arrives every ~87 us. When the motor ISR ran for 120 us, it blocked the UART ISR long enough for the UART hardware FIFO (only 1 byte deep on this MCU) to overflow. The second byte overwrote the first before the UART ISR could read it.
The fix had two parts: (1) reduce the motor ISR execution time to under 20 us by moving the heavy computation (PID calculations, sine table lookups) to the main loop and only doing the time-critical PWM register update in the ISR, and (2) enable the DMA-based UART receive to decouple reception from ISR timing entirely.
The lesson: ISR execution time does not just affect that interrupt — it affects the latency of every lower-priority interrupt in the system. Always analyze worst-case timing across all priority levels.
What Interviewers Want to Hear
- You understand the NVIC priority system (preemption vs sub-priority, lower number = higher priority)
- You can explain the full interrupt lifecycle: trigger, context save, vector lookup, ISR, context restore
- You know the ISR golden rules: short, no blocking, volatile for shared data, clear the flag
- You can analyze interrupt latency and identify what affects it
- You understand the difference between edge and level triggered
- You can describe how to safely share data between ISR and main loop (volatile + critical sections or lock-free structures)
Interview Focus
Classic Interview Questions
Q1: "What happens when an interrupt fires on a Cortex-M processor?"
Model Answer Starter: "The hardware finishes the current instruction, pushes eight registers (R0-R3, R12, LR, PC, xPSR) onto the active stack, loads the handler address from the vector table, and begins executing in Handler mode. The entire entry takes 12 cycles on Cortex-M3/M4. When the ISR returns via a special EXC_RETURN value in LR, the hardware pops the saved registers and resumes the interrupted code. If another interrupt is already pending, it tail-chains in 6 cycles instead of doing a full pop-and-push."
Q2: "How do you safely share data between an ISR and the main loop?"
Model Answer Starter: "Three rules: volatile so the compiler does not optimize away re-reads, atomicity for multi-byte data, and critical sections or lock-free structures for compound operations. A single 32-bit read or write is atomic on Cortex-M, but anything larger needs protection. For simple flags, a volatile uint32_t is sufficient. For streaming data, I use a lock-free ring buffer with separate read and write indices. For complex structures, I briefly disable interrupts with __disable_irq() / __enable_irq(), keeping the critical section as short as possible to minimize latency impact."
Q3: "Explain preemption priority vs sub-priority on NVIC."
Model Answer Starter: "Preemption priority determines whether one ISR can interrupt another — a lower preemption value will preempt a running ISR with a higher value. Sub-priority only matters as a tie-breaker when two interrupts with the same preemption priority pend simultaneously — the lower sub-priority runs first, but it cannot preempt the other. The split between preemption and sub-priority bits is configured globally via AIRCR.PRIGROUP. A common mistake is assuming sub-priority enables preemption — it does not."
Q4: "Why should ISRs be kept short?"
Model Answer Starter: "A long ISR blocks all lower-priority interrupts from running, increasing their worst-case latency. In a system with a motor control ISR taking 100 us, a UART ISR cannot respond faster than 100 us even though the hardware latency is 12 cycles. I keep ISRs short by doing the minimum necessary — clear the flag, capture the data, set a flag for the main loop. Heavy processing like PID calculations, protocol parsing, or data logging goes in the main loop or a deferred task."
Q5: "How would you debug a system where interrupts seem to be lost?"
Model Answer Starter: "I would check several things systematically: (1) Is the interrupt flag being cleared in the ISR? If not, a level-triggered interrupt re-fires immediately and appears to work but may mask a second event. (2) Is a higher-priority ISR blocking for too long? Use a scope or GPIO toggle to measure ISR execution times. (3) Is the main loop disabling interrupts for extended periods? (4) For edge-triggered interrupts, is the edge happening during the ISR when the flag is already cleared? I would toggle a GPIO at ISR entry/exit and use a logic analyzer to correlate interrupt signals with ISR timing."
Trap Alerts
- Don't say: "All interrupts have the same priority" — Cortex-M has a configurable priority system; understanding it is fundamental
- Don't forget:
volatileon every variable shared between ISR and main code — the most common real-world ISR bug - Don't ignore: The impact of ISR execution time on lower-priority interrupt latency — this is a system-level concern, not per-interrupt
Follow-up Questions
- "What is tail-chaining and how does it improve interrupt throughput?"
- "How would you handle priority inversion between two ISRs that share a resource?"
- "What is the difference between
__disable_irq()and masking a specific interrupt with NVIC?" - "How do you determine the right stack size when using nested interrupts?"
- "What happens if you forget to clear an interrupt flag in a level-triggered ISR?"
Ready to test yourself? Head over to the Interrupts Interview Questions page for a full set of Q&A with collapsible answers — perfect for self-study and mock interview practice.
Practice
❓ On Cortex-M, which priority value represents the HIGHEST priority?
❓ What is tail-chaining in the NVIC?
❓ Why must variables shared between an ISR and the main loop be declared volatile?
❓ What happens if you forget to clear the interrupt flag in a level-triggered ISR?
❓ A Cortex-M4 system has 4 priority bits and PRIGROUP=2 (2 bits preemption, 2 bits sub-priority). IRQ_A has priority 0x40 and IRQ_B has priority 0x80. Can IRQ_A preempt a running IRQ_B?
Real-World Tie-In
Automotive Engine Control — An engine ECU uses interrupts at three priority levels: crankshaft position sensor (highest, sub-microsecond deadline for ignition timing), fuel injector control (medium, millisecond-level), and CAN bus communication (lowest). The crankshaft ISR does nothing but capture a timer value and set a flag; the actual ignition angle calculation runs in a high-priority RTOS task. This separation ensures the capture timestamp is always accurate regardless of system load.
Industrial Sensor Hub — A process monitoring system reads 8 analog sensors via DMA (triggered by timer interrupt) while simultaneously receiving Modbus commands over UART. The DMA-complete ISR simply swaps buffer pointers (double-buffering); the UART ISR feeds bytes into a ring buffer. All protocol parsing and sensor averaging happens in the main loop. Under worst-case load, the maximum interrupt-disabled time is 2 us (a single pointer swap in a critical section), ensuring no UART bytes are lost even at 460800 baud.