Quick Cap
Every embedded system is built around an MCU core (most commonly ARM Cortex-M) and a clock system that determines how fast it runs, how much power it consumes, and what peripherals are available. The clock tree distributes a base oscillator frequency through PLLs and prescalers to the CPU and every peripheral bus. Getting it right is fundamental — get it wrong and UART baud rates drift, timers are inaccurate, and power consumption explodes.
Key Facts:
- ARM Cortex-M dominates embedded: M0 (ultra-low-cost), M4 (DSP+FPU), M7 (cache+TCM), M33 (TrustZone)
- RISC-V is the open-source alternative, gaining traction in cost-sensitive and custom silicon designs (ESP32-C3, CH32V)
- Clock tree: oscillator source (HSI/HSE) feeds a PLL, which drives SYSCLK, which is divided down to AHB and APB buses
- PLL formula:
SYSCLK = (Source / M) * N / P— a common interview calculation question - Power modes: Run, Sleep, Stop, Standby — each trades wake-up latency for lower consumption
- Clock gating: disabling the clock to unused peripherals is the simplest and most effective power-saving technique
Deep Dive
At a Glance
| Characteristic | Detail |
|---|---|
| Dominant architecture | ARM Cortex-M (90%+ market share in 32-bit MCUs) |
| Instruction set | Thumb-2 (ARM), RV32IMAC (RISC-V embedded) |
| Typical clock range | 8 MHz (HSI) to 480 MHz (STM32H7) |
| Clock sources | HSI (internal RC), HSE (external crystal), LSI, LSE |
| PLL lock time | Typically 200-500 us |
| Power modes | Run, Sleep, Stop, Standby, Shutdown (vendor-specific names) |
| Clock accuracy | HSI: +/-1%, HSE crystal: +/-20 ppm |
ARM Cortex-M Family
Choosing the right core is one of the first decisions in any embedded project. Each variant is optimized for a different price/performance/power sweet spot:
| Core | Pipeline | FPU | DSP | Cache | TrustZone | Typical Use Case |
|---|---|---|---|---|---|---|
| M0 | 3-stage | No | No | No | No | Ultra-low-cost: toys, simple sensors, LED drivers |
| M0+ | 2-stage | No | No | No | No | Low-power wearables, basic IoT nodes |
| M3 | 3-stage | No | No | No | No | General purpose: industrial control, motor drives |
| M4 | 3-stage | SP | Yes | No | No | Signal processing: audio, sensor fusion, motor FOC |
| M7 | 6-stage | DP | Yes | I+D | No | High performance: graphics, networking, complex control |
| M33 | 3-stage | SP | Yes | No | Yes | Security-critical: payment, medical, automotive |
Key differences that interviewers test:
- M0 vs M3: M3 adds hardware divide, bit-banding, and full fault handling. M0 has only HardFault.
- M4 vs M7: M7 has I-cache + D-cache (critical for external Flash/RAM performance), tightly-coupled memory (TCM), and a 6-stage superscalar pipeline. M4 is single-issue.
- M33 vs M4: M33 adds TrustZone (hardware-enforced secure/non-secure partition) but with similar DSP performance.
RISC-V in Embedded
RISC-V is an open ISA (no licensing fees) with a modular extension system. In embedded, the common configuration is RV32IMAC: 32-bit base integer (I) + multiply/divide (M) + atomics (A) + compressed instructions (C, equivalent to Thumb).
| Aspect | ARM Cortex-M | RISC-V |
|---|---|---|
| Licensing | Per-chip royalty + upfront fee | Free and open |
| Ecosystem maturity | Decades of tools, IDEs, RTOS ports | Rapidly growing but still catching up |
| Interrupt controller | NVIC (standardized) | PLIC/CLIC (varies by implementation) |
| Debug interface | CoreSight (SWD/JTAG) | Implementation-specific |
| Real products | STM32, nRF52, LPC, SAM | ESP32-C3, GD32V, CH32V, SiFive |
| When to choose | Need mature ecosystem, proven silicon | Cost-sensitive, custom silicon, avoiding vendor lock-in |
For interviews, you should know RISC-V exists and its key advantages (open, customizable, no royalties) but ARM Cortex-M knowledge is far more commonly tested.
Clock Tree Architecture
The clock tree is the distribution network that takes a base frequency and delivers appropriately scaled clocks to every part of the MCU. Here is a typical STM32 clock tree:
┌──────────┐HSI (16 MHz) ────►│ │ ┌───────┐│ MUX ├────►│ PLL │HSE (8 MHz) ────►│ │ │ /M *N │ ┌───────┐└──────────┘ │ /P ├────►│ SYSCLK│└───────┘ │168 MHz│└───┬───┘│┌────────────┼────────────┐▼ ▼ ▼┌────────┐ ┌────────┐ ┌────────┐│ AHB │ │ APB1 │ │ APB2 ││ /1 │ │ /4 │ │ /2 ││168 MHz │ │ 42 MHz │ │ 84 MHz │└───┬────┘ └───┬────┘ └───┬────┘│ │ │DMA, SRAM UART2/3 UART1GPIO, USB SPI2/3 SPI1Ethernet I2C1/2 TIM1/8TIM2-7 ADC
Clock sources:
| Source | Full Name | Frequency | Accuracy | Startup | Use Case |
|---|---|---|---|---|---|
| HSI | High-Speed Internal | 8-16 MHz (vendor-dependent) | +/-1% | Instant | Default after reset, fast startup |
| HSE | High-Speed External | Typically 4-25 MHz crystal | +/-20 ppm | 2-10 ms | Precise timing (UART, USB, Ethernet) |
| LSI | Low-Speed Internal | ~32 kHz | +/-5% | Instant | Watchdog timer, rough timekeeping |
| LSE | Low-Speed External | 32.768 kHz crystal | +/-20 ppm | Up to 2 s | RTC, accurate timekeeping |
PLL Configuration
The PLL (Phase-Locked Loop) multiplies the input frequency to achieve a higher system clock. The formula (STM32F4 example):
VCO frequency = (Source / M) * NSYSCLK = VCO / PUSB clock = VCO / Q (must be 48 MHz for USB)
Example: 8 MHz HSE crystal, target 168 MHz SYSCLK:
- M = 8 (input becomes 1 MHz)
- N = 336 (VCO = 336 MHz)
- P = 2 (SYSCLK = 168 MHz)
- Q = 7 (USB = 48 MHz)
The VCO output must stay within a valid range (typically 100-432 MHz for STM32F4). If M, N, or P values push the VCO outside this range, the PLL will not lock and the MCU will silently run on the fallback HSI clock. This is a common cause of "everything runs but at the wrong speed" bugs.
Why PLL lock time matters: After enabling the PLL, you must wait for the PLLRDY flag before switching SYSCLK to the PLL output. This takes 200-500 us. If you switch before the PLL is locked, the system clock will be unstable and the MCU may hard-fault.
Clock Gating
Every peripheral on an MCU has a clock enable bit in the RCC (Reset and Clock Control) registers. After reset, most peripheral clocks are disabled by default. You must enable the clock before accessing any peripheral register — otherwise the write is silently ignored and reads return zero.
This is also the simplest power optimization technique: disable clocks to peripherals you are not using. Each active peripheral clock adds current draw, even if the peripheral itself is idle.
// Enable GPIOA and USART1 clocksRCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN;RCC->APB2ENR |= RCC_APB2ENR_USART1EN;// Disable unused peripheral clocks to save powerRCC->APB1ENR &= ~RCC_APB1ENR_CAN1EN;
When asked "how do you reduce power consumption?", always mention clock gating first. It is zero-effort (just disable unused RCC enable bits) and can reduce idle current by 20-40%. More advanced techniques (power modes, voltage scaling) come after.
Power Modes
MCUs offer multiple power modes that trade wake-up latency for lower consumption. The exact names vary by vendor, but the concept is universal:
| Mode | CPU | Oscillators | SRAM | Peripherals | Wake-up Latency | Typical Current (STM32L4) |
|---|---|---|---|---|---|---|
| Run | Active | All on | Retained | Active | N/A | 100 uA/MHz |
| Sleep | WFI/WFE | All on | Retained | Active | Instant (interrupt latency) | ~50% of Run |
| Stop | Off | HSI/HSE off, LSE optional | Retained | Off (except wake-up) | 5-20 us | 1-10 uA |
| Standby | Off | All off (except LSE optional) | Lost (except backup domain) | Off | 50-200 us (full reboot) | 0.3-1 uA |
| Shutdown | Off | All off | Lost | Off | Full reboot | 0.03 uA |
Key decisions:
- Sleep: Use when you need sub-microsecond wake-up and peripherals must stay active. CPU is just waiting for the next interrupt.
- Stop: Best balance for most battery-powered applications. RAM is retained, so you resume where you left off. HSE/PLL must be reconfigured after wake-up.
- Standby: Near-zero power but you lose all RAM contents. Wake-up is effectively a system reset. Use for long idle periods (hours/days) where state can be saved to Flash or backup registers.
Dynamic Frequency Scaling
Instead of running at maximum clock speed all the time, you can adjust the CPU frequency based on workload:
| Workload | Strategy | Benefit |
|---|---|---|
| Sensor idle, waiting for timer | Drop to 8 MHz (HSI, no PLL) | Lowest active power |
| Light processing (UART parsing, flag checks) | 48 MHz (PLL with low N) | Good balance |
| Heavy computation (FFT, PID, encryption) | Full speed (168 MHz) | Maximum throughput |
| USB active | Must maintain 48 MHz USB clock | PLL Q divider constraint |
The tricky part: when you change the system clock, you must also update prescalers for any peripheral whose timing depends on the clock (UART baud rate, SPI clock, timer periods). Forget this and your UART will transmit at the wrong baud rate after a frequency change.
Reset Sources
Understanding what caused a reset is important for field debugging and reliability:
| Reset Source | Flag (STM32) | What It Means |
|---|---|---|
| Power-On Reset (POR) | PORRSTF | First power-up or power dipped below threshold |
| Pin Reset | PINRSTF | NRST pin pulled low (debug probe, reset button) |
| Software Reset | SFTRSTF | NVIC_SystemReset() called in firmware |
| Watchdog Reset | IWDGRSTF / WWDGRSTF | Watchdog timer expired — firmware is stuck |
| Brown-out Reset (BOR) | BORRSTF | Supply voltage dropped below configured threshold |
| Low-Power Reset | LPWRRSTF | Error during Stop/Standby mode entry |
After reading the reset flags in RCC->CSR, clear them by setting RMVF so they are fresh for the next reset event. Logging the reset source at startup helps diagnose field failures ("this unit keeps watchdog-resetting every 30 minutes").
Debugging Story: UART Baud Rate Wrong After Clock Change
A team was developing a data logger that used dynamic frequency scaling — running at 168 MHz during sensor acquisition and dropping to 8 MHz HSI between samples to save power. Everything worked perfectly until they noticed that UART communication corrupted after the first frequency transition.
The root cause: the UART baud rate register (BRR) is calculated from the APB clock frequency. When the system switched from PLL (168 MHz SYSCLK, 42 MHz APB1) to HSI (8 MHz SYSCLK, 8 MHz APB1), the BRR value was still configured for 42 MHz. The UART was transmitting at 1/5th the correct baud rate.
The fix: recalculate and update the UART BRR register every time the system clock changes. They created a SystemClockUpdate() function that reconfigured all clock-dependent peripherals (UART, SPI, timers) after any frequency transition.
The lesson: Changing the system clock affects every peripheral that derives timing from it. Always audit which peripherals need reconfiguration after a clock switch, and centralize the update logic.
What Interviewers Want to Hear
- You can compare Cortex-M variants and justify choosing one over another for a specific project
- You understand the clock tree from oscillator source through PLL to peripheral buses
- You can walk through a PLL calculation (source / M * N / P)
- You know the tradeoffs between power modes (latency vs current vs state retention)
- You mention clock gating as a first-line power optimization technique
- You understand that changing the system clock affects all clock-dependent peripherals
Interview Focus
Classic Interview Questions
Q1: "Compare Cortex-M4 vs Cortex-M7 — when would you choose each?"
Model Answer Starter: "Both have DSP instructions, but M7 adds I-cache and D-cache (critical when running from external Flash or SDRAM), tightly-coupled memory (TCM) for deterministic access, and a 6-stage superscalar pipeline for roughly 2x throughput at the same clock speed. I choose M4 for cost-sensitive products that run entirely from internal Flash with moderate DSP needs — sensor fusion, motor control, audio. I choose M7 when I need high throughput, large code/data in external memory, or complex networking stacks. The M7's cache also introduces DMA coherency issues that M4 does not have."
Q2: "Walk me through configuring the clock tree from an 8 MHz crystal to 168 MHz SYSCLK."
Model Answer Starter: "Start with HSE at 8 MHz. Configure the PLL: M=8 divides the input to 1 MHz, N=336 multiplies to 336 MHz VCO, P=2 divides to 168 MHz SYSCLK. Set Q=7 for 48 MHz USB clock. Configure bus prescalers: AHB /1 for 168 MHz, APB1 /4 for 42 MHz, APB2 /2 for 84 MHz. Before switching, set Flash wait states to match the new frequency — at 168 MHz you typically need 5 wait states. Enable HSE, wait for HSERDY, configure PLL, enable PLL, wait for PLLRDY, then switch SYSCLK to PLL and verify with SWS bits."
Q3: "How do power modes work? How do you wake from Stop mode?"
Model Answer Starter: "Stop mode turns off the CPU and the HSI/HSE oscillators but retains SRAM and register contents. The voltage regulator can be in main or low-power mode. Wake-up sources include any EXTI line (GPIO, RTC alarm, UART activity on some STM32s). On wake-up, the MCU resumes from the WFI instruction — it does NOT reset. However, the clock reverts to HSI since the PLL was off, so you must reconfigure the PLL and switch back to full speed in the wake-up handler. Stop mode typically draws 1-10 uA, making it ideal for battery-powered devices that wake periodically."
Q4: "What is clock gating and why does it matter?"
Model Answer Starter: "Clock gating means disabling the clock signal to a peripheral module via the RCC enable registers. When the clock is gated, the peripheral's flip-flops stop toggling, reducing dynamic power consumption to near zero for that module. It matters because even an idle peripheral consumes power if its clock is running — the transistors are still switching. On STM32, each peripheral has an enable bit in the appropriate RCC register (AHB1ENR, APB1ENR, etc.). After reset, most are disabled by default. It is the simplest and most effective power optimization."
Q5: "What's the difference between HSI and HSE? When does accuracy matter?"
Model Answer Starter: "HSI is the internal RC oscillator, typically 8-16 MHz with +/-1% accuracy. HSE is an external crystal oscillator, typically 4-25 MHz with +/-20 ppm accuracy. HSI is always available immediately after reset and requires no external components. HSE needs a crystal and 2-10 ms startup time. Accuracy matters for protocols with tight timing requirements: UART can tolerate about +/-2% clock error total (both ends), USB requires +/-0.25% (HSI is too inaccurate), CAN baud rates need crystal-grade accuracy for reliable multi-node communication. For applications that only use SPI, I2C, and GPIO, HSI is often sufficient."
Trap Alerts
- Don't say: "I just use the HAL clock configuration wizard" — interviewers want to see you understand the underlying clock tree
- Don't forget: Flash wait states must be configured BEFORE switching to a higher clock frequency, not after
- Don't ignore: That changing SYSCLK affects all peripheral baud rates and timer periods
Follow-up Questions
- "How do you calculate the correct Flash wait states for a given clock frequency?"
- "What happens if the HSE crystal fails while the PLL is running from it?"
- "How would you implement a clock-failure detection and fallback mechanism?"
- "What is the CSS (Clock Security System) and when would you enable it?"
- "How does voltage scaling relate to maximum clock frequency?"
Ready to test yourself? Head over to the MCU Cores & Clocking Interview Questions page for a full set of Q&A with collapsible answers — perfect for self-study and mock interview practice.
Practice
❓ On STM32F4, what is the PLL output frequency if HSE=8MHz, M=8, N=336, P=2?
❓ Which Cortex-M core adds instruction and data caches?
❓ After waking from Stop mode on STM32, what clock source is the system running on?
❓ Why must Flash wait states be configured BEFORE switching to a higher clock?
Real-World Tie-In
Battery-Powered Environmental Monitor — An IoT sensor node measures temperature and humidity every 15 minutes and transmits via LoRa. During the 50 ms measurement/transmit window, the MCU runs at 48 MHz from PLL. Between measurements, it enters Stop mode at 2 uA. The entire clock transition sequence (HSI to PLL, measure, transmit, PLL to HSI, enter Stop) is handled by a single RunMeasurementCycle() function that reconfigures the UART and SPI baud rates after every clock change. Battery life: 5+ years on 2x AA cells.
Automotive Sensor Fusion ECU — A Cortex-M7 running at 400 MHz fuses data from 6 CAN buses, 3 cameras, and 12 ultrasonic sensors. The clock tree feeds 200 MHz to the Ethernet MAC, 80 MHz to CAN peripherals, and 400 MHz to the CPU. The CSS (Clock Security System) monitors the HSE crystal — if it fails, the system automatically falls back to HSI and enters a safe degraded mode, logging the event to Flash for post-mortem analysis.