MCU & System Architecture
intermediate
Weight: 4/10

MCU cores and clocking

Understand ARM Cortex-M core families, RISC-V in embedded, clock tree architecture, PLL configuration, and power modes for optimal embedded system design.

mcu
arm
risc-v
clocking
pll
power-modes
clock-tree

Quick Cap

Every embedded system is built around an MCU core (most commonly ARM Cortex-M) and a clock system that determines how fast it runs, how much power it consumes, and what peripherals are available. The clock tree distributes a base oscillator frequency through PLLs and prescalers to the CPU and every peripheral bus. Getting it right is fundamental — get it wrong and UART baud rates drift, timers are inaccurate, and power consumption explodes.

Key Facts:

  • ARM Cortex-M dominates embedded: M0 (ultra-low-cost), M4 (DSP+FPU), M7 (cache+TCM), M33 (TrustZone)
  • RISC-V is the open-source alternative, gaining traction in cost-sensitive and custom silicon designs (ESP32-C3, CH32V)
  • Clock tree: oscillator source (HSI/HSE) feeds a PLL, which drives SYSCLK, which is divided down to AHB and APB buses
  • PLL formula: SYSCLK = (Source / M) * N / P — a common interview calculation question
  • Power modes: Run, Sleep, Stop, Standby — each trades wake-up latency for lower consumption
  • Clock gating: disabling the clock to unused peripherals is the simplest and most effective power-saving technique

Deep Dive

At a Glance

CharacteristicDetail
Dominant architectureARM Cortex-M (90%+ market share in 32-bit MCUs)
Instruction setThumb-2 (ARM), RV32IMAC (RISC-V embedded)
Typical clock range8 MHz (HSI) to 480 MHz (STM32H7)
Clock sourcesHSI (internal RC), HSE (external crystal), LSI, LSE
PLL lock timeTypically 200-500 us
Power modesRun, Sleep, Stop, Standby, Shutdown (vendor-specific names)
Clock accuracyHSI: +/-1%, HSE crystal: +/-20 ppm

ARM Cortex-M Family

Choosing the right core is one of the first decisions in any embedded project. Each variant is optimized for a different price/performance/power sweet spot:

CorePipelineFPUDSPCacheTrustZoneTypical Use Case
M03-stageNoNoNoNoUltra-low-cost: toys, simple sensors, LED drivers
M0+2-stageNoNoNoNoLow-power wearables, basic IoT nodes
M33-stageNoNoNoNoGeneral purpose: industrial control, motor drives
M43-stageSPYesNoNoSignal processing: audio, sensor fusion, motor FOC
M76-stageDPYesI+DNoHigh performance: graphics, networking, complex control
M333-stageSPYesNoYesSecurity-critical: payment, medical, automotive

Key differences that interviewers test:

  • M0 vs M3: M3 adds hardware divide, bit-banding, and full fault handling. M0 has only HardFault.
  • M4 vs M7: M7 has I-cache + D-cache (critical for external Flash/RAM performance), tightly-coupled memory (TCM), and a 6-stage superscalar pipeline. M4 is single-issue.
  • M33 vs M4: M33 adds TrustZone (hardware-enforced secure/non-secure partition) but with similar DSP performance.

RISC-V in Embedded

RISC-V is an open ISA (no licensing fees) with a modular extension system. In embedded, the common configuration is RV32IMAC: 32-bit base integer (I) + multiply/divide (M) + atomics (A) + compressed instructions (C, equivalent to Thumb).

AspectARM Cortex-MRISC-V
LicensingPer-chip royalty + upfront feeFree and open
Ecosystem maturityDecades of tools, IDEs, RTOS portsRapidly growing but still catching up
Interrupt controllerNVIC (standardized)PLIC/CLIC (varies by implementation)
Debug interfaceCoreSight (SWD/JTAG)Implementation-specific
Real productsSTM32, nRF52, LPC, SAMESP32-C3, GD32V, CH32V, SiFive
When to chooseNeed mature ecosystem, proven siliconCost-sensitive, custom silicon, avoiding vendor lock-in

For interviews, you should know RISC-V exists and its key advantages (open, customizable, no royalties) but ARM Cortex-M knowledge is far more commonly tested.

Clock Tree Architecture

The clock tree is the distribution network that takes a base frequency and delivers appropriately scaled clocks to every part of the MCU. Here is a typical STM32 clock tree:

px-2 py-1 rounded text-sm font-mono border
┌──────────┐
HSI (16 MHz) ────►│ │ ┌───────┐
│ MUX ├────►│ PLL │
HSE (8 MHz) ────►│ │ │ /M *N │ ┌───────┐
└──────────┘ │ /P ├────►│ SYSCLK│
└───────┘ │168 MHz│
└───┬───┘
┌────────────┼────────────┐
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│ AHB │ │ APB1 │ │ APB2 │
│ /1 │ │ /4 │ │ /2 │
│168 MHz │ │ 42 MHz │ │ 84 MHz │
└───┬────┘ └───┬────┘ └───┬────┘
│ │ │
DMA, SRAM UART2/3 UART1
GPIO, USB SPI2/3 SPI1
Ethernet I2C1/2 TIM1/8
TIM2-7 ADC

Clock sources:

SourceFull NameFrequencyAccuracyStartupUse Case
HSIHigh-Speed Internal8-16 MHz (vendor-dependent)+/-1%InstantDefault after reset, fast startup
HSEHigh-Speed ExternalTypically 4-25 MHz crystal+/-20 ppm2-10 msPrecise timing (UART, USB, Ethernet)
LSILow-Speed Internal~32 kHz+/-5%InstantWatchdog timer, rough timekeeping
LSELow-Speed External32.768 kHz crystal+/-20 ppmUp to 2 sRTC, accurate timekeeping

PLL Configuration

The PLL (Phase-Locked Loop) multiplies the input frequency to achieve a higher system clock. The formula (STM32F4 example):

px-2 py-1 rounded text-sm font-mono border
VCO frequency = (Source / M) * N
SYSCLK = VCO / P
USB clock = VCO / Q (must be 48 MHz for USB)

Example: 8 MHz HSE crystal, target 168 MHz SYSCLK:

  • M = 8 (input becomes 1 MHz)
  • N = 336 (VCO = 336 MHz)
  • P = 2 (SYSCLK = 168 MHz)
  • Q = 7 (USB = 48 MHz)
⚠️Common Trap: VCO Range

The VCO output must stay within a valid range (typically 100-432 MHz for STM32F4). If M, N, or P values push the VCO outside this range, the PLL will not lock and the MCU will silently run on the fallback HSI clock. This is a common cause of "everything runs but at the wrong speed" bugs.

Why PLL lock time matters: After enabling the PLL, you must wait for the PLLRDY flag before switching SYSCLK to the PLL output. This takes 200-500 us. If you switch before the PLL is locked, the system clock will be unstable and the MCU may hard-fault.

Clock Gating

Every peripheral on an MCU has a clock enable bit in the RCC (Reset and Clock Control) registers. After reset, most peripheral clocks are disabled by default. You must enable the clock before accessing any peripheral register — otherwise the write is silently ignored and reads return zero.

This is also the simplest power optimization technique: disable clocks to peripherals you are not using. Each active peripheral clock adds current draw, even if the peripheral itself is idle.

c
// Enable GPIOA and USART1 clocks
RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN;
RCC->APB2ENR |= RCC_APB2ENR_USART1EN;
// Disable unused peripheral clocks to save power
RCC->APB1ENR &= ~RCC_APB1ENR_CAN1EN;
💡Interview Insight

When asked "how do you reduce power consumption?", always mention clock gating first. It is zero-effort (just disable unused RCC enable bits) and can reduce idle current by 20-40%. More advanced techniques (power modes, voltage scaling) come after.

Power Modes

MCUs offer multiple power modes that trade wake-up latency for lower consumption. The exact names vary by vendor, but the concept is universal:

ModeCPUOscillatorsSRAMPeripheralsWake-up LatencyTypical Current (STM32L4)
RunActiveAll onRetainedActiveN/A100 uA/MHz
SleepWFI/WFEAll onRetainedActiveInstant (interrupt latency)~50% of Run
StopOffHSI/HSE off, LSE optionalRetainedOff (except wake-up)5-20 us1-10 uA
StandbyOffAll off (except LSE optional)Lost (except backup domain)Off50-200 us (full reboot)0.3-1 uA
ShutdownOffAll offLostOffFull reboot0.03 uA

Key decisions:

  • Sleep: Use when you need sub-microsecond wake-up and peripherals must stay active. CPU is just waiting for the next interrupt.
  • Stop: Best balance for most battery-powered applications. RAM is retained, so you resume where you left off. HSE/PLL must be reconfigured after wake-up.
  • Standby: Near-zero power but you lose all RAM contents. Wake-up is effectively a system reset. Use for long idle periods (hours/days) where state can be saved to Flash or backup registers.

Dynamic Frequency Scaling

Instead of running at maximum clock speed all the time, you can adjust the CPU frequency based on workload:

WorkloadStrategyBenefit
Sensor idle, waiting for timerDrop to 8 MHz (HSI, no PLL)Lowest active power
Light processing (UART parsing, flag checks)48 MHz (PLL with low N)Good balance
Heavy computation (FFT, PID, encryption)Full speed (168 MHz)Maximum throughput
USB activeMust maintain 48 MHz USB clockPLL Q divider constraint

The tricky part: when you change the system clock, you must also update prescalers for any peripheral whose timing depends on the clock (UART baud rate, SPI clock, timer periods). Forget this and your UART will transmit at the wrong baud rate after a frequency change.

Reset Sources

Understanding what caused a reset is important for field debugging and reliability:

Reset SourceFlag (STM32)What It Means
Power-On Reset (POR)PORRSTFFirst power-up or power dipped below threshold
Pin ResetPINRSTFNRST pin pulled low (debug probe, reset button)
Software ResetSFTRSTFNVIC_SystemReset() called in firmware
Watchdog ResetIWDGRSTF / WWDGRSTFWatchdog timer expired — firmware is stuck
Brown-out Reset (BOR)BORRSTFSupply voltage dropped below configured threshold
Low-Power ResetLPWRRSTFError during Stop/Standby mode entry

After reading the reset flags in RCC->CSR, clear them by setting RMVF so they are fresh for the next reset event. Logging the reset source at startup helps diagnose field failures ("this unit keeps watchdog-resetting every 30 minutes").

Debugging Story: UART Baud Rate Wrong After Clock Change

A team was developing a data logger that used dynamic frequency scaling — running at 168 MHz during sensor acquisition and dropping to 8 MHz HSI between samples to save power. Everything worked perfectly until they noticed that UART communication corrupted after the first frequency transition.

The root cause: the UART baud rate register (BRR) is calculated from the APB clock frequency. When the system switched from PLL (168 MHz SYSCLK, 42 MHz APB1) to HSI (8 MHz SYSCLK, 8 MHz APB1), the BRR value was still configured for 42 MHz. The UART was transmitting at 1/5th the correct baud rate.

The fix: recalculate and update the UART BRR register every time the system clock changes. They created a SystemClockUpdate() function that reconfigured all clock-dependent peripherals (UART, SPI, timers) after any frequency transition.

The lesson: Changing the system clock affects every peripheral that derives timing from it. Always audit which peripherals need reconfiguration after a clock switch, and centralize the update logic.

What Interviewers Want to Hear

  • You can compare Cortex-M variants and justify choosing one over another for a specific project
  • You understand the clock tree from oscillator source through PLL to peripheral buses
  • You can walk through a PLL calculation (source / M * N / P)
  • You know the tradeoffs between power modes (latency vs current vs state retention)
  • You mention clock gating as a first-line power optimization technique
  • You understand that changing the system clock affects all clock-dependent peripherals

Interview Focus

Classic Interview Questions

Q1: "Compare Cortex-M4 vs Cortex-M7 — when would you choose each?"

Model Answer Starter: "Both have DSP instructions, but M7 adds I-cache and D-cache (critical when running from external Flash or SDRAM), tightly-coupled memory (TCM) for deterministic access, and a 6-stage superscalar pipeline for roughly 2x throughput at the same clock speed. I choose M4 for cost-sensitive products that run entirely from internal Flash with moderate DSP needs — sensor fusion, motor control, audio. I choose M7 when I need high throughput, large code/data in external memory, or complex networking stacks. The M7's cache also introduces DMA coherency issues that M4 does not have."

Q2: "Walk me through configuring the clock tree from an 8 MHz crystal to 168 MHz SYSCLK."

Model Answer Starter: "Start with HSE at 8 MHz. Configure the PLL: M=8 divides the input to 1 MHz, N=336 multiplies to 336 MHz VCO, P=2 divides to 168 MHz SYSCLK. Set Q=7 for 48 MHz USB clock. Configure bus prescalers: AHB /1 for 168 MHz, APB1 /4 for 42 MHz, APB2 /2 for 84 MHz. Before switching, set Flash wait states to match the new frequency — at 168 MHz you typically need 5 wait states. Enable HSE, wait for HSERDY, configure PLL, enable PLL, wait for PLLRDY, then switch SYSCLK to PLL and verify with SWS bits."

Q3: "How do power modes work? How do you wake from Stop mode?"

Model Answer Starter: "Stop mode turns off the CPU and the HSI/HSE oscillators but retains SRAM and register contents. The voltage regulator can be in main or low-power mode. Wake-up sources include any EXTI line (GPIO, RTC alarm, UART activity on some STM32s). On wake-up, the MCU resumes from the WFI instruction — it does NOT reset. However, the clock reverts to HSI since the PLL was off, so you must reconfigure the PLL and switch back to full speed in the wake-up handler. Stop mode typically draws 1-10 uA, making it ideal for battery-powered devices that wake periodically."

Q4: "What is clock gating and why does it matter?"

Model Answer Starter: "Clock gating means disabling the clock signal to a peripheral module via the RCC enable registers. When the clock is gated, the peripheral's flip-flops stop toggling, reducing dynamic power consumption to near zero for that module. It matters because even an idle peripheral consumes power if its clock is running — the transistors are still switching. On STM32, each peripheral has an enable bit in the appropriate RCC register (AHB1ENR, APB1ENR, etc.). After reset, most are disabled by default. It is the simplest and most effective power optimization."

Q5: "What's the difference between HSI and HSE? When does accuracy matter?"

Model Answer Starter: "HSI is the internal RC oscillator, typically 8-16 MHz with +/-1% accuracy. HSE is an external crystal oscillator, typically 4-25 MHz with +/-20 ppm accuracy. HSI is always available immediately after reset and requires no external components. HSE needs a crystal and 2-10 ms startup time. Accuracy matters for protocols with tight timing requirements: UART can tolerate about +/-2% clock error total (both ends), USB requires +/-0.25% (HSI is too inaccurate), CAN baud rates need crystal-grade accuracy for reliable multi-node communication. For applications that only use SPI, I2C, and GPIO, HSI is often sufficient."

Trap Alerts

  • Don't say: "I just use the HAL clock configuration wizard" — interviewers want to see you understand the underlying clock tree
  • Don't forget: Flash wait states must be configured BEFORE switching to a higher clock frequency, not after
  • Don't ignore: That changing SYSCLK affects all peripheral baud rates and timer periods

Follow-up Questions

  • "How do you calculate the correct Flash wait states for a given clock frequency?"
  • "What happens if the HSE crystal fails while the PLL is running from it?"
  • "How would you implement a clock-failure detection and fallback mechanism?"
  • "What is the CSS (Clock Security System) and when would you enable it?"
  • "How does voltage scaling relate to maximum clock frequency?"
💡Practice MCU Core & Clocking Interview Questions

Ready to test yourself? Head over to the MCU Cores & Clocking Interview Questions page for a full set of Q&A with collapsible answers — perfect for self-study and mock interview practice.

Practice

On STM32F4, what is the PLL output frequency if HSE=8MHz, M=8, N=336, P=2?

Which Cortex-M core adds instruction and data caches?

After waking from Stop mode on STM32, what clock source is the system running on?

Why must Flash wait states be configured BEFORE switching to a higher clock?

Real-World Tie-In

Battery-Powered Environmental Monitor — An IoT sensor node measures temperature and humidity every 15 minutes and transmits via LoRa. During the 50 ms measurement/transmit window, the MCU runs at 48 MHz from PLL. Between measurements, it enters Stop mode at 2 uA. The entire clock transition sequence (HSI to PLL, measure, transmit, PLL to HSI, enter Stop) is handled by a single RunMeasurementCycle() function that reconfigures the UART and SPI baud rates after every clock change. Battery life: 5+ years on 2x AA cells.

Automotive Sensor Fusion ECU — A Cortex-M7 running at 400 MHz fuses data from 6 CAN buses, 3 cameras, and 12 ultrasonic sensors. The clock tree feeds 200 MHz to the Ethernet MAC, 80 MHz to CAN peripherals, and 400 MHz to the CPU. The CSS (Clock Security System) monitors the HSE crystal — if it fails, the system automatically falls back to HSI and enters a safe degraded mode, logging the event to Flash for post-mortem analysis.