Compare Cortex-M4 vs Cortex-M7 — when would you choose each?
The Cortex-M4 and M7 are both high-performance Cortex-M cores with hardware floating-point units and DSP instructions (single-cycle MAC, SIMD), but they differ significantly in microarchitecture and target applications. The M4 is a single-issue, in-order pipeline (3-stage for most instructions) that achieves roughly 1.25 DMIPS/MHz. It has no instruction or data cache and no tightly coupled memory (TCM). Its simplicity makes it deterministic and predictable — interrupt latency is fixed at 12 cycles, and there are no cache-related timing variations. The M4 is the sweet spot for cost-sensitive applications that need moderate DSP capability: motor control, sensor fusion, audio processing, and industrial control. Typical M4 parts (STM32F4, STM32G4) run at 72-180 MHz with 128 KB to 1 MB Flash.
The M7 is a superscalar, dual-issue, 6-stage pipeline that achieves up to 2.14 DMIPS/MHz — nearly double the M4 per clock cycle. It includes instruction and data caches (typically 4-16 KB each), tightly coupled memories (TCM) with single-cycle deterministic access, branch prediction, and optional double-precision FPU. These features make the M7 suitable for workloads that the M4 cannot handle: running from external QSPI Flash or SDRAM (where caches mask the slow external memory latency), high-throughput signal processing, graphics rendering (with Chrom-ART DMA), and complex protocol stacks. Typical M7 parts (STM32F7, STM32H7) run at 216-550 MHz.
The tradeoff is cost, power, and determinism. M7 parts are 2-5x more expensive than comparable M4 parts, consume more dynamic power due to the deeper pipeline and caches, and have less deterministic interrupt latency because cache misses introduce variable delays. For real-time control loops where jitter matters (motor FOC at 20 kHz), the M4's fixed latency is actually an advantage — or on M7, you place the ISR code in TCM to bypass the cache entirely. Choose M4 when your workload fits in internal Flash/RAM and you need predictable timing at low cost. Choose M7 when you need external memory, high throughput, or the computational headroom that dual-issue provides.
Source: MCU Cores & Clocking Q&A
