Write-through vs write-back cache — tradeoffs in embedded?

Question

Accepted Answer

Write-through cache policy means every CPU write goes to both the cache and main SRAM simultaneously. The cache always mirrors SRAM exactly, so there are never dirty cache lines. This is inherently safe for DMA TX — when the DMA reads the buffer from SRAM, it always gets the latest data the CPU wrote. The downside is performance: every store instruction generates a bus transaction to SRAM, which takes multiple clock cycles and consumes bus bandwidth. On a Cortex-M7 running at 400 MHz with SRAM at 200 MHz, every write stalls the CPU until the bus transaction completes (unless a write buffer absorbs it temporarily). For write-heavy code — clearing a framebuffer, building a transmit packet, initializing a large struct — write-through can reduce throughput by 30-50% compared to write-back. Write-back cache policy means CPU writes update only the cache, marking the cache line as "dirty." The data is written to SRAM later — either when the cache line is evicted to make room for new data, or when software explicitly cleans it. This is significantly faster because multiple writes to the same cache line are absorbed without any bus traffic. The danger in embedded systems is that SRAM contents lag behind the cache. If DMA reads a TX buffer before the CPU cleans those cache lines, the DMA transmits stale data. Similarly, if DMA writes to an RX buffer, the CPU must invalidate those cache lines before reading, or it sees old cached values instead of the new DMA data. The practical choice depends on the application. Write-through is simpler and safer — use it when DMA buffers are scattered throughout memory and you cannot easily track which buffers need explicit cache maintenance. Write-back is the default on most Cortex-M7 BSPs because it delivers better performance — use it when you can carefully manage cache maintenance calls around every DMA transfer, and when your DMA buffers are cache-line aligned. Many production systems use a hybrid approach: configure most SRAM as write-back for performance, and use the MPU to mark a small region as non-cacheable or write-through specifically for DMA buffers.