How is DMA used with SPI, and what are the common pitfalls?

Question

Accepted Answer

SPI is one of the most natural peripherals to pair with DMA because its transactions are predictable, fixed-length shift operations. The DMA controller feeds bytes from a memory buffer into the SPI transmit data register and simultaneously moves received bytes from the SPI receive register into another memory buffer — both without CPU involvement. This frees the CPU during large transfers (flash reads, display updates, ADC FIFO drains) and enables true concurrent operation.

A typical STM32 setup configures two DMA channels (or streams): one for TX (memory-to-peripheral) and one for RX (peripheral-to-memory). You set the buffer addresses, the transfer count, enable the DMA channels, and the SPI peripheral generates DMA requests automatically. When the transfer completes, the DMA triggers an interrupt so firmware can process the result, de-assert CS, and start the next transaction.

Common pitfalls include:

Forgetting the dummy TX buffer for read-only transfers. SPI is full-duplex — to clock data in, you must clock data out. For a read-only DMA transfer, the TX DMA channel must still be active, sending dummy bytes (typically 0xFF or 0x00). Some developers forget this and wonder why no clock is generated.

Cache coherency on Cortex-M7. The M7's data cache means DMA may write received data to SRAM, but the CPU reads a stale cached copy. Either place DMA buffers in a non-cacheable memory region, or invalidate the D-cache before reading the buffer after the transfer completes.

CS management. DMA handles the data transfer but does not manage chip select. Firmware must assert CS before starting the DMA and de-assert it in the DMA transfer-complete ISR. If CS is de-asserted too early (e.g., immediately after starting the DMA, before the transfer finishes), the slave aborts the transaction.