Search topics...

Explain how DMA works. What are some of the issues that you need to worry about when using DMA?

0 upvotes
Practice with AISoon

How DMA works

Direct Memory Access lets a dedicated DMA controller move data between memory and peripherals (or memory-to-memory) without the CPU copying each byte, freeing the CPU for other work and greatly increasing throughput.

Typical flow:

  1. Configuration: The CPU programs a DMA channel with source address, destination address, transfer length, transfer width (byte/half/word), address-increment rules (which side increments), trigger/request source (e.g., UART RX request), and mode (single, block, circular/ring, scatter-gather, etc.).
  2. Trigger: The peripheral asserts a DMA request (DRQ) when it needs service (e.g., RX FIFO has data, TX FIFO has space), or the CPU starts a memory-to-memory transfer.
  3. Bus arbitration: The DMA controller becomes bus master and arbitrates with the CPU for the memory bus. It may steal cycles ("cycle stealing"), use burst mode, or a transparent mode.
  4. Transfer: The controller moves data directly between endpoints, updating its address/count registers, until the count is exhausted (or continuously, in circular mode).
  5. Completion: The controller raises a transfer-complete interrupt (and/or half-transfer interrupt for double-buffering). The CPU's ISR then processes the buffer or queues the next transfer.

Variants: scatter-gather / linked-list descriptors (one logical transfer spanning many noncontiguous buffers), circular/ping-pong buffers (continuous streaming), and DMA engines integrated with peripherals.

Issues to worry about

  • Cache coherency (the big one): On systems with data caches, the CPU and DMA see memory differently.
    • Before a memory→peripheral (TX) DMA, you must clean/flush the cache so DMA reads the latest data.
    • After a peripheral→memory (RX) DMA, you must invalidate the cache so the CPU reads fresh data instead of stale cached values.
    • Many MCUs lack caches (so this is moot), but on Cortex-M7/Cortex-A this is a frequent source of bugs. Solutions: cache maintenance operations, non-cacheable/MPU-marked DMA regions, or hardware cache-coherent interconnects.
  • Memory alignment & transfer width: Buffers often must be aligned to the transfer width/cache-line; misalignment causes faults or wrong behavior. Cache-line alignment also prevents a cache-line-straddling hazard where a clean/invalidate on a DMA buffer’s line corrupts an adjacent CPU variable sharing that line. (This is a data-corruption issue, distinct from “false sharing,” which is an SMP coherence performance effect.)
  • Buffer ownership / lifetime: The buffer must remain valid and untouched by the CPU for the entire transfer. Don't free it, don't put it on a stack that unwinds, and don't write to it mid-transfer. Use clear ownership handoff between CPU and DMA.
  • Address translation: DMA controllers typically work with physical (bus) addresses, not virtual addresses. On an MMU system you must translate and ensure pages are pinned/contiguous (or use an IOMMU/scatter-gather).
  • Memory region reachability: Some DMA engines can't reach all memory (e.g., can access SRAM but not certain TCM/flash, or are limited to a 32-bit address window). Verify the source/dest are DMA-accessible.
  • Synchronization / race conditions: Polling vs. interrupt completion, half-transfer handling, and the order in which you read DMA count vs. peripheral flags all matter. Use the completion interrupt or correct flag sequence; mark shared status volatile.
  • volatile and compiler/memory ordering: Status flags updated by DMA must be volatile; insert barriers (DSB) so configuration writes complete before you enable the channel.
  • Throughput / bus contention: DMA competing with the CPU for the bus can stall the CPU; choose burst vs. cycle-steal modes and channel priorities carefully for real-time deadlines.
  • Error handling: Handle DMA error/abort interrupts (bus errors, FIFO over/underrun), and watch for overrun if the peripheral outpaces the configured transfer.
  • Endianness / element size mismatches between source and destination peripherals.