Search topics...
CPU FundamentalsCache & DMA Coherencyfoundational

How do you ensure a DMA buffer is cache-line aligned and why does it matter?

0 upvotes
Practice with AISoon
Study the fundamentals first — CPU Fundamentals topic page

On Cortex-M7, the D-Cache operates on cache lines of 32 bytes. Cache maintenance operations — SCB_InvalidateDCache_by_Addr() and SCB_CleanDCache_by_Addr() — work on entire cache lines, not individual bytes. If a DMA buffer shares a cache line with an unrelated variable, invalidating the buffer's cache line also discards the cached value of that adjacent variable. This causes silent data corruption: the unrelated variable reverts to whatever was last written to SRAM, losing any recent updates that were sitting in the cache.

To prevent this, DMA buffers must be aligned to 32-byte boundaries and sized as multiples of 32 bytes:

c
// Correct: 32-byte aligned, size is multiple of 32
__attribute__((aligned(32)))
static uint8_t dma_rx_buf[64]; // 2 cache lines
// Also correct: using linker section
__attribute__((section(".dma_buffer"), aligned(32)))
static uint8_t dma_tx_buf[128];

Without the alignment attribute, the compiler may place dma_rx_buf at any address that satisfies its natural alignment (1-byte for uint8_t arrays). If the buffer starts at address 0x20000014, it spans cache lines starting at 0x20000000 and 0x20000020. Invalidating the first cache line (0x20000000-0x2000001F) also invalidates bytes 0x20000000-0x20000013, which belong to other variables. With aligned(32), the buffer starts at a cache line boundary, ensuring invalidation only affects the buffer's own data.

The size requirement is equally important. A 50-byte buffer aligned to 32 bytes occupies cache lines 0-31 and 32-63. But bytes 50-63 of the second cache line belong to other variables. Invalidating the second cache line corrupts those variables. The solution is to round the buffer size up to the next multiple of 32 (64 bytes in this case). Some teams define a macro — #define CACHE_ALIGN_SIZE(x) (((x) + 31) & ~31) — to automate this.

Source: CPU Fundamentals Q&A