What issues are a concern for algorithms that read/write data to DRAM instead of SRAM?

Question

Accepted Answer

With SRAM, every access has essentially the same low, deterministic latency, so access pattern rarely matters for performance. DRAM is very different, and an algorithm tuned for SRAM can perform poorly against DRAM. Key concerns:

Refresh overhead and jitter. DRAM rows must be periodically refreshed; refresh cycles steal bandwidth and can momentarily stall an access, hurting real-time determinism.
Row activation / precharge latency and open-row locality. Reading a DRAM cell requires activating (opening) its row into the sense-amplifier "row buffer," then a column access, then precharge before another row in that bank can open. Accesses that hit the already-open row are fast; accesses that force closing one row and opening another (a "row miss" / conflict) pay a precharge-then-activate penalty (≈ tRP + tRCD + CAS latency; tRAS is the row’s minimum active-time constraint, not an additive term). So sequential / row-local access is fast; random scatter is slow.
Bank conflicts. DRAM is organized into multiple banks. Interleaving accesses across different banks hides latency (one bank precharges while another is accessed), but repeatedly hammering the same bank with different rows serializes and stalls. Layout/striding that maps your hot data to the same bank/row repeatedly is pathological.
Burst-oriented transfer. DRAM (and the cache hierarchy in front of it) is optimized for cache-line/burst transfers. Reading one byte costs nearly as much as reading a whole burst, so algorithms should consume full cache lines/bursts and avoid touching one word per line.
Caching effects. Because DRAM sits behind caches, performance depends heavily on cache locality, prefetcher behavior, and avoiding cache thrashing — none of which matter for on-chip SRAM. Measured timing is non-uniform and data-dependent.
Non-deterministic timing. All of the above make worst-case latency much larger and harder to bound, which matters for hard real-time code.

Design guidance: maximize spatial and temporal locality, access sequentially / in row order, process data in cache-line-sized blocks, block/tile large traversals, prefer streaming over random pointer-chasing, exploit bank interleaving, and minimize random scatter-gather. Keep latency-critical, small, or hard-real-time data in SRAM/tightly-coupled memory.