Search topics...
Build Systems
advanced
Weight: 5/10

Memory Layout & Startup

How VMA/LMA, .data copy, .bss zero, and the C runtime preparation chain make C code work — Reset_Handler from Flash to main().

build-systems
memory-layout
startup-code
Reset_Handler
VMA
LMA
.data
.bss
Loading quiz status...

Quick Cap

After the hardware loads SP and PC from the vector table, firmware is responsible for setting up the C runtime: copying .data from Flash to RAM, zeroing .bss, optionally calling C++ static constructors, and finally jumping to main(). This is what the linker script's _sdata / _edata / _sidata symbols exist to support, and what the Reset_Handler uses them for. Interviewers test whether you understand the why: why .data needs both a Flash and RAM address, why .bss is free in Flash, and why touching globals before this code runs gives garbage.

Key Facts:

  • VMA vs LMA: Virtual Memory Address (where CPU accesses) vs Load Memory Address (where it lives in the binary). .data has both because it must be RAM-writable but Flash-resident.
  • .data is copied from Flash (LMA) to RAM (VMA) by Reset_Handler. Costs Flash AND RAM.
  • .bss is just zeroed in RAM by Reset_Handler. Costs RAM only — zero Flash.
  • Linker symbols _sidata, _sdata, _edata, _sbss, _ebss mark the boundaries the startup code uses.
  • Stack grows downward from top of RAM (_estack); heap grows upward from end of .bss.
  • Before Reset_Handler completes step 1, no global variable is valid — this is why SystemInit cannot rely on globals.

Note: This page covers the C-runtime side of boot. The hardware side — reset sources, vector table, VTOR, clock-tree bring-up, peripheral init order — lives in MCU & System Architecture → Boot, vector table, and clock bring-up.

Deep Dive

At a Glance

ConceptWhat it isWhere definedWhere used
VMARuntime address (RAM)Linker script >RAMCPU at runtime
LMAStorage address (Flash)Linker script AT>FLASHFlash image; copied at boot
_sdataStart of .data in RAMLinker symbolStartup code copy loop
_edataEnd of .data in RAMLinker symbolStartup code copy loop
_sidataStart of .data init values in FlashLinker symbolStartup code copy source
_sbss / _ebssStart/end of .bss in RAMLinker symbolStartup code zero loop
_estackTop of RAM (initial SP)Linker symbolVector table entry 0

.data, .bss, .rodata: The Three Globals Sections

Every global / static variable in C goes into one of three sections, decided by the compiler based on initialization and const:

SectionGoes whereWhat it holdsFlash costRAM cost
.dataRAM (runtime)Initialized non-const globals (int x = 42;)Yes (init values stored at LMA)Yes (variable lives here)
.bssRAM (runtime)Uninitialized or zero-initialized globals (int y; or int z = 0;)NoYes (zero-filled at boot)
.rodataFlashconst globals (const int k = 10;, string literals)YesNo (read directly from Flash)

The most common Flash-saving mistake an embedded engineer makes is forgetting const on a large lookup table:

c
static float sine_table[8192] = { 0.0f, 0.00077f, ... }; // 32 KB Flash + 32 KB RAM
static const float sine_table[8192] = { 0.0f, 0.00077f, ... }; // 32 KB Flash, 0 RAM

The first costs both 32 KB Flash (for initial values) AND 32 KB RAM (for the runtime copy). The second costs 32 KB Flash only — the CPU reads it directly. Interviewers test this exact scenario.

Why .data Needs Both VMA and LMA

.data contains writable variables with non-zero initial values. The constraints conflict:

  • Variables must be in RAM at runtime because the program may write to them, and Flash is read-mostly.
  • Initial values must exist somewhere persistent because RAM loses its contents when power is removed.

The solution is two addresses:

  • LMA in Flash — where the initial values are stored in the binary
  • VMA in RAM — where the variable lives during execution

Linker script:

text
.data : {
_sdata = .; /* . is the VMA (RAM) here */
*(.data*)
_edata = .;
} >RAM AT>FLASH /* VMA in RAM, LMA in FLASH */
_sidata = LOADADDR(.data); /* the LMA — Flash address of init values */

At boot, Reset_Handler copies _sidata-to-_sidata + (_edata - _sdata) from Flash into _sdata-to-_edata in RAM. After that, the CPU sees initialized variables at their VMA addresses.

Why .bss Is Free in Flash

.bss contains uninitialized (or zero-initialized) globals. The C standard guarantees these are zero at program start, but zero is a single value — there's no need to store a Flash image of all-zeros. The startup code just memsets the .bss region to 0:

text
.bss : {
_sbss = .;
*(.bss*)
*(COMMON)
_ebss = .;
} >RAM /* RAM only, no AT> */

A 100 KB uninitialized buffer (uint8_t buf[102400];) costs zero Flash because it's in .bss. The same buffer with = {0} may go in .data and cost 100 KB Flash for all-zero initial values. Always declare large buffers without initializers.

Reset_Handler Walkthrough

This is the canonical sequence (Cortex-M, in C; some toolchains use assembly):

c
extern uint32_t _sidata, _sdata, _edata; // from linker script
extern uint32_t _sbss, _ebss;
extern void SystemInit(void); // from CMSIS
extern int main(void);
void __attribute__((naked, noreturn)) Reset_Handler(void) {
/* 1. Copy .data from Flash (LMA) to RAM (VMA) */
uint32_t *src = &_sidata;
uint32_t *dst = &_sdata;
while (dst < &_edata) {
*dst++ = *src++;
}
/* 2. Zero .bss */
dst = &_sbss;
while (dst < &_ebss) {
*dst++ = 0;
}
/* 3. Call SystemInit (clock tree, Flash wait states) */
SystemInit();
/* 4. Run C++ static constructors (if any) */
/* libc_init_array() typically does this; bare-metal C may skip */
/* 5. Jump to main */
main();
/* 6. main() should not return; if it does, hang */
while (1) { }
}

A few subtleties:

  • Step 1 must complete before any code reads a global. This is why SystemInit (which runs after step 1 in CMSIS) can use globals, but anything before Reset_Handler cannot.
  • The naked attribute prevents the compiler from emitting a function prologue that would touch the not-yet-set-up stack. Some toolchains write Reset_Handler in assembly to make this explicit.
  • The noreturn attribute is for the optimizer — it knows main shouldn't return.
  • Some startup files call __libc_init_array() between SystemInit and main. This runs C++ static constructors (.init_array / .preinit_array). Bare-metal C without C++ can omit it.

The Memory Map at Runtime

After Reset_Handler completes, RAM looks like this (simplified):

DiagramRAM Layout After Reset_Handler
 RAM (e.g., 0x20000000 – 0x20020000 = 128 KB)
 ┌─────────────────────────────────────┐ 0x20020000   ← _estack
 │                                     │
 │  Stack                              │ ↓ grows down
 │                                     │
 ├─────────────────────────────────────┤
 │                                     │
 │  Free space                         │
 │                                     │
 ├─────────────────────────────────────┤
 │  Heap                               │ ↑ grows up via malloc
 ├─────────────────────────────────────┤ _ebss
 │  .bss (zeroed by Reset_Handler)     │
 ├─────────────────────────────────────┤ _sbss / _edata
 │  .data (copied from Flash)          │
 └─────────────────────────────────────┘ _sdata = 0x20000000
Stack at top, heap above .bss, .data and .bss copied/zeroed at boot.

And Flash:

DiagramFlash Layout
 Flash (0x08000000 – 0x08080000 = 512 KB)
 ┌─────────────────────────────────────┐
 │  .isr_vector (vector table)         │
 ├─────────────────────────────────────┤
 │  .text (code)                       │
 ├─────────────────────────────────────┤
 │  .rodata (const globals)            │
 ├─────────────────────────────────────┤
 │  .data init values (LMA)            │  ← copied to RAM at boot
 └─────────────────────────────────────┘
Vector table first, then code, read-only data, and the LMA copy of .data init values.

Stack and Heap Sizing

Stack holds local variables, function call frames, and interrupt context saves.

SettingTypical
Bare-metal main stack2-4 KB starting; measure with stack painting
RTOS task minimum256 bytes
RTOS task with printf512 bytes – 2 KB
Per-nesting interrupt overhead~32 bytes

Stack painting is the standard technique: fill the stack region with a sentinel pattern (0xDEADBEEF) at startup, then later scan from the bottom to find the high-water mark. The .map file (see ELF, Map & Binary Inspection) shows declared stack and heap sizes; runtime tools confirm actual usage.

Heap is used by malloc. In embedded:

  • Avoid dynamic allocation in production code when possible — fragmentation and non-deterministic timing are real
  • Use memory pools or fixed-block allocators instead of the standard heap
  • If you must use malloc, size the heap generously and instrument allocation tracking
⚠️Stack-heap collision is silent

If the stack grows down into the heap (or vice versa), the system corrupts memory with no fault. On Cortex-M3+ with MPU, configure a small no-access region between stack bottom and heap top to convert silent corruption into a MemManage fault. On M0 (no MPU), stack painting + periodic runtime checks are your only defense.

Common Boot Failures Tied to Memory Layout

SymptomCauseFix
HardFault during init, before main().data + .bss exceed RAM, copy loop overruns into stack regionReduce globals; check .map; consider relocating large buffers to .bss (uninitialized)
Globals start with garbage valuesReset_Handler not running or _sdata/_edata symbols mismatchVerify Reset_Handler is in vector table; check linker script symbol names match startup code
Globals start with garbage AFTER OTA updateNew firmware's _sidata doesn't match new flash layoutVerify linker script and startup code regenerated for new firmware version
Large lookup table consumes RAMMissing const — table went in .dataAdd const; verify it moved to .rodata via nm
printf hangsnewlib's _sbrk (heap allocator stub) not properly definedImplement _sbrk to return heap memory between _end and stack bottom

Debugging Story: Hard Fault on Boot After Adding a Lookup Table

A team added a 32 KB sine lookup table:

c
static float sine_table[8192] = { 0.0f, 0.00077f, ... }; // 32 KB

Build succeeded. Board hard-faulted on power-up, before main() reached. JTAG showed PC stuck inside the Reset_Handler .data copy loop.

The .map file revealed .data had grown to 35 KB (32 KB lookup table plus normal globals). RAM was 64 KB total. Stack was reserved at top (4 KB). Reset_Handler was copying .data into RAM but the copy was running into the stack region — corrupting the return address it would use to jump into SystemInit.

The one-character fix:

c
static const float sine_table[8192] = { 0.0f, 0.00077f, ... }; // 32 KB

const moves the table into .rodata (Flash only). RAM usage drops to normal, .data copy completes, board boots.

The lesson: Always const your large lookup tables. Watch the .map file when adding any sizable initialized globals — they double-cost (Flash + RAM) and can blow your RAM budget invisibly.

What Interviewers Want to Hear

  • You can explain VMA vs LMA and why .data needs both
  • You can walk through Reset_Handler's three steps (.data copy, .bss zero, jump to main) without notes
  • You know .bss is free in Flash and can give the example (uninitialized buffer)
  • You can explain why globals are invalid before Reset_Handler completes
  • You know to const large lookup tables
  • You can connect linker symbols (_sdata, _sidata, _sbss) to the code that uses them

Interview Focus

Classic Interview Questions

Q1: "Walk me through what Reset_Handler does between the hardware reset and the call to main()."

Model Answer Starter: "Three core steps. First, copy the .data section from its Load Memory Address in Flash to its Virtual Memory Address in RAM — this is what gives initialized globals their values. The linker provides three symbols: _sidata (Flash source), _sdata (RAM destination start), _edata (RAM destination end). The copy is a tight word-by-word loop. Second, zero the .bss section using _sbss and _ebss — these are uninitialized globals which the C standard says start as zero. Third, call SystemInit() from CMSIS to bring up the clock tree and Flash wait states, then call main(). If main returns, hang in an infinite loop. Anything before step 1 cannot use C globals because their values aren't yet in place."

Q2: "Why does .data need to be copied from Flash to RAM, but .bss doesn't?"

Model Answer Starter: ".data contains globals with non-zero initial values like int x = 42. Those values must exist in Flash so they survive power-off, but the variables must be in RAM because the program may modify them. So .data has both a Load Memory Address in Flash and a Virtual Memory Address in RAM, and the startup code copies the values from one to the other. .bss contains uninitialized globals — the C standard says they're zero. Since zero is a single value, we don't need to store anything in Flash; the startup code just memsets the .bss region to zero in RAM. Practically, this means a 100 KB uninitialized buffer costs zero Flash but 100 KB RAM."

Q3: "What's the Flash and RAM cost of these three declarations?"

c
uint32_t a[1000]; // (1)
uint32_t b[1000] = { 0 }; // (2)
const uint32_t c[1000] = { /* ... */ }; // (3)

Model Answer Starter: "(1) goes in .bss — zero Flash, 4 KB RAM. (2) is interesting — it depends on the toolchain. Strict reading of the standard would put it in .bss because all values are zero, but many toolchains place initialized arrays in .data regardless, costing 4 KB Flash and 4 KB RAM. (3) is .rodata because it's const — 4 KB Flash, zero RAM, the CPU reads it directly from Flash. The lesson: always use const for lookup tables, and don't redundantly initialize uninitialized buffers to zero."

Q4: "What's the difference between VMA and LMA, and where do you specify them in the linker script?"

Model Answer Starter: "VMA is Virtual Memory Address — where the CPU accesses the section at runtime. LMA is Load Memory Address — where the section physically lives in the binary. For most sections (.text, .rodata, .bss) VMA equals LMA: text lives in Flash and runs from Flash; bss lives in RAM and runs from RAM. .data is the special case where they differ: VMA is in RAM (variables must be writable), LMA is in Flash (initial values must persist). The linker script syntax is >VMA_REGION AT>LMA_REGION, so >RAM AT>FLASH for .data. The startup code uses LOADADDR(.data) (typically captured into _sidata) to find the LMA and copies from there."

Q5: "Globals in my firmware have garbage values when I read them in main(). What's wrong?"

Model Answer Starter: "Reset_Handler isn't running .data copy or .bss zero, or its symbols don't match the linker script. First check: is Reset_Handler actually being executed? The vector table entry 1 should be its address — verify by reading address 0x08000004 in the debugger. Second: do _sdata/_edata/_sidata/_sbss/_ebss in the startup code match the names in the linker script? Toolchain mismatches (e.g., copying a startup file from one project to another with different symbol naming) silently break this. Third: did the build accidentally use a custom Reset_Handler that does nothing? Set a breakpoint on the copy loop and step through — if _sdata is junk, the symbol isn't being resolved by the linker."

Trap Alerts

  • Don't say: "main() is the first code that runs" — there's a whole startup chain before it
  • Don't forget: const for lookup tables — the #1 RAM-budget sinkhole
  • Don't ignore: Stack/heap collision — silent and devastating; configure MPU guard if available

Follow-up Questions

  • "What goes wrong if you call malloc before SystemInit completes?"
  • "How would you implement _sbrk for newlib in a bare-metal project?"
  • "What is __libc_init_array and when is it needed?"
  • "How do you put a section in CCM RAM that startup code should leave alone (not zero)?"
  • "What's the difference between .preinit_array, .init_array, and .fini_array?"
  • "How would you debug a Reset_Handler that hangs?"
💡Cross-link: the hardware side of boot

This page covered the C-runtime side of boot. The hardware side — reset sources, vector table layout, VTOR, clock-tree bring-up, peripheral init order — lives in Boot, vector table, and clock bring-up under MCU & System Architecture. A complete boot answer connects both halves.

💡Practice Build Systems Interview Questions

Ready to test yourself? Head over to the Build Systems Interview Questions page for a full set of Q&A with collapsible answers — perfect for self-study and mock interview practice.

Practice

A 10 KB uninitialized buffer declared as `static uint8_t buf[10240];` — how much Flash does it consume?

Why must .data have both a VMA and an LMA?

What is the Reset_Handler responsible for, in order?

What's the difference between `int arr[100];` and `const int arr[100] = { ... };` in terms of Flash and RAM cost?

What is `_sidata` in a typical Cortex-M linker script and startup code?

Real-World Tie-In

RAM Budget Audit Before Adding ML Inference — A team adding a TinyML inference engine ran arm-none-eabi-size firmware.elf and inspected the .map file before integration. .bss was at 38 KB of 64 KB; the ML model's input/output buffers would push it over. They moved several large status arrays from .data (initialized to specific values once at startup) to .bss with explicit init in main(), freeing 8 KB of RAM and making the integration fit.

Bootloader / Application Symbol Independence — A dual-image firmware uses separate linker scripts for bootloader and application. Each has its own _sdata/_sidata/_sbss/_ebss because each has its own .data and .bss in their respective Flash regions. The bootloader's Reset_Handler initializes only the bootloader's RAM; when it jumps to the application, the application's Reset_Handler initializes the application's RAM.

Field Failure Traced to Missing const — A field-deployed sensor showed 1-in-1000 boards failing to boot. Investigation revealed a recently-added 4 KB calibration table without const, which had pushed the .data copy slightly past the stack guard on chips with manufacturing-variation timing. Adding const moved the table to .rodata (Flash-only), eliminated the RAM pressure, and restored 100% boot success.

Was this helpful?