Quick Cap
When an embedded MCU powers on, it does not jump straight to main(). The hardware loads the initial stack pointer and reset vector from the vector table, then executes startup code that copies initialized data from Flash to RAM, zeros out uninitialized data, and calls SystemInit() before finally jumping to main(). The linker script controls where all of this ends up in memory — which sections go in Flash, which go in RAM, and where the stack and heap live. Understanding this chain is essential for debugging hard faults on boot, optimizing memory usage, and implementing bootloaders.
Key Facts:
- First two words in Flash: Initial Stack Pointer (address 0x00) and Reset_Handler address (address 0x04)
- Startup code copies
.datafrom Flash to RAM and zeros.bss— this is why global variables have correct initial values - Linker script defines MEMORY regions (Flash, RAM) and SECTIONS (.text, .data, .bss, stack, heap)
.datavs.bss:.datahas initial values stored in Flash;.bssis just zeroed — no Flash space wasted- Stack grows downward from top of RAM; heap grows upward from end of
.bss - VTOR register allows relocating the vector table to RAM (used by bootloaders and RTOS)
Deep Dive
At a Glance
| Characteristic | Detail |
|---|---|
| Reset vector address | 0x00000004 (Cortex-M), mapped to Flash at 0x08000004 on STM32 |
| Hardware-provided init | SP loaded from vector[0], PC loaded from vector[1] |
| Startup code language | Assembly (startup_stm32.s) or C (Reset_Handler) |
| Linker script format | GNU LD syntax (.ld file) |
| Key linker symbols | _sdata, _edata, _sidata, _sbss, _ebss, _estack |
| Typical stack size | 1-8 KB (bare metal), 256 B - 4 KB per RTOS task |
| Vector table alignment | Must be aligned to power-of-2 boundary (at least 128 bytes) |
The Boot Sequence
Here is what happens from power-on to main() on a Cortex-M:
Power On / Reset│▼┌─────────────────────────────────┐│ 1. Hardware loads SP from 0x00 │ ← Vector table word 0│ 2. Hardware loads PC from 0x04 │ ← Vector table word 1 (Reset_Handler)└─────────────┬───────────────────┘│▼┌─────────────────────────────────┐│ 3. Reset_Handler() executes ││ a. Copy .data: Flash → RAM │ ← Initialized globals get their values│ b. Zero .bss in RAM │ ← Uninitialized globals become 0│ c. Call SystemInit() │ ← Clock tree, Flash wait states│ d. Call main() │└─────────────┬───────────────────┘│▼┌─────────────────────────────────┐│ 4. main() runs ││ (if main returns → hang) │└─────────────────────────────────┘
The critical insight: before Reset_Handler runs, no C global variable is valid. The .data section has not been copied, the .bss section has not been zeroed, and the clock is running on the default HSI (no PLL). This is why SystemInit() cannot rely on any global state — it must work purely with register-level operations.
Memory Map
On a typical Cortex-M MCU (e.g., STM32F4 with 512 KB Flash, 128 KB RAM):
Flash (0x08000000) RAM (0x20000000)┌──────────────────┐ ┌──────────────────┐│ Vector Table │ ← SP, Reset, │ .data │ ← Copied from Flash│ (IRQ addresses) │ NMI, faults │ (initialized │ by startup code├──────────────────┤ │ globals) ││ .text │ ├──────────────────┤│ (program code) │ │ .bss │ ← Zeroed by│ │ │ (uninitialized │ startup code│ │ │ globals) │├──────────────────┤ ├──────────────────┤│ .rodata │ │ Heap ↓ │ ← malloc() grows│ (const strings, │ │ │ downward here│ lookup tables) │ │ (free space) │├──────────────────┤ │ ││ .data init │ ← Initial ├──────────────────┤│ values │ values for │ Stack ↑ │ ← Grows upward│ (LMA, not VMA) │ .data copy │ (from _estack) │ (toward heap)└──────────────────┘ └──────────────────┘
Why .data and .bss Are Different
This is a classic interview question:
| Section | Contains | Stored in Flash? | Initialized by | Why? |
|---|---|---|---|---|
.data | int x = 42; | Yes (initial values stored at LMA) | Startup code copies Flash LMA to RAM VMA | Variables need their compile-time values at runtime |
.bss | int y; (uninitialized) | No — zero takes no Flash space | Startup code memsets to 0 | C standard guarantees uninitialized globals are zero |
Why this matters: If you have a 100 KB array declared as uint8_t buffer[102400]; (uninitialized), it costs zero Flash because it goes in .bss. But if you declare it as uint8_t buffer[102400] = {0};, some toolchains may place it in .data, consuming 100 KB of Flash for all-zero initial values. The fix: declare it without an initializer and rely on the .bss zero-fill.
Linker Script Anatomy
A linker script has two main blocks: MEMORY defines the physical memory regions, and SECTIONS maps code and data into those regions.
MEMORY{FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512KRAM (rwx) : ORIGIN = 0x20000000, LENGTH = 128K}_estack = ORIGIN(RAM) + LENGTH(RAM); /* Stack starts at top of RAM */SECTIONS{.isr_vector : { KEEP(*(.isr_vector)) } >FLASH.text : { *(.text*) *(.rodata*) } >FLASH_sidata = LOADADDR(.data); /* Flash address of .data init values */.data : {_sdata = .;*(.data*)_edata = .;} >RAM AT>FLASH /* VMA in RAM, LMA in Flash */.bss : {_sbss = .;*(.bss*) *(COMMON)_ebss = .;} >RAM}
Key concepts:
- VMA (Virtual Memory Address) = where the CPU accesses the data at runtime (RAM)
- LMA (Load Memory Address) = where the data physically lives in the binary (Flash)
>RAM AT>FLASH= VMA is in RAM, LMA is in Flash. Startup code bridges the gap._sidata,_sdata,_edata= linker-generated symbols that startup code uses to know what to copy and whereKEEP()= prevents the linker from discarding the section during garbage collection (critical for the vector table)
The Startup Code
The startup code (typically Reset_Handler in startup_stm32.s or a C equivalent) does the absolute minimum to make C code work:
void Reset_Handler(void) {/* 1. Copy .data from Flash to RAM */uint32_t *src = &_sidata;uint32_t *dst = &_sdata;while (dst < &_edata) *dst++ = *src++;/* 2. Zero .bss */dst = &_sbss;while (dst < &_ebss) *dst++ = 0;/* 3. Configure clocks (still on HSI at this point) */SystemInit();/* 4. Jump to main */main();/* 5. If main returns, hang */while (1);}
This code runs before the C runtime is fully initialized. Anything that depends on initialized global variables (including some library functions) cannot be called until after step 1 completes.
Vector Table and VTOR
The vector table is an array of 32-bit addresses at the start of Flash. The Cortex-M hardware reads entries from it automatically:
| Index | Content | Used When |
|---|---|---|
| 0 | Initial SP value | Loaded into SP on reset |
| 1 | Reset_Handler address | CPU jumps here on reset |
| 2 | NMI_Handler | Non-maskable interrupt |
| 3 | HardFault_Handler | Unrecoverable error |
| 4-10 | MemManage, BusFault, UsageFault, ... | Configurable fault handlers |
| 11 | SVC_Handler | Supervisor call (RTOS uses this) |
| 14 | PendSV_Handler | Deferred context switch (RTOS) |
| 15 | SysTick_Handler | System timer tick |
| 16+ | IRQ0_Handler, IRQ1_Handler, ... | Peripheral interrupts |
VTOR (Vector Table Offset Register): By default, the vector table is at address 0x00000000 (aliased to Flash on most STM32). The VTOR register lets you relocate it — essential for:
- Bootloaders: The bootloader has its own vector table; the application has another at a different Flash offset
- RAM execution: Copy the vector table to RAM for faster interrupt response or to allow runtime modification of ISR addresses
- RTOS: Some RTOS implementations modify interrupt vectors at runtime
Stack and Heap Sizing
Stack is used for local variables, function call frames, and interrupt context saves. Too small causes hard-to-debug crashes; too large wastes RAM.
Rules of thumb for sizing:
- Bare-metal main stack: Start with 2-4 KB, measure with stack painting
- Each RTOS task: 256 B (minimal) to 2 KB (with printf/floating-point)
- Interrupt nesting: Add ~32 bytes per potential nesting level
- Recursive functions: Avoid in embedded, or carefully bound recursion depth
Stack painting is a debugging technique: fill the entire stack region with a known pattern (e.g., 0xDEADBEEF) at startup. Later, scan from the bottom to find the high-water mark. The linker map file (-Wl,-Map=output.map) also shows exact .bss, .data, and stack usage.
Heap is used by malloc(), which is generally discouraged in embedded systems due to fragmentation and non-deterministic allocation time. If you must use it, size it generously and use a fixed-block allocator or memory pool instead of the standard C heap.
If the stack grows into the heap (or vice versa), the system corrupts memory silently — no fault is generated. On Cortex-M3/M4/M7, enable the MPU to create a guard region between stack and heap. On M0 (no MPU), use stack painting and periodic runtime checks.
Placing Code or Data at Fixed Addresses
Sometimes you need to place a variable or function at a specific address — for example, a firmware version at a fixed Flash offset so a bootloader can read it, or a DMA buffer at a specific RAM alignment:
/* Place a version struct at a fixed Flash address */__attribute__((section(".fw_version")))const struct { uint32_t major, minor, patch; } fw_version = {1, 2, 3};
Then in the linker script:
.fw_version 0x0800FF00 : { KEEP(*(.fw_version)) } >FLASH
When asked "how do you place a variable at a fixed address?", mention both the __attribute__((section())) in C and the corresponding linker script entry. Many candidates know only one half.
Debugging Story: Hard Fault on Boot After Adding a Large Array
A team added a 32 KB lookup table to their firmware:
static float sine_table[8192] = { 0.0f, 0.00077f, ... }; /* 32 KB */
The firmware compiled and linked without errors, but the device hard-faulted immediately on boot — before main() was even reached. The .map file showed that Flash usage was fine (the init values fit), but .data in RAM had grown to exceed the available 64 KB SRAM. The startup code was trying to copy 32 KB of float values from Flash to RAM, overwriting the stack region.
The fix: declare the table as const:
static const float sine_table[8192] = { 0.0f, 0.00077f, ... };
Now it lives in .rodata (Flash only) — zero RAM cost. The CPU reads it directly from Flash.
The lesson: Always use const for large lookup tables and constant data. Without const, initialized data goes to .data and consumes both Flash (for initial values) and RAM (for the runtime copy). With const, it stays in Flash only.
What Interviewers Want to Hear
- You can walk through the complete boot sequence: power-on, SP/PC load from vector table, startup code, SystemInit, main
- You understand why
.datais copied and.bssis zeroed — and the Flash cost implications - You can read a basic linker script and explain MEMORY, SECTIONS, VMA vs LMA
- You know what VTOR does and why bootloaders need it
- You can explain how to size the stack and detect overflow
- You understand the difference between
.dataand.rodata(constmatters)
Interview Focus
Classic Interview Questions
Q1: "Describe what happens from power-on until main() is called."
Model Answer Starter: "On Cortex-M, the hardware reads two words from the start of Flash: the initial stack pointer (loaded into SP) and the Reset_Handler address (loaded into PC). Reset_Handler is the first code to execute. It copies the .data section from Flash to RAM so that initialized globals have their values, then zeros the .bss section for uninitialized globals. Next it calls SystemInit() to configure the clock tree and Flash wait states. Finally it calls main(). Before Reset_Handler runs, no C global variable is valid because the memory initialization has not happened yet."
Q2: "Why does .data need to be copied from Flash to RAM? Why doesn't .bss?"
Model Answer Starter: ".data contains variables with initial values like int x = 42. Those values must exist in Flash (they are part of the binary), but the variables must be writable at runtime, so they must live in RAM. Startup code copies the initial values from Flash (LMA) to RAM (VMA). .bss contains uninitialized variables — the C standard says they are zero. Since zero is a single value, we do not need to store anything in Flash; startup code just memsets the .bss region to zero. This saves Flash space: a 10 KB uninitialized buffer costs zero Flash in .bss but would cost 10 KB in .data."
Q3: "What does a linker script define? Walk me through the key sections."
Model Answer Starter: "A linker script has two main parts. MEMORY defines physical regions — typically FLASH with rx permissions and RAM with rwx. SECTIONS maps code and data into those regions: .text and .rodata go to Flash, .data has its VMA in RAM but its LMA in Flash (using the AT> directive), .bss goes to RAM only. The linker generates symbols like _sdata, _edata, _sidata that startup code uses to know what to copy. The script also defines _estack at the top of RAM for the initial stack pointer."
Q4: "How would you detect a stack overflow on a bare-metal system?"
Model Answer Starter: "Several approaches. Stack painting: fill the stack with a sentinel value (0xDEADBEEF) at startup and periodically check how much has been overwritten — the first untouched word is the high-water mark. MPU guard region: on Cortex-M3+ with MPU, configure a small no-access region between the stack bottom and heap top; any stack overflow triggers a MemManage fault. Hardware watchpoints: set a data watchpoint at the stack limit address. In production, I combine stack painting for development-time measurement with an MPU guard for runtime protection."
Q5: "How do you place a function or variable at a fixed memory address?"
Model Answer Starter: "Two parts: in C, use attribute((section(".my_section"))) to assign the symbol to a custom section. In the linker script, place that section at a fixed address: .my_section 0x0800FF00 : { KEEP(*(.my_section)) } >FLASH. Use KEEP to prevent the linker from discarding it during garbage collection. This is commonly used for firmware version headers at fixed offsets, bootloader shared data, or DMA buffers that need specific alignment."
Trap Alerts
- Don't say: "The boot process is simple — it just runs main()" — this misses the entire memory initialization chain that makes C work
- Don't forget:
conston large lookup tables — without it, they consume both Flash AND RAM - Don't ignore: That SystemInit runs before main but after .data/.bss init — clock configuration happens at this specific point
Follow-up Questions
- "What is VTOR and when would you relocate the vector table?"
- "What does SystemInit() typically do?"
- "How do you determine stack and heap sizes for your application?"
- "What happens if the stack and heap collide?"
- "How do you implement a dual-bank bootloader with separate vector tables?"
Ready to test yourself? Head over to the Boot & Startup Interview Questions page for a full set of Q&A with collapsible answers — perfect for self-study and mock interview practice.
Practice
❓ What are the first two values the Cortex-M hardware reads from the vector table on reset?
❓ A 10 KB uninitialized buffer declared as 'static uint8_t buf[10240];' — how much Flash does it consume?
❓ Why must .data have both a VMA and an LMA?
❓ What is the most likely cause of a hard fault that occurs during startup, before main() is reached?
Real-World Tie-In
OTA-Capable IoT Device — A LoRa sensor node uses a dual-bank Flash layout: bank A holds the running firmware, bank B receives OTA updates. Each bank has its own vector table at a known offset. The bootloader at the start of Flash validates the CRC of each bank, selects the valid one, sets VTOR to point to that bank's vector table, and jumps to its Reset_Handler. If the new firmware crashes (watchdog reset), the bootloader falls back to the previous bank on next boot.
Safety-Critical Motor Controller — An automotive ECU reserves a fixed Flash region (0x0800FF00) for a firmware metadata struct containing version, build date, CRC, and safety certification ID. The linker script places this at a fixed address so the bootloader and diagnostic tools can read it without knowing the firmware layout. Stack is sized at 4 KB with an MPU guard region, and stack painting during factory test verifies worst-case usage stays under 3.2 KB with 25% margin.