MCU & System Architecture
intermediate
Weight: 4/10

Boot, startup code, and linker scripts

Understand what happens from power-on to main(), how startup code initializes memory, and how linker scripts control the memory layout of embedded systems.

mcu
boot
startup
linker
memory-map
vector-table
initialization

Quick Cap

When an embedded MCU powers on, it does not jump straight to main(). The hardware loads the initial stack pointer and reset vector from the vector table, then executes startup code that copies initialized data from Flash to RAM, zeros out uninitialized data, and calls SystemInit() before finally jumping to main(). The linker script controls where all of this ends up in memory — which sections go in Flash, which go in RAM, and where the stack and heap live. Understanding this chain is essential for debugging hard faults on boot, optimizing memory usage, and implementing bootloaders.

Key Facts:

  • First two words in Flash: Initial Stack Pointer (address 0x00) and Reset_Handler address (address 0x04)
  • Startup code copies .data from Flash to RAM and zeros .bss — this is why global variables have correct initial values
  • Linker script defines MEMORY regions (Flash, RAM) and SECTIONS (.text, .data, .bss, stack, heap)
  • .data vs .bss: .data has initial values stored in Flash; .bss is just zeroed — no Flash space wasted
  • Stack grows downward from top of RAM; heap grows upward from end of .bss
  • VTOR register allows relocating the vector table to RAM (used by bootloaders and RTOS)

Deep Dive

At a Glance

CharacteristicDetail
Reset vector address0x00000004 (Cortex-M), mapped to Flash at 0x08000004 on STM32
Hardware-provided initSP loaded from vector[0], PC loaded from vector[1]
Startup code languageAssembly (startup_stm32.s) or C (Reset_Handler)
Linker script formatGNU LD syntax (.ld file)
Key linker symbols_sdata, _edata, _sidata, _sbss, _ebss, _estack
Typical stack size1-8 KB (bare metal), 256 B - 4 KB per RTOS task
Vector table alignmentMust be aligned to power-of-2 boundary (at least 128 bytes)

The Boot Sequence

Here is what happens from power-on to main() on a Cortex-M:

px-2 py-1 rounded text-sm font-mono border
Power On / Reset
┌─────────────────────────────────┐
│ 1. Hardware loads SP from 0x00 │ ← Vector table word 0
│ 2. Hardware loads PC from 0x04 │ ← Vector table word 1 (Reset_Handler)
└─────────────┬───────────────────┘
┌─────────────────────────────────┐
│ 3. Reset_Handler() executes │
│ a. Copy .data: Flash → RAM │ ← Initialized globals get their values
│ b. Zero .bss in RAM │ ← Uninitialized globals become 0
│ c. Call SystemInit() │ ← Clock tree, Flash wait states
│ d. Call main() │
└─────────────┬───────────────────┘
┌─────────────────────────────────┐
│ 4. main() runs │
│ (if main returns → hang) │
└─────────────────────────────────┘

The critical insight: before Reset_Handler runs, no C global variable is valid. The .data section has not been copied, the .bss section has not been zeroed, and the clock is running on the default HSI (no PLL). This is why SystemInit() cannot rely on any global state — it must work purely with register-level operations.

Memory Map

On a typical Cortex-M MCU (e.g., STM32F4 with 512 KB Flash, 128 KB RAM):

px-2 py-1 rounded text-sm font-mono border
Flash (0x08000000) RAM (0x20000000)
┌──────────────────┐ ┌──────────────────┐
│ Vector Table │ ← SP, Reset, │ .data │ ← Copied from Flash
│ (IRQ addresses) │ NMI, faults │ (initialized │ by startup code
├──────────────────┤ │ globals) │
│ .text │ ├──────────────────┤
│ (program code) │ │ .bss │ ← Zeroed by
│ │ │ (uninitialized │ startup code
│ │ │ globals) │
├──────────────────┤ ├──────────────────┤
│ .rodata │ │ Heap ↓ │ ← malloc() grows
│ (const strings, │ │ │ downward here
│ lookup tables) │ │ (free space) │
├──────────────────┤ │ │
│ .data init │ ← Initial ├──────────────────┤
│ values │ values for │ Stack ↑ │ ← Grows upward
│ (LMA, not VMA) │ .data copy │ (from _estack) │ (toward heap)
└──────────────────┘ └──────────────────┘

Why .data and .bss Are Different

This is a classic interview question:

SectionContainsStored in Flash?Initialized byWhy?
.dataint x = 42;Yes (initial values stored at LMA)Startup code copies Flash LMA to RAM VMAVariables need their compile-time values at runtime
.bssint y; (uninitialized)No — zero takes no Flash spaceStartup code memsets to 0C standard guarantees uninitialized globals are zero

Why this matters: If you have a 100 KB array declared as uint8_t buffer[102400]; (uninitialized), it costs zero Flash because it goes in .bss. But if you declare it as uint8_t buffer[102400] = {0};, some toolchains may place it in .data, consuming 100 KB of Flash for all-zero initial values. The fix: declare it without an initializer and rely on the .bss zero-fill.

Linker Script Anatomy

A linker script has two main blocks: MEMORY defines the physical memory regions, and SECTIONS maps code and data into those regions.

px-2 py-1 rounded text-sm font-mono border
MEMORY
{
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
RAM (rwx) : ORIGIN = 0x20000000, LENGTH = 128K
}
_estack = ORIGIN(RAM) + LENGTH(RAM); /* Stack starts at top of RAM */
SECTIONS
{
.isr_vector : { KEEP(*(.isr_vector)) } >FLASH
.text : { *(.text*) *(.rodata*) } >FLASH
_sidata = LOADADDR(.data); /* Flash address of .data init values */
.data : {
_sdata = .;
*(.data*)
_edata = .;
} >RAM AT>FLASH /* VMA in RAM, LMA in Flash */
.bss : {
_sbss = .;
*(.bss*) *(COMMON)
_ebss = .;
} >RAM
}

Key concepts:

  • VMA (Virtual Memory Address) = where the CPU accesses the data at runtime (RAM)
  • LMA (Load Memory Address) = where the data physically lives in the binary (Flash)
  • >RAM AT>FLASH = VMA is in RAM, LMA is in Flash. Startup code bridges the gap.
  • _sidata, _sdata, _edata = linker-generated symbols that startup code uses to know what to copy and where
  • KEEP() = prevents the linker from discarding the section during garbage collection (critical for the vector table)

The Startup Code

The startup code (typically Reset_Handler in startup_stm32.s or a C equivalent) does the absolute minimum to make C code work:

c
void Reset_Handler(void) {
/* 1. Copy .data from Flash to RAM */
uint32_t *src = &_sidata;
uint32_t *dst = &_sdata;
while (dst < &_edata) *dst++ = *src++;
/* 2. Zero .bss */
dst = &_sbss;
while (dst < &_ebss) *dst++ = 0;
/* 3. Configure clocks (still on HSI at this point) */
SystemInit();
/* 4. Jump to main */
main();
/* 5. If main returns, hang */
while (1);
}

This code runs before the C runtime is fully initialized. Anything that depends on initialized global variables (including some library functions) cannot be called until after step 1 completes.

Vector Table and VTOR

The vector table is an array of 32-bit addresses at the start of Flash. The Cortex-M hardware reads entries from it automatically:

IndexContentUsed When
0Initial SP valueLoaded into SP on reset
1Reset_Handler addressCPU jumps here on reset
2NMI_HandlerNon-maskable interrupt
3HardFault_HandlerUnrecoverable error
4-10MemManage, BusFault, UsageFault, ...Configurable fault handlers
11SVC_HandlerSupervisor call (RTOS uses this)
14PendSV_HandlerDeferred context switch (RTOS)
15SysTick_HandlerSystem timer tick
16+IRQ0_Handler, IRQ1_Handler, ...Peripheral interrupts

VTOR (Vector Table Offset Register): By default, the vector table is at address 0x00000000 (aliased to Flash on most STM32). The VTOR register lets you relocate it — essential for:

  • Bootloaders: The bootloader has its own vector table; the application has another at a different Flash offset
  • RAM execution: Copy the vector table to RAM for faster interrupt response or to allow runtime modification of ISR addresses
  • RTOS: Some RTOS implementations modify interrupt vectors at runtime

Stack and Heap Sizing

Stack is used for local variables, function call frames, and interrupt context saves. Too small causes hard-to-debug crashes; too large wastes RAM.

Rules of thumb for sizing:

  • Bare-metal main stack: Start with 2-4 KB, measure with stack painting
  • Each RTOS task: 256 B (minimal) to 2 KB (with printf/floating-point)
  • Interrupt nesting: Add ~32 bytes per potential nesting level
  • Recursive functions: Avoid in embedded, or carefully bound recursion depth

Stack painting is a debugging technique: fill the entire stack region with a known pattern (e.g., 0xDEADBEEF) at startup. Later, scan from the bottom to find the high-water mark. The linker map file (-Wl,-Map=output.map) also shows exact .bss, .data, and stack usage.

Heap is used by malloc(), which is generally discouraged in embedded systems due to fragmentation and non-deterministic allocation time. If you must use it, size it generously and use a fixed-block allocator or memory pool instead of the standard C heap.

⚠️Common Trap: Stack-Heap Collision

If the stack grows into the heap (or vice versa), the system corrupts memory silently — no fault is generated. On Cortex-M3/M4/M7, enable the MPU to create a guard region between stack and heap. On M0 (no MPU), use stack painting and periodic runtime checks.

Placing Code or Data at Fixed Addresses

Sometimes you need to place a variable or function at a specific address — for example, a firmware version at a fixed Flash offset so a bootloader can read it, or a DMA buffer at a specific RAM alignment:

c
/* Place a version struct at a fixed Flash address */
__attribute__((section(".fw_version")))
const struct { uint32_t major, minor, patch; } fw_version = {1, 2, 3};

Then in the linker script:

px-2 py-1 rounded text-sm font-mono border
.fw_version 0x0800FF00 : { KEEP(*(.fw_version)) } >FLASH
💡Interview Insight

When asked "how do you place a variable at a fixed address?", mention both the __attribute__((section())) in C and the corresponding linker script entry. Many candidates know only one half.

Debugging Story: Hard Fault on Boot After Adding a Large Array

A team added a 32 KB lookup table to their firmware:

c
static float sine_table[8192] = { 0.0f, 0.00077f, ... }; /* 32 KB */

The firmware compiled and linked without errors, but the device hard-faulted immediately on boot — before main() was even reached. The .map file showed that Flash usage was fine (the init values fit), but .data in RAM had grown to exceed the available 64 KB SRAM. The startup code was trying to copy 32 KB of float values from Flash to RAM, overwriting the stack region.

The fix: declare the table as const:

c
static const float sine_table[8192] = { 0.0f, 0.00077f, ... };

Now it lives in .rodata (Flash only) — zero RAM cost. The CPU reads it directly from Flash.

The lesson: Always use const for large lookup tables and constant data. Without const, initialized data goes to .data and consumes both Flash (for initial values) and RAM (for the runtime copy). With const, it stays in Flash only.

What Interviewers Want to Hear

  • You can walk through the complete boot sequence: power-on, SP/PC load from vector table, startup code, SystemInit, main
  • You understand why .data is copied and .bss is zeroed — and the Flash cost implications
  • You can read a basic linker script and explain MEMORY, SECTIONS, VMA vs LMA
  • You know what VTOR does and why bootloaders need it
  • You can explain how to size the stack and detect overflow
  • You understand the difference between .data and .rodata (const matters)

Interview Focus

Classic Interview Questions

Q1: "Describe what happens from power-on until main() is called."

Model Answer Starter: "On Cortex-M, the hardware reads two words from the start of Flash: the initial stack pointer (loaded into SP) and the Reset_Handler address (loaded into PC). Reset_Handler is the first code to execute. It copies the .data section from Flash to RAM so that initialized globals have their values, then zeros the .bss section for uninitialized globals. Next it calls SystemInit() to configure the clock tree and Flash wait states. Finally it calls main(). Before Reset_Handler runs, no C global variable is valid because the memory initialization has not happened yet."

Q2: "Why does .data need to be copied from Flash to RAM? Why doesn't .bss?"

Model Answer Starter: ".data contains variables with initial values like int x = 42. Those values must exist in Flash (they are part of the binary), but the variables must be writable at runtime, so they must live in RAM. Startup code copies the initial values from Flash (LMA) to RAM (VMA). .bss contains uninitialized variables — the C standard says they are zero. Since zero is a single value, we do not need to store anything in Flash; startup code just memsets the .bss region to zero. This saves Flash space: a 10 KB uninitialized buffer costs zero Flash in .bss but would cost 10 KB in .data."

Q3: "What does a linker script define? Walk me through the key sections."

Model Answer Starter: "A linker script has two main parts. MEMORY defines physical regions — typically FLASH with rx permissions and RAM with rwx. SECTIONS maps code and data into those regions: .text and .rodata go to Flash, .data has its VMA in RAM but its LMA in Flash (using the AT> directive), .bss goes to RAM only. The linker generates symbols like _sdata, _edata, _sidata that startup code uses to know what to copy. The script also defines _estack at the top of RAM for the initial stack pointer."

Q4: "How would you detect a stack overflow on a bare-metal system?"

Model Answer Starter: "Several approaches. Stack painting: fill the stack with a sentinel value (0xDEADBEEF) at startup and periodically check how much has been overwritten — the first untouched word is the high-water mark. MPU guard region: on Cortex-M3+ with MPU, configure a small no-access region between the stack bottom and heap top; any stack overflow triggers a MemManage fault. Hardware watchpoints: set a data watchpoint at the stack limit address. In production, I combine stack painting for development-time measurement with an MPU guard for runtime protection."

Q5: "How do you place a function or variable at a fixed memory address?"

Model Answer Starter: "Two parts: in C, use attribute((section(".my_section"))) to assign the symbol to a custom section. In the linker script, place that section at a fixed address: .my_section 0x0800FF00 : { KEEP(*(.my_section)) } >FLASH. Use KEEP to prevent the linker from discarding it during garbage collection. This is commonly used for firmware version headers at fixed offsets, bootloader shared data, or DMA buffers that need specific alignment."

Trap Alerts

  • Don't say: "The boot process is simple — it just runs main()" — this misses the entire memory initialization chain that makes C work
  • Don't forget: const on large lookup tables — without it, they consume both Flash AND RAM
  • Don't ignore: That SystemInit runs before main but after .data/.bss init — clock configuration happens at this specific point

Follow-up Questions

  • "What is VTOR and when would you relocate the vector table?"
  • "What does SystemInit() typically do?"
  • "How do you determine stack and heap sizes for your application?"
  • "What happens if the stack and heap collide?"
  • "How do you implement a dual-bank bootloader with separate vector tables?"
💡Practice Boot & Startup Interview Questions

Ready to test yourself? Head over to the Boot & Startup Interview Questions page for a full set of Q&A with collapsible answers — perfect for self-study and mock interview practice.

Practice

What are the first two values the Cortex-M hardware reads from the vector table on reset?

A 10 KB uninitialized buffer declared as 'static uint8_t buf[10240];' — how much Flash does it consume?

Why must .data have both a VMA and an LMA?

What is the most likely cause of a hard fault that occurs during startup, before main() is reached?

Real-World Tie-In

OTA-Capable IoT Device — A LoRa sensor node uses a dual-bank Flash layout: bank A holds the running firmware, bank B receives OTA updates. Each bank has its own vector table at a known offset. The bootloader at the start of Flash validates the CRC of each bank, selects the valid one, sets VTOR to point to that bank's vector table, and jumps to its Reset_Handler. If the new firmware crashes (watchdog reset), the bootloader falls back to the previous bank on next boot.

Safety-Critical Motor Controller — An automotive ECU reserves a fixed Flash region (0x0800FF00) for a firmware metadata struct containing version, build date, CRC, and safety certification ID. The linker script places this at a fixed address so the bootloader and diagnostic tools can read it without knowing the firmware layout. Stack is sized at 4 KB with an MPU guard region, and stack painting during factory test verifies worst-case usage stays under 3.2 KB with 25% margin.