Quick Cap
When you press "Build" on an embedded project, four distinct programs run in sequence: the preprocessor expands #include and #define, the compiler translates C/C++ to assembly, the assembler turns assembly into machine-code object files, and the linker combines object files and libraries into the final ELF binary. Each stage has its own input and output format, and each has its own failure modes. Understanding the pipeline is the difference between a candidate who can debug a "multiple definition" error in two minutes and one who flails for hours.
Key Facts:
- Preprocessor (
cpporgcc -E): handles#include,#define,#if,#pragma— output is a single.ifile - Compiler (
cc1invoked bygcc): translates.ito target assembly.s - Assembler (
asorgcc -S-then-assemble): turns.sinto object file.owith sections, symbols, and relocations - Linker (
ld): resolves symbols across.ofiles and libraries, applies relocations, places sections per linker script, produces.elf - Translation unit: one preprocessed
.c/.cppfile → one.ofile. Each is compiled in isolation. - Section-based output:
.text(code),.rodata(constants),.data(initialized RW),.bss(zero-initialized RW)
Deep Dive
At a Glance
| Stage | Tool | Input | Output | Common Flags |
|---|---|---|---|---|
| Preprocess | cpp / gcc -E | .c / .cpp + headers | .i / .ii (preprocessed source) | -E, -D, -U, -I |
| Compile | cc1 / cc1plus | .i / .ii | .s (assembly) | -S, -O0..-O3, -Os, -g |
| Assemble | as | .s | .o (relocatable object) | -c (typical end-to-end) |
| Link | ld (or gcc driver) | .o files + .a libs | .elf (executable) | -T script.ld, -l, -L, -Map= |
gcc is a driver — it invokes the underlying tools with appropriate flags. You normally never call cc1 or as directly; gcc -c file.c runs preprocessor + compiler + assembler in one shot, and gcc -o out *.o invokes the linker.
The Four Stages
1. Preprocessor
The preprocessor is a textual transformer. It does no semantic analysis — it just expands macros, splices in header files, evaluates conditional compilation directives, and strips comments. The output is a single text file that the compiler proper sees.
#include <stdio.h> ← spliced verbatim from /usr/include/stdio.h #define MAX 100 ← every later 'MAX' replaced with '100' int x = MAX + 1; ← becomes 'int x = 100 + 1;'
Run gcc -E foo.c -o foo.i and you'll see the result: a typical embedded .c file becomes a multi-thousand-line .i file once standard headers are expanded. Reading the .i file is the canonical way to debug a confusing macro expansion or #include ordering issue.
2. Compiler
The compiler reads the preprocessed .i file, parses it, type-checks it, optimizes the resulting intermediate representation (IR), and emits target-specific assembly. This is where -O0, -O2, -Os flags do their work. The output is a .s file containing human-readable assembly.
$ gcc -S -Os foo.i -o foo.s$ head -5 foo.s.arch armv7e-m.syntax unified.thumb.file "foo.c".text.global compute_crc
For embedded debugging, generating the .s file is invaluable: you can verify the compiler emitted the loop you expected, check that volatile actually produced ordered loads, and confirm an inline function actually got inlined.
3. Assembler
The assembler converts text assembly into binary machine code packaged as a relocatable object file (.o). This file is not yet executable — it contains:
- Sections:
.text,.data,.bss,.rodata, plus debug sections like.debug_info - Symbol table: every function and global variable, marked as defined or undefined
- Relocations: placeholders for addresses the assembler doesn't yet know (other functions, externs)
You can inspect any .o with nm foo.o (symbols) or readelf -a foo.o (full structure).
4. Linker
The linker is where the disparate .o files come together. It performs three jobs:
- Symbol resolution — every "undefined" symbol in one
.omust match a "defined" symbol in another.oor a library - Relocation — patch the placeholder addresses left by the assembler with the final addresses chosen during section placement
- Section placement — decide where each section lives in memory, guided by the linker script (
.textto Flash,.datato RAM, etc.)
The output is the .elf (Executable and Linkable Format) file containing the final binary. For embedded targets, this is then converted to .bin or .hex with objcopy for flashing. Linker scripts and section placement are deep enough to deserve their own page — see Linker Scripts.
The Pipeline in One Picture
.c source
│
▼ preprocessor (cpp / gcc -E)
.i preprocessed (one big text file, all macros expanded)
│
▼ compiler (cc1)
.s assembly (target ISA, optimization applied)
│
▼ assembler (as)
.o object (binary + symbols + relocations)
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
│ other .o files, .a libraries
▼ linker (ld) + linker script
.elf executable (final addresses, sections placed)
│
▼ objcopy
.bin / .hex (flat image for flashing)Translation Units and Why They Matter
A translation unit is one preprocessed .c/.cpp file. The compiler processes each translation unit in complete isolation — it does not know about other source files. This is why:
staticis per-translation-unit. Astatic int counter;infoo.cis invisible tobar.c.externis required for cross-TU sharing. The compiler needs to know "this symbol exists somewhere else, the linker will find it."- Header files declare; source files define. Headers say "this exists" via
externdeclarations; sources say "this is its body" via the actual definition. - Inline functions in headers must be
static inline(orinlinewith proper rules) because each TU that includes the header gets its own copy of the body.
Whole-program optimization (-flto for link-time optimization) breaks this isolation — the compiler defers code generation until link, when it can see the entire program. This enables cross-TU inlining and dead-code elimination but makes builds slower.
Inspection: Looking Inside Each Stage
| Goal | Command | What you get |
|---|---|---|
| See preprocessed source | gcc -E foo.c -o foo.i | Text file with all #includes and macros expanded |
| See generated assembly | gcc -S foo.c -o foo.s | Human-readable assembly for inspection |
| Stop at object file | gcc -c foo.c -o foo.o | Relocatable object, no linking |
| List symbols in an object | nm foo.o | Symbol table — T = code, D = data, B = bss, U = undefined |
| Dump full ELF structure | readelf -a foo.o | Sections, symbols, relocations, headers |
| Disassemble code | objdump -d foo.o | Assembly mixed with source (use -S if compiled -g) |
These commands are covered in depth on the ELF, Map & Binary Inspection page.
Common Linker Errors
The most common build failures live at the linker step, not the compiler step. Two patterns dominate:
| Error | Cause | Fix |
|---|---|---|
undefined reference to 'foo' | Symbol used in one .o but never defined in any linked .o or library | Check spelling; verify the source file containing foo is actually compiled and passed to the linker; check extern "C" for C++/C boundary |
multiple definition of 'foo' | Same symbol defined in 2+ object files | A non-static function or global is in a header, OR a .c file is being compiled and linked twice; mark file-scope helpers static |
relocation truncated to fit | A jump or address can't reach its target with the available encoding | Linker script placed sections too far apart, or function exceeded thumb branch range; rearrange sections or split function |
region 'FLASH' overflowed by N bytes | Total .text + .rodata + .data init exceeds the MEMORY block size | Optimize for size (-Os), strip unused symbols (-ffunction-sections + --gc-sections), or get a bigger Flash |
A common cause of "multiple definition" is a helper function in a header without static inline. Every .c file that includes the header gets its own definition, and the linker sees them all. Either make it static inline (per-TU copy, allowed) or move the definition to a single .c file and put only a declaration in the header.
Optimization Flags Summary
| Flag | Meaning | Embedded Use |
|---|---|---|
-O0 | No optimization | Default for debug builds — best stepping behavior |
-O1 | Basic optimization | Rare in practice |
-O2 | Standard optimization, no size penalty | Common for performance-sensitive embedded |
-O3 | Aggressive optimization, may grow code | Rarely justified on Flash-constrained targets |
-Os | Optimize for size | Most common embedded default |
-Og | Optimize for debug experience | Newer flag, useful when -O0 is too slow |
-flto | Link-Time Optimization | Cross-TU inlining + dead code removal; slower link |
-ffunction-sections -fdata-sections + -Wl,--gc-sections | One section per function/global, GC unused | Often saves 10-30% Flash on embedded |
Debugging Story: The Phantom Function
A team's firmware kept ballooning over Flash even though they were "only adding small features." Looking at the linker map (more on that in the binary-inspection topic), they spotted a 14 KB block belonging to a function their codebase didn't even use — printf. It had been pulled in by a single printf("debug\n"); call deep in test scaffolding code that no longer ran. Even the unused branch of an if (DEBUG_BUILD) had been enough to pull printf into the link.
Two complementary fixes: (1) ensure unused code is truly unused with -ffunction-sections + -fdata-sections + -Wl,--gc-sections, which lets the linker garbage-collect anything not reachable from Reset_Handler; (2) replace printf with a tiny tiny_printf or compile-time-removed log macro for embedded debug.
The lesson: Symbol references are sticky. If anything anywhere references printf, the linker will pull in the whole runtime. Section-level GC is the cleanest solution for unused-code bloat.
What Interviewers Want to Hear
- You can name all four stages and the file extension each produces
- You can explain what a translation unit is and why compilation is per-file
- You know the difference between compiler errors (syntax, types) and linker errors (symbols)
- You can debug "undefined reference" and "multiple definition" without panicking
- You know how to inspect intermediate outputs (
-E,-S,nm,objdump) - You understand that
staticandexternare about TU boundaries
Interview Focus
Classic Interview Questions
Q1: "Walk me through what happens when I press 'Build' on a typical embedded C project."
Model Answer Starter: "Four programs run in sequence per source file. First the preprocessor expands #include directives and #define macros, producing a single .i file with everything textually substituted. Then the compiler proper translates that to target assembly — this is where optimization flags like -Os do their work. The assembler turns assembly into a relocatable object file with sections, a symbol table, and relocation entries — but no final addresses yet. After all source files have been compiled, the linker resolves symbols across all .o files and libraries, applies relocations with the chosen final addresses, places sections per the linker script, and emits the ELF binary. objcopy then converts ELF to a flat binary or hex file for flashing."
Q2: "What's the difference between a compiler error and a linker error?"
Model Answer Starter: "Compiler errors are local to a single translation unit — syntax errors, type mismatches, undeclared variables. The compiler sees only one .c file at a time so it cannot tell you a function defined elsewhere is missing. Linker errors come later when the linker tries to glue all the object files together. The two big linker errors are 'undefined reference' (a symbol is used but never defined in any object or library you linked) and 'multiple definition' (the same symbol is defined in two object files). Linker errors are about cross-file consistency, not language correctness."
Q3: "What is a translation unit and why does it matter?"
Model Answer Starter: "A translation unit is one preprocessed source file — your .c file plus everything its #includes pull in. The compiler processes one TU at a time in isolation and produces one .o from it. This is why static makes a function or variable invisible outside its file — static means 'TU-local linkage'. It's also why headers can only contain declarations and inline definitions: if a header has a non-static function definition and is included in two .c files, both .o files end up with the function, and the linker rejects the duplicate. With link-time optimization the compiler defers final code generation until link time and can see all TUs together — but logically the model is still per-TU."
Q4: "I'm getting 'undefined reference to printf'. Walk me through how you'd debug this."
Model Answer Starter: "Undefined reference means the linker couldn't find a definition for a symbol that was used. For printf specifically, it usually means a libc isn't being linked in — embedded toolchains often don't link libc by default. I'd check the link command line for -lc or, better, that I'm using a libc-aware spec like --specs=nano.specs for newlib-nano. If the symbol is one of mine, I'd check spelling and extern \"C\" for C/C++ boundary issues, then verify the source file containing the definition is actually being compiled and the resulting .o is on the linker command line. nm on each candidate .o confirms whether the symbol is defined (T) or referenced (U)."
Q5: "Why does code grow when I include a function I'm not calling?"
Model Answer Starter: "By default, the linker's unit of inclusion is the section — and traditionally each .o puts all functions in one .text section. If you reference any one function from that .o, the whole .text section comes along, including unused functions. The fix is -ffunction-sections -fdata-sections on compile and -Wl,--gc-sections on link. This places each function in its own section, and the linker can then garbage-collect any section not reachable from the entry point. On a typical embedded firmware this saves 10-30% Flash with zero code changes."
Trap Alerts
- Don't say: "The compiler builds the program" — there are four distinct stages and interview answers that lump them lose credit
- Don't forget: Headers don't get compiled — they get textually pasted into every
.cthat includes them - Don't ignore: The linker can pull in code transitively (one
printfcall drags in a 10 KB runtime); always size-check after enabling features
Follow-up Questions
- "What is
-fltoand what does it cost?" - "How does
__attribute__((weak))change linker behavior?" - "What's the difference between a
.so, a.a, and a.o?" - "Why does
inlinein a header sometimes not actually inline?" - "How would you find which object file is bringing in a particular library function?"
- "What are common reasons a function pointer call doesn't get inlined even with LTO?"
Ready to test yourself? Head over to the Build Systems Interview Questions page for a full set of Q&A with collapsible answers — perfect for self-study and mock interview practice.
Practice
❓ What is the input and output of the preprocessor stage?
❓ A linker reports 'multiple definition of foo'. Which of the following is most likely the cause?
❓ Which gcc flag tells the linker to garbage-collect unused sections?
❓ What does a translation unit consist of?
❓ Why are linker errors common when crossing C and C++ source files?
Real-World Tie-In
Flash Overflow on a Sensor Node — A team adding BLE features hit a hard Flash limit. The map file showed printf and floating-point library code as the biggest single contributors. Switching from full newlib to newlib-nano (via --specs=nano.specs) dropped Flash usage 18 KB. Adding -ffunction-sections -fdata-sections -Wl,--gc-sections shaved another 6 KB. No source changes were required.
Mysterious Failures After Header Refactor — A junior engineer moved a "shared utility" function from a .c file into a header to "make it easier to use". The build started failing with "multiple definition of compute_crc" the first time two .c files both included the header. Fix: revert by moving the body back to a single .c file with a declaration in the header, OR (if cross-TU inlining was the goal) mark it static inline so each TU gets its own copy.
Cross-TU Optimization Win — A motor-control firmware enabled -flto and saw a 12% reduction in code size and a 4% improvement in inner-loop runtime. The win came from cross-TU inlining of small accessor functions in a HAL layer that previously lived in different .c files.
