Compilation Pipeline

Quick Cap

When you press "Build" on an embedded project, four distinct programs run in sequence: the preprocessor expands #include and #define, the compiler translates C/C++ to assembly, the assembler turns assembly into machine-code object files, and the linker combines object files and libraries into the final ELF binary. Each stage has its own input and output format, and each has its own failure modes. Understanding the pipeline is the difference between a candidate who can debug a "multiple definition" error in two minutes and one who flails for hours.

Key Facts:

Preprocessor (cpp or gcc -E): handles #include, #define, #if, #pragma — output is a single .i file
Compiler (cc1 invoked by gcc): translates .i to target assembly .s
Assembler (as or gcc -S-then-assemble): turns .s into object file .o with sections, symbols, and relocations
Linker (ld): resolves symbols across .o files and libraries, applies relocations, places sections per linker script, produces .elf
Translation unit: one preprocessed .c/.cpp file → one .o file. Each is compiled in isolation.
Section-based output: .text (code), .rodata (constants), .data (initialized RW), .bss (zero-initialized RW)

Deep Dive

At a Glance

Stage	Tool	Input	Output	Common Flags
Preprocess	`cpp` / `gcc -E`	`.c` / `.cpp` + headers	`.i` / `.ii` (preprocessed source)	`-E`, `-D`, `-U`, `-I`
Compile	`cc1` / `cc1plus`	`.i` / `.ii`	`.s` (assembly)	`-S`, `-O0..-O3`, `-Os`, `-g`
Assemble	`as`	`.s`	`.o` (relocatable object)	`-c` (typical end-to-end)
Link	`ld` (or `gcc` driver)	`.o` files + `.a` libs	`.elf` (executable)	`-T script.ld`, `-l`, `-L`, `-Map=`

gcc is a driver — it invokes the underlying tools with appropriate flags. You normally never call cc1 or as directly; gcc -c file.c runs preprocessor + compiler + assembler in one shot, and gcc -o out *.o invokes the linker.

The Four Stages

1. Preprocessor

The preprocessor is a textual transformer. It does no semantic analysis — it just expands macros, splices in header files, evaluates conditional compilation directives, and strips comments. The output is a single text file that the compiler proper sees.

DiagramPreprocessor Substitutions

#include <stdio.h>      ←  spliced verbatim from /usr/include/stdio.h
#define MAX 100         ←  every later 'MAX' replaced with '100'
int x = MAX + 1;        ←  becomes 'int x = 100 + 1;'

Each annotated line shows what the preprocessor does textually before the compiler sees the source.

Run gcc -E foo.c -o foo.i and you'll see the result: a typical embedded .c file becomes a multi-thousand-line .i file once standard headers are expanded. Reading the .i file is the canonical way to debug a confusing macro expansion or #include ordering issue.

2. Compiler

The compiler reads the preprocessed .i file, parses it, type-checks it, optimizes the resulting intermediate representation (IR), and emits target-specific assembly. This is where -O0, -O2, -Os flags do their work. The output is a .s file containing human-readable assembly.

text

$ gcc -S -Os foo.i -o foo.s
$ head -5 foo.s
        .arch armv7e-m
        .syntax unified
        .thumb
        .file   "foo.c"
        .text
        .global compute_crc

For embedded debugging, generating the .s file is invaluable: you can verify the compiler emitted the loop you expected, check that volatile actually produced ordered loads, and confirm an inline function actually got inlined.

3. Assembler

The assembler converts text assembly into binary machine code packaged as a relocatable object file (.o). This file is not yet executable — it contains:

Sections: .text, .data, .bss, .rodata, plus debug sections like .debug_info
Symbol table: every function and global variable, marked as defined or undefined
Relocations: placeholders for addresses the assembler doesn't yet know (other functions, externs)

You can inspect any .o with nm foo.o (symbols) or readelf -a foo.o (full structure).

4. Linker

The linker is where the disparate .o files come together. It performs three jobs:

Symbol resolution — every "undefined" symbol in one .o must match a "defined" symbol in another .o or a library
Relocation — patch the placeholder addresses left by the assembler with the final addresses chosen during section placement
Section placement — decide where each section lives in memory, guided by the linker script (.text to Flash, .data to RAM, etc.)

The output is the .elf (Executable and Linkable Format) file containing the final binary. For embedded targets, this is then converted to .bin or .hex with objcopy for flashing. Linker scripts and section placement are deep enough to deserve their own page — see Linker Scripts.

The Pipeline in One Picture

DiagramCompilation Pipeline

 .c source
     │
     ▼  preprocessor (cpp / gcc -E)
 .i preprocessed (one big text file, all macros expanded)
     │
     ▼  compiler (cc1)
 .s assembly (target ISA, optimization applied)
     │
     ▼  assembler (as)
 .o object (binary + symbols + relocations)
     │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
     │       other .o files, .a libraries
     ▼  linker (ld) + linker script
 .elf executable (final addresses, sections placed)
     │
     ▼  objcopy
 .bin / .hex (flat image for flashing)

Source → preprocessed → assembly → object → linked ELF → flash image.

Translation Units and Why They Matter

A translation unit is one preprocessed .c/.cpp file. The compiler processes each translation unit in complete isolation — it does not know about other source files. This is why:

static is per-translation-unit. A static int counter; in foo.c is invisible to bar.c.
extern is required for cross-TU sharing. The compiler needs to know "this symbol exists somewhere else, the linker will find it."
Header files declare; source files define. Headers say "this exists" via extern declarations; sources say "this is its body" via the actual definition.
Inline functions in headers must be static inline (or inline with proper rules) because each TU that includes the header gets its own copy of the body.

Whole-program optimization (-flto for link-time optimization) breaks this isolation — the compiler defers code generation until link, when it can see the entire program. This enables cross-TU inlining and dead-code elimination but makes builds slower.

Inspection: Looking Inside Each Stage

Goal	Command	What you get
See preprocessed source	`gcc -E foo.c -o foo.i`	Text file with all `#includes` and macros expanded
See generated assembly	`gcc -S foo.c -o foo.s`	Human-readable assembly for inspection
Stop at object file	`gcc -c foo.c -o foo.o`	Relocatable object, no linking
List symbols in an object	`nm foo.o`	Symbol table — `T` = code, `D` = data, `B` = bss, `U` = undefined
Dump full ELF structure	`readelf -a foo.o`	Sections, symbols, relocations, headers
Disassemble code	`objdump -d foo.o`	Assembly mixed with source (use `-S` if compiled `-g`)

These commands are covered in depth on the ELF, Map & Binary Inspection page.

Common Linker Errors

The most common build failures live at the linker step, not the compiler step. Two patterns dominate:

Error	Cause	Fix
`undefined reference to 'foo'`	Symbol used in one `.o` but never defined in any linked `.o` or library	Check spelling; verify the source file containing `foo` is actually compiled and passed to the linker; check `extern "C"` for C++/C boundary
`multiple definition of 'foo'`	Same symbol defined in 2+ object files	A non-static function or global is in a header, OR a `.c` file is being compiled and linked twice; mark file-scope helpers `static`
`relocation truncated to fit`	A jump or address can't reach its target with the available encoding	Linker script placed sections too far apart, or function exceeded thumb branch range; rearrange sections or split function
`region 'FLASH' overflowed by N bytes`	Total `.text + .rodata + .data init` exceeds the `MEMORY` block size	Optimize for size (`-Os`), strip unused symbols (`-ffunction-sections + --gc-sections`), or get a bigger Flash

⚠️The 'static' shield

A common cause of "multiple definition" is a helper function in a header without static inline. Every .c file that includes the header gets its own definition, and the linker sees them all. Either make it static inline (per-TU copy, allowed) or move the definition to a single .c file and put only a declaration in the header.

Optimization Flags Summary

Flag	Meaning	Embedded Use
`-O0`	No optimization	Default for debug builds — best stepping behavior
`-O1`	Basic optimization	Rare in practice
`-O2`	Standard optimization, no size penalty	Common for performance-sensitive embedded
`-O3`	Aggressive optimization, may grow code	Rarely justified on Flash-constrained targets
`-Os`	Optimize for size	Most common embedded default
`-Og`	Optimize for debug experience	Newer flag, useful when `-O0` is too slow
`-flto`	Link-Time Optimization	Cross-TU inlining + dead code removal; slower link
`-ffunction-sections -fdata-sections` + `-Wl,--gc-sections`	One section per function/global, GC unused	Often saves 10-30% Flash on embedded

Debugging Story: The Phantom Function

A team's firmware kept ballooning over Flash even though they were "only adding small features." Looking at the linker map (more on that in the binary-inspection topic), they spotted a 14 KB block belonging to a function their codebase didn't even use — printf. It had been pulled in by a single printf("debug\n"); call deep in test scaffolding code that no longer ran. Even the unused branch of an if (DEBUG_BUILD) had been enough to pull printf into the link.

Two complementary fixes: (1) ensure unused code is truly unused with -ffunction-sections + -fdata-sections + -Wl,--gc-sections, which lets the linker garbage-collect anything not reachable from Reset_Handler; (2) replace printf with a tiny tiny_printf or compile-time-removed log macro for embedded debug.

The lesson: Symbol references are sticky. If anything anywhere references printf, the linker will pull in the whole runtime. Section-level GC is the cleanest solution for unused-code bloat.

What Interviewers Want to Hear

You can name all four stages and the file extension each produces
You can explain what a translation unit is and why compilation is per-file
You know the difference between compiler errors (syntax, types) and linker errors (symbols)
You can debug "undefined reference" and "multiple definition" without panicking
You know how to inspect intermediate outputs (-E, -S, nm, objdump)
You understand that static and extern are about TU boundaries

Interview Focus

Classic Interview Questions

Q1: "Walk me through what happens when I press 'Build' on a typical embedded C project."

Model Answer Starter: "Four programs run in sequence per source file. First the preprocessor expands #include directives and #define macros, producing a single .i file with everything textually substituted. Then the compiler proper translates that to target assembly — this is where optimization flags like -Os do their work. The assembler turns assembly into a relocatable object file with sections, a symbol table, and relocation entries — but no final addresses yet. After all source files have been compiled, the linker resolves symbols across all .o files and libraries, applies relocations with the chosen final addresses, places sections per the linker script, and emits the ELF binary. objcopy then converts ELF to a flat binary or hex file for flashing."

Q2: "What's the difference between a compiler error and a linker error?"

Model Answer Starter: "Compiler errors are local to a single translation unit — syntax errors, type mismatches, undeclared variables. The compiler sees only one .c file at a time so it cannot tell you a function defined elsewhere is missing. Linker errors come later when the linker tries to glue all the object files together. The two big linker errors are 'undefined reference' (a symbol is used but never defined in any object or library you linked) and 'multiple definition' (the same symbol is defined in two object files). Linker errors are about cross-file consistency, not language correctness."

Q3: "What is a translation unit and why does it matter?"

Model Answer Starter: "A translation unit is one preprocessed source file — your .c file plus everything its #includes pull in. The compiler processes one TU at a time in isolation and produces one .o from it. This is why static makes a function or variable invisible outside its file — static means 'TU-local linkage'. It's also why headers can only contain declarations and inline definitions: if a header has a non-static function definition and is included in two .c files, both .o files end up with the function, and the linker rejects the duplicate. With link-time optimization the compiler defers final code generation until link time and can see all TUs together — but logically the model is still per-TU."

Q4: "I'm getting 'undefined reference to printf'. Walk me through how you'd debug this."

Model Answer Starter: "Undefined reference means the linker couldn't find a definition for a symbol that was used. For printf specifically, it usually means a libc isn't being linked in — embedded toolchains often don't link libc by default. I'd check the link command line for -lc or, better, that I'm using a libc-aware spec like --specs=nano.specs for newlib-nano. If the symbol is one of mine, I'd check spelling and extern \"C\" for C/C++ boundary issues, then verify the source file containing the definition is actually being compiled and the resulting .o is on the linker command line. nm on each candidate .o confirms whether the symbol is defined (T) or referenced (U)."

Q5: "Why does code grow when I include a function I'm not calling?"

Model Answer Starter: "By default, the linker's unit of inclusion is the section — and traditionally each .o puts all functions in one .text section. If you reference any one function from that .o, the whole .text section comes along, including unused functions. The fix is -ffunction-sections -fdata-sections on compile and -Wl,--gc-sections on link. This places each function in its own section, and the linker can then garbage-collect any section not reachable from the entry point. On a typical embedded firmware this saves 10-30% Flash with zero code changes."

Trap Alerts

Don't say: "The compiler builds the program" — there are four distinct stages and interview answers that lump them lose credit
Don't forget: Headers don't get compiled — they get textually pasted into every .c that includes them
Don't ignore: The linker can pull in code transitively (one printf call drags in a 10 KB runtime); always size-check after enabling features

Follow-up Questions

"What is -flto and what does it cost?"
"How does __attribute__((weak)) change linker behavior?"
"What's the difference between a .so, a .a, and a .o?"
"Why does inline in a header sometimes not actually inline?"
"How would you find which object file is bringing in a particular library function?"
"What are common reasons a function pointer call doesn't get inlined even with LTO?"

💡Practice Build Systems Interview Questions

Ready to test yourself? Head over to the Build Systems Interview Questions page for a full set of Q&A with collapsible answers — perfect for self-study and mock interview practice.

Practice

❓ What is the input and output of the preprocessor stage?

❓ A linker reports 'multiple definition of foo'. Which of the following is most likely the cause?

❓ Which gcc flag tells the linker to garbage-collect unused sections?

❓ What does a translation unit consist of?

❓ Why are linker errors common when crossing C and C++ source files?

Real-World Tie-In

Flash Overflow on a Sensor Node — A team adding BLE features hit a hard Flash limit. The map file showed printf and floating-point library code as the biggest single contributors. Switching from full newlib to newlib-nano (via --specs=nano.specs) dropped Flash usage 18 KB. Adding -ffunction-sections -fdata-sections -Wl,--gc-sections shaved another 6 KB. No source changes were required.

Mysterious Failures After Header Refactor — A junior engineer moved a "shared utility" function from a .c file into a header to "make it easier to use". The build started failing with "multiple definition of compute_crc" the first time two .c files both included the header. Fix: revert by moving the body back to a single .c file with a declaration in the header, OR (if cross-TU inlining was the goal) mark it static inline so each TU gets its own copy.

Cross-TU Optimization Win — A motor-control firmware enabled -flto and saw a 12% reduction in code size and a 4% improvement in inner-loop runtime. The win came from cross-TU inlining of small accessor functions in a HAL layer that previously lived in different .c files.