RTOS fundamentals

Quick Cap

An RTOS (Real-Time Operating System) provides deterministic task scheduling — the highest-priority ready task always runs, and the scheduler guarantees bounded response times. This is the foundation of every multi-tasking embedded system, from motor controllers to medical devices. "What is an RTOS and why would you use one?" is one of the most asked embedded interview questions.

Key Facts:

Hard real-time: Missing a deadline is a system failure (airbag deployment, motor commutation). No tolerance.
Soft real-time: Missing a deadline degrades quality but is not catastrophic (audio streaming, UI updates).
RTOS kernel is typically 5-20 KB Flash, 1-4 KB RAM (FreeRTOS). Small enough for Cortex-M0 with 32 KB Flash.
Task states: Ready, Running, Blocked, Suspended — every task is always in exactly one state.
Key advantage over bare-metal: Preemptive scheduling ensures a high-priority task runs immediately, regardless of what lower-priority code is doing.
FreeRTOS dominates IoT/general embedded; Zephyr is growing fast in commercial products with its upstream driver model.

Deep Dive

At a Glance

Characteristic	Bare-Metal Super-Loop	RTOS	Embedded Linux
Scheduling	Manual (one big loop)	Priority-based preemptive	Process-based, CFS scheduler
Response time	Depends on loop duration	Deterministic (bounded)	Non-deterministic (unless PREEMPT_RT)
Memory	No overhead	5-20 KB Flash, 1-4 KB RAM	4+ MB RAM minimum
Concurrency	Interrupts only	Tasks + interrupts	Processes + threads + interrupts
Memory protection	None	None (no MMU)	Full MMU isolation
Typical CPU	Cortex-M0, 8-bit	Cortex-M0 to M7	Cortex-A, x86
Best for	Simple, single-function	Multi-tasking, real-time	Complex, networked, UI

Hard vs Soft Real-Time

This distinction is the first thing interviewers test:

Type	Deadline Violation	Examples	Consequence
Hard real-time	System failure	Airbag deployment, fuel injection timing, pacemaker pulse	Physical damage, injury, death
Firm real-time	Result is useless but no damage	Video frame decode, radar pulse processing	Dropped frame/sample, degraded output
Soft real-time	Degraded quality	Audio playback, UI animation, telemetry reporting	Glitch, stutter, delayed data

Interview insight: Most embedded systems are actually firm or soft real-time. True hard real-time is limited to safety-critical domains (automotive, medical, aerospace). However, even soft real-time benefits from an RTOS because the preemptive scheduler makes timing behavior predictable and easier to reason about.

Why RTOS vs Bare-Metal?

Factor	Bare-Metal	RTOS
Multi-tasking	Manual state machines in main loop	Automatic preemptive task switching
Priority handling	Interrupts only; main loop has one priority	Multiple task priorities with guaranteed preemption
Timing	Depends on worst-case loop time	Deterministic — high-priority task runs immediately
Code structure	Monolithic loop, hard to maintain at scale	Modular tasks, clean separation of concerns
Power management	Manual idle detection	Built-in idle task with tickless sleep
Overhead	Zero	5-20 KB Flash, 1-4 KB RAM, context switch cost
When to choose	Simple systems, ultra-low power, very tight RAM	Multiple concurrent activities with different priorities

💡When Bare-Metal is Better

If your system has a single function (read sensor, transmit, sleep), an RTOS adds complexity and RAM overhead for no benefit. A super-loop with interrupts is simpler and uses less power. RTOS shines when you have 3+ concurrent activities with different timing requirements.

Task States

Every RTOS task is always in exactly one of these states:

DiagramTask State Machine

      create
        │
        ▼
   ┌─────────┐   dispatch (scheduler)   ┌─────────┐
   │  READY  │─────────────────────────▶│ RUNNING │
   │         │◀─────────────────────────│         │
   └─────────┘       preempted          └────┬────┘
        ▲                                    │
        │                                    │ blocks on
        │  event arrives                     │ mutex / queue / delay
        │  (timeout, signal,                 │
        │   give, post)                      ▼
        │                              ┌─────────┐
        └──────────────────────────────│ BLOCKED │
                                       └─────────┘

            vTaskSuspend()   (from Running or Blocked)
                   │
                   ▼
             ┌───────────┐
             │ SUSPENDED │
             └─────┬─────┘
                   │ vTaskResume()
                   ▼
                 READY

Hot path: Ready ↔ Running ↔ Blocked. Suspended is an off-band state.

State	Meaning	Example Trigger
Ready	Eligible to run but a higher-priority task is running	Task created, unblocked by event, resumed
Running	Currently executing on the CPU (only one at a time)	Scheduler picks highest-priority ready task
Blocked	Waiting for an event (timeout, mutex, queue, semaphore)	`vTaskDelay()`, `xSemaphoreTake()`, `xQueueReceive()`
Suspended	Removed from scheduling entirely until explicitly resumed	`vTaskSuspend()` — rarely used in practice

Task Priorities

FreeRTOS: Higher number = higher priority. Priority 0 is the idle task. Your tasks start from 1 upward.

Zephyr: Lower number = higher priority by default (configurable). Priority 0 is the highest cooperative priority.

This inconsistency is a common source of confusion and a frequent interview question. Always specify which RTOS you are discussing when talking about priority numbers.

Priority assignment guidelines:

Safety-critical control loops: highest priority
Communication handlers (UART, CAN, network): medium-high
Data processing, logging: medium
UI updates, status LED: low
Idle task: lowest (built-in, runs when nothing else is ready)

RTOS Comparison

Feature	FreeRTOS	Zephyr	ThreadX (Azure RTOS)	VxWorks	QNX
License	MIT	Apache 2.0	MIT (now open)	Commercial	Commercial
Kernel size	5-10 KB	8-20 KB	5-10 KB	Large	Large
Typical use	IoT, general embedded	Commercial products, BLE, Thread	Azure IoT, medical	Aerospace, defense	Automotive (QNX Neutrino)
Supported boards	Most MCUs	400+ boards	ARM, RISC-V	PowerPC, ARM, x86	ARM, x86
Networking	FreeRTOS+TCP, lwIP	Native (BSD sockets)	NetX Duo	Native	Native (POSIX)
Certification	SafeRTOS (separate product)	Planned (IEC 61508)	IEC 62304, IEC 61508	DO-178C, IEC 61508	ISO 26262, IEC 62304
Learning curve	Low	Medium	Low	High	High

FreeRTOS is the safe default for most embedded projects — smallest footprint, simplest API, widest MCU support, MIT license. Zephyr is the choice when you need an upstream driver model (similar to Linux), native BLE/Thread/Matter support, or a path to safety certification.

RTOS Memory Model

Unlike Linux (which uses an MMU to give each process its own virtual address space), RTOS tasks share a single flat memory space. Every task can read and write any address. This means:

No memory protection between tasks — a buffer overflow in one task can corrupt another task's data
Each task gets its own stack — sized at creation time, cannot grow dynamically
Shared globals require synchronization (mutexes, queues) — same as ISR shared data, but with more tools available

Stack sizing is critical. Too small causes stack overflow (silent corruption or hard fault). Too large wastes RAM.

Heap Strategy (FreeRTOS)	Description	Best For
heap_1	Allocate-only, never free	Static systems, safety-critical
heap_2	Simple free, no coalescence	Fixed-size allocations
heap_3	Wraps standard `malloc`/`free`	When C library heap is available
heap_4	First-fit with coalescence	General purpose (most common)
heap_5	Like heap_4 but spans non-contiguous regions	Multiple RAM banks
Static allocation	`xTaskCreateStatic()` — no heap at all	Safety-critical, deterministic

⚠️Common Trap: Stack Overflow

RTOS stack overflow is the #1 cause of mysterious crashes in embedded systems. FreeRTOS provides configCHECK_FOR_STACK_OVERFLOW (method 1: check on context switch, method 2: fill with pattern and verify). Always enable this during development. Size stacks generously at first (2-4 KB per task), then measure actual usage with uxTaskGetStackHighWaterMark() and trim.

Stack Overflow Detection

Method	How It Works	Catches
Pattern fill (FreeRTOS method 2)	Fill stack with 0xA5A5A5A5; check on context switch	Gradual overflow
High-water mark	`uxTaskGetStackHighWaterMark()` returns minimum free bytes ever	Measure worst-case usage
MPU guard region	Place a no-access MPU region at stack bottom	Immediate fault on overflow
Canary value	Place known value at stack bottom; check periodically	Gradual overflow

Debugging Story: Task Starvation

A team built an IoT sensor hub with 5 tasks: sensor reading (priority 3), data processing (priority 3), WiFi transmission (priority 2), LED status (priority 1), and logging (priority 1). The LED and logging tasks never ran — the system appeared to work but produced no log files and the status LED was frozen.

The root cause: the three higher-priority tasks (sensor, processing, WiFi) never blocked long enough for the lower-priority tasks to run. The sensor task used vTaskDelay(10) (10 ms) but the processing task used a busy-wait polling loop to check for new data instead of blocking on a queue. This polling loop consumed 100% CPU whenever data was available, starving everything below it.

The fix: replace the polling loop with xQueueReceive() with a timeout. When no data is available, the processing task blocks, allowing lower-priority tasks to run. The system went from 100% CPU to 35% average utilization.

The lesson: Every RTOS task must eventually block (on a delay, queue, semaphore, or mutex). A task that busy-waits at any priority level starves all lower-priority tasks. This is the most common RTOS design mistake.

What Interviewers Want to Hear

You can define hard vs soft real-time with concrete examples (not just "hard = important")
You know when to use an RTOS vs bare-metal (not "RTOS is always better")
You can draw the task state diagram and explain transitions
You understand the memory model (shared address space, per-task stacks, no MMU)
You can compare FreeRTOS and Zephyr with specific tradeoffs
You know about stack overflow detection and sizing strategies

Interview Focus

Classic Interview Questions

Q1: "What is an RTOS and how does it differ from a general-purpose OS?"

Model Answer Starter: "An RTOS provides deterministic task scheduling — the highest-priority ready task always runs within a bounded time. The key property is predictability, not speed. A general-purpose OS like Linux maximizes throughput and fairness; an RTOS guarantees worst-case response time. RTOS kernels are small (5-20 KB) with no MMU, running all tasks in a shared address space. Linux requires megabytes of RAM and provides process isolation via virtual memory. I choose RTOS for bare-metal MCU applications with real-time constraints; Linux for complex systems with networking, UI, and filesystem needs."

Q2: "Explain hard vs soft real-time with examples."

Model Answer Starter: "Hard real-time means a missed deadline is a system failure — not just poor performance, but potentially dangerous. Examples: airbag deployment must happen within 10 ms of crash detection; fuel injection timing must be accurate to microseconds. Soft real-time means missed deadlines degrade quality but the system continues functioning. Examples: an audio player drops a sample (audible click but no damage), a telemetry system reports data 100 ms late (acceptable). Most embedded systems are actually soft or firm real-time. True hard real-time requires formal timing analysis and often safety certification."

Q3: "When would you choose an RTOS over a bare-metal super-loop?"

Model Answer Starter: "When I have three or more concurrent activities with different timing requirements. A super-loop works well for simple systems — read sensor, process, transmit, repeat. But when I need to simultaneously handle a control loop at 1 kHz, a communication protocol at variable rates, and a UI update at 30 Hz, the super-loop becomes fragile. Adding a new feature requires re-analyzing the entire loop timing. With an RTOS, each activity is an independent task with its own priority. The scheduler guarantees the control loop runs on time regardless of what the communication code is doing."

Q4: "Draw the RTOS task state diagram and explain each transition."

Model Answer Starter: "Four states: Ready, Running, Blocked, and Suspended. A newly created task starts in Ready. The scheduler picks the highest-priority Ready task and moves it to Running — only one task runs at a time. When the running task calls a blocking function like xQueueReceive or vTaskDelay, it moves to Blocked and the next highest-priority Ready task runs. When the blocking condition is satisfied (queue receives data, delay expires, semaphore given), the task moves back to Ready. Suspended is a special state where the task is removed from scheduling entirely until explicitly resumed with vTaskResume — it is rarely used in practice."

Q5: "Compare FreeRTOS and Zephyr — when would you choose each?"

Model Answer Starter: "FreeRTOS is the safe default — smallest footprint (5-10 KB), simplest API, widest MCU support, MIT license, and the largest community. I choose it for straightforward IoT and general embedded projects. Zephyr is more opinionated — it has an upstream driver model similar to Linux, native Bluetooth LE/Thread/Matter support, device tree for hardware description, and a path to IEC 61508 certification. I choose Zephyr when I need out-of-the-box connectivity stack support, want a consistent driver API across MCU vendors, or need eventual safety certification. The tradeoff is Zephyr's steeper learning curve and larger kernel footprint."

Trap Alerts

Don't say: "RTOS is always better than bare-metal" — for simple single-function devices, bare-metal is simpler, smaller, and lower power
Don't forget: That every RTOS task must eventually block — a busy-waiting task starves all lower-priority tasks
Don't ignore: Stack sizing — RTOS stack overflow is the #1 cause of mysterious crashes in embedded systems

Follow-up Questions

"How do you determine the right stack size for an RTOS task?"
"What is the idle task and what happens when no application task is ready?"
"How does an RTOS handle tasks with the same priority?"
"What is the difference between vTaskDelay and vTaskDelayUntil?"
"What is configTICK_RATE_HZ and how do you choose it?"

Practice

❓ A pacemaker must deliver an electrical pulse within 1 ms of detecting a cardiac event. What type of real-time system is this?

❓ An RTOS task calls xQueueReceive() with a timeout of 100 ms but no data arrives. What state is the task in during those 100 ms?

❓ In FreeRTOS, task A has priority 3 and task B has priority 1. Which task has higher priority?

❓ Your RTOS system has a data processing task that polls a flag in a while loop instead of blocking on a queue. What problem does this cause?

❓ Which FreeRTOS heap strategy should you use for a safety-critical system that must never call free()?

Real-World Tie-In

Motor Control with RTOS — A brushless DC motor controller uses FreeRTOS with 3 tasks: FOC control loop (highest priority, 10 kHz, blocks on timer semaphore), CAN communication (medium, blocks on queue), and diagnostic logging (lowest, blocks on vTaskDelay). The control loop always meets its 100 us deadline because the scheduler guarantees it preempts anything else. Total RTOS overhead: 8 KB Flash, 2 KB RAM on a Cortex-M4.

IoT Environmental Monitor — A Zephyr-based air quality sensor uses BLE for data reporting and Thread mesh networking for multi-node communication. Zephyr was chosen over FreeRTOS because it provides native BLE and Thread stacks with a unified driver API. The system runs 6 tasks with priorities ranging from sensor sampling (highest) to BLE advertising (lowest). Stack sizes were measured with Zephyr's thread analyzer and trimmed from 2 KB default to 512-1024 bytes per task, saving 6 KB of RAM.