Quick Cap
The RTOS scheduler decides which task runs based on priorities and readiness. Understanding scheduling algorithms, context switching costs, and timing analysis is essential for designing systems that meet real-time deadlines. Interviewers probe this topic heavily because it reveals whether you can reason about determinism, worst-case behavior, and power-aware design.
Key Facts:
- Preemptive scheduling: the highest-priority ready task always runs immediately -- this is the default in most RTOSes (FreeRTOS, Zephyr, ThreadX)
- Cooperative scheduling: tasks run until they explicitly yield -- simpler but fragile, because one stuck task hangs the entire system
- Context switch cost: saves/restores registers plus the stack pointer; typically 5-20 us on Cortex-M depending on whether FPU state is included
- Rate Monotonic Scheduling (RMS): assign priority by period -- shorter period gets higher priority. The classic schedulability theory for fixed-priority systems
- Time-slicing (round-robin): equal-priority tasks share CPU time in fixed intervals determined by
configTICK_RATE_HZ - Tickless idle: suppresses tick interrupts during idle to save power -- critical for battery-powered devices
Deep Dive
At a Glance
| Characteristic | Detail |
|---|---|
| Scheduling model | Priority-based preemptive (most RTOSes) |
| Priority levels | Typically 5-56 (FreeRTOS default: 5, Zephyr: configurable) |
| Context switch time | 5-20 us on Cortex-M4 (varies with FPU, MPU) |
| Tick rate | Commonly 1 kHz (configTICK_RATE_HZ = 1000) |
| Schedulability bound | RMS: U must be under n(2^(1/n) - 1), converges to ~69.3% |
| Tickless idle | Suppresses SysTick during idle, programs wake-up timer for next event |
| Common pitfall | Assuming highest priority always means lowest latency (mutex blocking can violate this) |
Preemptive vs Cooperative Scheduling
These are the two fundamental scheduling models. Most production RTOSes default to preemptive, but cooperative mode exists and occasionally appears in very simple systems or legacy codebases.
| Aspect | Preemptive | Cooperative |
|---|---|---|
| Task switching | OS forces a switch when a higher-priority task becomes ready | Task runs until it explicitly calls taskYIELD() or a blocking API |
| Determinism | High -- high-priority task runs within one tick or less | Low -- depends on how quickly the current task yields |
| Synchronization needs | Must protect shared data (mutexes, critical sections) | Simpler -- no mid-function preemption, so less need for locks |
| Risk | Priority inversion, complex debugging | One long-running or stuck task starves everything else |
| Typical use | FreeRTOS, Zephyr, ThreadX, QNX | Super-loop with cooperative scheduler, some legacy systems |
| Interrupt interaction | ISR can make a higher-priority task ready, causing immediate switch on ISR exit | ISR sets a flag, but task switch waits until current task yields |
In preemptive mode, the scheduler runs at every tick interrupt and whenever a kernel API call changes task readiness (e.g., xSemaphoreGive() unblocks a waiting task). If the newly ready task has higher priority than the currently running task, the switch happens immediately -- even if the current task is in the middle of a function.
In cooperative mode, no amount of priority matters until the running task voluntarily gives up the CPU. This makes cooperative scheduling inherently non-deterministic for response time, which is why it is rarely used in systems with hard real-time requirements.
Priority-Based Preemptive Scheduling
This is the workhorse scheduling algorithm in nearly every RTOS. The rule is simple: the highest-priority ready task always runs. When a task with priority higher than the currently running task becomes ready (due to a timer expiry, semaphore give, or ISR notification), the scheduler immediately preempts the running task and switches to the higher-priority one.
The following ASCII timeline illustrates preemption in action:
PriorityHigh Task A: ______########____########_______Med Task B: __####________####________####___Low Task C: ##____########____########____###─────────────────────────────────► TimeLegend: # = Running, _ = Ready/BlockedEvents:t1: Task A becomes ready → preempts Task C immediatelyt2: Task A blocks (waits on semaphore) → Task B runst3: Task B blocks → Task C resumes where it left off
Key behaviors to remember:
- Immediate preemption: There is no "finish your current work" grace period. The switch happens at the next scheduler invocation (typically within the current tick or the kernel API call itself).
- Resumption: When a high-priority task blocks or completes, the previously preempted task resumes exactly where it was interrupted -- same program counter, same register values.
- Starvation risk: A continuously ready high-priority task will starve all lower-priority tasks. This is by design, not a bug -- but it means you must ensure high-priority tasks block or sleep periodically.
Time-Slicing (Round-Robin)
When two or more tasks share the same priority level, the scheduler uses time-slicing to give each an equal share of CPU time. Each task runs for one tick period (1 / configTICK_RATE_HZ seconds) before the scheduler switches to the next same-priority task in the ready list.
Same Priority Tasks (all at priority 3):Task X: ###___###___###___###___Task Y: ___###___###___###___###├──┤1 tick (1 ms at 1 kHz tick rate)──────────────────────► Time
When time-slicing is appropriate:
- Background tasks that all have equal importance (e.g., multiple logging channels, multiple sensor polling tasks with identical deadlines)
- Soft real-time tasks where fairness matters more than strict priority ordering
When to avoid time-slicing:
- Hard real-time control tasks -- these should have unique priorities so the most critical one always wins
- Tasks with vastly different execution times -- a 50 us task sharing a 1 ms slice with a 900 us task wastes CPU time on unnecessary context switches
FreeRTOS enables time-slicing by default when configUSE_PREEMPTION and configUSE_TIME_SLICING are both set to 1.
Context Switching
A context switch is the mechanical act of saving one task's CPU state and restoring another's. On ARM Cortex-M, this happens in two stages:
Stage 1 -- Hardware (automatic on exception entry): The processor automatically pushes 8 registers onto the current task's stack when entering PendSV (the exception used for context switching):
R0, R1, R2, R3, R12, LR, PC, xPSR (32 bytes)
Stage 2 -- Software (done by the RTOS PendSV handler): The PendSV handler manually saves the remaining registers:
R4, R5, R6, R7, R8, R9, R10, R11 (32 bytes)
Then it updates the task control block's saved stack pointer and loads the new task's stack pointer, pops R4-R11, and returns from exception (which auto-restores R0-R3, R12, LR, PC, xPSR from the new task's stack).
FPU adds significant overhead:
| Configuration | Registers saved | Stack usage | Typical switch time |
|---|---|---|---|
| No FPU | R0-R12, SP, LR, PC, xPSR | ~64 bytes | 5-8 us |
| FPU (lazy stacking) | Above + S0-S15, FPSCR on demand | ~64-132 bytes | 8-12 us |
| FPU (always save) | Above + S0-S31, FPSCR | ~196 bytes | 15-20 us |
Lazy FPU stacking (the default on Cortex-M4/M7) defers saving FPU registers until the new task actually uses the FPU. This avoids the overhead for tasks that do integer-only work. The FPCCR register's LSPEN bit controls this behavior.
If most of your tasks do not use floating-point, ensure lazy stacking is enabled (it is by default on Cortex-M4F/M7). This keeps context switch times near the no-FPU baseline for integer-only tasks. Only the task that actually touches an FPU instruction pays the extra save/restore cost.
Rate Monotonic Scheduling (RMS)
RMS is the foundational schedulability theory for fixed-priority preemptive systems. The core rule: assign priority inversely proportional to period -- the task with the shortest period gets the highest priority.
RMS is provably optimal among all fixed-priority scheduling algorithms: if any fixed-priority assignment can meet all deadlines, then the RMS assignment can too (assuming deadlines equal periods and tasks are independent).
Schedulability test (Liu-Layland bound):
The total CPU utilization U = sum of (Ci / Ti) for all tasks, where Ci is worst-case execution time and Ti is period. The system is guaranteed schedulable if:
U <= n * (2^(1/n) - 1)where n = number of tasksn=1: U <= 1.000 (100%)n=2: U <= 0.828 (82.8%)n=3: U <= 0.780 (78.0%)n=5: U <= 0.743 (74.3%)n→∞: U → 0.693 (69.3%) = ln(2)
Example: Three periodic tasks:
| Task | Period (Ti) | WCET (Ci) | Utilization (Ci/Ti) |
|---|---|---|---|
| A | 10 ms | 2 ms | 0.200 |
| B | 25 ms | 5 ms | 0.200 |
| C | 50 ms | 10 ms | 0.200 |
Total U = 0.600. The bound for 3 tasks is 0.780. Since 0.600 is less than 0.780, all deadlines are guaranteed met under RMS priority assignment (A gets highest priority, C gets lowest).
RMS theory assumes tasks are periodic, independent (no shared resources), have deadlines equal to their periods, and have zero context switch overhead. Real systems violate all of these assumptions. Use RMS as a starting-point analysis, then verify with response-time analysis or simulation for any system where margins are tight.
Timing Analysis
Timing analysis answers the critical question: will every task meet its deadline under worst-case conditions?
Key terms:
| Term | Meaning |
|---|---|
| WCET | Worst-Case Execution Time -- the longest a task can take to complete one run, considering all code paths and cache/memory effects |
| WCRT | Worst-Case Response Time -- WCET plus all interference from higher-priority tasks and ISRs. This is the actual metric that must fit within the deadline |
| Jitter | Variation in task start time or completion time across successive periods |
| Release jitter | Variation caused by tick alignment -- a task released at time T might not actually start until T + (one tick period) |
Sources of jitter:
- Tick granularity -- a 1 kHz tick means up to 1 ms of release jitter
- Higher-priority task preemption -- variable execution times in higher-priority tasks cause variable interference
- ISR execution time -- long or variable-length ISRs delay all tasks
- Mutex blocking -- even with priority inheritance, a task must wait for the mutex holder to finish its critical section
- Cache and pipeline effects -- data-dependent execution times
How to measure timing in practice:
The most reliable method is a GPIO toggle measured with an oscilloscope or logic analyzer:
/* Toggle a GPIO pin around the critical code path */void vControlTask(void *pvParam) {for (;;) {GPIO_SET(DEBUG_PIN); /* Pin goes HIGH */run_control_algorithm(); /* The code under measurement */GPIO_CLR(DEBUG_PIN); /* Pin goes LOW */vTaskDelayUntil(&xLastWake, pdMS_TO_TICKS(1));}}/* Measure HIGH pulse width on oscilloscope:- min pulse = best-case execution time- max pulse = WCET- variation = jitter */
For system-level tracing, tools like SEGGER SystemView and Percepio Tracealyzer can record every task switch, ISR entry/exit, and kernel API call with microsecond timestamps, letting you reconstruct the full scheduling timeline and identify the worst-case scenario.
Tickless Idle
In a standard RTOS configuration, the SysTick timer fires at configTICK_RATE_HZ (typically 1000 Hz), waking the CPU every 1 ms even when no task is ready to run. For battery-powered devices, this constant waking is a significant power drain.
Tickless idle solves this by suppressing tick interrupts during idle periods:
Standard tick mode (1 kHz):SysTick: ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑CPU: ██___██___██___██___██___██___██___██___██___██Wake every 1 ms even if idle ─────────────────Tickless idle mode:SysTick: ↑ ↑ ↑ ↑ ↑ ↑CPU: ██_██_██______ deep sleep _____██_██_██________↑ ↑No tasks ready: Next task wakes:program wake-up timer compensate tick countfor 47 ms, enter and resume normaldeep sleep ticking
How it works in FreeRTOS:
- The idle task checks the next scheduled task wake-up time
- If the idle period exceeds
configEXPECTED_IDLE_TIME_BEFORE_SLEEPticks (default: 2), it callsvPortSuppressTicksAndSleep() - This function programs a low-power timer (e.g., RTC or LPTIM) for the expected idle duration, stops SysTick, and enters a low-power sleep mode
- On wake-up (from the timer or any other interrupt), it calculates the elapsed time and advances the tick count to compensate for the suppressed ticks
Trade-offs:
- Timer resolution vs accuracy -- a 32 kHz RTC gives ~30 us resolution, which may cause slight tick drift
- Wake-up latency -- exiting deep sleep modes can take 5-50 us depending on the MCU, adding to response time
- Peripheral state -- some peripherals lose state in deep sleep, requiring re-initialization
Debugging Story: The Missed 1 kHz Deadline
A team had a motor control system with a 1 kHz control loop running at the highest priority. Most of the time it met its 1 ms deadline, but roughly once every 200 ms, the control output was delayed by 2 ms, causing audible motor vibration.
The first instinct was priority inversion, but priority inheritance was already enabled on all mutexes. Using SEGGER SystemView, they captured a trace showing the exact sequence: the 1 kHz control task would occasionally become ready while a medium-priority data-logging task held a mutex that the control task also needed. Even though priority inheritance immediately boosted the logger's priority to match the control task, the logger was in the middle of a 2 ms critical section (formatting and writing a log record while holding the mutex). The control task had to wait for those 2 ms regardless of priority -- inheritance only prevents a third (lower-priority) task from preempting the mutex holder. It does not make the holder run faster.
The fix was to redesign the data-sharing pattern. Instead of the control task and logger sharing a mutex-protected data structure, the control task wrote its output into a lock-free single-producer/single-consumer ring buffer. The logger read from the ring buffer without any mutex. The critical section was eliminated entirely, and the control task never blocked again.
The lesson: priority inheritance does not bound the blocking time -- it only prevents unbounded priority inversion. If a lower-priority task's critical section is too long, the high-priority task will still be blocked for the full duration of that critical section. The real fix is to minimize critical section length or eliminate shared mutexes through lock-free designs.
What Interviewers Want to Hear
Interviewers want to hear that you understand the mechanics beneath the abstraction: how the scheduler actually decides which task to run, what physically happens during a context switch, why RMS works and where it breaks down, and how tickless idle saves power without sacrificing timing. They are especially interested in candidates who can reason about worst-case behavior (not just typical behavior), identify sources of jitter and blocking, and explain trade-offs between determinism, power, and complexity.
Interview Focus
Classic Interview Questions
Q1: "What is the difference between preemptive and cooperative scheduling?"
Model Answer Starter: "In preemptive scheduling, the RTOS forcibly switches tasks whenever a higher-priority task becomes ready -- the running task gets no say. This provides deterministic response times because the highest-priority ready task always runs within one tick or less. The trade-off is that shared data must be protected with mutexes or critical sections, since a task can be preempted at any point. In cooperative scheduling, tasks run until they explicitly yield or call a blocking API. This eliminates the need for most synchronization, but a single task that takes too long -- or gets stuck -- blocks every other task in the system. Nearly all production RTOSes default to preemptive because determinism matters more than simplicity in real-time systems."
Q2: "What happens during a context switch on Cortex-M?"
Model Answer Starter: "A context switch occurs in two stages. First, the hardware automatically saves eight registers -- R0-R3, R12, LR, PC, and xPSR -- onto the current task's stack when entering the PendSV exception. Second, the PendSV handler in the RTOS port layer manually saves R4-R11, stores the updated stack pointer into the current task's TCB, loads the new task's stack pointer from its TCB, pops R4-R11, and returns from exception. The exception return automatically restores R0-R3, R12, LR, PC, and xPSR from the new task's stack. If the FPU is enabled, lazy stacking defers saving S0-S15 and FPSCR until the new task actually uses a floating-point instruction, which keeps the switch fast for integer-only tasks."
Q3: "Explain rate monotonic scheduling and its utilization bound."
Model Answer Starter: "RMS assigns fixed priorities based on task period -- shorter period means higher priority. It is provably optimal among fixed-priority algorithms: if any fixed-priority assignment can meet all deadlines, RMS can too, assuming deadlines equal periods. The Liu-Layland utilization bound gives a sufficient (but not necessary) schedulability test: total CPU utilization must be under n times the quantity 2 to the power of 1/n minus 1. For three tasks, that is about 78%. As the number of tasks grows, the bound converges to ln(2), approximately 69.3%. Important caveat: this analysis assumes independent tasks with no shared resources and zero context switch overhead, so in practice you verify with response-time analysis."
Q4: "How do you measure and optimize worst-case response time?"
Model Answer Starter: "I start by measuring WCET using a GPIO toggle and oscilloscope -- set a pin high at task start, clear it at task end, and record the maximum pulse width over a long run. For system-level response time, I use a trace tool like SEGGER SystemView or Percepio Tracealyzer to capture every context switch and ISR, then identify the longest gap between a task becoming ready and completing execution. To optimize, I look at the biggest contributors: overly long critical sections that block higher-priority tasks, ISRs that run too long and delay PendSV, and unnecessary preemptions from same-priority time-slicing. The fix is often to shorten critical sections, defer ISR work to tasks, or split complex tasks into smaller, more predictable pieces."
Q5: "What is tickless idle and why does it matter for battery devices?"
Model Answer Starter: "In standard mode, the RTOS tick interrupt fires every 1 ms, waking the CPU each time even if no task is ready. For a battery device spending 95% of its time idle, that is 950 unnecessary wake-ups per second, each consuming energy for exception entry, scheduler check, and re-entry to sleep. Tickless idle eliminates this by stopping the SysTick timer when no task is ready. The RTOS calculates when the next task will wake, programs a low-power timer for that duration, and puts the MCU into deep sleep. On wake-up, it reads the elapsed time and compensates the tick count. The trade-off is slightly reduced tick accuracy due to timer resolution and wake-up latency, but for battery devices the power savings are enormous -- often 10x or more reduction in idle current."
Trap Alerts
- Do not say: "Preemptive scheduling always guarantees deadlines" -- a high-priority task blocked on a mutex held by a lower-priority task can still miss its deadline, even with priority inheritance
- Do not forget: Context switch cost is not just registers -- with FPU enabled, lazy stacking behavior and its impact on determinism must be considered
- Do not ignore: RMS assumes independent tasks with no shared resources -- real systems need response-time analysis that accounts for blocking
Follow-up Questions
- "How does priority inheritance differ from priority ceiling protocol, and when would you choose one over the other?"
- "What happens to tickless idle accuracy when using a 32 kHz RTC versus SysTick for the wake-up timer?"
- "How would you handle a mixed system with both hard real-time and soft real-time tasks?"
- "What is the effect of ISR execution time on task schedulability?"
Practice
❓ In preemptive scheduling, what triggers a context switch?
❓ On Cortex-M, which registers does the hardware automatically save during a context switch (exception entry)?
❓ What is the RMS utilization bound for 3 independent periodic tasks?
❓ What does tickless idle do to save power?
❓ Why does priority inheritance NOT fully solve the problem of a high-priority task being blocked by a mutex?
Real-World Tie-In
Wearable Health Monitor -- A wrist-worn heart-rate and SpO2 monitor ran three periodic tasks: a 200 Hz sensor sampling task, a 10 Hz signal processing task, and a 1 Hz BLE advertisement task. RMS priority assignment (sampling at highest priority) kept all deadlines met with total utilization under 40%. Tickless idle was enabled with an LPTIM wake-up source, dropping idle-mode current from 1.2 mA (with SysTick running) to 8 uA. Battery life improved from 3 days to over 2 weeks.
Industrial Motor Controller -- A three-phase motor drive used a 10 kHz current control loop as the highest-priority task, with a 1 kHz speed loop and a 100 Hz diagnostic task below it. During integration, the diagnostic task occasionally caused the current loop to miss its 100 us deadline. Tracing revealed the diagnostic task held a mutex while formatting a CAN message (taking up to 150 us), and the current loop needed the same mutex to read motor parameters. Replacing the shared mutex with a double-buffered read pattern eliminated the blocking entirely, reducing worst-case current loop jitter from 150 us to under 3 us.