What causes stack overflow in an RTOS task and how do you detect it?

Question

Accepted Answer

Each RTOS task has its own stack, allocated when the task is created. Stack overflow occurs when a task uses more stack space than allocated — typically due to deep function call chains, large local arrays, or excessive recursion. Because tasks share the same address space, a stack overflow in one task silently corrupts adjacent memory — which could be another task's stack, a global variable, or the heap — causing seemingly unrelated failures.

Detection methods include: (1) Stack watermarking — the RTOS fills each task's stack with a known pattern (e.g., 0xA5A5A5A5) at creation. You can periodically check how much of the pattern remains to determine high-water-mark usage. FreeRTOS provides uxTaskGetStackHighWaterMark() for this. (2) Runtime stack checking — the RTOS checks for overflow on each context switch by verifying that the stack pointer is within bounds or that a sentinel value at the stack boundary is intact. FreeRTOS provides configCHECK_FOR_STACK_OVERFLOW (methods 1 and 2). (3) MPU-based protection — on Cortex-M with an MPU, you can set a stack guard region that triggers a memory fault on overflow. (4) Static analysis — tools can analyze the call graph and compute worst-case stack depth at compile time.

Sizing task stacks correctly requires measuring actual usage under worst-case conditions (e.g., the deepest call path including any ISR stacking if using the process stack). A common rule of thumb is to set the stack to 1.5–2x the measured high-water mark during development, then tighten it for production. Always leave margin for interrupt stacking.