How does a watchdog timer work in an RTOS context?

Question

Accepted Answer

A watchdog timer (WDT) is a hardware timer that resets the processor if it is not periodically "kicked" (refreshed) by software. Its purpose is to recover from software hangs — if the main loop or a critical task gets stuck (due to a deadlock, infinite loop, or corrupted program counter), the watchdog expires and forces a system reset. In a bare-metal super-loop, watchdog usage is straightforward: kick it once per iteration of the main loop. In an RTOS, it is more nuanced because there are multiple tasks. Simply kicking the watchdog from one task does not guarantee that other tasks are still running. A robust RTOS watchdog pattern is: (1) create a dedicated watchdog task at a priority that allows it to monitor all other tasks; (2) each monitored task periodically "checks in" (e.g., sets a flag or increments a counter); (3) the watchdog task verifies that all monitored tasks have checked in within their expected period; (4) only if all tasks have checked in does the watchdog task kick the hardware WDT. This way, if any single task hangs, it stops checking in, the watchdog task stops kicking the WDT, and the system resets. An alternative approach uses a window watchdog — the WDT must be kicked within a specific time window (not too early, not too late), catching both hangs and tasks that run too fast (which might indicate a logic error). In safety-critical systems, an external watchdog IC (separate from the MCU) is often used because a software bug that corrupts the MCU's WDT peripheral registers could disable an internal watchdog.