How do you implement a watchdog feeding strategy in an RTOS?
In an RTOS, a single watchdog feed in one task does not protect the other tasks — if the network task hangs but the sensor task keeps feeding the watchdog, the system appears healthy while half of its functionality is dead. The solution is a watchdog manager (sometimes called a watchdog supervisor) pattern that aggregates health information from all critical tasks before feeding the hardware watchdog.
The architecture works as follows: each critical task has a "check-in" flag or counter in a shared array. Within its normal execution cycle, each task sets its flag — task_alive[TASK_SENSOR] = true — at a point that is only reached after the task has completed its essential work for that cycle. A dedicated watchdog manager task runs periodically (e.g., every 100-200 ms) at a low-to-medium priority. On each run, it inspects every monitored task's flag. Only if all flags are set does the manager feed the hardware watchdog. After feeding (or after detecting a missing flag), the manager clears all flags for the next monitoring cycle.
// Watchdog manager task (FreeRTOS example)#define NUM_TASKS 4static volatile bool task_checkin[NUM_TASKS];void watchdog_manager_task(void *param) {while (1) {vTaskDelay(pdMS_TO_TICKS(100));bool all_alive = true;for (int i = 0; i < NUM_TASKS; i++) {if (!task_checkin[i]) {all_alive = false;log_error("Task %d missed check-in", i);}task_checkin[i] = false; // Reset for next cycle}if (all_alive) {IWDG_feed();}}}// Called by each monitored task after completing its critical workvoid watchdog_checkin(uint8_t task_id) {task_checkin[task_id] = true;}
This pattern catches several failure modes: (1) any single task hanging — its flag is never set; (2) priority inversion starving a low-priority task — it cannot run and therefore cannot check in; (3) deadlocks between two or more tasks — at least one task's flag will be missing. The watchdog timeout must be set longer than the monitoring period plus the worst-case latency for all tasks to complete one cycle and check in. If the manager runs every 100 ms and the slowest task takes up to 80 ms per cycle, a 300-500 ms watchdog timeout provides adequate margin.
A subtle design decision: the watchdog manager should run at a low-to-medium priority, not the highest. If it runs at the highest priority, it always gets CPU time and always feeds on schedule — but a priority inversion that starves a medium-priority task would go undetected because the high-priority manager does not depend on the medium-priority task for its own scheduling.
Source: Watchdog Q&A
