How do you implement a watchdog feeding strategy in an RTOS?

Question

Accepted Answer

In an RTOS, a single watchdog feed in one task does not protect the other tasks — if the network task hangs but the sensor task keeps feeding the watchdog, the system appears healthy while half of its functionality is dead. The solution is a watchdog manager (sometimes called a watchdog supervisor) pattern that aggregates health information from all critical tasks before feeding the hardware watchdog.

The architecture works as follows: each critical task has a "check-in" flag or counter in a shared array. Within its normal execution cycle, each task sets its flag — task_alive[TASK_SENSOR] = true — at a point that is only reached after the task has completed its essential work for that cycle. A dedicated watchdog manager task runs periodically (e.g., every 100-200 ms) at a low-to-medium priority. On each run, it inspects every monitored task's flag. Only if all flags are set does the manager feed the hardware watchdog. After feeding (or after detecting a missing flag), the manager clears all flags for the next monitoring cycle.

c

// Watchdog manager task (FreeRTOS example)
#define NUM_TASKS 4
static volatile bool task_checkin[NUM_TASKS];

void watchdog_manager_task(void *param) {
    while (1) {
        vTaskDelay(pdMS_TO_TICKS(100));
        bool all_alive = true;
        for (int i = 0; i < NUM_TASKS; i++) {
            if (!task_checkin[i]) {
                all_alive = false;
                log_error("Task %d missed check-in", i);
            }
            task_checkin[i] = false;  // Reset for next cycle
        }
        if (all_alive) {
            IWDG_feed();
        }
    }
}

// Called by each monitored task after completing its critical work
void watchdog_checkin(uint8_t task_id) {
    task_checkin[task_id] = true;
}

This pattern catches several failure modes: (1) any single task hanging — its flag is never set; (2) priority inversion starving a low-priority task — it cannot run and therefore cannot check in; (3) deadlocks between two or more tasks — at least one task's flag will be missing. The watchdog timeout must be set longer than the monitoring period plus the worst-case latency for all tasks to complete one cycle and check in. If the manager runs every 100 ms and the slowest task takes up to 80 ms per cycle, a 300-500 ms watchdog timeout provides adequate margin.

A subtle design decision: the watchdog manager should run at a low-to-medium priority, not the highest. If it runs at the highest priority, it always gets CPU time and always feeds on schedule — but a priority inversion that starves a medium-priority task would go undetected because the high-priority manager does not depend on the medium-priority task for its own scheduling.