Search topics...
WatchdogWatchdog Basicsfoundational

Why do watchdog timers exist? What problem do they solve?

0 upvotes
Practice with AISoon
Study the fundamentals first — Watchdog topic page

Watchdog timers exist because software can fail in ways that the software itself cannot detect or recover from. An infinite loop caused by unexpected input, a deadlock between concurrent tasks, a stack overflow that corrupts the return address and sends the program counter into garbage memory, a hardware glitch that flips a bit in a control register, or a cosmic ray-induced single-event upset that alters RAM — all of these can leave an embedded system in a hung or erratic state with no internal mechanism for recovery. Unlike desktop software, embedded systems often have no user to notice the hang and press Ctrl-C, and no operating system to kill the runaway process.

A watchdog timer is an independent hardware countdown timer that resets the system if firmware does not periodically "feed" (refresh, kick) it before it reaches zero. The logic is deliberately simple: if the software is running correctly and progressing through its intended execution path, it will feed the watchdog on schedule. If the software is stuck, corrupted, or otherwise non-functional, it will miss the feed deadline, and the watchdog timeout triggers a hardware reset. The watchdog operates independently of the CPU and the software — it is clocked by its own oscillator (on the IWDG) and cannot be disabled by runaway code once started (on most MCUs, the IWDG cannot be stopped after activation).

In safety-critical systems — medical devices, automotive ECUs, industrial controllers, avionics — a watchdog is not optional. Standards like IEC 61508 (industrial functional safety), ISO 26262 (automotive), and IEC 62304 (medical device software) mandate independent monitoring of program execution. The watchdog is the simplest and most widely used mechanism to satisfy this requirement. Even in non-safety-critical products, a watchdog is considered baseline engineering practice because field failures from software hangs are inevitable over millions of device-hours.

Source: Watchdog Q&A