Search topics...
WatchdogSafety and Recoveryfoundational

What should your system do after a watchdog reset to prevent reset loops?

0 upvotes
Practice with AISoon
Study the fundamentals first — Watchdog topic page

A watchdog reset indicates something went wrong. If the root cause persists — a corrupted configuration in flash, a peripheral stuck in an error state, a hardware fault, or an environmental condition like sustained over-temperature — the system will hang again immediately after restarting, triggering another watchdog reset, creating an infinite reset loop. This is worse than a single failure: the system never stabilizes, the rapid cycling may wear out flash memory (repeated boot writes to the same sectors), mechanical actuators may cycle dangerously (motors jerking on each boot), and the power supply stress from repeated inrush current can damage components.

Strategy 1 — Count consecutive watchdog resets: Maintain a reset counter in battery-backed RAM (or a persistent register that survives warm resets but not power-on resets). Increment it on each watchdog reset and clear it on normal operation after a stability period (e.g., 30 seconds of successful running). If the counter exceeds a threshold (e.g., 3 resets within 5 minutes), enter a safe mode with minimal functionality — disable the failing subsystem, load factory-default configuration, halt actuation outputs, and wait for external intervention (debugger connection, configuration command, or manual power cycle).

Strategy 2 — Log diagnostics for post-mortem analysis: At startup, read the reset cause flags in RCC_CSR to distinguish watchdog resets from power-on resets, pin resets, and brownout resets. For watchdog resets, retrieve any saved fault information (program counter at fault, stack pointer, fault status registers from a preceding HardFault handler) and store it to a dedicated diagnostic partition in flash or external EEPROM. This data is invaluable for root-cause analysis — it tells the field engineer or developer exactly where the code was when it failed.

c
void check_reset_reason(void) {
if (__HAL_RCC_GET_FLAG(RCC_FLAG_IWDGRST)) {
wdt_reset_count++; // In battery-backed RAM
save_reset_log(wdt_reset_count, saved_pc, saved_lr);
if (wdt_reset_count >= MAX_WDT_RESETS) {
enter_safe_mode(); // Minimal operation
}
} else if (__HAL_RCC_GET_FLAG(RCC_FLAG_PORRST)) {
wdt_reset_count = 0; // Clean power-on, reset counter
}
__HAL_RCC_CLEAR_RESET_FLAGS();
}

Strategy 3 — Progressive degradation: On first watchdog reset, attempt full normal restart. On second, disable non-essential subsystems (logging, telemetry, secondary sensors). On third, enter a minimal safe state that maintains only critical safety functions — a motor controller parks the motor and holds brakes; a medical infusion pump closes the valve and sounds an alarm; a communication gateway forwards only priority-1 messages. Communicate the fault condition through whatever channel remains operational: an LED blink pattern, a CAN error frame, a serial log message, or a status register readable via debugger.

Source: Watchdog Q&A