Explain CAN fault confinement. What are the three error states, and how does a node transition between them?
CAN fault confinement prevents a single malfunctioning node from disrupting the entire network. Each node maintains two error counters: the Transmit Error Counter (TEC) and the Receive Error Counter (REC). These counters increment on detected errors and decrement on successful transactions, with the increment steps being larger than the decrement steps so that a persistently faulty node's counters ratchet upward.
The three states are:
Error Active (TEC under 128, REC under 128): This is the normal operating state. When an Error Active node detects an error, it transmits an active error flag — six consecutive dominant bits. This deliberately violates the bit-stuffing rule, which guarantees all other nodes also detect the error and discard the frame. The dominant error flag aggressively corrupts the bus to force a fast, coordinated error recovery. All healthy nodes start in this state.
Error Passive (TEC at or above 128, or REC at or above 128): The node has accumulated too many errors and is now considered potentially faulty. It can still transmit and receive, but when it detects an error, it sends a passive error flag — six consecutive recessive bits. Because recessive bits do not override other traffic, a passive error flag has minimal impact on the bus. Additionally, an Error Passive node must wait an extra 8-bit-time "suspend transmission" period after transmitting before it can initiate a new transmission. This gives healthy nodes priority and reduces the faulty node's bus utilization.
Bus Off (TEC at or above 256): The node is disconnected from the bus — it does not transmit or receive anything. To recover, the node must detect 128 occurrences of 11 consecutive recessive bits on the bus (equivalent to 128 idle bus periods), after which it resets both counters to zero and returns to Error Active. This recovery process takes at least 128 x 11 = 1408 bit times, giving the rest of the network time to stabilize.
The transition path is always: Error Active to Error Passive to Bus Off, driven by rising error counters. Recovery is only from Bus Off back to Error Active (never directly from Passive to Active). This progressive escalation ensures that intermittent faults cause temporary throttling (Error Passive), while persistent faults cause full isolation (Bus Off), protecting the rest of the network.
Source: CAN Q&A
