CAN Protocol — Interview Questions & Answers

Basics and Architecture

QWhat is CAN and why is it the dominant bus in automotive and industrial systems?

CAN (Controller Area Network) is a multi-master, message-based serial protocol designed by Bosch in the 1980s for reliable real-time communication in electrically noisy environments. It uses a two-wire differential bus (CAN_H, CAN_L) with a maximum data rate of 1 Mbit/s for classic CAN and up to 8 Mbit/s (data phase) for CAN-FD.

CAN dominates automotive and industrial applications for several reasons that other protocols do not match in combination. Deterministic priority-based arbitration ensures the highest-priority message always wins bus access without any delay or corruption — critical for safety systems like ABS and airbag controllers that cannot tolerate unpredictable latency. Five independent error detection mechanisms (bit, stuff, CRC, form, ACK) catch virtually all transmission errors, and automatic retransmission ensures corrupted messages are resent without application-layer intervention. Fault confinement means a malfunctioning node progressively reduces its participation and eventually disconnects itself from the bus, preventing one bad ECU from taking down the entire vehicle network.

The bus architecture itself is remarkably robust: differential signaling rejects common-mode noise from ignition systems, motors, and solenoids; a single twisted pair replaces the point-to-point wiring harness that would otherwise connect dozens of ECUs; and any node can initiate communication without a central master. This combination of reliability, determinism, and simplicity of wiring is why CAN has been mandatory in all vehicles sold in the US since 2008 (OBD-II) and remains the backbone of automotive networking even as Ethernet enters the vehicle for high-bandwidth applications.

QExplain the CAN frame format. What is the purpose of each field?

A standard CAN 2.0A data frame consists of seven fields, each serving a specific purpose:

Start of Frame (SOF) — a single dominant (0) bit that marks the beginning of a frame. All nodes synchronize their bit timing on the falling edge of SOF.

Arbitration Field — the 11-bit message identifier (standard CAN) or 29-bit identifier (extended CAN 2.0B), followed by the RTR (Remote Transmission Request) bit. The identifier determines both the message's content and its priority during arbitration. Lower identifier values have higher priority because dominant (0) bits override recessive (1) bits.

Control Field — includes the IDE bit (standard vs. extended frame), a reserved bit, and the 4-bit DLC (Data Length Code) specifying the number of data bytes (0-8).

Data Field — 0 to 8 bytes of payload. The content and meaning are defined by the application protocol (CANopen, J1939, etc.), not by CAN itself.

CRC Field — a 15-bit CRC computed over all preceding fields (SOF through data), followed by a CRC delimiter (recessive bit). This catches transmission errors with extremely high probability — the Hamming distance of 6 means any 5 or fewer bit errors are guaranteed to be detected.

ACK Field — the transmitter sends a recessive bit; any receiver that has successfully received the frame pulls the ACK slot dominant. If the transmitter reads the ACK slot as recessive, no node acknowledged the frame, indicating either no receivers are on the bus or all receivers detected an error.

End of Frame (EOF) — seven recessive bits that mark the end of the frame, followed by a 3-bit inter-frame space (IFS) before the next frame can begin.

A common interview follow-up: "What is the maximum frame length?" For a standard frame with 8 data bytes, the fixed fields total 47 bits. With worst-case bit stuffing (a stuff bit every 5 bits), the total is approximately 130 bits, giving a maximum frame rate of about 7700 frames/second at 1 Mbit/s.

Arbitration

QWalk through CAN arbitration with a concrete bit-by-bit example.

CAN uses CSMA/CR (Carrier Sense Multiple Access with Collision Resolution) — nodes listen for an idle bus, then begin transmitting simultaneously. Arbitration happens non-destructively on the identifier field, exploiting the physical property that a dominant bit (0) always overrides a recessive bit (1) on the open-collector/open-drain bus.

Consider three nodes transmitting simultaneously:

Node A: ID = 0x64A = 0b110 0100 1010
Node B: ID = 0x649 = 0b110 0100 1001
Node C: ID = 0x658 = 0b110 0101 1000

Bit-by-bit (MSB first, bit 10 down to bit 0):

Bit	A sends	B sends	C sends	Bus	Result
10	1	1	1	1	All match — continue
9	1	1	1	1	All match — continue
8	0	0	0	0	All match — continue
7	0	0	0	0	All match — continue
6	1	1	1	1	All match — continue
5	0	0	0	0	All match — continue
4	0	0	1	0	C loses — C sent 1 (recessive) but read 0 (dominant). C stops transmitting.
3	1	1	—	1	A and B match — continue
2	0	0	—	0	A and B match — continue
1	1	0	—	0	A loses — A sent 1 but read 0. A stops.
0	—	1	—	1	B wins — sole remaining transmitter

Node B has the lowest identifier (0x649) and wins arbitration. Its frame is transmitted without any corruption or delay. Nodes A and C become receivers and retry their transmissions after the bus returns to idle. The key insight: lower ID = higher priority, and the winning node never even knows that arbitration occurred — from its perspective, every bit it sent appeared correctly on the bus.

This is why message identifiers in CAN are not arbitrary — they must be carefully assigned based on the priority requirements of each message. Safety-critical messages (brake commands, airbag triggers) are assigned the lowest IDs.

Error Detection

QExplain the five error detection mechanisms in CAN. Why are there five?

CAN implements five independent error detection mechanisms at the protocol level, providing a residual error probability of less than 4.7 x 10^-11 (one undetected error in 1000 years of continuous operation at maximum bus load). Each mechanism catches a different class of fault:

1. Bit Error: After each bit, the transmitter reads the bus and compares it to what it sent. If it sent dominant but read recessive (or vice versa), a bit error is flagged. The exception is during the arbitration field and ACK slot, where reading a different value is expected behavior (arbitration loss and receiver acknowledgment respectively). Bit errors catch transceiver faults and gross bus disturbances.

2. Stuff Error: CAN uses bit stuffing — after 5 consecutive bits of the same polarity, the transmitter inserts a complementary stuff bit. If the receiver detects 6 consecutive same-polarity bits, it flags a stuff error. Bit stuffing ensures enough signal transitions for receiver clock synchronization, and a stuff error typically indicates noise corruption or clock drift.

3. CRC Error: The transmitter computes a 15-bit CRC over the frame content (SOF through data field) using the polynomial x^15 + x^14 + x^10 + x^8 + x^7 + x^4 + x^3 + 1 and appends it. The receiver independently computes the CRC and compares. A mismatch flags a CRC error. This catches any combination of burst errors shorter than 15 bits and virtually all longer error patterns.

4. Form Error: Certain fields in the CAN frame have fixed values — the CRC delimiter, ACK delimiter, and EOF must all be recessive. If any of these fixed-form bits is dominant, a form error is flagged. This catches framing desynchronization and gross protocol violations.

5. ACK Error: After transmitting the CRC, the transmitter drives the ACK slot recessive and checks whether any receiver pulls it dominant. If the ACK slot remains recessive, no node acknowledged the frame — either no receivers exist, or all receivers detected an error. The transmitter flags an ACK error and retransmits.

The reason for five mechanisms is defense in depth. Each one targets a different failure mode — transceiver faults (bit error), clock drift (stuff error), data corruption (CRC), framing loss (form error), and receiver absence (ACK error). Together, they make CAN one of the most reliable serial protocols ever designed for real-time systems.

Fault Confinement

QExplain CAN fault confinement. What are the three error states, and how does a node transition between them?

CAN fault confinement prevents a single malfunctioning node from disrupting the entire network. Each node maintains two error counters: the Transmit Error Counter (TEC) and the Receive Error Counter (REC). These counters increment on detected errors and decrement on successful transactions, with the increment steps being larger than the decrement steps so that a persistently faulty node's counters ratchet upward.

The three states are:

Error Active (TEC under 128, REC under 128): This is the normal operating state. When an Error Active node detects an error, it transmits an active error flag — six consecutive dominant bits. This deliberately violates the bit-stuffing rule, which guarantees all other nodes also detect the error and discard the frame. The dominant error flag aggressively corrupts the bus to force a fast, coordinated error recovery. All healthy nodes start in this state.

Error Passive (TEC at or above 128, or REC at or above 128): The node has accumulated too many errors and is now considered potentially faulty. It can still transmit and receive, but when it detects an error, it sends a passive error flag — six consecutive recessive bits. Because recessive bits do not override other traffic, a passive error flag has minimal impact on the bus. Additionally, an Error Passive node must wait an extra 8-bit-time "suspend transmission" period after transmitting before it can initiate a new transmission. This gives healthy nodes priority and reduces the faulty node's bus utilization.

Bus Off (TEC at or above 256): The node is disconnected from the bus — it does not transmit or receive anything. To recover, the node must detect 128 occurrences of 11 consecutive recessive bits on the bus (equivalent to 128 idle bus periods), after which it resets both counters to zero and returns to Error Active. This recovery process takes at least 128 x 11 = 1408 bit times, giving the rest of the network time to stabilize.

The transition path is always: Error Active to Error Passive to Bus Off, driven by rising error counters. Recovery is only from Bus Off back to Error Active (never directly from Passive to Active). This progressive escalation ensures that intermittent faults cause temporary throttling (Error Passive), while persistent faults cause full isolation (Bus Off), protecting the rest of the network.

CAN-FD

QWhat are the key differences between classic CAN and CAN-FD? Is CAN-FD backward compatible?

CAN-FD (Flexible Data-rate) is an extension of classic CAN that addresses its two main limitations: the 8-byte payload limit and the 1 Mbit/s speed ceiling. CAN-FD was standardized in ISO 11898-1:2015.

Feature	Classic CAN (2.0)	CAN-FD
Payload	0-8 bytes	0-8, 12, 16, 20, 24, 32, 48, or 64 bytes
Bit rate	Up to 1 Mbit/s (entire frame)	Arbitration at up to 1 Mbit/s; data phase at up to 8 Mbit/s
CRC	15-bit	17-bit (up to 16 bytes) or 21-bit (up to 64 bytes)
Stuff bit count	Not tracked	Included in CRC calculation (gray-coded counter), improving error detection
BRS bit	N/A	Bit Rate Switch — signals the transition to the faster data-phase bit rate
ESI bit	N/A	Error State Indicator — tells receivers whether the transmitter is Error Active or Error Passive

The speed improvement works by using two different bit rates within a single frame. The arbitration phase (SOF through the BRS bit) uses the standard 1 Mbit/s rate to maintain compatibility with all nodes' oscillator tolerances and propagation delays. After the BRS bit, the data phase switches to a higher rate (2, 4, 5, or 8 Mbit/s) for the payload and CRC, then switches back to the nominal rate for the ACK and EOF fields.

CAN-FD is partially backward compatible. CAN-FD and classic CAN nodes can coexist on the same physical bus, but classic CAN controllers will detect CAN-FD frames as errors (because the FDF/EDL bit violates the classic frame format) and transmit error flags. This means you cannot mix classic and FD traffic on the same bus. The common migration strategy is: (1) upgrade all nodes to CAN-FD capable hardware, (2) initially run the bus in classic CAN mode, (3) switch to CAN-FD mode once all nodes are upgraded. The physical layer (transceivers, bus topology, termination) is identical.

Bit Timing and Synchronization

QHow does CAN bit timing work? What is the sample point and why does it matter?

CAN is an asynchronous protocol — each node has its own local oscillator, and there is no shared clock wire. Nodes stay synchronized by resynchronizing on signal edges within the data stream (enabled by bit stuffing, which guarantees edges at least every 5 bit times).

Each bit period is divided into segments measured in time quanta (TQ) — the fundamental timing unit derived by dividing the CAN controller's clock by a prescaler. A single bit consists of four segments:

Sync Segment (SYNC_SEG) — always 1 TQ. The expected edge transition occurs here.
Propagation Segment (PROP_SEG) — compensates for physical propagation delays (signal travel time across the bus and through transceivers). Typically 1-8 TQ.
Phase Segment 1 (PHASE_SEG1) — can be lengthened by the resynchronization mechanism. Typically 1-8 TQ.
Phase Segment 2 (PHASE_SEG2) — can be shortened by resynchronization. Typically 1-8 TQ.

The sample point is the instant when the bus level is read and interpreted as the current bit value. It is located at the boundary between Phase Segment 1 and Phase Segment 2. Its position within the bit period (expressed as a percentage) is critical:

Too early (e.g., 50%): the sample occurs before the signal has fully settled after propagation delays, risking sampling a transitioning signal.
Too late (e.g., 95%): leaves almost no Phase Segment 2, reducing the tolerance for oscillator drift and making resynchronization less effective.
Optimal: typically 75-87.5% for automotive CAN. The CiA (CAN in Automation) recommends 87.5% for CAN networks up to 125 kbit/s and shorter buses, and 75% for 1 Mbit/s networks.

Practical calculation example: for 500 kbit/s CAN with a 16 MHz oscillator, prescaler = 2, giving TQ = 125 ns. A bit period of 2 us = 16 TQ. Configuration: SYNC_SEG = 1, PROP_SEG = 5, PHASE_SEG1 = 6, PHASE_SEG2 = 4. Sample point = (1+5+6)/16 = 75%.

Getting the sample point wrong is a common cause of intermittent CAN errors, especially when mixing nodes from different vendors with different oscillator tolerances.

Differential Signaling and Termination

QWhy does CAN use differential signaling, and how does the bus physically work?

CAN uses a two-wire differential bus — CAN_H and CAN_L — where information is encoded in the voltage difference between the two wires rather than the absolute voltage on either wire. This is the primary reason CAN is so noise-resistant.

In the recessive state (logic 1), both CAN_H and CAN_L are driven to approximately 2.5V, so the differential voltage (CAN_H - CAN_L) is approximately 0V. In the dominant state (logic 0), CAN_H is driven to approximately 3.5V and CAN_L to approximately 1.5V, creating a differential voltage of approximately 2V. The CAN transceiver's receiver compares the two lines and interprets a differential voltage above about 0.9V as dominant and below about 0.5V as recessive.

The noise rejection works because electromagnetic interference from nearby motors, ignition coils, or switching regulators affects both wires equally (common-mode noise). When the receiver subtracts CAN_L from CAN_H, the common-mode component cancels out, and only the intended signal remains. A single-ended protocol (like UART at TTL levels) would interpret this common-mode noise as valid data transitions.

This is why CAN wiring should be twisted pair — twisting the two wires ensures they have equal exposure to external electromagnetic fields, maximizing common-mode rejection. The twist rate should be at least 1 turn per inch (40 turns per meter) for good rejection at automotive frequencies.

QWhy are 120-ohm termination resistors needed, and what happens without them?

Each end of the CAN bus must be terminated with a 120-ohm resistor connected between CAN_H and CAN_L. These resistors serve two purposes, both essential for reliable operation:

Signal reflection prevention: The CAN bus is a transmission line. When a signal reaches the end of an unterminated bus, it reflects back and interferes with the original signal (constructive or destructive interference depending on the round-trip delay). At 1 Mbit/s with a 40-meter bus, the reflected signal arrives approximately 400 ns later — within the same bit period — causing the transceiver to see ringing or an incorrect voltage level. Termination resistors absorb the signal energy at the bus ends, preventing reflections. The 120-ohm value matches the characteristic impedance of a standard CAN twisted-pair cable, providing maximum absorption.

DC bias and recessive state definition: The two 120-ohm resistors in parallel create a 60-ohm load between CAN_H and CAN_L. In the recessive state, the transceiver's internal bias drives both lines to 2.5V through this load, establishing a well-defined common-mode voltage. Without termination, the recessive voltage is poorly defined and susceptible to noise.

Symptoms of missing or incorrect termination:

Missing both terminators: the bus may appear to work at low speeds and short distances but fails intermittently under load or at higher speeds. CAN_H and CAN_L show ringing on an oscilloscope.
Missing one terminator: reflections from the unterminated end cause bit errors that increase with bus length and speed.
Wrong value (e.g., 60 ohm instead of 120 ohm): the total bus impedance is too low, overloading the transceivers' output drivers and causing voltage levels to fall outside the specification.
Extra termination in the middle: creates impedance discontinuities that cause partial reflections.

A practical diagnostic: measure the resistance between CAN_H and CAN_L with the bus powered down and all ECUs disconnected except the two end nodes. You should read 60 ohm (two 120-ohm resistors in parallel). This is the first check when debugging a new CAN bus that is not working.

Bus Load and Practical Design

QHow do you calculate CAN bus load, and why should it be kept below 70-80%?

Bus load is the percentage of time the bus is occupied by frame transmissions. It is calculated by summing the bit-time of all frames transmitted in one second and dividing by the bus bit rate:

text

Bus Load = (Sum of all frame lengths in bits per second) / (Bus bit rate) x 100%

For a standard CAN 2.0A frame with 8 data bytes, the nominal frame length is 111 bits (SOF + 11-bit ID + RTR + IDE + r0 + 4-bit DLC + 64 data bits + 15-bit CRC + CRC delim + ACK + ACK delim + 7-bit EOF). With worst-case bit stuffing (one stuff bit per 5 data bits in the stuffable region), the maximum frame length grows to approximately 130 bits. Add the 3-bit inter-frame space, and each frame occupies roughly 133 bit times.

Example: On a 500 kbit/s bus, a single message sent at 100 Hz occupies 133 bits x 100 / 500000 = 2.66% of bus bandwidth. A typical automotive body network with 30 messages at various rates might total 40-50% bus load.

Why keep it below 70-80%: CAN's arbitration mechanism works perfectly under any load, but high bus load has practical consequences. (1) Worst-case latency increases — a low-priority message may be delayed by multiple arbitration losses. At 90% load, a low-priority message might wait 10+ frame times before winning arbitration. For real-time systems, this latency must be bounded and budgeted. (2) Error recovery consumes bandwidth — a corrupted frame is retransmitted, temporarily adding to the load. At 90% load, one retransmission can push the bus into transient overload, cascading delays. (3) Oscillator tolerance — production CAN nodes have clock frequency variations (typically +/-0.5%). High bus load reduces the timing margin available to absorb these variations without errors.

Industry guidelines: automotive networks typically target 30-50% for safety-critical buses and 50-70% for body/comfort networks. CiA (CAN in Automation) recommends a maximum of 70% for industrial networks.

QWhat happens when a CAN node detects an error during reception or transmission?

When any CAN node detects an error (via any of the five detection mechanisms), it immediately begins transmitting an error frame to notify all other nodes. The error frame consists of an error flag followed by an error delimiter.

If the detecting node is Error Active, it transmits an active error flag: 6 consecutive dominant bits. This deliberately violates the bit-stuffing rule (which limits consecutive same-polarity bits to 5) and overwrites whatever is currently on the bus. Every other node on the bus detects this as a stuff error and also begins transmitting its own error flag. The result is that the entire corrupted frame is destroyed within 6-12 bit times, and all nodes discard it.

If the detecting node is Error Passive, it transmits a passive error flag: 6 consecutive recessive bits. Because recessive bits do not override the bus, a passive error flag does not disturb ongoing traffic from other nodes. It is effectively a silent notification — only the detecting node acts on it.

After the error flag, all nodes transmit an error delimiter (8 recessive bits) and then wait for the inter-frame space (3 bit times) before the bus returns to idle.

The original transmitter then automatically retransmits the message, starting arbitration from scratch. If the same message keeps failing, the transmitter's TEC increments by 8 for each failure, progressing it toward Error Passive and eventually Bus Off. Meanwhile, receivers that detected the error increment their REC by 1 (or 8 if they were the first to detect it, depending on the error type).

This automatic detection-flagging-retransmission cycle is completely transparent to the application layer — the CAN controller handles it in hardware. The application only sees successfully received, CRC-verified frames. The worst-case retransmission delay for a single error is approximately 23 bit times (6 error flag + 8 delimiter + 3 IFS + 6 SOF/arbitration overlap).

Bit Stuffing

QWhat is bit stuffing in CAN, and what problems would occur without it?

Bit stuffing is a data encoding technique where the transmitter inserts a complementary (opposite polarity) bit after every sequence of 5 consecutive bits of the same value. The receiver recognizes and removes these stuff bits to recover the original data. Stuffing applies to most of the frame — from SOF through the CRC sequence — but not to the fixed-format fields (CRC delimiter, ACK field, EOF), which have defined bit patterns.

Without bit stuffing, CAN would fail for two fundamental reasons:

Clock synchronization would be lost. CAN nodes have independent oscillators and synchronize by detecting edges (transitions between dominant and recessive) in the data stream. If the data contained a long run of identical bits (e.g., 20 consecutive zeros), there would be no edges for 20 bit times. During this time, the receiver's oscillator drifts relative to the transmitter's oscillator. Even a 0.5% frequency difference causes a 0.1-bit-time error after 20 bits — enough to shift the sample point into the wrong bit. Bit stuffing guarantees an edge at least every 6 bits (5 data bits + 1 stuff bit), limiting the maximum drift to about 0.03 bit times at 0.5% oscillator tolerance.

Error detection would be weaker. The stuff error mechanism provides an additional layer of error detection that catches bus faults and clock problems. If 6 consecutive same-polarity bits appear, it means either the data was corrupted or a stuff bit was lost — both indicating a fault. Removing bit stuffing would eliminate this mechanism entirely.

The cost of bit stuffing is increased frame length variability. In the best case (alternating data), no stuff bits are inserted and the frame length is minimal. In the worst case (all zeros or all ones in the data field), a stuff bit is inserted every 5 bits, increasing the frame length by up to 20%. This variability must be accounted for when calculating bus load and worst-case latency.