Basics and Frame Format
QWhat is UART and how does it differ from SPI and I2C?
UART (Universal Asynchronous Receiver-Transmitter) is an asynchronous, full-duplex serial protocol that uses two data wires — TX and RX — with no shared clock. Both endpoints must independently agree on the same baud rate before communication begins. This is fundamentally different from SPI and I2C, which are synchronous protocols where the master supplies a clock signal that slaves use for timing.
UART is strictly point-to-point: one transmitter talks to one receiver on each wire. There is no concept of addressing, bus arbitration, or multi-device sharing — if you need to talk to three peripherals, you need three separate UART ports (or a mux). SPI and I2C both support multiple devices on a shared bus. The trade-off is simplicity: UART requires only two wires (plus ground), has no clock skew issues, and is the easiest serial protocol to debug with an oscilloscope or logic analyzer because the frame format is self-contained.
UART is also the only one of the three that works over long distances when paired with a differential physical layer (RS-232, RS-485). SPI and I2C are limited to on-board or short-cable distances because their single-ended signals degrade with length.
QDescribe the UART frame structure and calculate its efficiency.
A UART frame consists of four parts transmitted in sequence: (1) a start bit (always logic low) that signals the beginning of a character, (2) 5 to 9 data bits (8 is standard), sent LSB first, (3) an optional parity bit for single-bit error detection, and (4) 1 or 2 stop bits (always logic high) that mark the end of the frame. Between frames the line idles high — this is what makes the falling edge of the start bit detectable.
For the most common configuration, 8N1 (8 data bits, no parity, 1 stop bit), each character requires 10 bit periods: 1 start + 8 data + 1 stop. The protocol efficiency is 8/10 = 80%. At 115200 baud, this gives a maximum throughput of 115200 / 10 = 11520 bytes per second, or approximately 11.25 KB/s. Adding parity (8E1 or 8O1) makes it 11 bit periods per character, dropping efficiency to ~73%. Some protocols like DMX512 use two stop bits (8N2), further reducing throughput to 8/12 = 67%.
A common interview trap: candidates confuse "baud rate" with "data rate." Baud rate is the number of symbol transitions per second (bits per second for binary signaling). Data rate is lower because of the framing overhead.
QWhat happens if the baud rates of two UART devices do not match?
If baud rates differ by more than about 3-5%, the receiving UART samples bits at the wrong points within each bit period, producing garbled data. The receiver uses the falling edge of the start bit to synchronize, then samples each subsequent bit at the center of its expected time window. As the frame progresses, timing error accumulates. By the time the receiver reaches the stop bit (bit 9 or 10), the cumulative drift may have shifted the sample point into the adjacent bit, causing framing errors and corrupted data.
The tolerance is tighter than many engineers expect. At 8N1, the receiver must correctly sample 10 consecutive bits. If the baud rate error is 5%, the sample point drifts by 0.5 bit periods over 10 bits — right at the boundary of the next bit. In practice, 2-3% is the safe limit, and you should aim for under 1%.
This is particularly important when deriving the baud rate from the MCU system clock. The UART baud rate generator divides the peripheral clock by an integer (or fractional) divisor. If the peripheral clock does not divide evenly into the target baud rate, there is an inherent error. For example, generating 115200 baud from a 8 MHz clock gives a divisor of 69.44 — rounding to 69 produces a 0.64% error. But generating 9600 baud from an 11.0592 MHz crystal gives an exact divisor of 1152, producing zero error. This is why 11.0592 MHz crystals were historically popular in UART-heavy designs.
Error Detection and Handling
QWhat error conditions can UART hardware detect, and how should firmware handle them?
UART peripherals detect four error conditions, each flagged in the status register:
- Framing error — the expected stop bit is not high. This usually means baud rate mismatch, noise corruption, or a break condition. It is the most common symptom of a misconfigured baud rate.
- Parity error — the number of 1-bits in the data plus the parity bit does not match the configured parity (even or odd). This indicates at least one bit was corrupted, but parity can only detect an odd number of bit errors — two flipped bits cancel out and go undetected.
- Overrun error — a new byte arrived before firmware read the previous byte from the receive data register. The hardware has no room to store it, and data is lost. This is the most dangerous error in production because it happens silently under load.
- Noise error (on some MCUs like STM32) — the three internal samples taken per bit did not all agree. The majority value is used, but the hardware flags that the bit was noisy.
Firmware should check error flags after every received byte (or in the receive ISR). For framing and parity errors, discard the byte and optionally request retransmission at the application layer. For overrun errors, the root cause is that the firmware is not draining the receive register fast enough — the fix is to use DMA or interrupt-driven reception with a ring buffer rather than polling. Simply clearing the error flag without addressing the throughput problem guarantees it will recur.
Flow Control
QExplain hardware flow control (RTS/CTS) and when it is necessary.
Hardware flow control adds two additional signals — RTS (Request to Send) and CTS (Clear to Send) — to prevent data loss when the receiver cannot keep up with the transmitter. The receiver asserts its RTS line to signal "I am ready to accept data." The transmitter checks the CTS line (connected to the receiver's RTS) before sending each byte; if CTS is de-asserted, the transmitter pauses until the receiver is ready again.
This is essential in two scenarios: (1) high baud rates where the receiver's ISR or DMA cannot drain the hardware FIFO fast enough, especially during bursts — for example, a Bluetooth module streaming data at 921600 baud into an MCU that occasionally disables interrupts for flash writes; (2) software processing delays where the receiver must parse, validate, or store data before it can accept more, and the processing time varies unpredictably.
The alternative is software flow control (XON/XOFF), where the receiver sends a special byte (0x13 = XOFF) to pause the transmitter and another (0x11 = XON) to resume. This uses no extra pins but is unreliable for binary data because the control bytes might appear in the data stream. It also adds latency — by the time the transmitter processes the XOFF, it may have already sent additional bytes. Hardware flow control reacts within one bit period and works with any data content, making it the preferred choice for reliable high-speed links.
Baud Rate Generation
QHow does the UART baud rate generator work, and how do you calculate the divisor from the peripheral clock?
The baud rate generator divides the peripheral clock (PCLK) by a programmable divisor to produce the bit-rate clock. The basic formula is:
Divisor = PCLK / (Oversampling x BaudRate)
Most UART peripherals oversample each bit 16 times (some support 8x oversampling for higher baud rates at the cost of noise margin). For STM32 at 16x oversampling:
USARTDIV = PCLK / BaudRateBRR register = USARTDIV (integer + fractional parts)
For example, with PCLK = 72 MHz and a target of 115200 baud: USARTDIV = 72000000 / 115200 = 625.0 — an exact integer, so the baud rate error is 0%. But with PCLK = 48 MHz: USARTDIV = 48000000 / 115200 = 416.67, which rounds to 417, producing an actual baud rate of 48000000 / 417 = 115107.9, an error of 0.08% — well within tolerance.
The critical point: always verify the actual baud rate error after choosing a system clock. Some clock/baud-rate combinations produce errors above 2%, which causes communication failures. This is a common embedded systems interview question because it tests whether you understand that UART timing is derived from the system clock and is not perfectly arbitrary.
DMA and Ring Buffers
QWhy is DMA preferred over interrupt-driven UART reception, and how do you set it up?
Interrupt-driven UART reception fires an ISR for every single byte received. At 115200 baud, that is up to 11520 interrupts per second — each one requiring context save/restore, flag checking, and a byte copy. At higher baud rates (921600+), the interrupt overhead can consume a significant fraction of CPU time, especially on lower-end Cortex-M0 cores. Worse, if interrupts are briefly disabled (during a flash erase, for example), bytes are lost to overrun errors.
DMA (Direct Memory Access) solves both problems. The DMA controller transfers received bytes directly from the UART data register into a RAM buffer without CPU intervention. The CPU is only interrupted when the buffer is half-full, completely full, or when the UART detects an idle line — reducing interrupt frequency from thousands per second to a handful.
A typical STM32 setup uses DMA in circular mode with a buffer of 64-256 bytes. The DMA controller writes to the buffer continuously, wrapping around to the beginning when it reaches the end. Firmware tracks a read index (tail pointer) and compares it against the DMA's current write position (obtained from the DMA NDTR register) to determine how many new bytes are available. The idle-line interrupt is critical for protocols where messages are separated by pauses — it tells firmware "the transmitter stopped sending, process what you have" even if the buffer is not full.
// Pseudo-code: checking for new data in DMA circular bufferuint16_t dma_head = BUFFER_SIZE - DMA1_Channel5->CNDTR;while (uart_tail != dma_head) {process_byte(rx_buffer[uart_tail]);uart_tail = (uart_tail + 1) % BUFFER_SIZE;}
QWhat is the ring buffer pattern for UART, and why is it essential?
A ring buffer (circular buffer) is a fixed-size array with two indices — a head (write pointer) and a tail (read pointer) — that wrap around when they reach the end. The ISR writes incoming bytes at the head, and the main loop (or a processing task) reads bytes from the tail. As long as the consumer keeps up with the producer, the buffer absorbs bursts without data loss.
#define BUF_SIZE 128 // Must be power of 2 for fast modulostatic volatile uint8_t buf[BUF_SIZE];static volatile uint16_t head = 0; // Written by ISRstatic volatile uint16_t tail = 0; // Read by main loopvoid UART_IRQHandler(void) {buf[head] = UART->DR;head = (head + 1) & (BUF_SIZE - 1); // Fast wrap}int uart_read(uint8_t *byte) {if (tail == head) return -1; // Empty*byte = buf[tail];tail = (tail + 1) & (BUF_SIZE - 1);return 0;}
The ring buffer is essential because it decouples the interrupt timing from the application processing timing. Without it, the ISR must either process each byte immediately (making it long and blocking) or use a single-byte holding register (which overruns if the main loop is delayed by even one byte period). The buffer size should be chosen based on the maximum burst length and the worst-case processing latency. Making the size a power of two allows the modulo operation to be replaced by a bitmask AND, which is a single-cycle operation on ARM.
A common trap: forgetting to declare head and tail as volatile. Without volatile, the compiler may cache the value of head in a register inside the main loop and never re-read it from memory, so the main loop never sees new data arriving from the ISR.
Idle Line Detection
QWhat is UART idle line detection and why is it important for packet-based protocols?
An idle line condition occurs when the UART RX line remains high (idle) for one full frame duration after the last received byte. Most modern UART peripherals (STM32, NXP, TI) can generate an interrupt when this condition is detected. This is distinct from simply "not receiving data" — the hardware specifically measures the silence after the last stop bit.
Idle line detection is critical for packet-based protocols where messages have variable length and no fixed delimiter. Consider a Modbus RTU frame: it is terminated by a silence of at least 3.5 character times. Without idle detection, firmware would need to implement a software timer that resets on every received byte and fires when no byte arrives within the timeout — this is fragile, wastes a hardware timer, and has poor resolution at high baud rates.
With idle line detection, the workflow becomes clean: DMA fills a buffer with incoming bytes continuously, and the idle interrupt signals "the sender paused — the current message is complete." Firmware then processes the accumulated bytes as a complete frame. This pattern (DMA + idle line interrupt) is the standard approach for UART reception in production embedded firmware. It handles variable-length messages, minimizes CPU overhead, and is robust against timing jitter.
UART vs RS-485
QWhat is the difference between UART, RS-232, and RS-485? When would you choose RS-485?
UART refers to the digital logic block inside the MCU — it handles framing, baud rate generation, and parallel-to-serial conversion. RS-232 and RS-485 are electrical standards that define how UART's logical signals are converted to physical voltages on a cable.
RS-232 uses single-ended signaling with voltage swings of +/- 3V to +/- 15V (inverted logic: negative voltage = logic 1). It supports point-to-point communication over cables up to about 15 meters. A level shifter IC (MAX232, MAX3232) converts between UART TTL levels (0/3.3V) and RS-232 voltages.
RS-485 uses differential signaling on a twisted pair (lines A and B). Because the receiver looks at the voltage difference between A and B rather than the absolute voltage, common-mode noise is rejected. This enables communication over distances up to 1200 meters at lower baud rates, or 100+ meters at 1 Mbps. RS-485 also supports multi-drop configurations with up to 32 (or 256 with enhanced transceivers) devices on a single bus — unlike UART/RS-232, which are strictly point-to-point.
Choose RS-485 for industrial environments (factory floors, building automation, sensor networks) where: (1) cable runs exceed a few meters, (2) electrical noise from motors or power supplies is present, (3) multiple devices must share a single bus, or (4) galvanic isolation is needed (RS-485 transceivers with integrated isolation are common). The MCU's UART peripheral is used unchanged — only the physical transceiver IC differs. Firmware must manage the transceiver's DE (Driver Enable) pin to switch between transmit and receive on the half-duplex bus, typically toggling it in the TX-complete ISR.
UART vs USART
QWhat is the difference between UART and USART, and when would you use synchronous mode?
UART (Universal Asynchronous Receiver-Transmitter) supports only asynchronous communication — no clock signal is shared between devices, so both sides must independently agree on the baud rate. USART (Universal Synchronous/Asynchronous Receiver-Transmitter) supports both asynchronous mode (identical to UART) and a synchronous mode where the master outputs a clock signal on a dedicated pin.
In synchronous mode, the USART behaves somewhat like a simplified SPI: the master drives a clock, and data is sampled on the clock edge. This eliminates baud rate mismatch errors entirely, since the receiver locks to the transmitted clock. It also enables higher data rates because there is no oversampling overhead — each clock edge transfers one bit.
Synchronous mode is rarely used in practice because SPI is more widely supported and better standardized for the same use case. However, it is useful when: (1) you need a simple clocked serial link between two MCUs and have run out of SPI peripherals, (2) you want the framing features of UART (start/stop bits, parity) combined with clock synchronization, or (3) a specific peripheral (some smart card interfaces use synchronous UART) requires it. Most modern STM32 MCUs label their serial peripherals as USART, but the vast majority of applications use them in asynchronous (UART) mode.