SPI Protocol — Interview Questions & Answers

Basics and Architecture

QWhat is SPI and what are its main advantages over I2C and UART?

SPI (Serial Peripheral Interface) is a synchronous, full-duplex serial protocol with a master-slave architecture using four signals: SCK (clock), MOSI (Master Out Slave In), MISO (Master In Slave Out), and CS/SS (Chip Select). The master generates the clock and selects which slave to communicate with by asserting its chip select line.

SPI's primary advantages stem from its simplicity and speed. (1) High throughput — SPI clock rates commonly reach 10-50 MHz, with some devices supporting 100+ MHz. There is no addressing overhead, no ACK/NACK handshaking, and no start/stop bit framing — every clock cycle transfers one bit of useful data in each direction. (2) Full duplex — MOSI and MISO operate simultaneously, so the master sends a command while receiving the response to the previous command. This pipelining doubles effective throughput for protocols that exploit it. (3) Simple hardware — at its core, SPI is just two shift registers connected in a ring. A slave can be implemented with a single shift register IC and a few gates, making it trivial to bit-bang in software.

The cost of this simplicity is that SPI has no built-in error detection (no ACK, CRC, or parity), no standard for multi-master operation, and requires one dedicated CS pin per slave — which becomes a real constraint when connecting many peripherals.

QExplain CPOL and CPHA. What are the four SPI modes, and how do you choose the right one?

CPOL (Clock Polarity) and CPHA (Clock Phase) define when data is driven onto the bus and when it is sampled. Getting them wrong is the number-one cause of "SPI isn't working" bugs.

CPOL sets the clock's idle state: CPOL=0 means SCK idles low; CPOL=1 means SCK idles high. CPHA determines which clock edge is used for sampling data: CPHA=0 means data is sampled on the first (leading) edge and shifted out on the second (trailing) edge; CPHA=1 means data is sampled on the second (trailing) edge and shifted out on the first (leading) edge.

The four combinations yield four SPI modes:

Mode	CPOL	CPHA	SCK Idle	Data Sampled On	Data Shifted On
0	0	0	Low	Rising edge	Falling edge
1	0	1	Low	Falling edge	Rising edge
2	1	0	High	Falling edge	Rising edge
3	1	1	High	Rising edge	Falling edge

You choose the mode by consulting the slave device's datasheet — it specifies which mode(s) the device supports. Most sensors and flash chips use Mode 0 or Mode 3 (sample on rising edge). The master must be configured to match. If you have multiple slaves on the same bus that require different modes, you must reconfigure the master's SPI peripheral each time you switch between them (before asserting the next CS).

A common interview trap: candidates memorize the table but cannot explain why mode matters. The answer is setup and hold time — data must be stable on the bus for a minimum time before and after the sample edge. If the master drives data on the same edge it samples, the slave sees data that is still transitioning, and communication fails.

Chip Select and Multi-Slave Topologies

QHow does chip select (CS) work, and what are the timing requirements around it?

Chip select is a dedicated active-low signal from the master to each slave. When CS is driven low, the slave is selected: it enables its MISO output driver and begins responding to clock transitions. When CS is high, the slave ignores SCK and MOSI, and its MISO output enters high-impedance (tri-state) to avoid bus contention.

Timing around CS is critical and often overlooked. Most SPI slave devices specify two timing parameters in their datasheets: (1) CS setup time — the minimum time CS must be asserted (low) before the first SCK edge, typically 5-50 ns. If you violate this, the slave is not ready and misses the first bit. (2) CS hold time — the minimum time CS must remain asserted after the last SCK edge, before being de-asserted. Violating this can cause the slave to discard the last byte.

Additionally, many devices require a minimum CS high time between transactions — the time CS must remain de-asserted before it can be asserted again. Flash memories, for example, often need 50-100 ns of CS high time to latch the previous command. At SPI clock speeds above 10 MHz, firmware that de-asserts and immediately re-asserts CS may violate this timing. Inserting a few NOPs or using a GPIO toggle with a brief delay between transactions solves this.

When using hardware-managed CS (where the SPI peripheral controls the CS pin automatically), verify that the peripheral's behavior matches the slave's requirements. Some MCU SPI peripherals de-assert CS between every byte by default, which breaks multi-byte transactions for slaves that expect CS to remain low for the entire command sequence. In such cases, use software-controlled CS (a regular GPIO pin) for more precise control.

QWhat happens if two SPI slaves are selected simultaneously, and how do you prevent it?

If two CS lines are asserted at the same time, both slaves enable their MISO output drivers and drive the shared MISO line simultaneously. If one drives high while the other drives low, a short-circuit current flows through both output stages. This corrupts the data the master reads, and in the worst case, the sustained contention current can damage the output transistors of one or both slave devices.

Prevention is straightforward in firmware: always de-assert the current slave's CS before asserting the next slave's CS. In safety-critical systems, add a small delay between de-assertion and re-assertion to account for CS propagation delay through any buffers or level shifters.

A subtler problem occurs with hardware-managed CS on some MCUs. If the SPI peripheral is configured to manage CS automatically, switching the CS GPIO to a different slave while a transaction is in progress can briefly assert both. The safe approach is to complete the current transaction, wait for the SPI busy flag to clear, de-assert CS in software, then configure the new slave's CS pin. Another defensive technique is to use external logic (a decoder or mux) so that only one CS can be active at a time by design, regardless of firmware bugs.

QHow does SPI daisy-chaining work, and what are its trade-offs?

In daisy-chain topology, all slaves share a single CS line. The MISO output of the first slave connects to the MOSI input of the second slave, and so on, forming a serial shift-register chain. The master's MOSI connects to the first slave's input, and the last slave's MISO connects back to the master's MISO input.

To communicate, the master must clock out N x word_size bits, where N is the number of slaves in the chain. Data shifts through each slave sequentially: the first word clocked out reaches the last slave, and the last word stays in the first slave. All slaves latch their data simultaneously on the CS rising edge.

The primary advantage is pin savings — only one CS line is needed regardless of the number of slaves. This is particularly valuable for LED driver chains (like WS2812 or APA102 strings), shift-register cascades, and DAC chains where many identical devices are connected. The disadvantages are significant: (1) increased latency — you cannot address a single slave without clocking through the entire chain; (2) increased transaction length — every communication involves all slaves, even if only one needs updating; (3) single point of failure — if any slave in the chain fails, all downstream devices become unreachable; (4) no readback from specific devices — the data that comes back on MISO is a concatenation of all slaves' outputs, shifted by the chain position.

Daisy-chaining is appropriate when all devices are identical and updated together (LED arrays, shift register chains) but poor for mixed peripherals where you need to address individual devices independently.

SPI vs I2C

QWhen would you choose SPI over I2C, and when would I2C be the better choice?

Choose SPI when: (1) High data throughput is the priority — SPI clocks at 10-50 MHz vs I2C's typical 400 kHz (fast mode) or 1 MHz (fast mode plus). For applications like reading a high-speed ADC, streaming audio data, accessing SPI NOR flash, or driving a TFT display, I2C is simply too slow. (2) Full-duplex communication is needed — I2C is inherently half-duplex on a single data line. (3) Low latency matters — SPI has no addressing phase, no ACK overhead, and no arbitration delay. The first clock edge transfers data. (4) Only a small number of peripherals are present — two or three SPI slaves are manageable; a dozen would consume too many CS pins.

Choose I2C when: (1) Pin count is constrained — I2C uses exactly two wires (SDA, SCL) regardless of how many devices are connected, vs SPI's 3 + N wires (MOSI, MISO, SCK, plus one CS per slave). (2) Many low-speed peripherals share the bus — temperature sensors, EEPROMs, real-time clocks, port expanders, and fuel gauges are all I2C devices that communicate infrequently and at low data rates. (3) Multi-master support is required — I2C has built-in arbitration; SPI does not. (4) Hot-plugging or dynamic device discovery is needed — I2C devices can be enumerated by scanning addresses.

A practical rule of thumb: use SPI for anything that needs bandwidth (flash, displays, high-speed sensors) and I2C for everything that needs convenience (configuration, status monitoring, low-speed peripherals). Many embedded designs use both on the same board.

Data Integrity and Error Handling

QSPI has no built-in error detection. How do you ensure data integrity?

SPI's shift-register design provides no ACK/NACK, no CRC, and no parity — a corrupted byte is silently accepted. This means error detection must be implemented at a higher layer, and the strategy depends on the device and application.

Read-back verification: After writing a configuration register, read it back and compare. This catches both SPI transmission errors and device-side failures. Many SPI devices (accelerometers, flash chips) support this pattern. The overhead is one extra transaction per write.

Application-layer CRC or checksum: For bulk data transfers (reading sensor FIFOs, flash pages), compute a CRC over the received data and compare it against a CRC provided by the device. Some SPI flash chips (Winbond W25Q series) include a CRC in their read responses. For devices that do not, the application protocol can add one.

Write-then-read pattern with known values: Before a critical transaction, send a "who am I" or "read device ID" command. If the response matches the expected value, the SPI link is functional. This is a common health-check pattern in initialization code.

Hardware considerations: At high SPI clock speeds (above 10-20 MHz), signal integrity becomes the dominant error source. Keep traces short (under 10 cm), add series termination resistors (22-33 ohm) on SCK and MOSI to reduce ringing, use ground planes, and avoid routing SPI traces near switching power supply inductors. Decoupling capacitors (100 nF) close to each slave's VDD pin are essential. If you see intermittent data corruption at high speeds, the first debugging step is to reduce the SPI clock frequency — if errors disappear, the problem is signal integrity, not firmware.

DMA with SPI

QHow is DMA used with SPI, and what are the common pitfalls?

SPI is one of the most natural peripherals to pair with DMA because its transactions are predictable, fixed-length shift operations. The DMA controller feeds bytes from a memory buffer into the SPI transmit data register and simultaneously moves received bytes from the SPI receive register into another memory buffer — both without CPU involvement. This frees the CPU during large transfers (flash reads, display updates, ADC FIFO drains) and enables true concurrent operation.

A typical STM32 setup configures two DMA channels (or streams): one for TX (memory-to-peripheral) and one for RX (peripheral-to-memory). You set the buffer addresses, the transfer count, enable the DMA channels, and the SPI peripheral generates DMA requests automatically. When the transfer completes, the DMA triggers an interrupt so firmware can process the result, de-assert CS, and start the next transaction.

Common pitfalls include:

Forgetting the dummy TX buffer for read-only transfers. SPI is full-duplex — to clock data in, you must clock data out. For a read-only DMA transfer, the TX DMA channel must still be active, sending dummy bytes (typically 0xFF or 0x00). Some developers forget this and wonder why no clock is generated.

Cache coherency on Cortex-M7. The M7's data cache means DMA may write received data to SRAM, but the CPU reads a stale cached copy. Either place DMA buffers in a non-cacheable memory region, or invalidate the D-cache before reading the buffer after the transfer completes.

CS management. DMA handles the data transfer but does not manage chip select. Firmware must assert CS before starting the DMA and de-assert it in the DMA transfer-complete ISR. If CS is de-asserted too early (e.g., immediately after starting the DMA, before the transfer finishes), the slave aborts the transaction.

Signal Integrity at High Speeds

QWhat signal integrity issues arise when running SPI at high clock speeds, and how do you address them?

At SPI clock frequencies above 10-20 MHz, the electrical behavior of PCB traces becomes significant. Several problems emerge:

Ringing and overshoot: Fast edge rates (sub-nanosecond rise times on modern MCUs) cause impedance mismatches at trace-to-pad transitions, vias, and connectors. The resulting ringing can create false clock edges that cause the slave to shift extra bits, corrupting the entire frame. The fix is series termination — a 22-33 ohm resistor placed close to the master's output pins on SCK and MOSI. This resistor, combined with the trace impedance, dampens reflections.

Crosstalk: SPI signals routed in parallel on adjacent PCB layers or tracks can capacitively couple. A fast edge on SCK can induce a glitch on MISO that gets sampled as a wrong bit. Mitigation: route SPI traces with adequate spacing (at least 2x the trace width), use ground planes between signal layers, and avoid running SPI traces parallel to other high-speed signals for long distances.

Propagation delay and skew: At high speeds, the propagation delay through traces, level shifters, or buffers becomes a significant fraction of the clock period. If MISO data arrives at the master too late relative to the sampling clock edge, setup time is violated. This is why SPI flash datasheets specify a maximum clock frequency that decreases as capacitive load increases. For very high-speed SPI, some masters support adjustable sample-point delay to compensate for round-trip propagation.

Ground bounce: Simultaneous switching of multiple SPI lines can cause transient shifts in the local ground reference, leading to false logic levels. Adequate decoupling (100 nF + 10 uF) close to both master and slave power pins mitigates this.

The practical debugging approach: if SPI works at 1 MHz but fails at 20 MHz, the problem is almost certainly signal integrity. Probe the signals with an oscilloscope (use a short ground lead, not the alligator clip) and look for ringing, overshoot, and insufficient voltage margins.

SPI Disadvantages and Limitations

QWhat are the key disadvantages and limitations of SPI?

SPI's simplicity comes with real limitations that can make it the wrong choice for certain designs:

Pin count scales with slave count. Each additional slave requires a dedicated CS line from the master. With 8 slaves, you need 11 GPIO pins (SCK, MOSI, MISO, plus 8 CS lines). If GPIO is scarce, this becomes untenable. Workarounds include using a decoder IC (3-to-8 decoder turns 3 GPIOs into 8 CS lines) or daisy-chaining, but both add complexity.

No built-in error detection. Unlike I2C (ACK/NACK) and CAN (CRC + 5 error mechanisms), SPI has zero error-detection capability at the protocol level. Every byte is accepted as valid. Application-layer checksums or read-back verification must be added manually.

Single master only. The SPI specification does not define multi-master arbitration. If two masters drive SCK simultaneously, the bus is corrupted. In systems that genuinely need multiple bus masters, I2C or CAN is a better choice.

No formal standard. Unlike I2C (NXP specification) and CAN (ISO 11898), SPI has no official standard document. Different vendors implement variations: some devices are MSB-first, others LSB-first; word sizes range from 8 to 32 bits; some devices use CPOL/CPHA Mode 0, others Mode 3. Each device's datasheet is the only authoritative reference, and you must verify compatibility for every new part.

Short distance only. SPI uses single-ended signaling referenced to a shared ground. Over cables longer than 10-20 cm, signal integrity degrades rapidly due to ground bounce, crosstalk, and capacitive loading. For board-to-board communication over cables, consider LVDS-SPI or switch to a differential protocol like CAN or RS-485.

QHow do you determine the maximum SPI clock speed for a given system?

The maximum achievable SPI clock speed is the minimum of several independent constraints, and finding it requires checking each one:

Slave device limit. The datasheet specifies the maximum SCK frequency the slave supports — often as a function of supply voltage (e.g., 20 MHz at 3.3V, 10 MHz at 1.8V). This is the absolute ceiling.

Master peripheral prescaler. The MCU's SPI peripheral divides the APB clock by a power-of-two prescaler (2, 4, 8, 16, ...). If the APB clock is 72 MHz, the available SPI clock rates are 36, 18, 9, 4.5 MHz, etc. You must choose the highest prescaler setting that keeps the SPI clock at or below the slave's maximum.

PCB trace length and capacitance. Longer traces and higher capacitance slow edge rates and increase propagation delay. A rule of thumb: if the round-trip propagation delay (master to slave and back) exceeds 25% of the clock period, setup/hold time violations are likely. For a 10 cm trace on FR4 (delay ~0.5 ns/cm), the round-trip is about 10 ns — fine for 10 MHz (100 ns period) but marginal for 50 MHz (20 ns period).

Level shifters in the signal path. Bidirectional level shifters (like TXB0104) add 2-10 ns of propagation delay per direction. At high SPI speeds, this delay can violate timing. Dedicated unidirectional level shifters (74LVC series) are faster but require separate components for each direction.

Setup and hold time margins. Even if the slave's maximum frequency is 20 MHz, the actual achievable speed depends on whether the master's output-to-valid delay plus trace propagation delay leaves enough margin for the slave's setup time. Conservative designs run at 50-75% of the theoretical maximum to provide margin.