USB Protocol — Interview Questions & Answers

USB Fundamentals

QExplain the USB enumeration process step by step. What happens from the moment you plug in a device?

USB enumeration is the process by which the host discovers, identifies, and configures a newly connected device. It is entirely host-driven — the device only responds to the host's requests. The process follows a strict sequence:

1. Connection detection: When a device is plugged in, it pulls one of the data lines high through a 1.5K pull-up resistor — D+ for full-speed and high-speed devices, D- for low-speed devices. The hub (or root hub in the host controller) detects this voltage change and reports a port status change to the host. 2. Reset: The host issues a USB reset by driving both D+ and D- low (SE0 state) for at least 10 ms. This puts the device in the Default state with address 0. 3. Get Device Descriptor (first 8 bytes): The host sends a GET_DESCRIPTOR request to address 0, endpoint 0, requesting only the first 8 bytes of the device descriptor. This reveals the maximum packet size for endpoint 0 (bMaxPacketSize0), which the host needs before it can send full-length control transfers. 4. Second reset: The host issues another reset (required by the USB specification to ensure clean state). 5. Set Address: The host assigns a unique address (1-127) using SET_ADDRESS. The device transitions to the Addressed state. 6. Get Full Device Descriptor: The host reads the complete 18-byte device descriptor at the new address, obtaining the VID, PID, device class, and number of configurations. 7. Get Configuration Descriptors: The host requests each configuration descriptor, which returns a hierarchy of configuration, interface, and endpoint descriptors. 8. Set Configuration: The host selects a configuration using SET_CONFIGURATION. The device transitions to the Configured state and is ready for data transfer.

The entire enumeration process takes 100 ms to several seconds depending on the OS. A common interview follow-up: "What happens if enumeration fails?" The host retries the reset and descriptor requests up to three times. If all attempts fail, the OS reports "Unknown USB device" or "Device descriptor request failed" — the most common symptom of incorrect descriptor data, a broken pull-up resistor, or signal integrity issues on the D+/D- lines.

QWhat are USB descriptors and what information do they contain? Describe the descriptor hierarchy.

USB descriptors are structured data blocks that a device provides to the host during enumeration. They form a tree-like hierarchy that describes every aspect of the device's capabilities and requirements. The host uses this information to load the correct driver and allocate bus bandwidth.

Device Descriptor (18 bytes, one per device): Contains the USB specification version (bcdUSB), device class/subclass/protocol codes, VID (Vendor ID) and PID (Product ID), maximum packet size for endpoint 0, number of configurations, and manufacturer/product/serial number string indices. The VID/PID pair is what the OS uses to match the device to a driver. The device class field can be 0x00 (class defined at interface level — most common for composite devices), or a specific class code (0x02 for CDC, 0xEF for miscellaneous composite).

Configuration Descriptor (9 bytes, one or more per device): Describes a specific configuration the device can operate in. Most devices have exactly one configuration. Contains the total length of all subordinate descriptors (wTotalLength — critical for the host to know how many bytes to read), number of interfaces, power requirements (bMaxPower in 2 mA units), and self-powered/remote-wakeup attributes. When the host sends GET_DESCRIPTOR for a configuration, the device returns the configuration descriptor followed by all its interface and endpoint descriptors concatenated into a single response of wTotalLength bytes.

Interface Descriptor (9 bytes): Describes a functional group of endpoints — one "function" of the device. A composite device (e.g., a USB keyboard with a built-in hub) has multiple interfaces, each potentially using a different device class. Contains the interface number, alternate setting number, class/subclass/protocol, and number of endpoints. Endpoint Descriptor (7 bytes): Describes a single endpoint — its address (number + direction), transfer type (control, bulk, interrupt, isochronous), maximum packet size, and polling interval (for interrupt and isochronous endpoints). Endpoint 0 is always a bidirectional control endpoint and has no explicit descriptor — its properties are defined in the device descriptor.

The hierarchy is: Device contains Configurations, each Configuration contains Interfaces, each Interface contains Endpoints. String descriptors provide human-readable names referenced by index from other descriptors. Class-specific descriptors (e.g., HID report descriptors, CDC functional descriptors) are inserted between interface and endpoint descriptors.

QWhat is the difference between control, bulk, interrupt, and isochronous transfer types? When do you use each?

USB defines four transfer types, each optimized for a different data pattern. The choice is made at design time and declared in the endpoint descriptor — you cannot change the transfer type at runtime.

Control transfers are the only type that uses the setup-data-status three-phase transaction. They are used for device configuration (enumeration requests like GET_DESCRIPTOR, SET_ADDRESS) and class-specific commands (e.g., CDC SET_LINE_CODING). Every device must support control transfers on endpoint 0. Control transfers are reliable (hardware retries on error) and guaranteed a portion of bus bandwidth, but they are relatively slow and have high protocol overhead per transaction. Maximum payload per transaction: 8 bytes (low-speed), 64 bytes (full-speed), 512 bytes (high-speed).

Bulk transfers move large amounts of data with guaranteed delivery but no guaranteed timing. The host controller schedules bulk transfers in whatever bandwidth remains after control, interrupt, and isochronous transfers are served. Bulk transfers use error detection (CRC16) and automatic retry — data is never lost, but latency is unpredictable. Use bulk for mass storage (USB flash drives, MSD class), printers, and file transfers. Available only at full-speed (64 bytes max per packet) and high-speed (512 bytes max per packet) — not available at low-speed.

Interrupt transfers guarantee a maximum latency — the host polls the device at a regular interval specified in the endpoint descriptor (bInterval, 1-255 ms for full-speed, 1-16 microframes at high-speed). Despite the name, USB interrupt transfers are polled, not truly interrupt-driven — the host initiates every transaction. They are reliable (CRC + retry) and latency-bounded, making them ideal for HID devices (keyboards, mice, game controllers) and infrequent status updates. Maximum payload: 8 bytes (low-speed), 64 bytes (full-speed), 1024 bytes (high-speed).

Isochronous transfers guarantee bandwidth and timing but not delivery — there is no retry on error. A corrupted packet is simply dropped. This tradeoff is correct for real-time streaming (audio, video) where a retransmitted packet arriving late is worse than a dropped packet. Isochronous endpoints reserve a fixed fraction of bus bandwidth each frame (1 ms at full-speed, 125 microseconds at high-speed). Maximum payload: 1023 bytes (full-speed), 1024 bytes (high-speed, up to 3 per microframe). Not available at low-speed. The lack of retry means isochronous applications must tolerate occasional errors — audio codecs interpolate, video decoders conceal artifacts.

QWhat are USB device classes and name the common ones used in embedded systems?

USB device classes are standardized specifications published by the USB-IF (Implementers Forum) that define the protocol and behavior for categories of devices. When a device declares a recognized class code in its interface or device descriptor, the host OS can load a built-in class driver — no vendor-specific driver installation is required. This is why you can plug a USB keyboard or flash drive into any computer and it works immediately.

The most commonly used classes in embedded systems are: CDC (Communications Device Class, class code 0x02) — used to implement virtual serial ports (CDC-ACM subclass). This is the go-to class for embedded developers who need a simple debug/command interface between an MCU and a PC. The host sees a COM port (Windows) or /dev/ttyACM device (Linux), and firmware sends/receives data through bulk endpoints. CDC requires two interfaces (a communication interface with an interrupt endpoint for notifications, and a data interface with bulk IN and bulk OUT endpoints). HID (Human Interface Device, class code 0x03) — originally designed for keyboards and mice, HID is widely used in embedded for any low-bandwidth bidirectional communication that needs to work without driver installation. HID uses interrupt transfers and report descriptors that define the data format. Custom HID devices can send up to 64 bytes per report at full-speed, making them suitable for sensor data, configuration tools, and firmware update utilities. MSC (Mass Storage Class, class code 0x08) — makes the device appear as a removable disk drive. Used in embedded systems for data logging (expose an SD card as a USB drive), firmware update (user copies a binary file to the "drive"), and configuration file exchange. MSC uses bulk transfers and the SCSI transparent command set (BOT — Bulk Only Transport).

Other classes encountered in embedded: Audio (0x01) for USB microphones and speakers, Video (0x0E) for USB cameras, DFU (Device Firmware Upgrade) for in-field firmware updates via USB, and Vendor Specific (0xFF) when no standard class fits and you provide your own host driver. The key interview point: using a standard class eliminates the need for custom host drivers, dramatically simplifying deployment and cross-platform support.

Embedded USB

QUSB device mode vs USB host mode — which does a typical MCU implement and why?

The vast majority of MCUs implement USB device mode only. In device mode, the MCU acts as a peripheral that is discovered, enumerated, and controlled by a host (typically a PC, laptop, or smartphone). This is the natural role for an embedded system — when you plug your custom sensor board into a laptop, the MCU is the device and the laptop is the host. Device mode requires significantly simpler hardware and software: the MCU's USB peripheral contains a Serial Interface Engine (SIE) that handles the low-level protocol, a set of endpoint buffers, and transceiver logic. The firmware responds to host requests rather than initiating them.

USB host mode is far more complex because the host is responsible for all bus management: detecting device connections, issuing resets, running the enumeration state machine, scheduling transactions across all connected devices, managing bandwidth allocation, handling hub topology, and providing bus power (VBUS at 5V, up to 500 mA per port). This requires a host controller (OHCI, EHCI, or xHCI), a device driver stack, and substantially more RAM for descriptor parsing and transfer management. Few MCUs include host-capable USB peripherals, and those that do (STM32F4/F7/H7, NXP LPC, some TI Sitara) are higher-end parts with more memory and processing power.

The practical consequence: if your embedded device needs to connect to a PC for data transfer, debug output, or firmware updates, you need device mode — and every MCU with USB supports it. If your embedded device needs to read a USB flash drive, connect to a USB barcode scanner, or act as a USB host for other peripherals, you need host mode — and you must select an MCU that specifically supports it, bring a USB host stack (like TinyUSB in host mode or the STM32 USB Host Library), and provide VBUS power circuitry. Host mode is uncommon in deeply embedded systems and more typical of embedded Linux platforms where a full USB stack is available.

QHow does USB CDC-ACM work as a virtual serial port?

CDC-ACM (Abstract Control Model) is a subclass of the USB Communications Device Class that emulates a serial port over USB. It allows an embedded device to appear as a standard COM port on the host PC, enabling existing serial terminal software (PuTTY, minicom, screen) and serial libraries (pyserial) to communicate with the device without any custom driver — the operating system's built-in CDC-ACM driver handles everything.

The CDC-ACM device presents two USB interfaces to the host. Interface 0 (Communication Class Interface) carries management commands and notifications. It includes a single interrupt IN endpoint used to send serial state notifications (e.g., DCD, DSR, ring indicator changes) to the host — many embedded implementations leave this endpoint unused and never send notifications. This interface also includes CDC-specific functional descriptors: the Header Functional Descriptor, Call Management Functional Descriptor, ACM Functional Descriptor (declares which line coding and control signal features are supported), and the Union Functional Descriptor (links the communication and data interfaces together). Interface 1 (Data Class Interface) carries the actual serial data. It has two bulk endpoints: a bulk IN endpoint (device-to-host, the device's "TX") and a bulk OUT endpoint (host-to-device, the device's "RX").

From the firmware perspective, the MCU's USB stack handles enumeration and class requests automatically. The two class-specific control requests that matter are: SET_LINE_CODING (the host tells the device the desired baud rate, stop bits, parity, and data bits — most embedded implementations accept any values and ignore them since the communication is USB, not actual UART) and SET_CONTROL_LINE_STATE (the host signals DTR and RTS — firmware often uses the DTR assertion as a "port opened" signal to start sending data). For actual data transfer, firmware writes bytes into the bulk IN endpoint buffer, and the USB peripheral transmits them when the host polls. Incoming data from the host arrives in the bulk OUT endpoint buffer, and firmware reads it. The throughput is determined by USB, not by the "baud rate" set via SET_LINE_CODING — at full-speed USB, CDC-ACM can achieve approximately 1 MB/s, regardless of whether the host configures 9600 or 115200 baud.

QWhat is USB OTG and when would an embedded device need it?

USB OTG (On-The-Go) is a supplement to the USB specification that allows a single USB port to function as either a host or a device, with the role determined dynamically. Standard USB has rigid roles: a device is always a device and a host is always a host. OTG relaxes this constraint, enabling two OTG-capable devices to negotiate which one acts as the host for a given session.

The role negotiation uses the ID pin on the USB micro-AB or Type-C connector. When a cable with a micro-A plug is inserted (ID pin grounded), the port assumes the host role and supplies VBUS power. When a micro-B plug is inserted (ID pin floating), the port assumes the device role. OTG also defines the Host Negotiation Protocol (HNP), which allows the two connected devices to swap roles without physically disconnecting — for example, a digital camera that initially connects as a device to a PC can switch to host mode to talk to a printer directly. The Session Request Protocol (SRP) allows a device to request that the host turn on VBUS, enabling power-saving scenarios where VBUS is not continuously supplied.

An embedded device needs OTG in these scenarios: (1) A device that must connect to both PCs and peripherals — a handheld data terminal that acts as a USB device when docked with a PC (for data sync) but acts as a USB host when connected to a barcode scanner or printer in the field. (2) Direct device-to-device communication — two embedded devices sharing data without a PC intermediary, such as a camera printing directly to a PictBridge printer. (3) Debug and manufacturing versatility — a product that normally connects as a device to user PCs but switches to host mode on the factory floor to read test configuration from a USB flash drive.

The implementation cost is significant: the MCU needs an OTG-capable USB peripheral (not just device-mode), firmware must include both a device stack and a host stack (doubling the USB code footprint), and the hardware must include VBUS power supply circuitry with current limiting and overcurrent protection. For most embedded products, pure device mode is sufficient, and OTG is overkill. OTG is most common in Android smartphones, tablets, and feature-rich industrial handhelds.

QHow do you debug USB enumeration failures on an embedded device?

USB enumeration failures — where the device is not recognized by the host, shows as "Unknown USB Device," or disconnects during enumeration — are among the most frustrating embedded debugging problems because the failure happens in a fast, multi-step protocol with no serial console output available (the USB link itself is what you are trying to bring up). The debugging approach combines host-side diagnostics, protocol analysis, and systematic firmware verification.

Host-side diagnostics (start here): On Linux, run dmesg -w while plugging in the device — the kernel logs every enumeration step and the exact point of failure (e.g., "device descriptor read/64, error -71" means the device did not respond to the first GET_DESCRIPTOR). On Windows, use USBView (from the Windows SDK) or USBTreeView to inspect the device's descriptors if enumeration partially succeeded, or check Device Manager for error codes (Code 43 means the device was rejected after enumeration). On macOS, use system_profiler SPUSBDataType or the USB Prober utility. These tools are free and available immediately — always check them before reaching for a protocol analyzer.

Protocol analysis: A hardware USB protocol analyzer (Total Phase Beagle, Ellisys, or LeCroy) captures every packet on the bus, showing exactly what the host sent and what the device responded (or failed to respond). This is the definitive debugging tool for USB — it removes all guesswork. For budget-constrained debugging, a logic analyzer (Saleae Logic, Sigrok-compatible) with a USB decoder can capture low-speed and full-speed traffic, though it cannot handle high-speed. Software-based capture (Wireshark with USBPcap on Windows, or usbmon on Linux) captures traffic from the host's perspective and is free.

Common root causes to check systematically: (1) Incorrect descriptors — wTotalLength in the configuration descriptor does not match the actual concatenated length of all subordinate descriptors; endpoint addresses collide; missing CDC functional descriptors; string descriptor indices point to nonexistent strings. Verify your descriptor tables byte-by-byte against the USB specification. (2) Pull-up resistor — the 1.5K pull-up on D+ (full-speed) must be present and correctly valued. Some MCUs have an internal pull-up enabled by software; if your initialization code does not enable it, the host never detects the device. (3) Clock accuracy — USB requires a clock tolerance of 0.25% (2500 ppm) or better. An internal RC oscillator (typically 1-2% accuracy) is not sufficient for USB — you need a crystal or MEMS oscillator. Many enumeration failures on STM32 are traced to attempting USB with the HSI (internal RC) instead of HSE (external crystal). (4) VBUS detection — some USB peripherals require VBUS sensing to be configured before enabling the pull-up; if the VBUS pin is not connected or not configured, the device never starts enumeration. (5) Endpoint 0 response timing — the device must respond to setup packets within 500 ms. If your firmware has a long initialization sequence before the USB stack starts, the host times out.