Networking & Protocols
intermediate
Weight: 4/10

TCP/IP Fundamentals

TCP/IP protocol stack for embedded systems: layers, TCP vs UDP, three-way handshake, flow control, congestion control, sockets API, and lightweight stacks (lwIP).

networking
tcp-ip
tcp
udp
sockets
lwip

Quick Cap

TCP/IP is the protocol stack that connects embedded devices to the internet and local networks. It defines how data is addressed (IP), how it is reliably delivered (TCP), and how it can be sent with minimal overhead when reliability is not needed (UDP). In embedded systems, TCP/IP runs either on a full Linux network stack or on a lightweight stack like lwIP or Zephyr's net subsystem — understanding both contexts is essential.

Interviewers test whether you understand the layer model, can articulate the TCP vs UDP trade-off with embedded-specific reasoning, and can walk through TCP's reliability mechanisms (handshake, retransmission, flow/congestion control) at the conceptual level.

Key Facts:

  • Four layers: Application, Transport (TCP/UDP), Network (IP), Link (Ethernet/WiFi) — not the 7-layer OSI model
  • TCP: Connection-oriented, reliable, ordered delivery with flow control and congestion control — but adds latency and memory overhead
  • UDP: Connectionless, no guarantees — but minimal overhead, ideal for real-time sensor data and discovery protocols
  • Three-way handshake: SYN, SYN-ACK, ACK — establishes sequence numbers and window sizes before data flows
  • MTU: Maximum Transmission Unit — 1500 bytes for Ethernet; packets larger than MTU are fragmented (or dropped if DF bit is set)
  • lwIP: The most common TCP/IP stack for bare-metal and RTOS embedded systems — single-threaded, low RAM footprint

Deep Dive

At a Glance

ConceptDetail
Layer modelApplication, Transport, Network, Link (TCP/IP 4-layer, not OSI 7-layer)
TCPReliable, ordered, connection-oriented, flow + congestion control
UDPUnreliable, unordered, connectionless, minimal overhead
IP addressingIPv4 (32-bit, 4 octets) or IPv6 (128-bit); subnetting with CIDR notation
Port numbers16-bit; well-known (0-1023), registered (1024-49151), ephemeral (49152-65535)
MTU1500 bytes (Ethernet); MSS = MTU - 40 (20 IP + 20 TCP headers)
Embedded stackslwIP, Zephyr net, uIP, picoTCP; Linux uses the full kernel stack

The TCP/IP Layer Model

The TCP/IP model has four layers, not seven. The OSI model is a teaching reference; TCP/IP is what actually runs on every networked device.

TCP/IP LayerOSI EquivalentProtocolsWhat It Does
ApplicationLayers 5-7 (Session, Presentation, Application)HTTP, MQTT, CoAP, DNS, DHCP, mDNS, TLSMessage format, semantics, encryption
TransportLayer 4TCP, UDPEnd-to-end delivery, port multiplexing, reliability
NetworkLayer 3IPv4, IPv6, ICMP, ARPAddressing and routing across networks
LinkLayers 1-2 (Physical, Data Link)Ethernet, WiFi, PPP, cellularFrame delivery on a single physical link

Why does this matter in interviews? Candidates who say "TCP/IP has 7 layers" are confusing it with OSI. The OSI model splits things into 7 layers (Physical, Data Link, Network, Transport, Session, Presentation, Application), but real implementations combine the top three into one Application layer and the bottom two into one Link layer. Knowing this distinction signals that you understand the practical stack, not just a textbook diagram.

What Each Layer Does

Link layer -- Handles delivery of frames between two devices connected by the same physical medium (Ethernet cable, WiFi access point, cellular base station). Each device has a MAC address (48-bit, burned into the NIC). The link layer adds a frame header (source/destination MAC, type field) and a CRC trailer for error detection. In embedded systems, this is the NIC driver -- you configure the MAC/PHY, enable/disable the interface, and handle link-up/link-down events.

Network layer (IP) -- Handles routing packets across multiple networks. Each device has an IP address (32-bit for IPv4, 128-bit for IPv6). The network layer adds an IP header containing source/destination IP, TTL (time to live), protocol field (6=TCP, 17=UDP), and checksum. Routers operate at this layer -- they read the destination IP, look it up in their routing table, and forward the packet to the next hop. ARP (Address Resolution Protocol) resolves an IP address to a MAC address on the local network. ICMP (Internet Control Message Protocol) provides error reporting (e.g., "destination unreachable") and diagnostics (ping).

Transport layer -- Provides end-to-end communication between applications on different hosts. TCP adds sequence numbers, acknowledgments, flow control (window), and congestion control for reliable delivery. UDP adds only source/destination port numbers and a checksum -- minimal overhead for applications that handle their own reliability or don't need it. Port numbers (16-bit) allow multiple applications to share one IP address.

Application layer -- Everything above transport. This is where your embedded application logic lives: MQTT for IoT telemetry, HTTP for REST APIs, CoAP for constrained devices, DNS for name resolution, DHCP for automatic IP configuration. TLS sits between the application and transport layers, encrypting data before TCP segments it.

Data Encapsulation

As data moves down the stack, each layer wraps it with its own header (and sometimes trailer). This is called encapsulation:

px-2 py-1 rounded text-sm font-mono border
Application data: [ MQTT PUBLISH payload ]
|
Transport (TCP): [ TCP header | MQTT PUBLISH payload ]
|
Network (IP): [ IP header | TCP header | MQTT payload ]
|
Link (Ethernet): [ Eth header | IP header | TCP header | MQTT payload | CRC ]

At the receiver, each layer strips its header and passes the payload up -- this is decapsulation. The key insight: each layer only reads its own header and treats everything above it as opaque payload. TCP does not know or care that the payload is MQTT; IP does not know or care that the payload is TCP. This separation is what makes the stack modular -- you can swap Ethernet for WiFi without changing TCP, or swap MQTT for HTTP without changing IP.

TCP vs UDP: When to Use Which

This is the single most asked networking question in embedded interviews.

CriteriaTCPUDP
ReliabilityGuaranteed delivery with retransmissionBest-effort — packets may be lost, duplicated, or reordered
ConnectionConnection-oriented (handshake required)Connectionless (send anytime)
OrderingIn-order delivery guaranteedNo ordering guarantee
Flow controlSliding window prevents receiver overloadNone — sender can flood the receiver
Overhead20-byte header + handshake + ACK traffic8-byte header, no handshake
LatencyHigher (handshake, retransmission waits)Lower (no setup, no waiting for ACK)
RAM usageRequires per-connection buffers (TX + RX)Minimal state
Use in embeddedOTA updates, REST APIs, cloud telemetrySensor broadcasts, NTP, DNS, mDNS, streaming

Embedded rule of thumb: Use TCP when data integrity matters and you can tolerate latency (firmware updates, configuration, cloud MQTT). Use UDP when freshness matters more than completeness (real-time sensor readings, audio/video, device discovery). Many IoT protocols are designed for one or the other: MQTT uses TCP, CoAP uses UDP, HTTP uses TCP.

⚠️Common Trap: TCP on Constrained Devices

TCP requires significant RAM for per-connection send/receive buffers, retransmission queues, and state tracking. On a Cortex-M with 64 KB RAM running lwIP, each TCP connection can consume 4-8 KB. If your device needs to handle 10 simultaneous connections, that is 40-80 KB — possibly more than your total RAM. Always calculate TCP memory requirements before choosing it on constrained devices.

TCP Three-Way Handshake

Every TCP connection begins with a three-way handshake that synchronizes sequence numbers and advertises window sizes:

px-2 py-1 rounded text-sm font-mono border
Client Server
| |
| ---- SYN (seq=x) ------> | 1. Client picks initial sequence number x
| |
| <-- SYN-ACK (seq=y, ack=x+1) | 2. Server picks its own seq y, ACKs client's x
| |
| ---- ACK (ack=y+1) -----> | 3. Client ACKs server's y
| |
| Connection established |

After the handshake, both sides know each other's starting sequence numbers and can begin exchanging data. The sequence numbers are used to detect lost, duplicated, or out-of-order segments.

Connection teardown uses a four-way handshake: FIN, ACK, FIN, ACK. Either side can initiate. The TIME_WAIT state (2x MSL, typically 60-120 seconds) on the initiator ensures delayed packets from the old connection don't corrupt a new one — this can be a problem on embedded servers that restart frequently, causing "address already in use" errors. The fix is SO_REUSEADDR.

💡Interview Tip: Sequence Numbers

When asked "why does TCP use random initial sequence numbers?", the answer is security — predictable ISNs allow an attacker to inject forged packets into a connection. This was a real vulnerability (the Mitnick attack, 1994) that led to ISN randomization in all modern TCP stacks.

TCP Flow Control: Sliding Window

TCP uses a sliding window to prevent the sender from overwhelming the receiver. The receiver advertises a window size (in bytes) in every ACK — this tells the sender how much data it can send before it must wait for another ACK. If the receiver's buffer fills up, it advertises a window of 0, and the sender pauses until the receiver drains its buffer and advertises a nonzero window.

The window size is critical in embedded systems where RAM is limited. A device running lwIP might advertise a TCP window of only 2-4 KB (vs. 64 KB on a desktop). This limits throughput: with a 2 KB window and 100 ms round-trip time, the maximum throughput is 2 KB / 0.1 s = 20 KB/s — regardless of link bandwidth. This is why OTA firmware downloads over TCP on constrained devices are often slow.

TCP Congestion Control

Congestion control prevents TCP from overwhelming the network (as opposed to flow control, which protects the receiver). TCP starts slowly and probes for available bandwidth:

PhaseBehaviorWhen
Slow startCongestion window doubles every RTT (exponential growth)Connection start, after timeout
Congestion avoidanceWindow increases by 1 MSS per RTT (linear growth)After reaching slow-start threshold
Fast retransmitRetransmit immediately after 3 duplicate ACKs (don't wait for timeout)Packet loss detected
Fast recoveryHalve the window but don't restart from slow startAfter fast retransmit

In embedded systems, congestion control matters when devices share a network. A sensor gateway with 50 devices all uploading data over TCP will experience congestion — TCP's backoff mechanism prevents network collapse but also limits per-device throughput. Understanding this trade-off is essential for designing IoT systems that scale.

IP Addressing and Subnetting

IPv4 addresses are 32 bits, written as four decimal octets (e.g., 192.168.1.100). Subnetting divides a network into smaller segments using a subnet mask.

CIDR notation: 192.168.1.0/24 means the first 24 bits are the network portion, leaving 8 bits (256 addresses, 254 usable) for hosts. Common embedded configurations:

CIDRSubnet MaskHostsTypical Use
/24255.255.255.0254Small office, home network
/16255.255.0.065,534Large campus or factory
/30255.255.255.2522Point-to-point link (e.g., MCU to gateway)
/32255.255.255.2551Loopback or host route

Static vs DHCP: Embedded devices often use static IPs for reliability (no dependency on a DHCP server). Devices with web UIs or cloud connectivity typically use DHCP for ease of deployment. Many embedded devices support both — trying DHCP first and falling back to a link-local address (169.254.x.x) if no DHCP server responds.

Sockets API (Brief)

The sockets API is the standard interface for TCP/UDP programming on both Linux and RTOS (via lwIP or Zephyr). The core calls:

c
// TCP server (simplified)
int sock = socket(AF_INET, SOCK_STREAM, 0); // Create TCP socket
bind(sock, &addr, sizeof(addr)); // Bind to port
listen(sock, 5); // Listen for connections
int client = accept(sock, NULL, NULL); // Accept one connection
recv(client, buf, sizeof(buf), 0); // Receive data
send(client, response, len, 0); // Send response
close(client); // Close connection

For UDP, replace SOCK_STREAM with SOCK_DGRAM — no listen/accept needed because UDP is connectionless. Just sendto/recvfrom with the destination address on each call.

Embedded consideration: On lwIP (bare-metal/RTOS), the sockets API is available but runs on a single thread. For high-throughput or multi-connection scenarios, lwIP's "raw API" (callback-based, zero-copy) is more efficient but harder to use.

Lightweight TCP/IP Stacks for Embedded

StackRAM FootprintRTOS SupportNotes
lwIP15-40 KBFreeRTOS, Zephyr, bare-metalMost popular for Cortex-M; sockets + raw API
uIP5-10 KBAnyUltra-minimal, single-connection, for 8/16-bit MCUs
Zephyr net20-50 KBZephyr onlyNative Zephyr subsystem, full-featured
picoTCP15-30 KBAnyModular, good for constrained devices
Linux kernelN/ALinuxFull-featured, runs on embedded Linux (Yocto, Buildroot)

lwIP is the de facto standard for Cortex-M devices. It supports TCP, UDP, DHCP, DNS, ICMP, IPv4/IPv6, and TLS (via mbedTLS). Configuration is done via lwipopts.h — tuning buffer sizes, connection limits, and feature flags to match your RAM budget. A common interview question is "have you used lwIP?" — being able to discuss lwipopts.h tuning, the raw vs sockets API trade-off, and pbuf management shows real-world experience.

MTU, MSS, and Fragmentation

  • MTU (Maximum Transmission Unit) = largest packet the link can carry. Ethernet MTU = 1500 bytes.
  • MSS (Maximum Segment Size) = largest TCP payload per segment = MTU - 20 (IP header) - 20 (TCP header) = 1460 bytes for Ethernet.
  • Fragmentation: If a packet exceeds the path MTU, it can be fragmented by routers. Each fragment gets its own IP header with offset and "more fragments" flag. The receiver reassembles. Fragmentation is expensive (CPU, memory, reassembly timeout) and should be avoided — use Path MTU Discovery (PMTUD) to find the smallest MTU along the path and size packets accordingly. Set the DF (Don't Fragment) bit to get ICMP "fragmentation needed" messages back.
⚠️Common Trap: IPv6 and Fragmentation

IPv6 does NOT allow routers to fragment packets — only the sender can fragment. If a packet is too large, the router drops it and sends an ICMPv6 "Packet Too Big" message. This makes PMTUD mandatory for IPv6 and is a common source of connectivity issues when ICMPv6 is blocked by firewalls.

Debugging Story: The 30-Second Timeout

An IoT gateway was losing connection to its MQTT broker every 30 seconds. The TCP connection would establish successfully, exchange a few messages, then silently die. The MQTT keep-alive was set to 60 seconds, so the broker wasn't timing out. Wireshark showed no RST or FIN — the connection just stopped.

The root cause: a NAT router between the gateway and the broker had a 30-second idle timeout for TCP connections. When no data was exchanged for 30 seconds, the NAT entry expired, and return packets from the broker were dropped because the NAT no longer knew where to forward them. The gateway's TCP stack didn't detect the loss because TCP has no built-in keep-alive at the transport level by default (TCP keep-alive is optional and disabled on most systems).

The fix was twofold: (1) reduce the MQTT keep-alive interval to 15 seconds so PINGREQ/PINGRESP packets flow regularly, keeping the NAT entry alive; (2) enable TCP keep-alive with setsockopt(SO_KEEPALIVE) as a fallback.

Lesson: When debugging TCP connections that die silently, always consider NAT timeouts, firewalls, and middleboxes. TCP's reliability guarantees only apply end-to-end — the network between the endpoints can and will discard state.

What interviewers want to hear: You understand the four TCP/IP layers and what each does (not the 7-layer OSI model). You can articulate TCP vs UDP with embedded-specific trade-offs (RAM, latency, throughput on constrained devices). You can walk through the three-way handshake and explain why each step exists. You know that TCP flow control protects the receiver (window) and congestion control protects the network (slow start). You understand MTU/MSS and why fragmentation should be avoided. You have practical experience with a lightweight stack (lwIP) and can discuss its configuration trade-offs.

Interview Focus

Classic TCP/IP Interview Questions

Q1: "What are the differences between TCP and UDP, and when would you use each in an embedded system?"

Model Answer Starter: "TCP provides reliable, ordered delivery with flow control and congestion control, but it requires per-connection state and RAM for buffers — on a Cortex-M running lwIP, each TCP connection consumes 4-8 KB of RAM. UDP has an 8-byte header, no connection state, and no retransmission — it is ideal for real-time sensor data where freshness matters more than completeness. I use TCP for OTA updates, cloud connectivity, and configuration APIs where data integrity is critical. I use UDP for periodic sensor broadcasts, device discovery (mDNS), time synchronization (NTP), and any scenario where a lost packet should be replaced by the next reading rather than retransmitted."

Q2: "Walk me through the TCP three-way handshake."

Model Answer Starter: "The client sends a SYN segment with its initial sequence number. The server responds with SYN-ACK, acknowledging the client's sequence and providing its own. The client sends ACK to acknowledge the server's sequence. After this, both sides have synchronized sequence numbers and know each other's window sizes, so data can flow in both directions. The handshake ensures that both sides are alive and agree on the starting state before committing resources to the connection."

Q3: "How does TCP handle a lost packet?"

Model Answer Starter: "TCP detects loss in two ways: (1) retransmission timeout — if no ACK arrives within the RTO (calculated from measured round-trip time), the segment is retransmitted with exponential backoff; (2) fast retransmit — if the sender receives three duplicate ACKs for the same sequence number, it assumes the next segment was lost and retransmits immediately without waiting for the timeout. Fast retransmit is much quicker because it reacts in one RTT instead of the full timeout. After loss, TCP also halves its congestion window to reduce load on the network."

Q4: "What is the difference between TCP flow control and congestion control?"

Model Answer Starter: "Flow control protects the receiver — the receiver advertises a window size in every ACK that tells the sender how much buffer space is available. If the window goes to zero, the sender stops. Congestion control protects the network — TCP starts with a small congestion window and increases it gradually (slow start, then linear increase). On packet loss, it halves the window. Flow control is local (receiver to sender), congestion control is inferred from network behavior (packet loss signals congestion). Both limit the sender's rate, and the effective sending rate is the minimum of the two windows."

Q5: "What lightweight TCP/IP stack would you use on a Cortex-M, and how would you configure it?"

Model Answer Starter: "I would use lwIP — it is the most widely supported lightweight stack for Cortex-M with FreeRTOS. Configuration is done in lwipopts.h where I set TCP window sizes, maximum connection count, pbuf pool size, and enable/disable features like DHCP, DNS, and IPv6 based on available RAM. For a device with 64 KB RAM, I typically allocate 8-16 KB for the stack, limit TCP connections to 2-4, and use the raw callback API instead of sockets for lower overhead. For TLS, I integrate mbedTLS and allocate an additional 20-40 KB for the TLS session and certificate handling."

Trap Alerts

  • Don't say: "TCP/IP has 7 layers" — that is the OSI model. TCP/IP has 4 layers.
  • Don't forget: TCP's RAM cost on constrained devices — each connection needs send/receive buffers. This is a critical embedded trade-off that desktop engineers never think about.
  • Don't ignore: NAT and middlebox behavior — TCP's end-to-end guarantees break when stateful network devices drop connection tracking.

Follow-up Questions

  • "How would you implement TLS on an embedded device with limited RAM?"
  • "What is TCP Nagle's algorithm and when would you disable it?"
  • "How does ARP work and what happens when the ARP cache is empty?"
  • "What is the difference between blocking and non-blocking sockets, and which would you use in an RTOS?"

Practice

How many layers does the TCP/IP model have?

Which protocol would you use for periodic sensor data where freshness matters more than reliability?

What is the TCP Maximum Segment Size (MSS) on a standard Ethernet link?

What triggers TCP fast retransmit?

Why is TCP expensive on a Cortex-M running lwIP?

Real-World Tie-In

IoT Sensor Gateway -- A factory gateway aggregated data from 200 Modbus sensors and forwarded it to a cloud platform via MQTT over TCP. With lwIP's default settings, the gateway could only maintain 4 simultaneous TCP connections in 48 KB of RAM. We reduced the TCP window to 2 KB, limited the retransmission queue depth, and used a single persistent MQTT connection with QoS 1 to achieve reliable cloud delivery within the RAM budget.

Automotive OTA Updates -- A vehicle telematics unit received firmware updates over TCP. The cellular link had 300 ms RTT and frequent packet loss. With a 4 KB TCP window, throughput was limited to 13 KB/s — a 2 MB update took over 2.5 minutes. Increasing the window to 16 KB (at the cost of 16 KB more RAM) improved throughput to 53 KB/s, cutting update time to under 40 seconds. This illustrates how TCP window size directly limits throughput on high-latency links.

Smart Home Hub -- A Zigbee-to-WiFi bridge used UDP for mDNS device discovery and TCP for REST API control. During testing, the hub crashed when 20 clients simultaneously opened TCP connections for status polling. Each connection consumed 6 KB of lwIP buffers, exceeding the 128 KB RAM budget. The fix was switching the status API to UDP-based CoAP with observe notifications — reducing per-client state from 6 KB to near zero and eliminating the connection scaling problem entirely.