Sockets API Basics

Quick Cap

The sockets API (Berkeley sockets) is the standard interface for network programming on both Linux and RTOS-based embedded systems. It abstracts TCP and UDP into a file-descriptor-like model where you create a socket, configure it, and then read/write data through it. In embedded contexts, sockets run either on the full Linux kernel stack or on lightweight stacks like lwIP — and the differences between these two environments create real gotchas in production.

Interviewers test whether you understand the full socket lifecycle (especially the TCP server vs client asymmetry), can choose between blocking and non-blocking I/O for a given embedded architecture, and know how to multiplex connections with select/poll/epoll.

Key Facts:

TCP server lifecycle: socket() → bind() → listen() → accept() → recv()/send() → close()
TCP client lifecycle: socket() → connect() → send()/recv() → close()
UDP: No connection — use sendto()/recvfrom() with destination address on each call
Blocking vs non-blocking: Blocking simplifies logic but stalls the thread; non-blocking requires a poll/event loop but enables single-threaded concurrency
Multiplexing: select() is portable but O(n); epoll() is Linux-only but O(1) — for embedded Linux with many connections, epoll wins
lwIP sockets: API-compatible with POSIX but single-threaded, smaller buffer defaults, and some options (like epoll()) are unavailable

Deep Dive

At a Glance

Concept	Detail
API origin	Berkeley sockets (BSD 4.2, 1983) — POSIX standardized
Socket types	`SOCK_STREAM` (TCP), `SOCK_DGRAM` (UDP), `SOCK_RAW` (raw IP)
Address families	`AF_INET` (IPv4), `AF_INET6` (IPv6), `AF_UNIX` (local IPC)
Blocking model	Default is blocking; set `O_NONBLOCK` via `fcntl()` for non-blocking
Multiplexing	`select()` (portable), `poll()` (better API), `epoll()` (Linux, scalable)
Embedded stacks	lwIP sockets (RTOS/bare-metal), Zephyr BSD sockets, Linux kernel sockets
Key options	`SO_REUSEADDR`, `TCP_NODELAY`, `SO_KEEPALIVE`, `SO_RCVTIMEO`

The TCP Socket Lifecycle

TCP sockets follow an asymmetric pattern: the server side has more steps because it must bind to a port, listen for connections, and accept them individually. The client side is simpler — just connect and communicate.

TCP server flow:

text

socket()  →  bind()  →  listen()  →  accept()  →  recv()/send()  →  close()
     ↑                                    ↑
  Create fd                          Returns NEW fd
  (listening)                        (per-client connection)

The key insight is that accept() returns a new file descriptor for each client connection. The original listening socket stays open, accepting more connections. This is why a TCP server has at least two file descriptors: one for listening, one (or more) for active connections.

/* Minimal TCP server — 12 lines of core logic */
int srv = socket(AF_INET, SOCK_STREAM, 0);
int opt = 1;
setsockopt(srv, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
struct sockaddr_in addr = {
    .sin_family = AF_INET, .sin_port = htons(8080),
    .sin_addr.s_addr = INADDR_ANY
};
bind(srv, (struct sockaddr *)&addr, sizeof(addr));
listen(srv, 5);                          /* backlog = 5 */
int cli = accept(srv, NULL, NULL);       /* blocks until client connects */
char buf[256];
ssize_t n = recv(cli, buf, sizeof(buf), 0);
send(cli, buf, n, 0);                   /* echo back */
close(cli);
close(srv);

TCP client flow:

/* Minimal TCP client */
int sock = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in srv = {
    .sin_family = AF_INET, .sin_port = htons(8080)
};
inet_pton(AF_INET, "192.168.1.10", &srv.sin_addr);
connect(sock, (struct sockaddr *)&srv, sizeof(srv));
send(sock, "hello", 5, 0);
char buf[256];
ssize_t n = recv(sock, buf, sizeof(buf), 0);
close(sock);

The client does not need bind() — the OS assigns an ephemeral port automatically. The client does not need listen() or accept() — it initiates the connection, not the server.

UDP Sockets: sendto() and recvfrom()

UDP sockets skip the connection setup entirely. There is no listen(), no accept(), no connection state. Each datagram is self-contained with its own destination address.

/* UDP sender */
int sock = socket(AF_INET, SOCK_DGRAM, 0);
struct sockaddr_in dest = {
    .sin_family = AF_INET, .sin_port = htons(5000)
};
inet_pton(AF_INET, "192.168.1.10", &dest.sin_addr);
sendto(sock, "sensor:42", 9, 0,
       (struct sockaddr *)&dest, sizeof(dest));
close(sock);

/* UDP receiver */
int sock = socket(AF_INET, SOCK_DGRAM, 0);
struct sockaddr_in addr = {
    .sin_family = AF_INET, .sin_port = htons(5000),
    .sin_addr.s_addr = INADDR_ANY
};
bind(sock, (struct sockaddr *)&addr, sizeof(addr));
char buf[256];
struct sockaddr_in src;
socklen_t slen = sizeof(src);
recvfrom(sock, buf, sizeof(buf), 0,
         (struct sockaddr *)&src, &slen);  /* who sent it? */
close(sock);

Why UDP matters in embedded: Sensor nodes often broadcast readings via UDP because there is no connection overhead, no per-client state, and a lost reading is replaced by the next one. Discovery protocols (mDNS, SSDP) and time sync (NTP) also use UDP because they need multicast support and low latency.

Blocking vs Non-Blocking Sockets

By default, sockets are blocking — calls like recv(), accept(), and connect() will not return until data arrives, a client connects, or the connection completes. This simplifies code but has a critical implication: a blocking call stalls the entire thread.

Aspect	Blocking	Non-Blocking
Default	Yes — sockets are blocking out of the box	Must set `O_NONBLOCK` via `fcntl()`
Behavior on no data	`recv()` blocks until data arrives	`recv()` returns -1 with `errno = EAGAIN`
Behavior on `connect()`	Blocks until handshake completes or times out	Returns -1 with `errno = EINPROGRESS`
Code complexity	Simple, sequential	Requires event loop and partial-read handling
Thread usage	One thread per connection (or one thread blocked)	Single thread can manage many connections
Embedded fit	Good for simple RTOS tasks (one connection per task)	Essential for single-threaded event loops (lwIP raw)

Setting non-blocking mode:

int flags = fcntl(sock, F_GETFL, 0);
fcntl(sock, F_SETFL, flags | O_NONBLOCK);

After setting non-blocking, every I/O call must check for EAGAIN/EWOULDBLOCK:

ssize_t n = recv(sock, buf, sizeof(buf), 0);
if (n < 0) {
    if (errno == EAGAIN || errno == EWOULDBLOCK)
        /* No data right now — try again later */;
    else
        /* Real error — handle it */;
} else if (n == 0) {
    /* Peer closed the connection */
}

⚠️Common Trap: Non-Blocking connect()

When connect() returns EINPROGRESS, the connection is not yet established. You must use select() or poll() to wait for the socket to become writable, then call getsockopt(SO_ERROR) to check whether the connection actually succeeded. Many embedded developers skip this check and start sending data on a half-open socket — resulting in silent failures.

I/O Multiplexing: select(), poll(), epoll()

Multiplexing lets a single thread monitor multiple sockets and react only when data is ready. This is the foundation of every embedded network server and event-driven client.

select()

The oldest and most portable multiplexer. You build a set of file descriptors, call select(), and it tells you which ones are ready.

fd_set readfds;
FD_ZERO(&readfds);
FD_SET(listen_fd, &readfds);
FD_SET(client_fd, &readfds);
struct timeval tv = { .tv_sec = 1, .tv_usec = 0 };
int ready = select(max_fd + 1, &readfds, NULL, NULL, &tv);
if (ready > 0) {
    if (FD_ISSET(listen_fd, &readfds)) { /* new connection */ }
    if (FD_ISSET(client_fd, &readfds)) { /* data available */ }
}

Limitation: select() scans all file descriptors up to max_fd on every call — O(n) overhead. On Linux, FD_SETSIZE is typically 1024, limiting the maximum fd number. On lwIP, this limit is usually much lower (often 8-16).

poll()

A cleaner API that uses an array of pollfd structs instead of bitmask sets. No FD_SETSIZE limit.

struct pollfd fds[2];
fds[0] = (struct pollfd){ .fd = listen_fd, .events = POLLIN };
fds[1] = (struct pollfd){ .fd = client_fd, .events = POLLIN };
int ready = poll(fds, 2, 1000);  /* 1-second timeout */
if (fds[0].revents & POLLIN) { /* new connection */ }
if (fds[1].revents & POLLIN) { /* data available */ }

Still O(n) per call — the kernel scans the entire array each time. Better API than select(), same scalability limitation.

epoll() (Linux only)

The scalable solution for Linux. You register interest once, and epoll_wait() returns only the file descriptors that are ready — O(1) per ready event.

int ep = epoll_create1(0);
struct epoll_event ev = { .events = EPOLLIN, .data.fd = listen_fd };
epoll_ctl(ep, EPOLL_CTL_ADD, listen_fd, &ev);
struct epoll_event events[16];
int n = epoll_wait(ep, events, 16, 1000);
for (int i = 0; i < n; i++) {
    if (events[i].data.fd == listen_fd) { /* new connection */ }
    else { /* data on a client socket */ }
}

Feature	`select()`	`poll()`	`epoll()`
Portability	POSIX (Linux, lwIP, macOS, Windows)	POSIX (Linux, macOS)	Linux only
Scalability	O(n) — scans all fds every call	O(n) — same as select	O(1) — returns only ready fds
FD limit	`FD_SETSIZE` (1024 on Linux)	No hard limit	No hard limit
Trigger modes	Level-triggered only	Level-triggered only	Level-triggered or edge-triggered
Embedded use	lwIP, Zephyr, any POSIX RTOS	Linux only in practice	Embedded Linux only

💡Edge-Triggered vs Level-Triggered epoll

Level-triggered (default): epoll_wait() keeps returning a fd as ready as long as there is data in the buffer. Edge-triggered (EPOLLET): it only notifies when the state changes (new data arrives), so you must drain the buffer completely in each callback. Edge-triggered is more efficient but harder to get right — missed reads mean lost data. For embedded Linux, start with level-triggered unless you have a specific performance requirement.

Socket Options That Matter

Socket options configure behavior that affects performance, reliability, and debugging. These three are the most commonly discussed in embedded interviews.

Option	Level	Purpose	When to Use
`SO_REUSEADDR`	`SOL_SOCKET`	Allow binding to a port in `TIME_WAIT` state	Always on TCP servers — prevents "address already in use" after restart
`TCP_NODELAY`	`IPPROTO_TCP`	Disable Nagle's algorithm (send small packets immediately)	Real-time control, interactive protocols — reduces latency at cost of more packets
`SO_KEEPALIVE`	`SOL_SOCKET`	Send periodic probes on idle connections to detect dead peers	Long-lived MQTT/cloud connections — detects NAT timeout and silent peer death
`SO_RCVTIMEO`	`SOL_SOCKET`	Set timeout on blocking `recv()`	Prevents indefinite blocking on embedded devices that must remain responsive
`SO_SNDBUF`/`SO_RCVBUF`	`SOL_SOCKET`	Set send/receive buffer sizes	Tune memory usage on constrained devices (lwIP defaults are small)

/* Disable Nagle — send immediately, don't wait to coalesce */
int flag = 1;
setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(flag));

/* Set receive timeout to 5 seconds */
struct timeval tv = { .tv_sec = 5, .tv_usec = 0 };
setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv));

Error Handling: The Errors You Will Hit

Sockets produce specific errno values that tell you exactly what went wrong. Handling them correctly is the difference between robust embedded networking and silent failures.

Error	Meaning	What to Do
`EAGAIN` / `EWOULDBLOCK`	Non-blocking socket has no data or buffer is full	Retry later — register for `POLLIN`/`POLLOUT` and wait
`ECONNRESET`	Peer sent RST — connection was forcibly closed	Close socket, reconnect if needed
`EPIPE`	Writing to a socket whose read end is closed	Peer is gone — close and reconnect; ignore `SIGPIPE` with `MSG_NOSIGNAL`
`ECONNREFUSED`	No process listening on target port	Server is down or wrong port — retry with backoff
`ETIMEDOUT`	Connection or operation timed out	Network issue or peer unresponsive — retry or failover
`EINPROGRESS`	Non-blocking `connect()` is in progress	Not an error — wait for writability, then check `SO_ERROR`

⚠️SIGPIPE Kills Your Process

On Linux, writing to a closed socket raises SIGPIPE, which terminates the process by default. In embedded daemons, this is catastrophic. Either set signal(SIGPIPE, SIG_IGN) at startup or use send(sock, data, len, MSG_NOSIGNAL) on every write. lwIP does not generate SIGPIPE, so code ported from lwIP to Linux often hits this unexpectedly.

lwIP Sockets vs Linux Sockets

lwIP provides a POSIX-like sockets API for bare-metal and RTOS environments, but there are important differences that bite developers who assume full Linux behavior.

Aspect	Linux Sockets	lwIP Sockets
Threading	Fully reentrant, multi-threaded safe	Single-threaded core; socket calls dispatch to `tcpip_thread`
Multiplexing	`select()`, `poll()`, `epoll()` all available	`select()` only (no `poll()` or `epoll()`)
FD_SETSIZE	1024 by default	Often 8-16 (configured in `lwipopts.h`)
Buffer sizes	64 KB+ default send/receive buffers	2-8 KB typical — tuned for RAM-constrained devices
`SO_RCVTIMEO`	Works as expected	Some lwIP ports ignore it or require `LWIP_SO_RCVTIMEO`
`close()` behavior	Sends FIN, enters `TIME_WAIT`	May linger or drop data if TCP send queue is not empty
Error codes	POSIX `errno`	lwIP defines its own error codes; mapping is imperfect
Raw API alternative	N/A	Callback-based, zero-copy — more efficient but harder to use

⚠️lwIP Gotcha: Thread Safety

All lwIP socket calls internally post messages to the tcpip_thread. If you call send() from an ISR or a non-RTOS context, you will corrupt internal state. On lwIP, sockets must only be called from RTOS tasks (or the main loop in bare-metal with cooperative scheduling). This is the single most common lwIP bug in production embedded systems.

Embedded Architecture: Event Loop vs Thread-per-Connection

How you structure your socket code depends fundamentally on whether your embedded system runs a full RTOS with multiple threads or a single-threaded event loop.

Thread-per-connection (embedded Linux, FreeRTOS with plenty of RAM):

Spawn a new thread (or RTOS task) for each accept()ed connection
Each thread runs a simple blocking read/write loop
Easy to write, easy to reason about
Cost: stack memory per thread (typically 2-8 KB), context-switch overhead, limited scalability

Single-threaded event loop (bare-metal, lwIP raw API, constrained RTOS):

One loop calls select() or lwIP's raw callbacks
All connections share one thread/task
Non-blocking I/O required; state machines track per-connection progress
Cost: code complexity, but minimal memory and no thread synchronization bugs

Hybrid (common in practice):

One task for the event loop managing socket I/O
Separate tasks for processing (sensor reads, actuator control)
Communication via queues or shared buffers with mutexes
Balances code simplicity with resource constraints

For most embedded Linux applications, a select()-based or epoll()-based event loop in a single thread is the sweet spot — it avoids thread overhead while handling multiple connections efficiently. Reserve thread-per-connection for systems with few connections and plenty of RAM.

Debugging Story: The Mysterious Stale Connection

An IoT gateway running lwIP on FreeRTOS maintained a persistent TCP connection to a cloud broker. Every few days, the connection would "die" — no data flowed, but the gateway's application layer believed it was still connected. The socket was never closed, recv() never returned an error, and the device kept calling send() without failure.

The root cause: the network path included a carrier-grade NAT that silently dropped the connection's state after 2 hours of idle time. Outbound packets from the gateway were blackholed — the NAT had no entry for them. But since lwIP's TCP stack saw no explicit RST or timeout (the NAT just silently dropped packets, and TCP retransmission eventually gave up silently in this lwIP configuration), the socket remained "open" from the application's perspective.

The fix had three parts: (1) enable SO_KEEPALIVE with a 60-second interval so TCP probes would detect the dead path; (2) add application-level heartbeats (MQTT PINGREQ every 30 seconds); (3) implement a watchdog that closes and reopens the socket if no data is received for 120 seconds.

Lesson: On embedded devices with long-lived connections, never rely solely on TCP to detect dead paths. NATs, firewalls, and middleboxes can silently kill connections without generating any TCP error. Always layer application-level heartbeats on top of SO_KEEPALIVE.

What interviewers want to hear: You understand the full socket lifecycle and can explain why accept() returns a new fd. You know the difference between blocking and non-blocking I/O and can choose the right model for a given embedded architecture. You can compare select(), poll(), and epoll() and know when each is appropriate. You have configured socket options like SO_REUSEADDR, TCP_NODELAY, and SO_KEEPALIVE and can explain why each matters. You handle errors properly — especially EAGAIN, ECONNRESET, and EPIPE. You understand the differences between lwIP and Linux sockets and the gotchas when porting between them.

Interview Focus

Classic Interview Questions

Q1: "Walk me through the lifecycle of a TCP server socket."

Model Answer Starter: "First, I call socket() with AF_INET and SOCK_STREAM to create the listening socket. Then bind() to associate it with a port, listen() to mark it as a passive socket with a connection backlog, and accept() which blocks until a client connects and returns a new file descriptor for that connection. The original listening socket stays open to accept more clients. On the new fd, I use recv() and send() for data exchange, then close() when done. Before bind(), I always set SO_REUSEADDR to avoid 'address already in use' errors during server restarts."

Q2: "What is the difference between blocking and non-blocking sockets, and which would you use in an embedded RTOS?"

Model Answer Starter: "Blocking sockets stall the calling thread until the operation completes — recv() waits for data, accept() waits for a connection. Non-blocking sockets return immediately with EAGAIN if no data is available. In an RTOS, I choose based on architecture: if I have one dedicated task per connection, blocking is simpler and the RTOS scheduler handles concurrency. If I need to handle multiple connections in a single task — common on RAM-constrained devices — I use non-blocking sockets with select() in an event loop. On lwIP specifically, the raw callback API is even more efficient for single-threaded designs."

Q3: "Compare select(), poll(), and epoll(). When would you use each?"

Model Answer Starter: "select() is the most portable — it works on Linux, lwIP, Zephyr, and even Windows. But it scans all file descriptors up to max_fd on every call (O(n)) and is limited by FD_SETSIZE. poll() fixes the API — it uses an array instead of bitmask, no fd limit — but is still O(n). epoll() is Linux-only but O(1): you register interest once, and epoll_wait() returns only ready descriptors. For embedded Linux with many connections, I use epoll(). For RTOS with lwIP, select() is the only option. For portable code that must run on both, I use select() or abstract behind a platform layer."

Q4: "What does SO_REUSEADDR do and why is it important for embedded servers?"

Model Answer Starter: "When a TCP server closes, the port enters TIME_WAIT state for 60-120 seconds (2x MSL). During this time, bind() to the same port fails with EADDRINUSE. SO_REUSEADDR allows binding to a port in TIME_WAIT, which is essential for embedded servers that restart frequently — OTA updates, watchdog resets, or crash recovery. Without it, the server cannot restart for up to two minutes. I set it on every TCP server socket before bind()."

Q5: "How do you handle errors when writing to a TCP socket in an embedded system?"

Model Answer Starter: "send() can fail in several ways. EAGAIN means the send buffer is full — I back off and retry, typically by waiting for POLLOUT in my event loop. EPIPE means the remote end closed the connection — on Linux this also raises SIGPIPE, which I suppress with MSG_NOSIGNAL or by ignoring SIGPIPE globally. ECONNRESET means the peer sent a RST. For any connection-fatal error, I close the socket, clean up per-connection state, and attempt reconnection with exponential backoff. On lwIP, I also check for lwIP-specific error codes since the POSIX mapping is not always exact."

Trap Alerts

Don't say: "accept() returns the same socket descriptor" — it returns a new fd for the client connection.
Don't forget: SO_REUSEADDR on servers, MSG_NOSIGNAL on Linux sends, and EAGAIN handling on non-blocking sockets — omitting any of these is a production bug.
Don't ignore: The differences between lwIP and Linux sockets — code that works on one will not necessarily work on the other, especially around threading, select() limits, and error codes.

Follow-up Questions

"How would you implement a reconnection strategy with exponential backoff for a TCP client?"
"What is Nagle's algorithm and when would you disable it with TCP_NODELAY?"
"How does edge-triggered epoll differ from level-triggered, and what bugs can edge-triggered cause?"
"How would you design a socket abstraction layer that works on both Linux and lwIP?"

Practice

❓ What does accept() return on a TCP server?

❓ What does recv() return on a non-blocking socket when no data is available?

❓ Which I/O multiplexing mechanism is available on lwIP (RTOS)?

❓ Why should you set SO_REUSEADDR on a TCP server socket before calling bind()?

❓ What signal does Linux send when you write to a TCP socket whose peer has closed the connection?

Real-World Tie-In

IoT Gateway with lwIP -- A Cortex-M4 gateway running FreeRTOS and lwIP needed to handle 8 sensor nodes reporting over TCP simultaneously. With lwIP's FD_SETSIZE set to 8, the select()-based event loop could barely fit all connections. We switched the sensor protocol to UDP (sendto()/recvfrom()) — eliminating per-connection state entirely — and reserved the single TCP connection for cloud MQTT. Memory usage dropped from 32 KB to 6 KB and the gateway could support 50+ sensors.

Automotive Diagnostics Server -- A Linux-based vehicle diagnostics server used epoll() to handle simultaneous connections from multiple diagnostic tools. A subtle bug caused occasional data corruption: the server used edge-triggered epoll but did not drain the receive buffer completely on each event. When two packets arrived back-to-back, the second was silently lost because edge-triggered mode only notifies on new arrivals. Switching to level-triggered mode fixed the issue with no measurable performance impact — a reminder that edge-triggered epoll requires careful coding that is rarely worth the complexity in embedded.

Smart Meter with Reconnection -- A utility smart meter connected to a cloud server over cellular TCP. The cellular link dropped every 6-8 hours due to carrier policy. The original code did not handle ECONNRESET or detect silent connection death. We added SO_KEEPALIVE (30-second interval), application-level heartbeats, and an exponential backoff reconnection loop (1s, 2s, 4s, 8s, max 60s). After the fix, the meter achieved 99.97% uptime over 12 months with zero manual interventions.