Quick Cap
Embedded Linux systems often split functionality across multiple processes for isolation, security, and reliability — one process handles the sensor, another runs the network stack, a third manages the UI. These processes need to communicate, and Linux provides a rich set of inter-process communication (IPC) mechanisms, each optimized for different patterns. The interview question is always: "which IPC would you use for X, and why?"
Key Facts:
- Pipes: Simplest IPC. Unidirectional, parent-child only (anonymous) or named (FIFO). Good for streaming data.
- Shared memory: Fastest IPC (zero-copy). Requires explicit synchronization (mutexes/semaphores). Best for large data buffers.
- Unix domain sockets: Most flexible. Bidirectional, stream or datagram, supports file descriptor passing. Used by systemd, D-Bus.
- Message queues (POSIX): Structured messages with priority. Good for command/event passing between processes.
- Signals: Lightweight notifications (no payload). Limited to predefined signal numbers. Not for data transfer.
- D-Bus: High-level message bus built on Unix sockets. Standard for Linux system services (BlueZ, NetworkManager, systemd).
Deep Dive
At a Glance
| Mechanism | Direction | Data Type | Latency | Synchronization | Best For |
|---|---|---|---|---|---|
| Pipe | Unidirectional | Byte stream | Low | Built-in (blocking read/write) | Parent-child streaming |
| Named pipe (FIFO) | Unidirectional | Byte stream | Low | Built-in | Unrelated processes, streaming |
| Shared memory | Bidirectional | Any (raw bytes) | Lowest (zero-copy) | Manual (mutex/semaphore) | Large data buffers, sensor frames |
| Unix socket | Bidirectional | Stream or datagram | Low | Built-in | General-purpose, most flexible |
| Message queue | Bidirectional | Structured messages | Low | Built-in, with priority | Command/event passing |
| Signal | Unidirectional | Signal number only | Very low | Async (interrupts process) | Notifications, no payload |
| D-Bus | Bidirectional | Typed messages | Higher (~100 us) | Built-in | System service APIs |
Pipes and FIFOs
Anonymous pipes are the simplest IPC — created with pipe(), they provide a unidirectional byte channel between a parent and child process. The shell's | operator uses pipes: cat file | grep pattern creates a pipe between cat and grep.
Named pipes (FIFOs) appear as files in the filesystem (created with mkfifo). Any process that can access the file can open it for reading or writing. This allows unrelated processes to communicate without a parent-child relationship.
Limitations: Pipes are unidirectional (you need two for bidirectional), have a kernel buffer (typically 64 KB on Linux), and block when the buffer is full (writer) or empty (reader). For bidirectional communication between unrelated processes, Unix sockets are almost always a better choice.
Shared Memory
Shared memory is the fastest IPC because data is not copied between processes — both processes map the same physical memory pages into their address space. The kernel is not involved in the data transfer (only in setup and teardown).
Two APIs:
- POSIX (
shm_open+mmap): Creates a named shared memory object in/dev/shm/. Preferred for new code. - System V (
shmget+shmat): Older API, still widely used. Uses integer keys for identification.
The catch: Shared memory has no built-in synchronization. If two processes write to the same memory region simultaneously, data corruption occurs. You must pair shared memory with:
- POSIX mutexes (
pthread_mutex_twithPTHREAD_PROCESS_SHARED) for mutual exclusion - POSIX semaphores (
sem_open) for signaling between processes - Lock-free data structures (ring buffers with separate read/write indices) for high-performance paths
| Shared Memory Pattern | Synchronization | Use Case |
|---|---|---|
| Single writer, single reader | Lock-free ring buffer | Sensor data streaming |
| Multiple writers, single reader | Mutex-protected queue | Event aggregation |
| Multiple readers, single writer | Read-write lock or RCU-like pattern | Configuration broadcast |
The most common IPC bug in embedded Linux: two processes using shared memory without any locking. It works in testing (low load, deterministic scheduling) but corrupts data under production load when processes run on different CPU cores simultaneously. Always pair shared memory with explicit synchronization.
Unix Domain Sockets
Unix domain sockets are the most versatile IPC mechanism. They use the socket API (socket, bind, connect, send, recv) but communicate locally through a filesystem path instead of a network address.
Why Unix sockets are preferred over TCP for local IPC:
| Feature | Unix Domain Socket | TCP Loopback |
|---|---|---|
| Latency | Lower (no TCP/IP stack) | Higher (full protocol processing) |
| Overhead | No checksums, no sequence numbers | Full TCP overhead |
| File descriptor passing | Yes (SCM_RIGHTS) | No |
| Credential passing | Yes (SO_PEERCRED) | No |
| Datagram mode | Yes (reliable, unlike UDP) | No |
Unix sockets support both stream (connection-oriented, like TCP) and datagram (connectionless, like UDP but reliable on Unix sockets). systemd uses Unix sockets for socket activation, D-Bus is built on Unix sockets, and most Linux system daemons use them for local communication.
D-Bus
D-Bus is a high-level message bus that provides typed, structured inter-process communication with service discovery, method calls, signals (events), and property access. It is built on Unix domain sockets but adds a protocol layer.
Two buses:
- System bus: System-wide services (NetworkManager, BlueZ Bluetooth, UPower). Runs as root.
- Session bus: Per-user services (desktop applications). Rarely used in headless embedded.
When to use D-Bus in embedded:
- When you need a standard interface that other Linux services already use (Bluetooth via BlueZ API)
- When you need service discovery ("which services are available on the bus?")
- When you need a strongly-typed API between processes written in different languages
When NOT to use D-Bus:
- High-frequency data streaming (D-Bus marshaling adds ~100 us per message)
- Resource-constrained devices (the D-Bus daemon uses 1-2 MB RAM)
- Simple point-to-point communication (Unix sockets are simpler and faster)
Signals
Signals are the most primitive IPC — they deliver an integer (signal number) to a process asynchronously. The process can catch the signal with a handler, ignore it, or let the default action occur (usually terminate).
Commonly used signals in embedded:
| Signal | Default | Embedded Use |
|---|---|---|
SIGTERM | Terminate | Graceful shutdown (flush data, close connections) |
SIGKILL | Kill (uncatchable) | Force-kill unresponsive process |
SIGHUP | Terminate | Reload configuration (by convention) |
SIGUSR1 | Terminate | Application-defined (toggle debug mode) |
SIGCHLD | Ignore | Reap child processes (avoid zombies) |
Limitations: Signals cannot carry data (only the signal number), are not queued (multiple same signals may be merged), and signal handlers run asynchronously in the process context — only async-signal-safe functions can be called in handlers.
Choosing the Right IPC
| Scenario | Best IPC | Why |
|---|---|---|
| Streaming sensor data between processes | Shared memory + ring buffer | Zero-copy, lowest latency |
| Command/response between services | Unix domain socket (stream) | Bidirectional, reliable, simple |
| Event notifications across system | D-Bus signals | Service discovery, typed events |
| Parent launches child, pipes output | Anonymous pipe | Simplest, built-in to fork/exec |
| "Reload config" signal to daemon | SIGHUP | Convention, no data needed |
| Passing a file descriptor to another process | Unix socket + SCM_RIGHTS | Only mechanism that supports this |
Debugging Story: Shared Memory Corruption in a Camera System
An embedded Linux camera system had two processes: a capture process writing frames to shared memory and a compression process reading them. During development with a single-core CPU, it worked perfectly. When deployed on a dual-core SoC, the compression process occasionally produced garbled images.
The root cause: both processes accessed the shared memory buffer without synchronization. On a single core, the scheduler ensured only one ran at a time. On dual-core, both ran simultaneously — the capture process was writing a new frame while the compression process was reading the previous one, resulting in a "torn" frame (half old, half new).
The fix: implement a double-buffer scheme with a lock-free swap mechanism. The capture process writes to buffer A while the compression process reads buffer B. When capture completes, an atomic pointer swap makes buffer A the "read" buffer. No mutex needed, no latency added.
The lesson: IPC bugs that depend on timing are the hardest to find. They often hide on single-core systems and only appear on multi-core. Always design for concurrent access from the start, even if your current hardware is single-core.
What Interviewers Want to Hear
- You can compare IPC mechanisms by latency, complexity, and use case — not just list them
- You understand that shared memory requires explicit synchronization
- You know Unix sockets are preferred over TCP loopback for local IPC (lower overhead, FD passing)
- You can recommend the right IPC for a specific architecture
- You know when D-Bus is appropriate (service APIs) vs overkill (simple data streaming)
- You understand signal limitations (no payload, async-unsafe handler context)
Interview Focus
Classic Interview Questions
Q1: "Compare shared memory and Unix domain sockets for IPC. When would you use each?"
Model Answer Starter: "Shared memory is the fastest — zero-copy, both processes access the same physical pages. But it requires explicit synchronization (mutexes or lock-free structures) and has no built-in flow control. I use it for high-bandwidth data like camera frames or audio buffers where latency matters. Unix domain sockets are slightly slower (data is copied through the kernel) but provide built-in flow control, connection management, and sequencing. I use them for command/response communication between services. For most embedded IPC, I default to Unix sockets unless profiling shows the copy overhead is a bottleneck."
Q2: "How do you synchronize access to shared memory between two processes?"
Model Answer Starter: "Three approaches depending on the access pattern. For single-writer, single-reader streaming data: a lock-free ring buffer with separate read and write indices, both stored in the shared region. For multiple writers: a POSIX mutex initialized with PTHREAD_PROCESS_SHARED attribute, stored in the shared memory itself. For a producer-consumer pattern: a POSIX semaphore (sem_open) to signal data availability. The key is matching the synchronization to the access pattern — a ring buffer is fastest for streaming, but a mutex is needed for random-access shared state."
Q3: "What is D-Bus and when would you use it vs a Unix socket?"
Model Answer Starter: "D-Bus is a message bus protocol built on Unix sockets that adds service discovery, typed method calls, property access, and broadcast signals. I use it when interfacing with existing Linux system services (BlueZ for Bluetooth, NetworkManager for networking) because they already expose D-Bus APIs. For custom application-level IPC where I control both sides, I use raw Unix sockets — simpler, faster, no daemon dependency. D-Bus adds about 100 us of marshaling overhead per message and requires the dbus-daemon process, so it is not suitable for high-frequency data transfer."
Q4: "What are the limitations of using signals for IPC?"
Model Answer Starter: "Signals are notifications only — they carry no data beyond the signal number. Standard signals are not queued: if SIGUSR1 is sent twice while the handler for the first is running, the second may be lost. Signal handlers run asynchronously and can only call async-signal-safe functions — no malloc, no printf, no mutex operations. The safe pattern is to set a volatile flag in the handler and check it in the main loop. For anything beyond simple notifications, use a proper IPC mechanism."
Q5: "You need to stream 30 FPS camera frames (1 MB each) between two processes. Which IPC would you choose?"
Model Answer Starter: "Shared memory with a double-buffer or ring buffer scheme. At 30 MB/s, copying data through pipes or sockets would consume significant CPU and add latency. With shared memory, the capture process writes directly to a buffer, then atomically swaps the buffer pointer. The consumer reads the previous buffer. Zero copy, sub-millisecond latency. I would use POSIX shared memory (shm_open + mmap) with cache-line-aligned buffers, and a lock-free swap mechanism using atomic operations for the buffer index."
Trap Alerts
- Don't say: "Just use shared memory, it's the fastest" — without mentioning synchronization requirements
- Don't forget: Unix domain socket datagrams ARE reliable (unlike UDP) — a common misconception
- Don't ignore: D-Bus overhead — it is inappropriate for high-frequency data but perfect for service APIs
Follow-up Questions
- "How would you implement a watchdog that monitors multiple processes using IPC?"
- "What is socket activation in systemd and how does it relate to IPC?"
- "How do you pass a file descriptor from one process to another?"
- "What is the difference between POSIX and System V shared memory?"
Practice
❓ Which IPC mechanism provides zero-copy data transfer between processes?
❓ Why are Unix domain sockets preferred over TCP loopback (127.0.0.1) for local IPC?
❓ What happens if two processes write to shared memory simultaneously without synchronization?
❓ A signal handler calls printf() to log the signal. What is wrong with this?
Real-World Tie-In
Automotive Sensor Fusion — An ADAS system runs camera capture, radar processing, and fusion algorithm as separate processes for fault isolation (if camera crashes, radar continues). Camera frames (2 MB, 30 FPS) go through shared memory with double buffering. Radar data (1 KB commands) goes through Unix domain sockets. The fusion process reads from both. This architecture survives individual process crashes without losing the other sensor feeds.
Smart Home Hub — A home automation gateway uses D-Bus as its central message bus. The Zigbee process publishes device events on D-Bus, the automation engine subscribes to events and sends commands back, and the web UI process queries device state via D-Bus properties. D-Bus service discovery means new protocol handlers (Z-Wave, Matter) can be added as drop-in services without modifying existing code.