Comprehensive Survey: P2P Signaling and Relay Server Technologies
Building a self-hosted P2P signaling and relay server is the core infrastructure for cross-network connectivity, remote access, and mesh VPN scenarios. This article systematically surveys the complete technology landscape across three dimensions: protocol standards (STUN/TURN/ICE/BEHAVE), mainstream products (Tailscale, Nebula, NetBird, ZeroTier, Headscale, OpenZiti, etc.), and frameworks & algorithms (libp2p, WebRTC, Kademlia DHT).
All technical descriptions are verified against primary sources—RFC originals, academic papers, and official documentation. Key statistics include citations.
Core Concept Quick Reference
Before diving into technical details, here’s a quick reference table for the core concepts covered in this article. Each concept is explored in detail in subsequent sections.
| Concept | One-Line Explanation |
|---|---|
| NAT | Network Address Translation—maps private internal IPs to a public IP, letting multiple devices share one internet exit point |
| STUN | Protocol that lets a device discover its own public address (“Who am I to the outside world?”) |
| TURN | Protocol for relaying traffic through a server when direct connection fails (“Help me pass messages”) |
| ICE | Complete solution orchestrating STUN + TURN, automatically finding the optimal connection path (“The conductor”) |
| DERP | Tailscale’s encrypted relay protocol over HTTPS port 443, extremely difficult to block |
| WireGuard | Modern VPN protocol (~4,000 lines of code) using Curve25519 + ChaCha20-Poly1305 |
| Noise Protocol | Cryptographic protocol framework for building secure transport channels; Nebula is built on this |
| Signaling Channel | Control channel independent of business traffic, used to exchange addresses and coordinate hole-punching (“side channel”) |
| Hole Punching | Both sides send packets to each other simultaneously, opening mappings in their respective NATs to establish direct connection |
| Control / Data Plane | Control plane manages “who can connect to whom”; data plane manages “how actual data flows”—kept separate |
| DHT | Distributed Hash Table—decentralized P2P node discovery and data location mechanism with no central server |
| Mesh VPN | Mesh-topology VPN where nodes interconnect directly rather than through a central hub, eliminating single points of failure |
| Lighthouse | Nebula’s discovery node, similar to DNS—answers “where is node X?” |
| PKI / CA | Public Key Infrastructure / Certificate Authority—used for identity authentication and encryption |
| CGNAT | Carrier-Grade NAT—ISPs have many users share few public IPs, a common obstacle to hole-punching |
NAT Traversal: The First Hurdle for Direct Connectivity
When two devices behind different NATs want to establish a P2P direct connection, the biggest obstacle is NAT. NAT (Network Address Translation) is a technology that maps private internal IP addresses to public IP addresses—home routers and ISP gateways all do this. It lets multiple devices share one public IP exit, but also makes it impossible for external hosts to proactively initiate connections to internal devices. This is the core contradiction that all traversal techniques aim to solve.
Understanding NAT behavior models is the theoretical foundation for all traversal techniques.
The Two-Dimensional NAT Behavior Model
RFC 4787 (BCP 127) describes NAT behavior using two independent dimensions, replacing the earlier coarse Cone/Symmetric classification:
- Mapping Behavior—Does the NAT reuse the same external mapping for the same internal
(IP, port)when communicating with different destinations? In other words, when you access website A and website B from home, does your router assign the same external port?- Endpoint-Independent Mapping: Always reuses the same mapping (≈ Full Cone’s mapping dimension). The external port is the same regardless of destination—this is the best case, hole-punching is easiest
- Address-Dependent Mapping: Different destination IP = different mapping. External port differs when accessing different IPs
- Address and Port-Dependent Mapping: Different destination IP or port = different mapping (≈ Symmetric). This is the worst case—hole-punching is nearly impossible
- Filtering Behavior—When does the NAT allow inbound packets to a mapping? That is, what conditions must an external packet meet to be forwarded to the internal host?
- Endpoint-Independent Filtering: Any external host can send—most permissive
- Address-Dependent Filtering: Only if the internal host previously sent to that IP—only replies to those you’ve contacted
- Address and Port-Dependent Filtering: Only if the internal host previously sent to that
(IP, port)—most strict
RFC 4787 REQ-1 explicitly requires: NATs MUST have Endpoint-Independent Mapping, otherwise nearly all hole-punching techniques fail, forcing reliance on relays.
Four Classic NAT Types
Although RFC 3489’s classification was obsoleted by RFC 5389 (for “causing significant confusion”), its four-type model remains the best introduction to understanding hole-punching failures:
| Type | Mapping Characteristic | External Reachability | Hole Punching |
|---|---|---|---|
| Full Cone | Same internal (IP,port) → same external (IP,port) | Any external host | ✅ Easiest |
| Restricted Cone | Same as above | Only IPs previously contacted | ✅ Needs coordination |
| Port Restricted Cone | Same as above | Only (IP,port) pairs previously contacted | ✅ Needs precise coordination |
| Symmetric | Different destinations → different mappings | Only the specific source that received packets | ❌ Nearly impossible |
Symmetric NAT is the biggest obstacle to P2P direct connections—it assigns different external ports for each destination, making the public address exchanged via rendezvous servers invalid when used for a different peer.
A Rendezvous Server is a public intermediary server where both peers register their address information; it relays each peer’s address to the other, preparing for hole-punching.
UDP Hole Punching
UDP Hole Punching is the simplest and most robust NAT traversal technique. Bryan Ford et al. systematically documented and measured this technique in their seminal USENIX ‘05 paper:
sequenceDiagram
participant A as Node A
participant S as Rendezvous Server
participant B as Node B
A->>S: Register (report own address)
B->>S: Register (report own address)
S-->>A: Return B's public mapping
S-->>B: Return A's public mapping
Note over A,B: Both send packets to each other simultaneously
A->>B: UDP packet (opens hole in local NAT)
B->>A: UDP packet (opens hole in local NAT)
Note over A,B: NAT mappings established, bidirectional comms successfulCore flow: Both peers register with a public rendezvous server → server tells each peer the other’s public reflexive address → both simultaneously send packets from local sockets to the other’s public address → outbound packets create mappings (“holes”) in their respective NATs → subsequent packets from the peer are treated as “responses” and passed through.
Measured data (Ford et al., USENIX ATC 2005): approximately 82% of NATs support UDP hole punching, approximately 64% support TCP hole punching. The paper also first reliably demonstrated P2P TCP stream establishment via simultaneous-open.
TCP Hole Punching: Feasible but Fragile
TCP hole punching exploits the simultaneous-open mode defined in RFC 793—in a normal TCP connection, one side sends SYN (active open) and the other replies with SYN-ACK (passive open); with simultaneous-open, both sides send SYN almost simultaneously, SYNs cross in the network, each responds with SYN-ACK, and the connection establishes. This requires NAT to support endpoint-independent mapping and correctly handle simultaneous-open (RFC 5382 REQ-2).
It’s more fragile than UDP because: the TCP state machine is complex (needing to correctly handle various exception combinations in the SYN/SYN-ACK/ACK three-way handshake), and some NATs drop inbound SYN or incorrectly translate outbound SYN-ACK. The industry recommends UDP as the base protocol (e.g., QUIC over UDP).
QUIC is a transport protocol developed by Google and used by HTTP/3. It runs over UDP with built-in encryption (TLS 1.3), solving TCP’s head-of-line blocking problem. Using UDP as the carrier provides reliable transport similar to TCP while bypassing TCP hole-punching difficulties.
The Standardized Traversal Protocol Stack
The NAT traversal technology stack is layered bottom-up, with each layer solving different problems:
flowchart TD
A["BEHAVE Specs<br/>Define NAT behavior"] --> B["Hole Punching<br/>UDP/TCP Techniques"]
B --> C["STUN<br/>Discover reflexive addr"]
C --> D["TURN<br/>Relay fallback"]
C --> E["ICE<br/>Orchestrates STUN+TURN"]
E --> F["DERP<br/>Encrypted relay supplement"]
style A fill:#fff3e0,stroke:#FF9800
style B fill:#e3f2fd,stroke:#2196F3
style C fill:#e3f2fd,stroke:#2196F3
style D fill:#e3f2fd,stroke:#2196F3
style E fill:#f3e5f5,stroke:#9C27B0
style F fill:#e8f5e9,stroke:#4CAF50The bottom layer is BEHAVE working group’s NAT behavioral specifications—it doesn’t define traversal techniques but rather defines “how a proper NAT should behave.” Above that are hole-punching techniques (actual traversal methods), then STUN (address discovery tool) and TURN (relay fallback tool). ICE orchestrates these tools into a complete solution, and at the top is Tailscale’s DERP supplemental relay design.
STUN: Discovering Your Public Address
RFC 5389 (2008), authored by J. Rosenberg et al. STUN (Session Traversal Utilities for NAT) was originally “Simple Traversal of UDP through NAT,” renamed when its role changed from a complete solution to a tool—it’s no longer a standalone system for solving NAT traversal, but a tool callable by other protocols.
The core method is Binding: the client sends a Binding Request to the STUN server; the packet’s source address is rewritten by the NAT; the server returns the observed public source address (the reflexive transport address—“what address do you look like from outside the NAT”) in the XOR-MAPPED-ADDRESS attribute.
Simply put, STUN is like asking a friend “When I call you, what number shows up on your caller ID?"—that number is your public mapped address.
STUN is a tool, not a complete solution—it’s embedded into ICE, SIP Outbound, and other complete frameworks via “STUN usages.”
Important caveat: RFC 5389 explicitly states that classic STUN’s (RFC 3489) NAT type classification algorithm was “flawed”—many real NATs cannot be cleanly categorized.
TURN: Relay Fallback
RFC 5766 (2010), updated by RFC 8656. TURN (Traversal Using Relays around NAT) is a STUN extension (most messages are STUN format). When both peers are behind “poorly behaved” NATs, hole-punching fails, requiring a TURN relay server to forward packets—like a post office forwarding your mail.
The client creates an Allocation on the server—essentially “renting” a relay port—obtaining a relayed transport address. Peers send to this address, and the server forwards to the client.
Key mechanisms:
- Permissions: Controls which peers can communicate through this relay, preventing unauthorized access
- Channels: A more efficient data transmission method, using a 4-byte channel number instead of the full address information on each send, reducing overhead
- Allocation refresh: Relay allocations have a lifetime and must be refreshed periodically to prevent resource leaks
- Unique feature: A single relayed address can communicate with multiple peers simultaneously (designed for SIP forking—routing one call to multiple destinations)
Design philosophy: TURN servers incur high bandwidth costs (all traffic passes through them) and should only be used as a last resort when ICE cannot find a direct path.
ICE: Orchestrating Everything
RFC 8445 (2018), obsoletes RFC 5245. ICE (Interactive Connectivity Establishment) is a complete NAT traversal solution based on the offer/answer methodology (one side sends a connection offer, the other responds with an answer), orchestrating STUN and TURN.
flowchart TD
A["Gather candidates<br/>Host · SRFLX · Relay"] --> B["Priority sort<br/>Exchange via signaling"]
B --> C["STUN connectivity checks<br/>Test candidate pairs"]
C --> D{"Valid path<br/>found?"}
D -->|"Yes"| E["Nominate & switch<br/>to direct ✓"]
D -->|"No"| F["TURN/DERP<br/>relay fallback"]
style A fill:#e3f2fd,stroke:#2196F3
style E fill:#e8f5e9,stroke:#4CAF50
style F fill:#fce4ec,stroke:#f44336ICE core phases:
- Gather Candidates: Each side collects multiple candidate types—
- Host candidates: All local network interface addresses (e.g., 192.168.1.100)
- Server-Reflexive candidates (SRFLX): Public NAT mappings obtained via STUN
- Relayed candidates: Relay addresses obtained via TURN
- Peer-Reflexive candidates may also be discovered during runtime: addresses observed by the peer during connectivity checks
- Priority Sort & Exchange: Calculate priorities by formula (Host > SRFLX > Relay), exchange candidate information via a signaling channel (e.g., SDP—Session Description Protocol, a plaintext text format for describing connection parameters)
- Connectivity Checks: Test candidate pairs with STUN Binding requests—“Does this path work?"—using triggered-check, role conflict resolution (ICE-Controlling / ICE-Controlled, roles determining who leads nomination), and
USE-CANDIDATEnomination mechanisms - Nominate & Conclude: Select the valid pair, conclude ICE processing, release surplus candidates
ICE’s value: compatibility with all network topologies—it doesn’t assume any single technique works, but collects all possible paths and tests each, ultimately selecting the optimal one.
DERP: Tailscale’s Encrypted Relay Innovation
DERP (Designated Encrypted Relay for Packets) is Tailscale’s proprietary relay protocol, not an IETF RFC standard. The authoritative sources are Tailscale’s official documentation and source code.
Design highlights:
- Zero-knowledge forwarding: DERP never terminates or decrypts WireGuard encryption—it blindly forwards already-encrypted traffic. Tailscale private keys never leave the local device, so the DERP server cannot decrypt any traffic, even if compromised
- HTTPS (TCP 443): Port 443 is the standard HTTPS port, allowed by virtually all networks. Blocking DERP requires blocking 443, which simultaneously blocks all normal web access—nearly impossible to do without attracting attention
- Public key addressing: Uses curve25519 (an elliptic curve cryptographic algorithm providing 128-bit security) public keys as routing addresses—not relying on IP addresses; your public key is your “street address”
- Dual-stack: Full IPv4/IPv6 support, can bridge v4-only and v6-only networks
- Multi-region routing: The coordination server distributes a DERP Map (list of DERP servers), and clients select their home DERP (primary DERP node) by network latency
DERP also serves as a “side channel” for exchanging ip:port information to coordinate hole-punching timing—most connections first use DERP to exchange information, then upgrade to direct.
flowchart TD
CO["Coordination Server<br/>Distributes DERP Map"] --> DA["DERP Region A<br/>TCP:443"]
CO --> DB["DERP Region B<br/>TCP:443"]
NA["Node A<br/>behind hard NAT"] -->|"HTTPS encrypted<br/>forwarding"| DA
NB["Node B<br/>behind hard NAT"] -->|"HTTPS encrypted<br/>forwarding"| DB
style CO fill:#f3e5f5,stroke:#9C27B0
style DA fill:#fff3e0,stroke:#FF9800
style DB fill:#fff3e0,stroke:#FF9800
style NA fill:#e3f2fd,stroke:#2196F3
style NB fill:#e3f2fd,stroke:#2196F3The DERP Map is distributed by the coordination server. Clients select the nearest DERP node by latency. When direct UDP paths cannot be established (hard NAT, firewall blocking UDP), DERP forwards encrypted WireGuard packets over TCP 443. If a single node fails, clients switch to other nodes in the same region; if an entire region fails, they switch to the nearest region.
Control Plane and Data Plane: The Industry’s Architectural Consensus
Modern mesh VPNs universally adopt control plane / data plane separation. This is the most important architectural principle.
- Control Plane: Manages “who can connect to whom”—device registration, identity authentication, key distribution, policy enforcement (ACL—Access Control List, defining which nodes can intercommunicate), NAT traversal coordination. Never touches business data
- Data Plane: Manages “how actual data flows”—end-to-end encrypted tunnels between nodes, where data encryption/decryption/routing all happen locally on the node
flowchart TD
CS["Coordination Server<br/>Discovery · Keys · ACL · NAT Coord."]
CS -.->|"Metadata only"| A["Node A"]
CS -.->|"Metadata only"| B["Node B"]
A ===|"P2P encrypted direct<br/>WireGuard/Noise"| B
style CS fill:#fff3e0,stroke:#FF9800
style A fill:#e3f2fd,stroke:#2196F3
style B fill:#e3f2fd,stroke:#2196F3Dashed lines represent the control plane (metadata exchange: who’s online, what’s their public key, what’s their address); solid lines represent the data plane (end-to-end encrypted tunnels: actual business traffic). The coordination server can see metadata but not business traffic—even if the coordination server is breached, attackers cannot access communication content.
Architecture Practices of Representative Products
| Product | Control Plane | Data Plane | Relay Fallback |
|---|---|---|---|
| Tailscale | Official coordination server (closed source) | WireGuard (userspace) | DERP (HTTPS/TCP 443) |
| Nebula | Lighthouse (self-hosted) | Noise protocol (built-in) | Lighthouse configurable as relay |
| ZeroTier | Planet Root + Controller | Salsa20/Poly1305 (VL1) | Root forwarding + network relays |
| NetBird | Management + signal + relay | WireGuard | Self-hosted relay |
| Headscale | Self-hosted control plane (Tailscale OSS alternative) | Reuses Tailscale client | Built-in DERP + Peer relays |
| OpenZiti | Controller + Router | Built-in overlay encryption | Edge Router forwarding |
WireGuard is a modern VPN protocol aiming for simplicity (~4,000 lines of code, far smaller than IPSec’s hundreds of thousands of lines), speed, and security. It uses Curve25519 (key exchange), BLAKE2 (hashing), and ChaCha20-Poly1305 (encryption) as modern cryptographic primitives. Tailscale/NetBird/Netmaker all use WireGuard as their data plane foundation.
Noise Protocol Framework is a cryptographic framework for building secure transport protocols (WireGuard is also based on it). Nebula has its own Noise-based transport implementation rather than using WireGuard directly.
Tailscale: Benchmark of DERP’s Dual-Layer Design
Tailscale’s DERP serves a dual role: (1) Connection negotiation intermediary—most connections only use DERP to exchange information before upgrading to direct; (2) Fallback traffic relay when direct connection fails. Connections technically always start via DERP, then concurrently attempt direct hole-punching. On success, they seamlessly switch. Under typical conditions, direct connection success rate exceeds 90%.
The challenge is symmetric NAT (hard NAT) randomizing source port mappings, making P2P nearly impossible; multi-layer NAT, enterprise firewalls blocking UDP, and carrier-grade CGNAT trigger DERP fallback. Tailscale also sponsored FreeBSD PF firewall’s Endpoint-Independent Mapping (EIM) patch, converting pfSense/OPNsense devices from symmetric to cone NAT, improving hole-punching success rates.
Nebula: Decentralized PKI + Lighthouse
Nebula (open-sourced by Slack) is a mutually-authenticated P2P software-defined network based on the Noise protocol framework.
- Decentralized PKI/CA: Each network has its own Certificate Authority (CA), and certificates assert node IP, name, and group membership. PKI (Public Key Infrastructure) is a system for managing public keys using digital certificates; CA (Certificate Authority) is a trusted third party that issues certificates
- Lighthouse: A discovery node with an immutable IP, similar to a DNS server—answers “where is host X?” queries by returning the last known external endpoint. The Lighthouse does not forward traffic by default—it only performs discovery coordination. When hole-punching fails, the Lighthouse can be configured as a relay
NetBird: Fully Open-Source Tailscale Alternative
NetBird is a fully open-source + complete self-hosted coordination infrastructure solution, directly addressing Tailscale’s closed-source control plane pain point. Management service, signal server, relay routing are all self-hostable, suitable for GDPR/HIPAA/SOC 2 (EU data protection / US healthcare info / security audit standards) compliance-sensitive scenarios. The client (including iOS/Android) is also fully open-source.
Self-Hosting Friendliness Ranking
flowchart TD
A["Fully self-host friendly<br/>NetBird · Nebula<br/>OpenZiti · Headscale"] --> B["Partially self-host<br/>ZeroTier<br/>Netmaker"]
B --> C["Closed control plane<br/>Tailscale<br/>(needs Headscale)"]
style A fill:#e8f5e9,stroke:#4CAF50
style B fill:#fff3e0,stroke:#FF9800
style C fill:#fce4ec,stroke:#f44336NetBird is the most direct reference implementation due to its full-stack open-source nature with built-in signal server + relay. Headscale implements an open-source self-hosted alternative to Tailscale’s control plane, while the data plane still uses the official open-source client.
The New Paradigm of Decentralized Traversal
libp2p’s NAT traversal system draws from the ICE protocol but removes the dependency on centralized STUN/TURN servers, using distributed coordination instead. libp2p is a modular P2P networking stack developed by Protocol Labs, adopted by major P2P networks like IPFS and Ethereum, providing foundational tools for node discovery, connection establishment, stream multiplexing, and secure communication.
Three Core libp2p Modules
| Module | Function | ICE Equivalent |
|---|---|---|
| AutoNAT | Determines if a node is behind NAT. Requests other peers to dial back its address—success=public (internet-reachable), failure=private (behind NAT) | Similar to STUN |
| Identify | Exchanges info after connection, learning the external public IP:port as observed by the peer. Uses existing connections—no separate STUN infrastructure needed | Similar to decentralized STUN |
| Circuit Relay v2 | Provides lightweight relay for private peers. Requires reservation first, with strict limits on connections/duration/data volume, allowing most public nodes to serve as relays at minimal cost | Similar to TURN (signals only, not full traffic) |
DCUtR: Upgrading from Relay to Direct
DCUtR (Direct Connection Upgrade through Relay) upgrades an established relay connection to a direct connection. Its core idea is to use the existing relay channel to coordinate hole-punching timing, then both sides simultaneously initiate direct dials:
sequenceDiagram
participant I as Initiator
participant R as Relay Node
participant L as Listener
I->>R: Establish relay connection
R->>L: Forward connection request
Note over I,L: Exchange Connect messages (with non-relay addresses)
Note over I,L: Initiator measures RTT, sends Sync
Note over I,L: After half RTT, both dial simultaneously
I->>L: Direct dial (hole punch)
L->>I: Direct dial (hole punch)
Note over I,L: Direct connection established, relay releasedKey design: The protocol is extremely lightweight—normally only 2 network round trips, exchanging <500 bytes per direction. Both sides dial simultaneously, causing the 5-tuple (source IP, source port, destination IP, destination port, protocol) to match in their respective router state tables → hole-punching succeeds.
Large-Scale Measurement Data
The IMC 2026 paper (Trautwein et al.) conducted large-scale measurements of DCUtR on the IPFS production network, based on 4.4 million traversal attempts across 85,000+ networks in 167 countries:
Given that relay reservation and public address discovery succeed, the hole-punching stage achieves a conditional success rate of 70% ± 7.1%.
TCP and QUIC have statistically indistinguishable success rates (both ~70%), challenging the conventional wisdom that “UDP traversal is necessarily superior”—DCUtR’s RTT synchronization mechanism makes them equivalent.
97.6% of successful connections are established on the first attempt.
WebRTC: P2P in the Browser
WebRTC (Web Real-Time Communication) is a W3C-standardized browser P2P communication framework supporting audio/video calls and data transmission. WebRTC uses the ICE framework to systematically find optimal communication paths.
A notable design choice: the signaling protocol itself is not specified by WebRTC—it only defines how to transport audio/video and data; “how to exchange connection information” is left to the application layer (commonly via WebSocket—a protocol for full-duplex communication over a single TCP connection). Signaling exchange uses SDP (Session Description Protocol), a plaintext protocol containing reachable IP:port candidates, audio/video track counts, codec lists, encryption parameters, etc.
WebRTC’s DataChannel (RFC 8831) supports arbitrary P2P data transmission, built on SCTP over DTLS—SCTP (Stream Control Transmission Protocol) is a multiplexing transport protocol, and DTLS (Datagram TLS) is the UDP version of TLS providing encryption. This combination lets browsers directly transmit arbitrary data between each other.
Classic DHT Algorithms at a Glance
DHT (Distributed Hash Table) is the foundation for node discovery and data location in P2P networks—like a decentralized dictionary with no central server, where each node stores part of the data and cooperatively locates any key. Three classic algorithms laid the routing architecture for modern P2P systems.
Kademlia (2002)
A DHT based on the XOR metric, designed by Petar Maymounkov & David Mazières. Node IDs and keys share the same 160-bit identifier space; distance is defined as their XOR (exclusive OR: same bits = 0, different bits = 1). XOR distance satisfies the mathematical properties of a metric space (reflexivity, symmetry, triangle inequality), enabling routing to efficiently converge.
Core innovations:
- Nodes can learn routing information from received queries—unlike Chord, every Kademlia query simultaneously updates the routing table
- Non-rigid routing table—queries can go to any node within a range, allowing latency-based routing and parallel asynchronous queries
- k-bucket structure: Each node maintains a set of “buckets,” each covering a distance interval and storing the k nearest nodes in that interval. This structure naturally adapts to node churn (joining/leaving)
Widely adopted by eMule (Kad Network), Ethereum node discovery protocol, and BitTorrent Mainline DHT.
Chord (2001, SIGCOMM)
A DHT based on consistent hashing, designed by Ion Stoica et al. Consistent hashing is a special key distribution strategy that minimizes data redistribution when nodes join/leave (rather than global rehashing), a classic technique for distributed system load balancing.
Core is the finger table: Each node maintains O(log N) routing entries pointing to nodes at specific distances on the identifier ring. Communication overhead and state grow logarithmically with node count—1 million nodes need only ~20 entries—extremely efficient.
Pastry (2001, Middleware)
A hybrid routing scheme combining numerical routing with prefix matching, designed by Rowstron & Druschel. Node IDs are organized by string prefix: each routing hop increases the common prefix with the destination by one digit, like postal codes progressively narrowing location. Nodes maintain a routing table + leaf set (numerically nearest nodes) + neighborhood set (network-latency-nearest nodes), with O(log N) hops.
Engineering Wisdom from Relay and Session Resilience
Mosh: The Elegant Paradigm of Stateless Roaming
Mosh (Mobile Shell) is a remote terminal tool developed by MIT’s Keith Winstein & Hari Balakrishnan, solving the problem of SSH frequently disconnecting on mobile networks. Mosh is built on the State Synchronization Protocol (SSP)—a new protocol running over UDP that securely synchronizes terminal state between client and server.
Its core innovation is stateless roaming:
Whenever the server receives an authenticated packet from the client with a sequence number higher than all previously received packets, that packet’s source IP address becomes the new target for the server’s outgoing packets.
This means when the client’s IP address changes (e.g., switching from WiFi to 4G, or cell tower handoff), no reconnection is needed—the server automatically tracks the new address. The client doesn’t even need to know its public IP has changed. This approach provides direct inspiration for “session migration / seamless continuation after IP drift” in P2P relay scenarios.
Mosh also has a Local Echo mechanism: the client predicts the effect of keystrokes and speculatively displays them, without waiting for server confirmation → typing remains responsive on poor networks.
Tor / I2P: Layered Relay Anonymity Reference
Tor (The Onion Router) core mechanism is Circuit anonymity. Tor has four types of relay nodes:
- Guard/Entry Relay: Knows the client IP but not the destination
- Middle Relay: Only knows the previous and next hop IPs
- Exit Relay: Knows the destination but not the client
- Bridge Relay: Used to bypass Tor network blocking
The structural characteristic is “only predecessor/successor knowledge”: each relay only knows the previous and next hop IPs. As long as guard and exit don’t collude, anonymity holds.
I2P (Invisible Internet Project) uses Garlic Routing and unidirectional tunnels—separating inbound and outbound tunnels, each carrying traffic in only one direction. A complete message exchange requires 4 tunnels.
While anonymity isn’t the primary goal of P2P signaling, Tor/I2P’s “zero-knowledge relay” concept echoes DERP’s “don’t decrypt, just forward” design philosophy.
frp / ngrok: Reverse Proxy Engineering Practices
frp (Fast Reverse Proxy) is an open-source internal network penetration tool written in Go. frp’s “control channel + data tunnel multiplexing” separation design is worth studying: frpc (client, deployed on the internal network) proactively connects outbound to frps (server, deployed on a public VPS) at bind_port to establish a persistent TCP connection (control channel); frpc tells frps what services it can proxy; when external traffic accesses frps’s public port, frps notifies frpc via the control channel, and data is forwarded via tunnel multiplexing.
This aligns with the P2P “signaling channel + data channel separation” architectural principle—control signaling takes one path, business data takes another.
ngrok provides similar functionality as a SaaS service with one-click startup, integrated authentication and monitoring. bore is a lightweight Rust/Tokio alternative. All three share the centralized relay model (different from the P2P direct connection goal), but their reverse-connection engineering practice for bypassing inbound restrictions can be directly borrowed.
Eight Engineering Takeaways for Self-Building Signaling & Relay
Based on this comprehensive survey, eight core conclusions for building your own P2P signaling and relay server:
Signaling Channel is a Prerequisite
All hole-punching techniques depend on an independent “side channel” to exchange candidate addresses and coordinate hole-punching timing. Tailscale uses its coordination server + DERP as this channel; WebRTC requires you to bring your own signaling channel. Project design should first clarify the signaling server’s responsibility boundaries—it only coordinates, never touches business traffic.
Layered Fallback is the Standard Paradigm
Industry consensus: STUN reflexive address discovery → UDP hole punching → TURN/DERP relay as progressive fallback. ICE standardizes this flow; DERP is Tailscale’s HTTPS/TCP relay supplement beyond TURN (harder to block). Don’t expect one technique to solve all scenarios—always have a fallback.
Symmetric NAT is the Primary Obstacle
Endpoint-Dependent Mapping (≈ Symmetric) NATs cannot be hole-punched conventionally—they require relay or port prediction. CGNAT (Carrier-Grade NAT) has many users sharing few public IPs, further exacerbating this problem. BEHAVE (RFC 4787 REQ-1) specifically requires NATs to have endpoint-independent mapping to eliminate this obstacle.
UDP as the Recommended Base Protocol
TCP hole-punching is feasible but fragile (requires simultaneous-open, ~64% success rate). UDP is recommended as the base protocol. QUIC over UDP is the modern choice—gaining reliable transport while avoiding TCP hole-punching difficulties.
Relay Servers Should Be “Zero-Knowledge”
DERP’s design lessons: relay servers should not decrypt—only forward encrypted packets; use port 443 for reachability; support dual-stack and multi-region routing; double as discovery/coordination channels. This way, even if the relay server is breached or compelled to hand over data, it cannot reveal communication content.
Control/Data Plane Decoupling
Tailscale’s coordination server, Nebula’s Lighthouse, ZeroTier’s root+controller, OpenZiti’s Controller—all only handle coordination/discovery/keys/policy, never business traffic. This is the architectural principle self-built systems should follow—the coordination server can go down; as long as nodes have established direct connections, communication continues unaffected.
Session Resilience Design
Mosh’s stateless roaming (sequence-number-driven IP migration) provides an elegant paradigm for relay-scenario reconnection, worth implementing in self-built relays. When a user’s IP drifts (mobile network switch, WiFi reconnect), the relay server automatically tracks the new address without rebuilding the session.
Self-Hosting Technology Selection Priority
NetBird / Nebula / OpenZiti / Headscale > ZeroTier > Tailscale (closed control plane). NetBird is the most direct reference implementation due to its full-stack open-source nature with built-in signal server + relay.
References
RFC Standards
| RFC | Title | Description |
|---|---|---|
| RFC 5389 | STUN | Session Traversal Utilities for NAT |
| RFC 5766 | TURN | Traversal Using Relays around NAT |
| RFC 8656 | TURN (updated) | IPv6/IPv4 support |
| RFC 8445 | ICE | Interactive Connectivity Establishment |
| RFC 3489 | Classic STUN | NAT type classification (obsoleted) |
| RFC 5128 | P2P across NAT | Documents all known traversal methods |
| RFC 4787 | NAT UDP Behavioral Requirements | BCP 127 |
| RFC 5382 | NAT TCP Behavioral Requirements | BCP 142 |
| RFC 8831 | WebRTC Data Channels | P2P data channels |
Academic Papers
| # | Paper | Authors | Year/Venue |
|---|---|---|---|
| 1 | Peer-to-Peer Communication Across NATs | Ford, Srisuresh, Kegel | USENIX ATC 2005 |
| 2 | Kademlia: A P2P Information System Based on XOR | Maymounkov, Mazières | 2002 |
| 3 | Chord: A Scalable P2P Lookup Service | Stoica et al. | SIGCOMM 2001 |
| 4 | Pastry: Scalable Decentralized Object Location | Rowstron, Druschel | Middleware 2001 |
| 5 | Large-Scale Measurement of NAT Traversal (DCUtR) | Trautwein et al. | IMC 2026 |
| 6 | WireGuard Formal Proof | INRIA | 2019 |
| 7 | Mosh: Interactive Remote Shell for Mobile | Winstein, Balakrishnan | MIT 2012 |
Product Documentation & Source Code
| Product | GitHub | License | Docs |
|---|---|---|---|
| Tailscale | tailscale/tailscale | BSD-3-Clause | tailscale.com |
| Nebula | slackhq/nebula | MIT | defined.net |
| ZeroTier | zerotier/ZeroTierOne | MPL 2.0 | docs.zerotier.com |
| NetBird | netbirdio/netbird | BSD-3-Clause + AGPLv3 (server) | netbird.io |
| Headscale | juanfont/headscale | BSD-3-Clause | headscale.net |
| OpenZiti | openziti/ziti | Apache 2.0 | openziti.io |
| Tinc | gsliepen/tinc | GPL-2.0 | tinc-vpn.org |
| libp2p | libp2p/go-libp2p | MIT | docs.libp2p.io |
License notes: ZeroTier’s core code uses MPL 2.0 (Mozilla Public License), not the BSL 1.1 often cited in earlier literature. NetBird uses a dual-license model: client and core components are BSD-3-Clause, while management/signal/relay server components are AGPLv3.