Survey: P2P Signaling and Relay Server Technologies

July 3, 2026 Network P2P, NAT Traversal, STUN, TURN, ICE, WireGuard, Mesh VPN, DERP Network Development Practice 4837 words 23 min read

🔊

Building a self-hosted P2P signaling and relay server is the core infrastructure for cross-network connectivity, remote access, and mesh VPN scenarios. This article systematically surveys the complete technology landscape across three dimensions: protocol standards (STUN/TURN/ICE/BEHAVE), mainstream products (Tailscale, Nebula, NetBird, ZeroTier, Headscale, OpenZiti, etc.), and frameworks & algorithms (libp2p, WebRTC, Kademlia DHT).

All technical descriptions are verified against primary sources—RFC originals, academic papers, and official documentation. Key statistics include citations.

Core Concept Quick Reference

Before diving into technical details, here’s a quick reference table for the core concepts covered in this article. Each concept is explored in detail in subsequent sections.

Concept	One-Line Explanation
NAT	Network Address Translation—maps private internal IPs to a public IP, letting multiple devices share one internet exit point
STUN	Protocol that lets a device discover its own public address (“Who am I to the outside world?”)
TURN	Protocol for relaying traffic through a server when direct connection fails (“Help me pass messages”)
ICE	Complete solution orchestrating STUN + TURN, automatically finding the optimal connection path (“The conductor”)
DERP	Tailscale’s encrypted relay protocol over HTTPS port 443, extremely difficult to block
WireGuard	Modern VPN protocol (~4,000 lines of code) using Curve25519 + ChaCha20-Poly1305
Noise Protocol	Cryptographic protocol framework for building secure transport channels; Nebula is built on this
Signaling Channel	Control channel independent of business traffic, used to exchange addresses and coordinate hole-punching (“side channel”)
Hole Punching	Both sides send packets to each other simultaneously, opening mappings in their respective NATs to establish direct connection
Control / Data Plane	Control plane manages “who can connect to whom”; data plane manages “how actual data flows”—kept separate
DHT	Distributed Hash Table—decentralized P2P node discovery and data location mechanism with no central server
Mesh VPN	Mesh-topology VPN where nodes interconnect directly rather than through a central hub, eliminating single points of failure
Lighthouse	Nebula’s discovery node, similar to DNS—answers “where is node X?”
PKI / CA	Public Key Infrastructure / Certificate Authority—used for identity authentication and encryption
CGNAT	Carrier-Grade NAT—ISPs have many users share few public IPs, a common obstacle to hole-punching

NAT Traversal: The First Hurdle for Direct Connectivity

When two devices behind different NATs want to establish a P2P direct connection, the biggest obstacle is NAT. NAT (Network Address Translation) is a technology that maps private internal IP addresses to public IP addresses—home routers and ISP gateways all do this. It lets multiple devices share one public IP exit, but also makes it impossible for external hosts to proactively initiate connections to internal devices. This is the core contradiction that all traversal techniques aim to solve.

Understanding NAT behavior models is the theoretical foundation for all traversal techniques.

The Two-Dimensional NAT Behavior Model

RFC 4787 (BCP 127) describes NAT behavior using two independent dimensions, replacing the earlier coarse Cone/Symmetric classification:

Mapping Behavior—Does the NAT reuse the same external mapping for the same internal (IP, port) when communicating with different destinations? In other words, when you access website A and website B from home, does your router assign the same external port?
- Endpoint-Independent Mapping: Always reuses the same mapping (≈ Full Cone’s mapping dimension). The external port is the same regardless of destination—this is the best case, hole-punching is easiest
- Address-Dependent Mapping: Different destination IP = different mapping. External port differs when accessing different IPs
- Address and Port-Dependent Mapping: Different destination IP or port = different mapping (≈ Symmetric). This is the worst case—hole-punching is nearly impossible
Filtering Behavior—When does the NAT allow inbound packets to a mapping? That is, what conditions must an external packet meet to be forwarded to the internal host?
- Endpoint-Independent Filtering: Any external host can send—most permissive
- Address-Dependent Filtering: Only if the internal host previously sent to that IP—only replies to those you’ve contacted
- Address and Port-Dependent Filtering: Only if the internal host previously sent to that (IP, port)—most strict

RFC 4787 REQ-1 explicitly requires: NATs MUST have Endpoint-Independent Mapping, otherwise nearly all hole-punching techniques fail, forcing reliance on relays.

Four Classic NAT Types

Although RFC 3489’s classification was obsoleted by RFC 5389 (for “causing significant confusion”), its four-type model remains the best introduction to understanding hole-punching failures:

Type	Mapping Characteristic	External Reachability	Hole Punching
Full Cone	Same internal `(IP,port)` → same external `(IP,port)`	Any external host	✅ Easiest
Restricted Cone	Same as above	Only IPs previously contacted	✅ Needs coordination
Port Restricted Cone	Same as above	Only `(IP,port)` pairs previously contacted	✅ Needs precise coordination
Symmetric	Different destinations → different mappings	Only the specific source that received packets	❌ Nearly impossible

Symmetric NAT is the biggest obstacle to P2P direct connections—it assigns different external ports for each destination, making the public address exchanged via rendezvous servers invalid when used for a different peer.

A Rendezvous Server is a public intermediary server where both peers register their address information; it relays each peer’s address to the other, preparing for hole-punching.

UDP Hole Punching

UDP Hole Punching is the simplest and most robust NAT traversal technique. Bryan Ford et al. systematically documented and measured this technique in their seminal USENIX ‘05 paper:

mermaid
sequenceDiagram
    participant A as Node A
    participant S as Rendezvous Server
    participant B as Node B
    A->>S: Register (report own address)
    B->>S: Register (report own address)
    S-->>A: Return B's public mapping
    S-->>B: Return A's public mapping
    Note over A,B: Both send packets to each other simultaneously
    A->>B: UDP packet (opens hole in local NAT)
    B->>A: UDP packet (opens hole in local NAT)
    Note over A,B: NAT mappings established, bidirectional comms successful

Core flow: Both peers register with a public rendezvous server → server tells each peer the other’s public reflexive address → both simultaneously send packets from local sockets to the other’s public address → outbound packets create mappings (“holes”) in their respective NATs → subsequent packets from the peer are treated as “responses” and passed through.

Measured data (Ford et al., USENIX ATC 2005): approximately 82% of NATs support UDP hole punching, approximately 64% support TCP hole punching. The paper also first reliably demonstrated P2P TCP stream establishment via simultaneous-open.

TCP Hole Punching: Feasible but Fragile

TCP hole punching exploits the simultaneous-open mode defined in RFC 793—in a normal TCP connection, one side sends SYN (active open) and the other replies with SYN-ACK (passive open); with simultaneous-open, both sides send SYN almost simultaneously, SYNs cross in the network, each responds with SYN-ACK, and the connection establishes. This requires NAT to support endpoint-independent mapping and correctly handle simultaneous-open (RFC 5382 REQ-2).

It’s more fragile than UDP because: the TCP state machine is complex (needing to correctly handle various exception combinations in the SYN/SYN-ACK/ACK three-way handshake), and some NATs drop inbound SYN or incorrectly translate outbound SYN-ACK. The industry recommends UDP as the base protocol (e.g., QUIC over UDP).

QUIC is a transport protocol developed by Google and used by HTTP/3. It runs over UDP with built-in encryption (TLS 1.3), solving TCP’s head-of-line blocking problem. Using UDP as the carrier provides reliable transport similar to TCP while bypassing TCP hole-punching difficulties.

The Standardized Traversal Protocol Stack

The NAT traversal technology stack is layered bottom-up, with each layer solving different problems:

mermaid
flowchart TD
    A["BEHAVE Specs<br/>Define NAT behavior"] --> B["Hole Punching<br/>UDP/TCP Techniques"]
    B --> C["STUN<br/>Discover reflexive addr"]
    C --> D["TURN<br/>Relay fallback"]
    C --> E["ICE<br/>Orchestrates STUN+TURN"]
    E --> F["DERP<br/>Encrypted relay supplement"]
    style A fill:#fff3e0,stroke:#FF9800
    style B fill:#e3f2fd,stroke:#2196F3
    style C fill:#e3f2fd,stroke:#2196F3
    style D fill:#e3f2fd,stroke:#2196F3
    style E fill:#f3e5f5,stroke:#9C27B0
    style F fill:#e8f5e9,stroke:#4CAF50

The bottom layer is BEHAVE working group’s NAT behavioral specifications—it doesn’t define traversal techniques but rather defines “how a proper NAT should behave.” Above that are hole-punching techniques (actual traversal methods), then STUN (address discovery tool) and TURN (relay fallback tool). ICE orchestrates these tools into a complete solution, and at the top is Tailscale’s DERP supplemental relay design.

STUN: Discovering Your Public Address

RFC 5389 (2008), authored by J. Rosenberg et al. STUN (Session Traversal Utilities for NAT) was originally “Simple Traversal of UDP through NAT,” renamed when its role changed from a complete solution to a tool—it’s no longer a standalone system for solving NAT traversal, but a tool callable by other protocols.

The core method is Binding: the client sends a Binding Request to the STUN server; the packet’s source address is rewritten by the NAT; the server returns the observed public source address (the reflexive transport address—“what address do you look like from outside the NAT”) in the XOR-MAPPED-ADDRESS attribute.

Simply put, STUN is like asking a friend “When I call you, what number shows up on your caller ID?"—that number is your public mapped address.

STUN is a tool, not a complete solution—it’s embedded into ICE, SIP Outbound, and other complete frameworks via “STUN usages.”

Important caveat: RFC 5389 explicitly states that classic STUN’s (RFC 3489) NAT type classification algorithm was “flawed”—many real NATs cannot be cleanly categorized.

TURN: Relay Fallback

RFC 5766 (2010), updated by RFC 8656. TURN (Traversal Using Relays around NAT) is a STUN extension (most messages are STUN format). When both peers are behind “poorly behaved” NATs, hole-punching fails, requiring a TURN relay server to forward packets—like a post office forwarding your mail.

The client creates an Allocation on the server—essentially “renting” a relay port—obtaining a relayed transport address. Peers send to this address, and the server forwards to the client.

Key mechanisms:

Permissions: Controls which peers can communicate through this relay, preventing unauthorized access
Channels: A more efficient data transmission method, using a 4-byte channel number instead of the full address information on each send, reducing overhead
Allocation refresh: Relay allocations have a lifetime and must be refreshed periodically to prevent resource leaks
Unique feature: A single relayed address can communicate with multiple peers simultaneously (designed for SIP forking—routing one call to multiple destinations)

Design philosophy: TURN servers incur high bandwidth costs (all traffic passes through them) and should only be used as a last resort when ICE cannot find a direct path.

ICE: Orchestrating Everything

RFC 8445 (2018), obsoletes RFC 5245. ICE (Interactive Connectivity Establishment) is a complete NAT traversal solution based on the offer/answer methodology (one side sends a connection offer, the other responds with an answer), orchestrating STUN and TURN.

mermaid
flowchart TD
    A["Gather candidates<br/>Host · SRFLX · Relay"] --> B["Priority sort<br/>Exchange via signaling"]
    B --> C["STUN connectivity checks<br/>Test candidate pairs"]
    C --> D{"Valid path<br/>found?"}
    D -->|"Yes"| E["Nominate & switch<br/>to direct ✓"]
    D -->|"No"| F["TURN/DERP<br/>relay fallback"]
    style A fill:#e3f2fd,stroke:#2196F3
    style E fill:#e8f5e9,stroke:#4CAF50
    style F fill:#fce4ec,stroke:#f44336

ICE core phases:

Gather Candidates: Each side collects multiple candidate types—
- Host candidates: All local network interface addresses (e.g., 192.168.1.100)
- Server-Reflexive candidates (SRFLX): Public NAT mappings obtained via STUN
- Relayed candidates: Relay addresses obtained via TURN
- Peer-Reflexive candidates may also be discovered during runtime: addresses observed by the peer during connectivity checks
Priority Sort & Exchange: Calculate priorities by formula (Host > SRFLX > Relay), exchange candidate information via a signaling channel (e.g., SDP—Session Description Protocol, a plaintext text format for describing connection parameters)
Connectivity Checks: Test candidate pairs with STUN Binding requests—“Does this path work?"—using triggered-check, role conflict resolution (ICE-Controlling / ICE-Controlled, roles determining who leads nomination), and USE-CANDIDATE nomination mechanisms
Nominate & Conclude: Select the valid pair, conclude ICE processing, release surplus candidates

ICE’s value: compatibility with all network topologies—it doesn’t assume any single technique works, but collects all possible paths and tests each, ultimately selecting the optimal one.

DERP: Tailscale’s Encrypted Relay Innovation

DERP (Designated Encrypted Relay for Packets) is Tailscale’s proprietary relay protocol, not an IETF RFC standard. The authoritative sources are Tailscale’s official documentation and source code.

Design highlights:

Zero-knowledge forwarding: DERP never terminates or decrypts WireGuard encryption—it blindly forwards already-encrypted traffic. Tailscale private keys never leave the local device, so the DERP server cannot decrypt any traffic, even if compromised
HTTPS (TCP 443): Port 443 is the standard HTTPS port, allowed by virtually all networks. Blocking DERP requires blocking 443, which simultaneously blocks all normal web access—nearly impossible to do without attracting attention
Public key addressing: Uses curve25519 (an elliptic curve cryptographic algorithm providing 128-bit security) public keys as routing addresses—not relying on IP addresses; your public key is your “street address”
Dual-stack: Full IPv4/IPv6 support, can bridge v4-only and v6-only networks
Multi-region routing: The coordination server distributes a DERP Map (list of DERP servers), and clients select their home DERP (primary DERP node) by network latency

DERP also serves as a “side channel” for exchanging ip:port information to coordinate hole-punching timing—most connections first use DERP to exchange information, then upgrade to direct.

mermaid
flowchart TD
    CO["Coordination Server<br/>Distributes DERP Map"] --> DA["DERP Region A<br/>TCP:443"]
    CO --> DB["DERP Region B<br/>TCP:443"]
    NA["Node A<br/>behind hard NAT"] -->|"HTTPS encrypted<br/>forwarding"| DA
    NB["Node B<br/>behind hard NAT"] -->|"HTTPS encrypted<br/>forwarding"| DB
    style CO fill:#f3e5f5,stroke:#9C27B0
    style DA fill:#fff3e0,stroke:#FF9800
    style DB fill:#fff3e0,stroke:#FF9800
    style NA fill:#e3f2fd,stroke:#2196F3
    style NB fill:#e3f2fd,stroke:#2196F3

The DERP Map is distributed by the coordination server. Clients select the nearest DERP node by latency. When direct UDP paths cannot be established (hard NAT, firewall blocking UDP), DERP forwards encrypted WireGuard packets over TCP 443. If a single node fails, clients switch to other nodes in the same region; if an entire region fails, they switch to the nearest region.

Control Plane and Data Plane: The Industry’s Architectural Consensus

Modern mesh VPNs universally adopt control plane / data plane separation. This is the most important architectural principle.

Control Plane: Manages “who can connect to whom”—device registration, identity authentication, key distribution, policy enforcement (ACL—Access Control List, defining which nodes can intercommunicate), NAT traversal coordination. Never touches business data
Data Plane: Manages “how actual data flows”—end-to-end encrypted tunnels between nodes, where data encryption/decryption/routing all happen locally on the node

mermaid
flowchart TD
    CS["Coordination Server<br/>Discovery · Keys · ACL · NAT Coord."]
    CS -.->|"Metadata only"| A["Node A"]
    CS -.->|"Metadata only"| B["Node B"]
    A ===|"P2P encrypted direct<br/>WireGuard/Noise"| B
    style CS fill:#fff3e0,stroke:#FF9800
    style A fill:#e3f2fd,stroke:#2196F3
    style B fill:#e3f2fd,stroke:#2196F3

Dashed lines represent the control plane (metadata exchange: who’s online, what’s their public key, what’s their address); solid lines represent the data plane (end-to-end encrypted tunnels: actual business traffic). The coordination server can see metadata but not business traffic—even if the coordination server is breached, attackers cannot access communication content.

Architecture Practices of Representative Products

Product	Control Plane	Data Plane	Relay Fallback
Tailscale	Official coordination server (closed source)	WireGuard (userspace)	DERP (HTTPS/TCP 443)
Nebula	Lighthouse (self-hosted)	Noise protocol (built-in)	Lighthouse configurable as relay
ZeroTier	Planet Root + Controller	Salsa20/Poly1305 (VL1)	Root forwarding + network relays
NetBird	Management + signal + relay	WireGuard	Self-hosted relay
Headscale	Self-hosted control plane (Tailscale OSS alternative)	Reuses Tailscale client	Built-in DERP + Peer relays
OpenZiti	Controller + Router	Built-in overlay encryption	Edge Router forwarding

WireGuard is a modern VPN protocol aiming for simplicity (~4,000 lines of code, far smaller than IPSec’s hundreds of thousands of lines), speed, and security. It uses Curve25519 (key exchange), BLAKE2 (hashing), and ChaCha20-Poly1305 (encryption) as modern cryptographic primitives. Tailscale/NetBird/Netmaker all use WireGuard as their data plane foundation.
Noise Protocol Framework is a cryptographic framework for building secure transport protocols (WireGuard is also based on it). Nebula has its own Noise-based transport implementation rather than using WireGuard directly.

Tailscale: Benchmark of DERP’s Dual-Layer Design

Tailscale’s DERP serves a dual role: (1) Connection negotiation intermediary—most connections only use DERP to exchange information before upgrading to direct; (2) Fallback traffic relay when direct connection fails. Connections technically always start via DERP, then concurrently attempt direct hole-punching. On success, they seamlessly switch. Under typical conditions, direct connection success rate exceeds 90%.

The challenge is symmetric NAT (hard NAT) randomizing source port mappings, making P2P nearly impossible; multi-layer NAT, enterprise firewalls blocking UDP, and carrier-grade CGNAT trigger DERP fallback. Tailscale also sponsored FreeBSD PF firewall’s Endpoint-Independent Mapping (EIM) patch, converting pfSense/OPNsense devices from symmetric to cone NAT, improving hole-punching success rates.

Nebula: Decentralized PKI + Lighthouse

Nebula (open-sourced by Slack) is a mutually-authenticated P2P software-defined network based on the Noise protocol framework.

Decentralized PKI/CA: Each network has its own Certificate Authority (CA), and certificates assert node IP, name, and group membership. PKI (Public Key Infrastructure) is a system for managing public keys using digital certificates; CA (Certificate Authority) is a trusted third party that issues certificates
Lighthouse: A discovery node with an immutable IP, similar to a DNS server—answers “where is host X?” queries by returning the last known external endpoint. The Lighthouse does not forward traffic by default—it only performs discovery coordination. When hole-punching fails, the Lighthouse can be configured as a relay

NetBird: Fully Open-Source Tailscale Alternative

NetBird is a fully open-source + complete self-hosted coordination infrastructure solution, directly addressing Tailscale’s closed-source control plane pain point. Management service, signal server, relay routing are all self-hostable, suitable for GDPR/HIPAA/SOC 2 (EU data protection / US healthcare info / security audit standards) compliance-sensitive scenarios. The client (including iOS/Android) is also fully open-source.

Self-Hosting Friendliness Ranking

mermaid
flowchart TD
    A["Fully self-host friendly<br/>NetBird · Nebula<br/>OpenZiti · Headscale"] --> B["Partially self-host<br/>ZeroTier<br/>Netmaker"]
    B --> C["Closed control plane<br/>Tailscale<br/>(needs Headscale)"]
    style A fill:#e8f5e9,stroke:#4CAF50
    style B fill:#fff3e0,stroke:#FF9800
    style C fill:#fce4ec,stroke:#f44336

NetBird is the most direct reference implementation due to its full-stack open-source nature with built-in signal server + relay. Headscale implements an open-source self-hosted alternative to Tailscale’s control plane, while the data plane still uses the official open-source client.

The New Paradigm of Decentralized Traversal

libp2p’s NAT traversal system draws from the ICE protocol but removes the dependency on centralized STUN/TURN servers, using distributed coordination instead. libp2p is a modular P2P networking stack developed by Protocol Labs, adopted by major P2P networks like IPFS and Ethereum, providing foundational tools for node discovery, connection establishment, stream multiplexing, and secure communication.

Three Core libp2p Modules

Module	Function	ICE Equivalent
AutoNAT	Determines if a node is behind NAT. Requests other peers to dial back its address—success=public (internet-reachable), failure=private (behind NAT)	Similar to STUN
Identify	Exchanges info after connection, learning the external public `IP:port` as observed by the peer. Uses existing connections—no separate STUN infrastructure needed	Similar to decentralized STUN
Circuit Relay v2	Provides lightweight relay for private peers. Requires reservation first, with strict limits on connections/duration/data volume, allowing most public nodes to serve as relays at minimal cost	Similar to TURN (signals only, not full traffic)

DCUtR: Upgrading from Relay to Direct

DCUtR (Direct Connection Upgrade through Relay) upgrades an established relay connection to a direct connection. Its core idea is to use the existing relay channel to coordinate hole-punching timing, then both sides simultaneously initiate direct dials:

mermaid
sequenceDiagram
    participant I as Initiator
    participant R as Relay Node
    participant L as Listener
    I->>R: Establish relay connection
    R->>L: Forward connection request
    Note over I,L: Exchange Connect messages (with non-relay addresses)
    Note over I,L: Initiator measures RTT, sends Sync
    Note over I,L: After half RTT, both dial simultaneously
    I->>L: Direct dial (hole punch)
    L->>I: Direct dial (hole punch)
    Note over I,L: Direct connection established, relay released

Key design: The protocol is extremely lightweight—normally only 2 network round trips, exchanging <500 bytes per direction. Both sides dial simultaneously, causing the 5-tuple (source IP, source port, destination IP, destination port, protocol) to match in their respective router state tables → hole-punching succeeds.

Large-Scale Measurement Data

The IMC 2026 paper (Trautwein et al.) conducted large-scale measurements of DCUtR on the IPFS production network, based on 4.4 million traversal attempts across 85,000+ networks in 167 countries:

Given that relay reservation and public address discovery succeed, the hole-punching stage achieves a conditional success rate of 70% ± 7.1%.
TCP and QUIC have statistically indistinguishable success rates (both ~70%), challenging the conventional wisdom that “UDP traversal is necessarily superior”—DCUtR’s RTT synchronization mechanism makes them equivalent.
97.6% of successful connections are established on the first attempt.

WebRTC: P2P in the Browser

WebRTC (Web Real-Time Communication) is a W3C-standardized browser P2P communication framework supporting audio/video calls and data transmission. WebRTC uses the ICE framework to systematically find optimal communication paths.

A notable design choice: the signaling protocol itself is not specified by WebRTC—it only defines how to transport audio/video and data; “how to exchange connection information” is left to the application layer (commonly via WebSocket—a protocol for full-duplex communication over a single TCP connection). Signaling exchange uses SDP (Session Description Protocol), a plaintext protocol containing reachable IP:port candidates, audio/video track counts, codec lists, encryption parameters, etc.

WebRTC’s DataChannel (RFC 8831) supports arbitrary P2P data transmission, built on SCTP over DTLS—SCTP (Stream Control Transmission Protocol) is a multiplexing transport protocol, and DTLS (Datagram TLS) is the UDP version of TLS providing encryption. This combination lets browsers directly transmit arbitrary data between each other.

Classic DHT Algorithms at a Glance

DHT (Distributed Hash Table) is the foundation for node discovery and data location in P2P networks—like a decentralized dictionary with no central server, where each node stores part of the data and cooperatively locates any key. Three classic algorithms laid the routing architecture for modern P2P systems.

Kademlia (2002)

A DHT based on the XOR metric, designed by Petar Maymounkov & David Mazières. Node IDs and keys share the same 160-bit identifier space; distance is defined as their XOR (exclusive OR: same bits = 0, different bits = 1). XOR distance satisfies the mathematical properties of a metric space (reflexivity, symmetry, triangle inequality), enabling routing to efficiently converge.

Core innovations:

Nodes can learn routing information from received queries—unlike Chord, every Kademlia query simultaneously updates the routing table
Non-rigid routing table—queries can go to any node within a range, allowing latency-based routing and parallel asynchronous queries
k-bucket structure: Each node maintains a set of “buckets,” each covering a distance interval and storing the k nearest nodes in that interval. This structure naturally adapts to node churn (joining/leaving)

Widely adopted by eMule (Kad Network), Ethereum node discovery protocol, and BitTorrent Mainline DHT.

Chord (2001, SIGCOMM)

A DHT based on consistent hashing, designed by Ion Stoica et al. Consistent hashing is a special key distribution strategy that minimizes data redistribution when nodes join/leave (rather than global rehashing), a classic technique for distributed system load balancing.

Core is the finger table: Each node maintains O(log N) routing entries pointing to nodes at specific distances on the identifier ring. Communication overhead and state grow logarithmically with node count—1 million nodes need only ~20 entries—extremely efficient.

Pastry (2001, Middleware)

A hybrid routing scheme combining numerical routing with prefix matching, designed by Rowstron & Druschel. Node IDs are organized by string prefix: each routing hop increases the common prefix with the destination by one digit, like postal codes progressively narrowing location. Nodes maintain a routing table + leaf set (numerically nearest nodes) + neighborhood set (network-latency-nearest nodes), with O(log N) hops.

Engineering Wisdom from Relay and Session Resilience

Mosh: The Elegant Paradigm of Stateless Roaming

Mosh (Mobile Shell) is a remote terminal tool developed by MIT’s Keith Winstein & Hari Balakrishnan, solving the problem of SSH frequently disconnecting on mobile networks. Mosh is built on the State Synchronization Protocol (SSP)—a new protocol running over UDP that securely synchronizes terminal state between client and server.

Its core innovation is stateless roaming:

Whenever the server receives an authenticated packet from the client with a sequence number higher than all previously received packets, that packet’s source IP address becomes the new target for the server’s outgoing packets.

This means when the client’s IP address changes (e.g., switching from WiFi to 4G, or cell tower handoff), no reconnection is needed—the server automatically tracks the new address. The client doesn’t even need to know its public IP has changed. This approach provides direct inspiration for “session migration / seamless continuation after IP drift” in P2P relay scenarios.

Mosh also has a Local Echo mechanism: the client predicts the effect of keystrokes and speculatively displays them, without waiting for server confirmation → typing remains responsive on poor networks.

Tor / I2P: Layered Relay Anonymity Reference

Tor (The Onion Router) core mechanism is Circuit anonymity. Tor has four types of relay nodes:

Guard/Entry Relay: Knows the client IP but not the destination
Middle Relay: Only knows the previous and next hop IPs
Exit Relay: Knows the destination but not the client
Bridge Relay: Used to bypass Tor network blocking

The structural characteristic is “only predecessor/successor knowledge”: each relay only knows the previous and next hop IPs. As long as guard and exit don’t collude, anonymity holds.

I2P (Invisible Internet Project) uses Garlic Routing and unidirectional tunnels—separating inbound and outbound tunnels, each carrying traffic in only one direction. A complete message exchange requires 4 tunnels.

While anonymity isn’t the primary goal of P2P signaling, Tor/I2P’s “zero-knowledge relay” concept echoes DERP’s “don’t decrypt, just forward” design philosophy.

frp / ngrok: Reverse Proxy Engineering Practices

frp (Fast Reverse Proxy) is an open-source internal network penetration tool written in Go. frp’s “control channel + data tunnel multiplexing” separation design is worth studying: frpc (client, deployed on the internal network) proactively connects outbound to frps (server, deployed on a public VPS) at bind_port to establish a persistent TCP connection (control channel); frpc tells frps what services it can proxy; when external traffic accesses frps’s public port, frps notifies frpc via the control channel, and data is forwarded via tunnel multiplexing.

This aligns with the P2P “signaling channel + data channel separation” architectural principle—control signaling takes one path, business data takes another.

ngrok provides similar functionality as a SaaS service with one-click startup, integrated authentication and monitoring. bore is a lightweight Rust/Tokio alternative. All three share the centralized relay model (different from the P2P direct connection goal), but their reverse-connection engineering practice for bypassing inbound restrictions can be directly borrowed.

Eight Engineering Takeaways for Self-Building Signaling & Relay

Based on the survey above, here are the core conclusions for building your own P2P signaling and relay server:

Signaling Channel is a Prerequisite

All hole-punching techniques depend on an independent “side channel” to exchange candidate addresses and coordinate hole-punching timing. Tailscale uses its coordination server + DERP as this channel; WebRTC requires you to bring your own signaling channel. Project design should first clarify the signaling server’s responsibility boundaries—it only coordinates, never touches business traffic.

Layered Fallback is the Standard Paradigm

Industry consensus: STUN reflexive address discovery → UDP hole punching → TURN/DERP relay as progressive fallback. ICE standardizes this flow; DERP is Tailscale’s HTTPS/TCP relay supplement beyond TURN (harder to block). Don’t expect one technique to solve all scenarios—always have a fallback.

Symmetric NAT is the Primary Obstacle

Endpoint-Dependent Mapping (≈ Symmetric) NATs cannot be hole-punched conventionally—they require relay or port prediction. CGNAT (Carrier-Grade NAT) has many users sharing few public IPs, further exacerbating this problem. BEHAVE (RFC 4787 REQ-1) specifically requires NATs to have endpoint-independent mapping to eliminate this obstacle.

UDP as the Recommended Base Protocol

TCP hole-punching is feasible but fragile (requires simultaneous-open, ~64% success rate). UDP is recommended as the base protocol. QUIC over UDP is the modern choice—gaining reliable transport while avoiding TCP hole-punching difficulties.

Relay Servers Should Be “Zero-Knowledge”

DERP’s design lessons: relay servers should not decrypt—only forward encrypted packets; use port 443 for reachability; support dual-stack and multi-region routing; double as discovery/coordination channels. This way, even if the relay server is breached or compelled to hand over data, it cannot reveal communication content.

Control/Data Plane Decoupling

Tailscale’s coordination server, Nebula’s Lighthouse, ZeroTier’s root+controller, OpenZiti’s Controller—all only handle coordination/discovery/keys/policy, never business traffic. This is the architectural principle self-built systems should follow—the coordination server can go down; as long as nodes have established direct connections, communication continues unaffected.

Session Resilience Design

Mosh’s stateless roaming (sequence-number-driven IP migration) provides an elegant paradigm for relay-scenario reconnection, worth implementing in self-built relays. When a user’s IP drifts (mobile network switch, WiFi reconnect), the relay server automatically tracks the new address without rebuilding the session.

Self-Hosting Technology Selection Priority

NetBird / Nebula / OpenZiti / Headscale > ZeroTier > Tailscale (closed control plane). NetBird is the most direct reference implementation due to its full-stack open-source nature with built-in signal server + relay.

References

RFC Standards

RFC	Title	Description
RFC 5389	STUN	Session Traversal Utilities for NAT
RFC 5766	TURN	Traversal Using Relays around NAT
RFC 8656	TURN (updated)	IPv6/IPv4 support
RFC 8445	ICE	Interactive Connectivity Establishment
RFC 3489	Classic STUN	NAT type classification (obsoleted)
RFC 5128	P2P across NAT	Documents all known traversal methods
RFC 4787	NAT UDP Behavioral Requirements	BCP 127
RFC 5382	NAT TCP Behavioral Requirements	BCP 142
RFC 8831	WebRTC Data Channels	P2P data channels

Academic Papers

#	Paper	Authors	Year/Venue
1	Peer-to-Peer Communication Across NATs	Ford, Srisuresh, Kegel	USENIX ATC 2005
2	Kademlia: A P2P Information System Based on XOR	Maymounkov, Mazières	2002
3	Chord: A Scalable P2P Lookup Service	Stoica et al.	SIGCOMM 2001
4	Pastry: Scalable Decentralized Object Location	Rowstron, Druschel	Middleware 2001
5	Large-Scale Measurement of NAT Traversal (DCUtR)	Trautwein et al.	IMC 2026
6	WireGuard Formal Proof	INRIA	2019
7	Mosh: Interactive Remote Shell for Mobile	Winstein, Balakrishnan	MIT 2012

Product Documentation & Source Code

Product	GitHub	License	Docs
Tailscale	tailscale/tailscale	BSD-3-Clause	tailscale.com
Nebula	slackhq/nebula	MIT	defined.net
ZeroTier	zerotier/ZeroTierOne	MPL 2.0	docs.zerotier.com
NetBird	netbirdio/netbird	BSD-3-Clause + AGPLv3 (server)	netbird.io
Headscale	juanfont/headscale	BSD-3-Clause	headscale.net
OpenZiti	openziti/ziti	Apache 2.0	openziti.io
Tinc	gsliepen/tinc	GPL-2.0	tinc-vpn.org
libp2p	libp2p/go-libp2p	MIT	docs.libp2p.io

License notes: ZeroTier’s core code uses MPL 2.0 (Mozilla Public License), not the BSL 1.1 often cited in earlier literature. NetBird uses a dual-license model: client and core components are BSD-3-Clause, while management/signal/relay server components are AGPLv3.

Part of series: Network Development Practice

← Previous Gossip in Production Systems Next → Comprehensive Survey: NVR P2P Remote Access Technologies