Kademlia DHT Protocol Deep Dive

January 20, 2021 Network P2P, DHT, Kademlia, Distributed Systems, XOR Network Development Practice 1155 words 6 min read

🔊

Kademlia is one of the most influential DHT (Distributed Hash Table) protocols, proposed by Petar Maymounkov and David Mazières in 2002. It is widely used in IPFS, BitTorrent, Ethereum, and many other systems. Kademlia’s revolutionary innovation lies in using XOR (exclusive or) as its distance metric, offering elegant mathematical properties and efficient routing algorithms.

XOR Distance Metric

Kademlia maps nodes and resources to the same 160-bit identifier space and defines the XOR distance function:

1
distance(a, b) = a XOR b

This seemingly simple function has ideal mathematical properties:

Identity: distance(a, a) = 0, a node is zero distance from itself
Non-negativity: distance(a, b) > 0 when a ≠ b, distinct nodes always have positive distance
Symmetry: distance(a, b) = distance(b, a), distance is direction-independent
Triangle Inequality: distance(a, b) + distance(b, c) ≥ distance(a, c), satisfying basic metric space requirements

Concrete Calculation Example

To intuitively understand XOR distance, let’s use 8-bit IDs for demonstration (real systems use 160 bits):

1
2
3
4
5
6
7
Node A ID:  00010110  (decimal 22)
Node B ID:  00011010  (decimal 26)
Node C ID:  00100010  (decimal 34)

distance(A, B) = 00010110 XOR 00011010 = 00001100 (decimal 12)
distance(A, C) = 00010110 XOR 00100010 = 00110100 (decimal 52)
distance(B, C) = 00011010 XOR 00100010 = 00111000 (decimal 56)

Notice: distance between A and B (12) is much smaller than A and C (52). In binary, A and B differ only in the lower 4 bits (sharing 4-bit prefix 0001), while A and C differ in 6 bits (sharing only 2-bit prefix 00).

More importantly, XOR distance has prefix-matching properties — the smaller the XOR value between two IDs, the longer their shared prefix. This allows routing tables to be organized by prefix, enabling efficient binary search. Each query eliminates half the address space — the source of O(log N) efficiency.

K-Bucket Routing Table

Kademlia’s routing table consists of a series of K-buckets, each covering a specific distance range:

mermaid
flowchart TD
    KB0["K-bucket 0<br/>Bit 0 differs<br/>Closest"]
    KB1["K-bucket N<br/>Bit N differs<br/>Middle range"]
    KB255["K-bucket 255<br/>Highest bit differs<br/>Farthest"]
    CAP[("Max K=20<br/>nodes per bucket")]
    KB0 --> CAP
    KB1 --> CAP
    KB255 --> CAP

    style KB0 fill:#4CAF50,color:#fff
    style KB1 fill:#2196F3,color:#fff
    style KB255 fill:#f44336,color:#fff
    style CAP fill:#9C27B0,color:#fff

For a 160-bit ID space, each node maintains 160 K-buckets. The i-th K-bucket stores nodes whose distance falls within [2^i, 2^(i+1)). Each K-bucket holds at most K nodes (K=8 in BitTorrent DHT, K=20 in IPFS).

Why different K values? K represents the tradeoff between redundancy and overhead. K=8 (BitTorrent) has lower overhead but less fault tolerance — if 5 of 8 nodes go offline, that K-bucket is nearly empty. K=20 (IPFS) provides higher redundancy but requires contacting more nodes per lookup. IPFS chose K=20 because it targets long-term storage scenarios with higher durability requirements.

K-bucket update policy reflects Kademlia’s emphasis on node liveness: when an existing node is contacted again, it is moved to the tail (most recently used position) of its K-bucket. If a bucket is full, a new node can only join after an existing node goes offline. This ensures the routing table retains the “most stable” nodes.

Core RPC Messages

Kademlia defines only four RPC messages, balancing simplicity and power:

Message	Function	Use Case
PING	Probe if a node is online	Routing table maintenance
FIND_NODE	Find K closest nodes to target	Node discovery, routing
FIND_VALUE	Find stored value, or return K close nodes	Data retrieval
STORE	Notify a node to store a key-value pair	Data publication

The difference between FIND_NODE and FIND_VALUE: if the queried node holds the target value, FIND_VALUE returns the data directly; otherwise both respond identically — returning the K closest nodes to the target.

Recursive Lookup Algorithm

Kademlia uses recursive parallelism for node lookup, balancing speed and accuracy:

mermaid
sequenceDiagram
    participant A as Node A (Initiator)
    participant B as Node B
    participant C as Node C
    participant D as Node D

    Note over A: Select α=3 closest from routing table
    A->>B: FIND_NODE(target)
    A->>C: FIND_NODE(target)
    A->>D: FIND_NODE(target)

    B-->>A: Return K closest nodes
    C-->>A: Return K closest nodes
    D-->>A: Return K closest nodes

    Note over A: Merge results, pick unqueried closest
    Note over A: Continue next iteration

    Note over A: Repeat until no closer nodes found
    Note over A: Return K closest nodes

Algorithm steps:

Select α closest nodes from local routing table (α is the concurrency parameter, typically 3)
Send parallel FIND_NODE queries to these nodes
Each node returns the K closest nodes it knows
Merge results, select α unqueried closest nodes
Repeat steps 2-4 until no closer nodes can be found or max iterations reached
Return the K closest nodes to the target

Time Complexity: O(log N) hops, with α concurrency per hop. In a 160-bit ID space, even with millions of nodes, only 5-10 hops are needed. For a 1-million-node network, each iteration halves the candidate range, log₂(1000000) ≈ 20, so at most about 20/α ≈ 7 rounds (with α=3) suffice.

Bootstrap Process

How does a new node join the Kademlia network? This is the bootstrap process:

mermaid
flowchart TD
    A["New node starts<br/>generates Peer ID"] --> B["Connect to bootstrap node<br/>(hardcoded address)"]
    B --> C["Send FIND_NODE<br/>(own ID) to bootstrap"]
    C --> D["Bootstrap returns<br/>K closest nodes it knows"]
    D --> E["New node iteratively looks up<br/>gradually fills all K-buckets"]
    E --> F["Bootstrap complete<br/>node is part of the network"]

    style A fill:#4CAF50,color:#fff
    style B fill:#2196F3,color:#fff
    style F fill:#9C27B0,color:#fff

The critical step is #3 — the new node looks up its own ID. This seems counterintuitive (why look for yourself?), but the purpose is to discover nodes closest to itself in the network. Each round of FIND_NODE returns nodes closer than the previous round. Through multiple iterations, the new node’s routing table is gradually filled.

Data Storage and Retrieval

Kademlia’s data storage strategy revolves around redundancy and automatic refresh:

Redundant Storage: When a node wants to store a key-value pair, it first finds the K closest nodes to the key via FIND_NODE(key), then sends STORE messages to all K nodes. Even if some nodes go offline, the data remains available.

Automatic Refresh: Storing nodes periodically (typically hourly) re-execute the lookup and republish data, ensuring newly joined nodes also hold copies. This solves the data loss problem caused by dynamic node join/leave.

Data Retrieval: When querying data, a node executes FIND_VALUE(key). If any node along the path holds the value, it returns directly; otherwise, it iteratively searches like FIND_NODE until finding a node with the data.

Expiration: To prevent garbage data from accumulating indefinitely, each stored record has a TTL (Time To Live). Records exceeding TTL are automatically purged. IPFS uses a default TTL of 24 hours.

This design ensures data availability even in highly dynamic environments where nodes frequently join and leave.

References

Maymounkov, P., & Mazières, D. (2002). Kademlia: A peer-to-peer information system based on the XOR metric. IPTPS, 53-65.
Lopsa, D. (2016). Kademlia and BitTorrent DHT. Technical Report.
IPFS. Kademlia DHT Implementation. https://github.com/libp2p/go-libp2p-kad-dht
Falkner, J., et al. (2004). Profiling a million-user DHT. Telecommunications Review, 14(11).

Part of series: Network Development Practice

← Previous Gossip Protocol Core Principles Next → SWIM Protocol and Cluster Membership Management