Hands-On: Building a Distributed File Sharing System
Combining theory with practice, we’ll build a real distributed file sharing system. This system will leverage technologies introduced in previous articles — Kademlia DHT for node discovery and metadata distribution, Gossipsub for broadcasting, and a custom file transfer protocol.
System Architecture
flowchart TD
subgraph Application
CLI["CLI Interface"]
API["REST API"]
end
subgraph Business Logic
Index["File Index Manager"]
Scheduler["Piece Download Scheduler"]
Verify["Verify & Reassemble"]
end
subgraph P2P Network
Discovery["Kad-DHT<br/>Peer Discovery + Metadata"]
Broadcast["GossipSub<br/>Message Broadcast"]
Transfer["Custom Protocol<br/>File Transfer"]
end
CLI --> Index
API --> Index
Index --> Scheduler
Scheduler --> Verify
Index --> Discovery
Scheduler --> Transfer
Broadcast -->|"New Peer Notification"| IndexCore design principles:
- Layered abstraction: Business logic strictly separated from P2P network layer
- Modularity: Each component independently testable
- Fault tolerance: Node failures don’t affect overall system availability
Metadata Distribution
How does a downloader know which Pieces a file has and their hashes? This is done through DHT-based FileMetadata distribution:
flowchart LR
S["Seeder Node"] -->|"1. Chunk and hash file"| FM["FileMetadata<br/>{filename, piece_size,<br/> piece_hashes[]}"]
FM -->|"2. hash(fileID) as DHT key<br/>store on K closest nodes"| DHT["Kademlia DHT"]
D["Downloader Node"] -->|"3. Query DHT with fileID"| DHT
DHT -->|"4. Return FileMetadata"| D
D -->|"5. Start per-Piece download"| SThe Seeder stores file metadata (including each Piece’s SHA-256 hash) in the DHT keyed by fileID. The Downloader only needs to know fileID to retrieve complete metadata and start downloading. fileID is typically the hash of file content, so identical files always map to the same ID.
Rust Core Module Implementation
Piece State Management
First, define the Piece status enum and thread-safe shared state:
| |
Why Arc<RwLock<>>? In async P2P programs, multiple Pieces may download simultaneously from different peers, each as an independent async task. These tasks need to concurrently update piece_status (marking Pieces as Downloading/Complete). RwLock allows multiple tasks to read state simultaneously (non-blocking) while exclusive access during writes prevents data races. Arc provides shared ownership across tasks.
File Chunking and Verification
| |
Rarest First Download Scheduling
flowchart TD
A["Peer has Piece set"] --> B["Count replicas of<br/>each Piece globally"]
B --> C{"Select rarest<br/>missing Piece"}
C --> D["Request Piece from<br/>a peer that has it"]
D --> E{"Download OK?"}
E -->|"Yes"| F["Verify SHA-256"]
E -->|"No"| G["Mark peer unavailable<br/>Pick another"]
F -->|"Passed"| H["Mark Complete<br/>Notify peers"]
F -->|"Failed"| I["Re-request Piece"]
H --> J{"All Pieces<br/>Complete?"}
J -->|"No"| A
J -->|"Yes"| K["Reassemble file"] | |
Seeder Side: Serving Piece Requests
The seeder node must listen for download requests and return data by Piece index:
| |
Running and Testing
Start a seeder node:
| |
Start a downloader node:
| |
Download Flow
sequenceDiagram
participant D as Downloader
participant T as DHT
participant S as Seeder
D->>T: Query fileID metadata
T-->>D: Return FileMetadata (piece_hashes[])
D->>S: Request Piece 0 (rarest first)
S-->>D: Send Piece 0 data
Note over D: Verify SHA-256 → Mark Complete
D->>S: Request Piece 1
S-->>D: Send Piece 1 data
Note over D: All Pieces Complete
Note over D: Reassemble in orderReferences
- Cohen, B. (2003). Incentives build robustness in BitTorrent. Workshop on Economics of Peer-to-Peer Systems.
- IPFS. https://ipfs.tech/
- anacrolix/torrent (Go BitTorrent). https://github.com/anacrolix/torrent
- rust-libp2p examples. https://github.com/libp2p/rust-libp2p/tree/master/examples