Building a Minimal BitTorrent Client in Python

I’ve been curious about how BitTorrent works under the hood. I download files from other peers, but how does that handshake happen? How do you verify you got the right pieces? How does it coordinate with trackers?

I spent some time building a minimal (clientBitTorrent client) in Python. It works and I learned a something I guess from it.

The result

py-torrent

What It Does

This is a minimal, download-only BitTorrent client. You give it a .torrent file and an output path, and it handles the rest: connecting to trackers, finding peers, negotiating connections, and downloading the file piece by piece with integrity verification.

The implementation follows the core BitTorrent protocol pretty closely. It handles:

  • Bencode parsing - Reading .torrent metadata files
  • Tracker communication - Getting peer lists from HTTP trackers
  • Peer handshakes - The initial protocol handshake with other peers
  • Piece management - Requesting, downloading, and verifying file pieces
  • Concurrent downloads - Using threads to connect to multiple peers simultaneously

The protocol is straightforward and simple. The handshake is just a fixed-length header, followed by the peer sending a bitfield showing which pieces they have. Then you send “interested” messages and request specific pieces by index and byte offset. The peer sends back blocks of data, and you verify each piece against its SHA-1 hash before assembling the final file.

Using It

The interface is so simple a caveman can run it:

python main.py your-torrent.torrent output-file.iso

That’s it. The client will:

  1. Parse the torrent file to extract metadata
  2. Contact the tracker to get a list of peers
  3. Connect to those peers and download pieces concurrently
  4. Verify each piece as it comes in
  5. Write the completed file to your output path

By default, it prints colorful progress logs showing which pieces are being downloaded. If you want less churping and chatter, pass the -q flag for quiet mode.

The codebase is small - maybe 600 lines total across the main modules. Each component has a clear responsibility:

  • bencode.py - Parsing the bencode format used in torrent files
  • tracker.py - HTTP requests to trackers
  • handshake.py - The initial peer handshake protocol
  • message.py - BitTorrent message types (request, piece, have, etc.)
  • client.py - TCP connection management with peers
  • p2p.py - Orchestrating the download across multiple peers

Limitations

This is intentionally minimal, so there are some limitations:

  • Only supports single-file torrents (no multi-file downloads)
  • Download-only (doesn’t upload pieces to peers)
  • HTTP trackers with compact peer lists only (no UDP trackers)
  • No magnet link support

But for educational purposes, it’s perfect. You can read through the code and actually understand what’s happening at each step. The protocol isn’t hidden behind layers of abstraction or optimization.

What I Learned

The most interesting part for me was seeing how peers coordinate. When you connect to a peer, they immediately send you a bitfield showing which pieces they have. You can then request specific pieces, but the peer might choke you if they don’t want to upload to you right now. It’s a whole negotiation dance.

Also, the integrity checking is clever. Each piece has a SHA-1 hash in the torrent file. You download the piece, hash it, and if it doesn’t match, you know something went wrong. This means you can safely download pieces from multiple peers in parallel - if one sends you bad data, you just discard it and try again.

Threading makes a huge difference here. The implementation spawns a worker thread for each peer, so pieces can download in parallel. Without that, you’d be waiting for one peer to send you everything sequentially, which would be painfully slow.

If you’re curious about how BitTorrent actually works, I’d recommend checking out the (clientcode). It’s all there, and it’s readable enough that you can trace through what’s happening from start to finish.