> I was not satisfied with my regular bittorrent client, and was wondering how much work would it be to create a new one from scratch, and it got where it is, starting from bencode protocol implemenation, then peer protocol, etc, etc.
That's pretty cool :) Usually people just pull in libraries for all those things, so pretty nice you didn't, makes for more interesting code when you're building stuff just for fun!
Of the parts you've built, which one would you say was the trickiest part?
When I built it at first a couple years ago, it all went pretty smooth until I hit concurrent communication with many peers, both for DHT and torrent downloading.
E.g. parsing bencode, and other binary protocols involved in the network (bittorrent peer protocol, tracker requests, DHT protocol) was all easy comparing to managing state.
Managing state (e.g. what parts we have downloaded so far, what are we downloading at the moment, what peers are we expecting to get pieces from, etc) has a lot of edge cases that need to be handled.
For a recent example, I was dealing with a bug that manifested like this: when resuming a torrent from "paused" state, it was stuck at ~99.87% downloaded, and never progressed. Turned out, than when I put the torrent into "paused" state, after disconnecting the peers, I wasn't marking the in-flight pieces for re-download. So when new peers connected, they had nothing to do.
DHT was also tricky, for multiple reasons:
- as it works via UDP, there's no "connection" - you need to match incoming messages to previously sent requests yourself. But then you might never get a response, so you must manage some kind of timeouts, to clean up memory of old outgoing requests that are no longer expected to be completed.
- DHT involves "recursive" requests. You query N peers at first, and each of them might return M other peers. So on next round you need to make N * M requests. And this recursion can continue forever, ever growing exponentially. You need to put some heuristics around not to explode the network and at least your own computer. For example, my MacOS UI was freezing when rqbit was trying to send too many DHT requests at once.
- Handling the above (and other) examples makes the code worse. Managing code complexity, at least for myself to understand what on earth is going on, is much harder than e.g. implementing the binary protocols.
Otherwise, it's all about tricky details, behaviours that can only be observed under certain circumstances, and only if you're curious enough to look, e.g.
- Garbage collection. Ensuring that when the client (e.g. peer, browser etc) disconnect, everything is cleaned up. E.g. when the torrent is paused, this causes a massive "stop" for all spawned tasks. If you don't account for that, they might keep running forever.
- Network issues. Re-connecting to peers, to DHT etc, re-trying everything that can be retried, might be a head-scratcher sometimes.
All that said, it's quite fun to deal with these when that's your goal by itself - to enjoy the process of coding.
I found the trickiest parts to be DHT concurrency (finally solved this adequately after 5-6 years of experimenting), and efficient block requesting (I've rewritten this every few years, but my latest implementation seems to be solid since 2020 or so). The major thing I've not solved, but also been too lazy to tackle is a peer cache. I just reaannounce when all peers are exhausted and start over.
Concurrency and the shared state nature of torrents are definitely what makes it all hard and tricky. I've rewritten this part of DHT completely recently, but according to your experience looks like this time won't be the last.
For block requesting, rqbit has a pretty simple algorithm https://github.com/ikatson/rqbit/blob/main/crates/librqbit/s..., and I didn't notice it in benchmarks, thanks to Rust being fast by default I guess. I admit though, never looked how other clients do it, maybe the rqbit algorithm is too naive.
At the moment, it doesn't listen for bittorent downloads on neither UDP nor TCP, so external peers can't connect to it.
It only listens on UDP for DHT requests.
So if it connects by itself (only TCP is supported), then it can upload. In reality, this makes uploading rare.
For "being a good citizen" of the network, a couple days ago I implemented storing peer information, so that it can be returned back when DHT nodes query for it. It has limitations though, e.g. only storing 1000 peers at the moment, and not cleaning up old peers once the peer store is full.
Listening on UDP for torrent requests seems like a big change, maybe someday.
I started doing this in Zig (roughly following the codecrafters course on the topic to start out) but I ran out of steam a week or so ago. It's good to see someone building something they want for themselves.
2 years ago when I decided to create rqbit, I've been using qBittorrent as my main client.
I think there was a bug in it, or smth else that caused it for me, that made it download torrents really slowly, and it couldn't saturate my gigabit network, whereas previously it was close.
It didn't bother me enough to investigate deeper, but instead sparked the curiosity of what would it take to make a qBittorrent myself.
Since then, either qBittorrent fixed the bug, or whatever else was causing it fixed it, so it's no longer an issue. So if you like qBittorrent, I'm with you - it's great!
My *recent* motivation to put some more work into rqbit was caused by smth else though:
1. I had some free time on my hands, and was craving coding some Rust
2. I gave the client to my dad, he put it on his RaspberryPI, said this is the fastest torrent client he's ever seen, and asked if certain features are available. I implemented them all, cause it wasn't hard, and I made my dad happier at the same time :)
2. Wireshark dumps of some existing BitTorrent clients to write unit-tests for RPC serialization/deserialization. I used qBittorrent, but you can use any other existing client.
3. (kind of optional) DHT protocol: https://www.bittorrent.org/beps/bep_0005.html. This actually came later, you can download torrents using just #1. But if you try to do so, you'll discover that most peer information is stored in DHT, and not in trackers.
Everything else was heuristics, observing real network behaviour, and tweaking the code accordingly.
That said, I'm not exactly the go-to expert on "how to develop BT clients and servers", as rqbit isn't as fully featured as the more mature clients. But given that the above links got me that far, I'm sure they can give a very decent start.
That's pretty cool :) Usually people just pull in libraries for all those things, so pretty nice you didn't, makes for more interesting code when you're building stuff just for fun!
Of the parts you've built, which one would you say was the trickiest part?