Who remembers Skytorrents (https://news.ycombinator.com/item?id=13423629)? Posted as a "Show HN" here, it was a DHT-sourced index and stack written in C with no JavaScript, no cookies, no ads, no tracking. Skytorrents was unbelievably fast, friendly, and complete, and this translated into rapid adoption and traffic growth that caused the site to shut down due to server costs after just a year (https://torrentfreak.com/skytorrents-dumps-massive-torrent-d...).
It was a shame that the technology behind Skytorrents was never open-sourced; it was the best torrent crawler and site I've ever seen, and I would have liked to see how it worked so well.
I want to seamlessly distribute my music through bittorrent, but since I have a small fan base (and thus, a very small potential seeding pool), I found it difficult to connect all the moving parts.
I'll give Bitmagnet a try for an Indexer.
I did find that IPFS, with a pinning service (I won't shill the one I used in particular), was a bit easier and just worked for everyone who tried to use it.
But I'd like to get my bittorrent presence up to the "just works" level also.
You don't need an indexer for that. All you need is a static list with all your music, and people can click on links to download your music. Give them qbittorrent and it'll be running smoothly.
There is no point in running your own indexer if it's only for your own music though. If your goal is to run a more "communal" indexing point, it also makes sense to build it cooperatively with other artists, listing only content you all care about and want to distribute (rather than doing a filtering after the fact with a general indexer)
I like your idea of giving its name back to a technology that just works better, and I think it pairs really well with giving back a humane touch to how we use the internet, computers, and putting people back at the center rather than thinking about tools first !
Maybe check out how it's done here: https://archive.org/details/the-fanimatrix-div-x-5.1-hq. They don't seem to provide magnet links, but basically you create a torrent from your music. Host it somewhere (Cloudflare R2 would be good for free egress) in the right structure. Add the webseed endpoint to your torrent file and create a magnet link. Put all this stuff on a website in the download section. Let your users download it however they want.
Well for your use case of distributing your own music, Bitmagnet wouldn't be necessary.
What a DHT crawler like Bitmagnet does is the following:
1. Take a few initial "bootstrap" torrents and ping them to see which IP addresses are seeding that file
2. Ping those IP addresses and ask what other files/torrents they're seeding
3. Ping those torrents to see which IP addresses are seeding that torrent
Rinse repeat.
So to distribute your music to fans, you'd just want to put magnet links to your music on your site.
That's not correct. Torrent swarms and the DHT are separate. Each torrent basically forms its own small network of TCP connections to exchange data specific to that one torrent. While the DHT is a network shared by all clients that speak the protocol and it's carried over short-lived UDP query-response exchanges.
You have to be participating in a torrent swarm in order to bootstrap yourself into DHT at all. Bittorrent's DHT is not a network independent of torrent swarms. You need the address of a peer who is already part of the network in order to join it, and you have to get a list those addresses from somewhere.
Which of these methods is not obtaining the address of a node in a swarm, or hitting a tracker for a list of nodes in a swarm?
> People have used the DHT for non-torrent-related purposes.
This is simply non-responsive. People have used a DHT overlaid on the collection of torrent swarms for non-torrent related purposes.
> And nothing dictates that that has to be obtained via the bittorrent peer protocol.
This is a silly distinction. You also don't need to join a swarm to get on the DHT if I join a swarm to get on the DHT, write down the addresses I get on a piece of paper, then email you those addresses, which you plug into your handwritten specialized client that only knows how to join the DHT.
You don't need to ping anyone to crawl the DHT. You can passively wait and you'll get DHT queries in the form of "I'm looking for people seeding XYZ. Do you have a list?". You can just save those somewhere and you'll accumulate a list of active and new torrents.
Writing a DHT crawler is super fun, I suggest everyone to get a cheap VM and write/run one.
> I want to seamlessly distribute my music through bittorrent, but since I have a small fan base (and thus, a very small potential seeding pool), I found it difficult to connect all the moving parts.
Out of curiosity, why not just distribute via R2 or similar, or archive.org if you don't need AuthN/AuthZ? What's the complexity (to you and listeners) of BitTorrent buy you?
Well, I do distribute via some centralized platforms (including some odious ones like Spotify).
But I'd like to put forward a practice that demonstrates that the tools that have been smeared as anti-artist (chiefly but not only bittorrent) are actually compelling tools for independent distribution.
"Unlike well-moderated torrent sites, Bitmagnet adds almost any torrent it finds to its database. This includes mislabeled files, malware-ridden releases, and potentially illegal content. The software tries to limit abuse by filtering metadata for CSAM content, however."
This is by far the biggest hurdle to something like this. You'll eventually have to end up again with a centralized curator.
I have a DHT indexer implementation I mentioned in https://news.ycombinator.com/item?id=39425381. It definitely finds everything, and you need a metric to sort for quality. That metric can be seeder/leecher counts from a trusted tracker, or peer count on the DHT. https://www.coveapp.info/ uses this to grade its search results.
This has been used by record labels in the past. I can't remember which system tried to do reputation (was it kazaa?), but some of the most upvoted songs were the current hits, but replaced with a ~10s loop going for a few minutes.
Now, do you think you can outvote riaa enforcement group equivalent, if they decide so spend money on this?
Bitcoin originally evolved from the "hashcash" proof-of-work system, which was intended to be a scalable anti-spam measure (force an attacker to generate a unique proof that is expensive to generate but cheap to verify). And you are basically describing proof-of-stake (force participants to stake, and if they misbehave you slash their stake). I wasn't kidding about "nothing is new under the sun" ;)
In general though others are right that the problem often is that attackers are more willing to play these games than individuals - you are not willing to put up 10 bucks to post on a web forum, but an attacker might think $10 to get a wonderful commercial offer delivered to 1000 people sounds really good! You won't spend 10c to upvote each individual song, but the RIAA will! Etc. It needs to be highly asymmetrical, and ideally have minimal/zero cost to "honest" users with an exponential penalty for attackers.
Obviously we are not living in a post-email-spam world unfortunately, and what we have is basically the "lightning network" with gatekeepers offloading and centralizing a lot of the problem so users don't have to deal with it. But you aren't the first person to make the observation that these are related problems!
>And you are basically describing proof-of-stake (force participants to stake, and if they misbehave you slash their stake). I wasn't kidding about "nothing is new under the sun
> Its going to be an unpopular take, but crypto solves this. As in currency. Make people attest to things with locking a very small amount of money.
So claims Elon on Twitter, a platform that is very obviously to anyone who uses it completely overrun with bots who have found it very profitable to validate their spam accounts and get preferential listings for a mere eight dollars, while scaring off legitimate users who (fairly) do not trust a service with no functioning security department with their payment information.
Ad blockers seems to imply that we can do curator without the centralized aspect. Ublock has a couple hundred of lists, each with thousands or more filters. A user choose to enable or disable what they want with no central control.
Further more, the filter itself can be made to limit disclosure of the original data while still providing a binary decision regarding filtering a content or not. Hashing, bloom filters, ai models and so on are common tools for filtering data like emails and link reputation.
Is it possible to use bots or use a riaa enforcement group equivalent to get ad blockers to green lights specific ads? Personally I trust easylist as long the community does so, and would discard it the moment it lost the community trust. That makes it a kind of centralized curator, but also very much not.
Things like this need good signal to noise ratio to survive. Just throwing your hands in the air and saying anything goes doesn't work because users don't want to search through 10 billion mislabelled things to find the right one.
The really interesting part for me is what this technology might lead towards in the end, which is decentralized community based curation. An index with white listed curation would be indistinguishable from a website, but it would not need a domain name nor a ip address to function.
The problem with decentralised software is that you don't want to host other people's illegal content. I once tried out zeronet, which downloads the entire decentralised website and anyone can post things to it. Although I have not found CSAM directly on their Reddit equivalent, there are people posting advertisements to zeronet CSAM sites. The idea that I am downloading and automatically redistributing content like that is disturbing and zeronet is dead for a good reason. It's a pool that is asking to be peed in, even if the abusers themselves are a tiny minority.
Bitmagnet may download metadata about CSAM content, which is automatically deleted with fairly high accuracy. You would never be redistributing it. No outgoing peer protocol is currently implemented. This is planned but it will give users control of what they're sharing rather than indiscriminately sharing everything.
https://www.coveapp.info/ approaches this in a similar fashion. I don't believe there's any legal issues in collecting metadata in an automated fashion from a public network. So creating a search index for personal use from this is fine. However if you do click and start seeding questionable content, then it becomes an issue. https://www.coveapp.info/#dht-indexer
Another alternative is https://github.com/the8472/mldht which, contrary to magnetico, strives to be a nice citizen (its author is active in the bittorrent community AFAIU)
I have worked with the8472 to get Bitmagnet's BEP5 & BEP51 implementations working and ensure it's a good citizen on the network - there is more to be done and more protocols to be implemented, but unlike Magnetico, BM is not simply scraping without responding to incoming requests.
We had a discussion here https://github.com/bitmagnet-io/bitmagnet/issues/11
With the related changes bitmagnet shouldn't have the blatant misbehavior of magnetico (anymore). Though I haven't looked at its in-the-wild behavior, so I can't vouch for how spec-compliant the implementation plays in practice.
Implementing sample_infohashes opens your torrent client to abuse. UDP responses are much larger than queries and this protocol (BEP51) allows attackers that can spoof source IP addresses to use a large number of clients as mules for amplified distributed denial of service attacks.
Individual DHT nodes should only see a trickle of packets. A few kilobytes per second. Even less per remote IP. So they can set fairly strict rate limits.
And ISPs are in a much better position to solve this problem anyway. They should ask for higher peering fees from peers that don't do source-filtering.
True. Although an ISP can say they do source filtering and then fail to implement it properly. For example my ISP at home implements source filtering on IPv4, but not on IPv6.
Bitmagnet has rate limiting on incoming UDP requests (both overall and per-IP) so I don't know that it would be vulnerable; if there's anything else that should be done to mitigate any risk I'd like to know.
Incoming and outgoing are both limited, I think the worst such an attack could do is prevent responding to legitimate incoming queries - this shouldn't slow down the DHT crawler in a noticeable way.
Mainly because I am not ready to share my DHT indexing implementation. It's the culmination of years of working with BitTorrent and DHTs and I'd like to get something back for it one day. However I do dogfood Cove and want people to have that experience too.
I would think the only thing you would get out of such work is a sternly worded cease and desist from a big entity so you might as well open-source this and let somebody else take the torch in case you are shut down.
I mean I'd use such software (and huge DBs like that of Skytorrentz) for research purposes because to me distribution is hugely interesting as a hobby but many courts won't see it that way, and we know a lot of them are influenced by copyrights holders.
The web UI is super slow. It's slow with the default view of 10 results per page and unbearably slower at 100. These aren't big numbers and my computer isn't old, something was poorly done here and setting the default number of results per page to 10 clearly is not the correct fix.
(For the record it only has a mere thousand results so far..)
Also, the content classification is so bad it might as well not exist. How does a torrent with "Playboy" in the title get classified as 'Unknown' instead of 'XXX'? Even torrents with "Porn" or "XXX" in the title get classified as Unknown. A simple Bayesian classifier should have this covered, it's not a task that needs heavy duty AI to solve.
Try https://www.coveapp.info/, I mentioned it here: https://news.ycombinator.com/item?id=39425381. I wrote a custom search implementation for torrents that uses keyword matching and boosts search scores based on popularity of torrents on the DHT. It feels very much like TPB search but not so aggressively focused on seeder counts (i.e. better matching on file names can outrank a higher seeder count).
Can I take a wild guess that you're using Firefox? I have noticed Firefox performance is much worse than any other browser. I think this is due to issues in the Angular and Material components being used.
As stated in the big red notice in the website, the software is currently in alpha preview. Given time, Bitmagnet will be trying to mitigate the Firefox performance issues (the <5% market share means it's prioritised accordingly), but it's important that users of all browsers can have acceptable experience in the app.
It takes several seconds to load 100 results from a list of 1000, with no search term used. I don't know how that's even possible. I think half-baked is a mild way of putting it, particularly if the excuse is that it hasn't been in the oven very long.
This looks interesting but I’m a bit worried about the CSAM / illegal stuff part, could a user get in trouble because he has traces of that in his crawled index? Also, how large is the index after indexing for a few months?
An indexer doesn't download content. The only information you'll have is the name of a torrent, potentially its files, and who is interested in those files.
But that's the technical view, what happens in court might be totally different.
In order to get the information such as the name of the torrent and its files from the hash you do need to connect to someone in the swarm to download that metadata. You won't know what it is until after you've already connected.
Connecting to an unknown machine and asking what they have, is like knocking on a stranger's door and asking what they're selling. Them mentioning something nefarious and you leaving in response, is very obviously not a crime.
There probably are nefarious content you can see just from the filenames but not everything is like that. Moreover, you "only" know they distribute it, you don't do it yourself.
The real question is: metadata is data, so are there any limitations on how much data can be transferred through DHT using well-behaving clients/servers so that you can be reasonably sure what you download on your machine isn't poisoned enough to possibly get you into trouble with the law enforcement?
At least in the case of https://coveapp.info, the metadata you fetch from users while scraping is disassembled into a form for efficient searching only. The only part remaining in an identifiable form is the infohash.
Isn't this dangerous? I thought that in order to get details of any torrent from the DHT you must connect to it. This would automatically set off DMCA complaints from MPAA/RIAA etc. right?
(also how do they ever prove you actually downloaded any usable part of a torrent?!)
It was a shame that the technology behind Skytorrents was never open-sourced; it was the best torrent crawler and site I've ever seen, and I would have liked to see how it worked so well.