Hyperdrive v10 – a peer-to-peer filesystem

sktrdie · on May 14, 2020

This is really cool, but why reinvent the wheel? For instance SQLite already has tons of years of optimization regarding storing and accessing files on disk.

To make SQLite decentralized (like Hyperdrive) you can put in a torrent. Index it using full-text-search https://sqlite.org/fts5.html for instance. Then let the users seed it.

Users can use sqltorrent Virtual File System (https://github.com/bittorrent/sqltorrent) to query the db without downloading the entire torrent - essentially it knows to download only the pieces of the torrent to satisfy the query. This is similar techniques behind Hyperdrive I believe just again, using standard tools and tech that exists and highly optimized: https://www.sqlite.org/vfs.html

Every time a new version of the SQLite db is published (say by wikipedia), the peers can change to the new torrent and reuse the pieces they already have - since SQLite is indexed in an optimal way to reduce file changes (and hence piece changes) when the data is updated.

I talk a bit about it here: https://medium.com/@lmatteis/torrentnet-bd4f6dab15e4

Again not against redoing things better, but why not use existing proven tech for certain parts of the tool?

jhardy54 · on May 15, 2020

That's very interesting, thanks for the links. I'm working on Scuttlebutt (like Dat/Hypercore) and have been working on reimplementing our stack with 'boring' tooling like SQLite and HTTP, and I've been really enjoying it so far.

I'm going to read your blog post now, thanks a lot for the new info.

gnu · on May 15, 2020

Interesting! Is there any document that I can read about your reimplementation? Or any code?

black_puppydog · on May 15, 2020

Christian has also put some work into the underlying database and such lately, but the user facing part of that is Oasis [1] which aims to be an ssb interface that has a no-JS UI, with all the logic being handled by the (locally running) nodeJS server.

[1]: https://github.com/fraction/oasis

smaddox · on May 15, 2020

It seems like the blog post answers your question pretty thoroughly. The Hyperdrive index and the protocol are tuned for this use case, making it scale to being able to host a Wikipedia clone. BitTorrent FS + SQlite are not tuned for this use case.

carterschonwald · on May 15, 2020

Wikipedia’s text history absolutely fits on a tiny hard drive and is easy to get a replica of.

RussianCow · on May 15, 2020

Compressed with 7-Zip, sure, but uncompressed, the entire thing takes up 10TB. The Hyperdrive post doesn't mention compression at all, so the comparison should be without it.

> As of June 2015, the dump of all pages with complete edit history in XML format at enwiki dump progress on 20150602 is about 100 GB compressed using 7-Zip, and 10 TB uncompressed.

From: https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia#Si...

It's a bit of a nitpick either way because you're right, Wikipedia may not be the best example because 10TB is still relatively small.

fenwick67 · on May 14, 2020

The big asterisk is, this only works if your database never changes.

rrdharan · on May 15, 2020

Not sure if the parent edited their post after you stated this but note they explained a technique to accommodate database updates / changes / edits.

sktrdie · on May 15, 2020

What do you mean? The author (say wikipedia owners) can change the db as they usually would change (using UPDATE queries say). Those write queries will result in the least-amount of disk-pages updates. In the torrent world this equals a minimum set of pieces modified and needed to be downloaded by users.

watson · on May 15, 2020

Last I checked you can't update a torrent. So if Wikipedia changes even a single letter, you'd need to download all the data once more

sktrdie · on May 15, 2020

No, the pieces you downloaded can be reused for the new torrent download. The pieces will effectively have the same hash hence can be reused for the new digest: http://bittorrent.org/beps/bep_0038.html

This is also why sqlite is a good choice because it's highly optimized to do the least amount of changes to its "pieces" when an update occurs.

If you're implementing this behavior, trying to manage all kinds of different queries, building a querying engine on top of that, optimizing for efficiency and reliability, you're effectively rewriting a database. Sure you can do it, but why not take advantage of battle-tested off-the-shelf stuff for things like "databases" (sqlite) and/or "distributing data" (torrent)?

namibj · on May 15, 2020

Actually, there is a solution against this. Just combine https://www.bittorrent.org/beps/bep_0030.html (Merkle-tree-based hashing) with https://www.bittorrent.org/beps/bep_0039.html (Feed-URL based updates), and in some settings also https://www.bittorrent.org/beps/bep_0047.html (Specifically the padding files, so that flat files inside a torrent can also be efficiently shared in arbitrary combinations of non-partial files.).

black_puppydog · on May 15, 2020

All those BEPs are in "Draft" status. Okay, libtorrent implements two of them. But also, BEP 39 (Updating Torrents Via Feed URL) doesn't really fit very well into the fully distributed setting because of the centralized URL part.

So now to update the torrent file you need a mechanism for having a mutable document you can update in a distributed but signed way. Or you could make an append only feed of sequential torrent urls... oh wait.

My point is: Hyperdrive's scope is sufficiently different from your proposed solution that yes, you could probably rely on existing tools (and I have much love for bittorrent based solutions!) but it starts feeling like shoehorning the problem into a solution that doesn't quite fit.

namibj · on May 17, 2020

The distributed-but-signed way is there in https://www.bittorrent.org/beps/bep_0046.html (Updating Torrents Via DHT Mutable Items).

That draft status is of little practical relevance, though, if nothing changed for years, and no one voiced well-founded critic on the technical details.

I do agree though that Hyperdrive is different from what the bittorrent ecosystem has to offer. I too like not reinventing the wheel where that's not necessary, as you recommend there. I'll leave you the list of BEPs for further reading, in case you're interested: https://www.bittorrent.org/beps/bep_0000.html

black_puppydog · on May 18, 2020

I've been keeping an eye on that list for a long time. There's some really cool stuff in there, and I think bittorrent has really been within reach of being "simply good enough for most applications" for quite some time now. And the massive user base is of course a good thing there, especially if you're talking more about archival projects.

guerrilla · on May 14, 2020

Seems like somone should make a frontend for what you just said, especially that last part which would get annoying to do manually.

nobodywillobsrv · on May 15, 2020

Would sqltorrent setup make sense for sharing scraped/pulled data amongst users. So each user can run the data-extraction themselves or check if anyone has ingress chunks to their liking on the swarm? Everying is append-only content addressable at it's base.

I've been looking around IPFS, dat, hyperdrive etc and it seems like dat is the most natural setting for this but sqltorrent is new to me.

sktrdie · on May 16, 2020

"Sharing data amongst user" - torrents excel at this. Do you have a specific use-case in mind?

xvector · on May 14, 2020

That is pretty clever!

steve76 · on May 15, 2020

Wouldn't seeding be a problem? You would need to seed from something that supports webtorrent which uses WebRTC.

With dat-sdk, users just need to go to a webpage. You really just need WebRTC without torrents.

I got rid of multiwriter by just having a dat archive for each user, and the users sharing their dat addresses with each other. They write to their own. When that happens, events emit and users listening write to theirs.

If enough users stay on, listening to each other's address, I only need a web client.

Also, if I have offline support, like Workbox Background Sync, I don't even need internet and information transfers device to device with just an offline PWA. At least that's my goal.

Mizza · on May 14, 2020

Been following this for years, congratulations on your release. I looked at dat for building P2P services, but found that ZeroNet was actually more capable for "real" services - maybe now is the time to reevaluate that, especially given the improvements to large (both deep and wide) archives and the new hole punching.

Can you please mention your thoughts on:

- Discoverability of content in Hyperswarm (DHT Search/"Superpeers"/???)

- What happens to the old DEP proposals? (There is a critical feature that I need for my service that's still an open DEP proposal! https://github.com/datprotocol/DEPs/issues/61)

- What uses cases do you have in mind for this service?

Thanks!

andrewosh · on May 14, 2020

Thanks Mizza!

Not sure if I'm answering your first point, but Hyperswarm is baked into the Hyperdrive daemon, so daemon users should have their drives swarmed/available automatically. There are a few CLI commands to toggle this behavior too, in case you don't want to add your drive's key to the DHT.

The Hypercore Protocol org's creating a similar proposal repo called HYP [0] (we couldn't resist the name), scoped tightly to the core protocol. We're still solidifying the proposals plan, but yours would add lots of value, so we don't want to lose track of it.

Since your proposal is about peer identifiers, you might like that Hyperswarm uses the Noise protocol [0] to handshake each connection, and the Noise key can be used as a stable peer ID.

As for use-cases, check out Beaker (launched today too). Paul's made a whole bunch of example applications that take advantage of Hyperdrive features, like drive mounts.

I'm personally really interested in using Beaker to make personal document indexers + search (kinda like an amped-up Dropbox), but that's for another post!

[0] https://noiseprotocol.org/

andrewosh · on May 15, 2020

In the flurry of the launch I got all my links wrong. The HYP repo can be found here: https://github.com/hypercore-protocol/hyp

telamohn · on May 15, 2020

Check out the dependency hyper-protocol, specifically the extension messaging, custom-messaging has been around for a while allowing you to define your own protocol-extensions and peer behaviour. hyper-protocol-v7 came with an option for doing authenticated peer handshakes as well.

rklaehn · on May 14, 2020

Quick note: it would be nice to ship the hyperdrive daemon as something more self-contained than a npm package.

I tried to install it and got some c compile errors. Probably my nodejs version is too old, but you want to have the absolute minimum of friction for users to install it.

Beaker comes as an appimage for linux, which worked flawlessly at the first try. Maybe do that for the hyperdrive daemon as well?

mafintosh · on May 14, 2020

Yes, we'll definitely be doing this. Thanks for the feedback.

dvasdekis · on May 15, 2020

A Docker image would also allow for us Windows folks to use this, as Docker can map folders between Windows and Linux. Does one exist? If not, I could give it a go :)

jstanley · on May 14, 2020

Total noob question, but:

Last time Dat was on HN I tried to follow the "simple chat application" tutorial[0], but got stuck at the stage where 2 instances were supposed to automatically discover each other because they only intermittently managed to actually discover each other.

Will this new version of Hyperdrive improve this? Or is it something completely different?

[0] https://docs.dat.foundation/docs/kappa

andrewosh · on May 14, 2020

The new version has a totally revamped networking stack that should be a lot faster and more reliable. Part of it includes a new UDP holepunching algorithm that's had a super high success rate for us in practice.

There are some diagrams describing this on the Hypercore Protocol [0] site, and the repo's can be found here.

[0] https://hypercore-protocol.org/#hyperswarm

[1] https://github.com/hyperswarm/hyperswarm

mafintosh · on May 14, 2020

The new holepunching algorithm also runs distributed on the our DHT, meaning any peer can help you holepunch to other peers.

We also talk a bit more about this on the hypercore website

q3k · on May 14, 2020

> Hyperdrive is a POSIX-like filesystem implementation

How POSIX-like is it?

From a filesystem I would expect some more rigorousness than just stating this without any extra information, especially as implementing an actual, to-spec, fully distrubuted POSIX file system is known to be a difficult problem and is generally solved by not wanting a full POSIX-compatible implementation instead (ie. no locking, append-only files, etc.).

pfraze · on May 14, 2020

Hyperdrive is part of the peer-to-peer networking stack that Beaker Browser uses for websites. Today is release day and this tech update is the first announcement.

billme · on May 14, 2020

What is your relationship with Hyperdrive & the Beaker Browser?

macawfish · on May 14, 2020

He lives inside of a beaker browser. It's his home.

pfraze · on May 14, 2020

The Beaker team works on Hyperdrive (the Hypercore Protocol) but we maintain a separate org at https://github.com/hypercore-protocol

errnoh · on May 14, 2020

Color me interested. I've been wanting to use IPFS for similar use case but there's always something that annoys me or documentation that's lacking.

While reading Hyperdrive/Hyperswarm docs I actually managed to find everything I wanted to know without much effort (most of all it seems like setting up this on private network should be doable)

imhoguy · on May 14, 2020

Does it support private clustering and encryption? My use case is to have shared volumes between servers, laptops but without all the networking hassle (VPNs, opening firewalls).

hinkley · on May 14, 2020

My go-to example for 'private cloud' (which as soon as we hit the Trough of Disillusionment on this Hype Train, we will all be talking about) is having a set of people in your social or family group maintaining a file share for your semi-private photos and videos.

You don't have to worry about a flood or tornado taking out all the photos of first birthdays, or Grandma when she was younger than you are now, and you also don't broadcast family dynamics, whereabouts (people have been robbed when social media made it clear they were not at home) or pictures of minors.

You, your uncle who Moved to the City and your cousin by marriage who wants to be a game designer all set up a file server and share the photos (your cousin is gonna throttle traffic while he's playing CoD or Dota of course, which it turns out he does all the time but at least he's an extra backup copy).

I don't know how you keep that one relative from uploading funny things they found on the internet that keep trying to install spyware or back doors though, but I suppose you'd have that problem now.

beckingz · on May 14, 2020

Do you have any good solutions for this exact use case yet?

imhoguy · on May 14, 2020

Checked Tahoe-LAFS, Perkeep, but they are too low level. Haven't tried Resilio Connect yet, although I would prefer something open-sourced.

I think we have all the tech needed already (WebRTC, DHT, NAT hole punching, decades of p2p, encryption, onion security, StorJ/Filecoin) etc but what is lacking is dead simple UX and wide support of operating systems - Windows, Mac, Android, iOS, Linux (Raspberry, Synology, cheap VPS backup)

hinkley · on May 14, 2020

Yeah there are no turnkey solutions that I'm aware of. Which is why every time there's a thread like this I come to see what the rumpus is.

I'm still waiting for a 'Drobo' like device without the proprietary physical layout. A light or dial goes into orange territory, you head to Best Buy or Amazon and buy the biggest drive that doesn't give you sticker shock, you push a button, out pops your worst drive and in goes the new one. Some lights flicker for a while and then go green.

I thought ZFS would have given us almost everything but the hardware ten years ago, but it turned out they oversold a few of the features back then, and then Larry happened.

Custom hardware is too expensive for small run consumer hardware, and Apple might have gotten into that space but never did. I wonder how many PCIe lanes you could shoehorn onto a Pi clone...

mceachen · on May 15, 2020

Synology hardware is pretty close to what you're describing. Light on front goes from green to orange and you get an email, you plug in another drive, click a button in the GUI, and wait for the volume to reshard/resilver/remirror.

A freenas box does this as well, but won't have the pretty drive light indicator if it's a home-built box, but then you're not limited to proprietary hardware.

hinkley · on May 15, 2020

I had been meaning to look into Synology more and I watched a few reviews after this exchange. Sound good except I'm not happy about having to link to their servers. But everything works that way these days :/

thijsvandien · on May 15, 2020

Why would you have to? Their devices work fine without any sort of account with them.

Reelin · on May 15, 2020

4 TB drives are now commonly well under $100 (I've seen them as low as $70) and many Micro-ATX and NUC boards have 4+ SATA connectors. Btrfs has CoW, subvolumes, snapshots, and data integrity features.

For the physical device at least, Debian and 2 Btrfs data drives in RAID 1 certainly isn't turnkey but seems quite accessible at this point.

tekknolagi · on May 14, 2020

Syncthing works pretty well here!

newman314 · on May 14, 2020

I use Resilio Sync to sync my 1Password. It mostly works but is clunky in that the handling of identities is clumsy (unable to cleanly remove) as well as having less than stellar UX for adding/removing folders.

Still looking for the perfect Dropbox-like experience but without the cloud sync piece. If only syncthing had decent mobile apps...

Reelin · on May 15, 2020

Perhaps I'm completely misunderstanding something but don't centralized self hosted cloud-plus-app services address this exact usecase? Things like NextCloud, Seafile, and Sandstorm?

The only major drawback seems to be that you have to host the physical hardware yourself due to lack of solid end-to-end encryption for most platforms. Sandstorm might have it (I'm not clear on what's client- and what's server- side there), Seafile has end-to-end encryption that doesn't protect metadata (https://forum.seafile.com/t/how-strong-is-the-encryption/627...), and NextCloud appears to have a long-running beta of end-to-end encryption on a per-folder basis (https://nextcloud.com/endtoend/). Apparently Cryptomator exists (https://cryptomator.org/) although I've never tried it myself.

If you want to decentralize your shared files unfortunately last I checked SyncThing didn't yet support end-to-end encryption. (I don't think any of the other ones I mentioned can be used in a decentralized manner but things move quickly so I'm not sure.)

Alternatively, if you were thinking more chat and messaging there's self hostable federated services such as PixelFed, PeerTube, Mastodon, and Matrix. Or was there some other usecase you had in mind?

dpflug · on May 15, 2020

SyncThing does support end-to-end encryption, and has for a while: https://docs.syncthing.net/users/security.html#sync-connecti...

Reelin · on May 16, 2020

By end-to-end, I meant that the server can't see your data. For example, the idea of deploying one or more SyncThing instances to cloud providers that would never see your unencrypted data. There's an open GitHub issue about it since 2014 (https://github.com/syncthing/syncthing/issues/109).

mceachen · on May 14, 2020

I'm hoping PhotoStructure solves this usecase when I add sharing in the next version. Note that PhotoStructure is focused on just photos and videos--it isn't a general file sharing solution.

I'm looking into integrating with DAT or Hyperdrive-like solutions to help make storage backups less fiddly for users.

For now, I recommend my beta users use SyncThing or Resilio Sync to get their photos and videos off their phones and on to their home NAS or computer.

mafintosh · on May 14, 2020

It uses Noise to encrypt all transport connections and using the api you can whitelist which peers (by their public key) you want to replicate with. This API is not yet exposed through the daemon but it’s in the modules.

Hyperdrives also have a built in capability system where you have to know the public key of the drive to download it from a peer, so if you only share the key with yourself no one else can access it.

Finally using the modules you can build almost any kind of networking you’d imagine but that part requires more work on your side obviously.

contravariant · on May 14, 2020

So regarding the multi-writer problem, exactly how far away is this from implementing a central (bare) git repository? Seems to me that all you need is some kind of way to take care of authentication and possibly some way to make git's (mostly) append-only data-structure play nice with Hyperdrives'. I realise I'm probably missing some details, so my question is which details?

rakoo · on May 14, 2020

Congratulations to the team !

2 questions:

- What is the difference between dat and hyperdrive ?

- I see hyperdrive manages hyper:// urls -- what about dat:// urls ? Will they be still managed ?

pfraze · on May 14, 2020

Thanks!

- "Hyperdrive" was formerly the internal data-structure name of Dat archives. With this release, the team decided to rename the protocol from Dat to Hypercore Protocol. Subsequently, "Dat Archives" are now "Hyperdrives." The Dat community will post some updates about this soon.

- The hyperdrive-daemon does not support dat:// URLs. I don't know what the future of dat:// URLs will be but Beaker is phasing them out with a converter tool.

aspenmayer · on May 14, 2020

Another comment which is now gone asked what your relation to the Beaker project is, and how Hyperdrive is related to the Beaker browser?

pfraze · on May 14, 2020

Just replied there -

Beaker uses Hyperdrive, it's basically the source of its novel features. The Beaker team works on Hyperdrive (the Hypercore Protocol) but we maintain a separate org at https://github.com/hypercore-protocol

obiefernandez · on May 14, 2020

How soon until this can replace Dropbox for my little distributed working group consisting of 3 people wanting to share a single file system?

andrewosh · on May 14, 2020

Our little team has been doing just this, both for dogfooding and because it works well for us.

Take a look at the "mounts" section of the blog post where we describe a group pattern you can set up. You can create a group directory called "team-drive", then mount each user's drive into the group.

If you're using FUSE, this will feel similar to Dropbox, but with one directory per-user.

We're starting with these kinds of simple mounts, and brainstorming ways to extend them soon.

alphydan · on May 14, 2020

Could this be used to create a kbfs (keybase file system) competitor?

mafintosh · on May 14, 2020

Yea totally. The FUSE support is already very close to being elexactly that. We’ve been using it internally during our beta as an in office P2P dropbox using the mounts feature

rakoo · on May 14, 2020

In kbfs when I create a shared drive between you and me, the space is automatically available in read and write for both of us. Can this be done directly on top of hyperdrive or do we need to modify how hyperdrive (or the daemon I suppose) work ?

mafintosh · on May 14, 2020

We can mount each others drives and collab that way.

For a full “union” mount experience we still have some research to do but we are working on it.

The mount setup is really good though and fully p2p

MayeulC · on May 14, 2020

Interesting, thanks. Not clear to me when reading the announcement is: will hyperdrive replace dat? From reading the comments here, it seems that the answer is "yes".

Now, onto the content, you touch on de-duplication. I am quite concerned with the cost associated to updating a large file. Is something like rolling hashes investigated, to chunk files independently of their size? I guess it is, given you seem to be working hard on de-duplication.

But then, that kind of trick best works on uncompressed data, which is inefficient for transmission. Is data compressed before transmission? Whole chunks, or whole files at a time? Ahead of time? Interactively based on what the peer needs?

The trade-offs are many, and complex to investigate. I'm wondering if this could be used as an OS image, like OSTree does?

And lastly, I did not get if multiple peers having the same private key identity could modify the structure simultaneously. What would happen?

Also, nodejs gave me a kneejerk reaction that may be unwarranted, but that's quite a huge dependency to pull in for something that wants to be a ubiquitous building block. Does it have a C API? Also, twitter, discord, github (node to a lesser extent)... it seems somewhat ironic to build a ultimate decentralized filesystem while relying on these hypercentralized offerings, and I am afraid it could turn some contributors off.

rklaehn · on May 14, 2020

I kinda share your kneejerk reaction about nodejs for foundational tools, even though I must admit that dat/hyperdrive works surprisingly well for a js tool.

But dat/hyperdrive is a well-documented protocol, and there are several ports to other languages.

I am very interested in the rust port, but I am not sure in what state it is: https://datrs.yoshuawuyts.com/

I have seen some tweets about progress being made on this. Does anybody know more?

mafintosh · on May 14, 2020

The rust port of Hypercore has been very active recently and they are making good progress. Part of the latest Hypercore release was to move some of the transport crypto to be easier to port to other languages such as rust.

The wire protocol works now: https://github.com/Frando/hypercore-protocol-rs and the community is active in #datrs on freenode

EGreg · on May 14, 2020

I really liked Dat (thanks, Knight Foundation). And I think it gets a ton of things right. However, I wish it had two things that we need for our purposes... can anyone chime in and say how AND IF they can be accomplished with the current Dat and hypercore?

1. Consensus. If I submit conflicting updates and sign both, how does the swarm resolve what is the latest state?

2. Migrating the swarm. If all the machines in a swarm get corrupted, can I migrate to a totally new swarm?

noidesto · on May 14, 2020

What are the main differences between this and Syncthing?

pfraze · on May 14, 2020

I'm not deeply familiar with Syncthing but here are some differences based on what I do know:

- Hyperdrive includes mass-publishing as a usecase, so it uses public-key URLs and a bandwidth-sharing mechanism among its active peers (like BitTorrent)

- Hyperdrive is built on a general-purpose protocol called Hypercore which is a signed append-only log. These logs can be used for other datastructures. Some examples [1] [2]

[1] Kappa-core, a general-purpose db built on the logs https://github.com/kappa-db/kappa-core

[2] Cabal, a chat network https://cabal.chat/

0x006A · on May 14, 2020

what version of node.js does it require? i get this with v10.19.0 from Ubuntu 20.04:

    node_modules/hyperdrive-daemon/node_modules/hyperdrive-daemon-client/bin/commands/create.js:8        
      static usage = 'create [path]'                                                                                    
               ^                                                                                                    
                                                                                                                    
    SyntaxError: Unexpected token =

andrewosh · on May 14, 2020

Oops, give it a shot in v12. Think static class properties were added after 10.

0x006A · on May 14, 2020

v12 seams to work, v14 crashed. Would be nice to have that info in the README.

mafintosh · on May 14, 2020

It should work on 14. Could you open an issue on the repository?

0x006A · on May 14, 2020

filed hyperdrive-daemon #47 problem with hyperdrive-daemon/node_modules/fuse-native/prebuilds/linux-x64/node.napi.node

mafintosh · on May 14, 2020

Thanks, appreciate it

seigel · on May 15, 2020

Great work. Planning to read about it. I, and I am sure others, are continually intrigued about this space and I hold out hope that the definitive solution appears soon (along with the solution to micropayments !!).

On a slightly related note, is anyone interested in having a discussion about how to layer on top of all these "drive" systems a `HDFS` like drive that would have n block replication across different sources, and trying to interpret any given source (dropbox, hyperdrive, google drive, etc..) blocks to make sense of what is being stored there would render the person confused?

:)

seigel · on May 15, 2020

One more question... How effective would this be for a `network questionable` environment, if I were let's say doing a mobile app and wanted to incorporate this type of solution in for an offline first type of experience?

andrewosh · on May 15, 2020

Hyperdrive (and the stack it's built on more generally) works great offline. All the drives you own are still editable without the network, and changes will sync when you come online.

As for 'network questionable' and mobile, here are a few of the other projects building on Hypercore that have made those a priority [0] [1] [2].

The Hyperdrive daemon as it's currently built wouldn't fare too well in a bandwidth and/or battery constrained environment (wasn't designed for that), but a mobile solution is definitely on our radar.

[0] https://cabal.chat/

[1] https://www.digital-democracy.org/mapeo/

[2] https://github.com/consento-org/mobile

rklaehn · on May 14, 2020

Congrats for the release. Mostly familiar with IPFS, but learned about DAT at dtn.is last year. The hypercore protocol is pretty well designed and documented.

How would hyperdrive deal with the following scenario: you got a large dataset such as wikipedia. Lots of people have browsed it, but most of them only have a tiny fraction.

How do yo know which peers to connect to to get a particular bit (offset?) you are interested in? The DHT only tells you which nodes participate in the hypercore, not what they have in detail, right?

mafintosh · on May 14, 2020

The peers gossip using compressed bitfields in regards to what data they have. These bitfields are super small so we can pack quite a bit of information.

At the moment we don't do anything special in regards to discovery, but as we scale that's something we want to investigate. Since everything is running on append-only logs we can group the data into sections quite easily so there is some easy wins we can do there with announcing to the dht that you have data in a specific region.

rklaehn · on May 14, 2020

So do you opportunistically gather info about what other peers have via this gossip protocol, or just when you need something?

I had looked at https://datprotocol.github.io/how-dat-works/ , but I don't remember anything about a gossip protocol or a peer building a "view of the world". Is that new?

mafintosh · on May 14, 2020

It's only between the peers in your subset of the swarm for now. They exchange a series of WANT and HAVE messages where they subscribe to the sections of each others logs they are interested in.

We are working on expanding this scheme so peers can help discover peers that have the section you are looking for.

Due to the compressed bitfields these section are quite large. In most cases using a few kilobytes you can share WANT/HAVE for millions of blocks

rklaehn · on May 14, 2020

Yes, I saw the compressed bitfields. Most amount of bit twiddling I have ever seen in a pure javascript library...

Being able to identify a piece of content by an integer instead of a hash makes things more efficient compared to content-addressed storage a la IPFS.

grizzles · on May 14, 2020

How do you handle merge conflicts? Since you aren't calling it a blockchain I take it that Hypercore is targeted at the trusted (writers) and semi-trusted (readers) use case?

mafintosh · on May 14, 2020

Hypercore is a single writer append-only log. The website has a bit more info about how it works, but's basically a merkle log signed by a private key / public key scheme. We build collaborative data structures by combining multiple Hypercores.

Hyperdrives builds a p2p filesystem on top of Hypercore for a single writer. Using mounts you can mount other peoples drives so merge conflicts don't happen since there is no overlapping writes.

We are working on a union mount approach as well for overlapping drives (we talk a bit about this in the post)

grizzles · on May 14, 2020

So it sounds a bit like you've replicated git.

Do you agree that for the collaborative data structures side of things (like the chat app) users of the hypercore-protocol will likely run into clock trust problems?

PS I'm a big fan of your work/repos.

mafintosh · on May 14, 2020

Yea you have to trust the original writers atm. I have some ideas for reducing this trust in the future through some consensus schemes but nothing fully baked yet. Def something I wanna hit tho, so we can get better security in something like a massively distributed chat system.

ZoomZoomZoom · on May 15, 2020

Would Hyperswarm be useful for setting up mesh-like Wireguard networks automatically? We need some solution for this ASAP - IPv6 not coming, CGNATs everywhere.

gyrgtyn · on May 15, 2020

That'd be zerotier or tailscale

czei002 · on May 15, 2020

Sharing an append only log can be quite treacherous for users that are unaware of it. For example, when accidentally including some confidential stuff. A not so security aware person may think a quick delete fixes it and depending on the situation this might event be true, but its still in the log... Not familiar with Hyperdrive so please somebody correct me if I am wrong and this case is handled.

hutzlibu · on May 15, 2020

This sounds good!

But I would not call it finished, as of today, apparently only one person can make changes to the filesystem. That does limit the use cases.

"In v10, we don't go all the way to a general multi-writer solution; solving multi-writer scalably, without incurring major performance penalties or confusing UX, remains a research question for us. "

.. so, give them some support, so they can solve this.

rodolphoarruda · on May 14, 2020

Would it be possible to build a service that resembles email using this technology? Like dropping a message file within a local folder and making it appear on someone else's local folder?

andrewosh · on May 14, 2020

@pfrazee threw together a demo app called PaulMail [1] that you can view in Beaker [0]. It's very simple and is just a proof-of-concept (messages are unencrypted, for example), but it does hint at possibilities there.

[0] https://beakerbrowser.com

[1] hyper://1bc1faf01a22270fb5698a60e63ef7a596ad976457e6d9914a8fd56d87281917/

pfraze · on May 14, 2020

Your recipients also need to be in your beaker address book for it work. It's a very rough PoC.

BoumTAC · on May 14, 2020

I've been following dat since a few years and I've been hearing about multiwriter since a few years but not anymore. Is it still something you are working on ?

mafintosh · on May 14, 2020

Yes we are continuously exploring and research this. The mount support is our first stepping stone towards this. See the union mounts section of the post

myu701 · on May 14, 2020

I'm not able to view the link due to HTTPS errors.

The page you are trying to view cannot be shown because the authenticity of the received data could not be verified.

pfraze · on May 14, 2020

That's odd, what browser?

jhoechtl · on May 14, 2020

Can it be that you moved from the early days of dat which was written in Go afaik to primarily node/js? Why the move?

pfraze · on May 14, 2020

I think dat has always been node/js

tangent128 · on May 14, 2020

They may be thinking of IPFS, which is written in Go.

0x006A · on May 14, 2020

does it scale now? previously i was not able to import large folders (several TB) or many files (+100k files)

mafintosh · on May 14, 2020

Yep! That is one of the things we’ve worked the hardest on. Completely new indexing structure, using an append-only hash trie which scales really well. We’ve tested it with many big datasets including importing all of Wikipedia as files in a single folder. Worked like a charm :)

0x006A · on May 14, 2020

this one? https://dumps.wikimedia.org/other/static_html_dumps/current/... how long does it take to import it?

mafintosh · on May 14, 2020

I think it was that one yes. Can’t remember the exact time it took, as we ran it over a couple of days due to some unrelated computer issues.

gyrgtyn · on May 15, 2020

Can you limit the synced size on clients (like an LRU cache) ?

ohvigrinia · on May 15, 2020

Would there be an advantage to implementing this using WebRTC as part of the networking, so you could use it in browser?

I don't know much about this space, but always been interested in some sort of web socket/WebRTC p2p fs.

carapace · on May 14, 2020

[flagged]

andrewosh · on May 14, 2020

Fair enough -- updated it to POSIX-like. The main point there being that it should be as straightforward to use as Node's fs module.

yjftsjthsd-h · on May 14, 2020

I don't think that was the humor. Node is, from one point of view, a piece of a browser engine that's been extracted out. And you wrote a filesystem in it. It's like writing a webapp in x86_64 assembly - you can do it, obviously, but it can be viewed as an odd juxtaposition.

choeger · on May 14, 2020

[flagged]

dsun179 · on May 14, 2020

Wow. What a way to describe the tradeoff of a programming language. No, thanks

cxr · on May 14, 2020

JavaScript is a programming language. NodeJS is not. It's the substrate that the NPM ecosystem runs on.

choeger's comment very well may have not been made with a distinction in mind (and I agree that it was a low effort, low signal comment), but the distinction exists.

choeger seems to be using Firefox. Firefox is implemented using an Emacs-like architecture. The Firefox codebase includes millions of lines of JS, and that's been the case since before the language was JITted. I ran Firefox 1.0, 2.5, and 2.0 on an 800MHz PIII with 128 MB (later 192 MB, wow-wee!) of RAM. The JS in the Firefox codebase wasn't written in the NodeJS style, and if it were, that would have made it a non-starter.

Kneejerk JS haters are tiring, but so is the conflation of JS and the NodeJS ecosystem (by NodeJS supporters and NodeJS haters alike).

(Side note: if you insist on giving attention to folks engaged in low-value discussions, try to elevate the conversation to a level that makes it actually worth having.)

RussianCow · on May 14, 2020

These days, using JavaScript outside of the browser sort of implies Node.js, in the same way that using Python implies that you're using CPython, unless you specify otherwise. In theory your distinction makes sense, but in practice, very few people are using anything but Node.js for backend JavaScript systems (at least for new projects).

cxr · on May 14, 2020

Okay. Let's consider your comment within the context of the discussion taking place. What's your point?

yjftsjthsd-h · on May 14, 2020

We're discussing an application written in JavaScript that runs outside a browser. So the point is that it is unsurprising to discuss node. It's like if we discussed writing a network app in Python and someone said that the GIL was a problem. Technically, there are python implementations that don't have that problem, but in practice we would almost certainly be talking about something that would have been using cpython, so it's relevant.

cxr · on May 14, 2020

Two things:

First, this is very much being developed in a similar scope the millions of lines of JS that are in Firefox. This is going into Beaker. See beakerbrowser.com. Just as a point of fact.

Secondly, you still haven't connected your observations to the discussion! That observation amounts to, hey, there's lots of JS written for the NodeJS ecosystem—most of it is, really. And once again, my response is, "Okay, so it exists. So what?" I'm truly struggling to understand the significance of the comments here. Like, what if anything is that supposed to change? What is anyone supposed to do with that information? Is it supposed to change someone's mind? Is it supposed to change mine? And if so, in reference to what specifically? Is it even new information?

The most meaningful thing that I can manage to parse is where you say "the point is that it is unsurprising to discuss node". But whoever said it was surprising? Do you think I'm surprised? choeger mentioned NodeJS by name. My comment is evidence of a nuanced understanding of NodeJS. Where's the surprise?

Dylan16807 · on May 15, 2020

> It's like if we discussed writing a network app in Python and someone said that the GIL was a problem. Technically, there are python implementations that don't have that problem, but in practice we would almost certainly be talking about something that would have been using cpython, so it's relevant.

Not 'technically' at all. If it's normal python code, and the GIL is the problem, then you can change the interpreter and still use the code. But for something built on node, you can't just pull out node.