This one doesn't have WiFi, just 5 Ethernet ports. It can be flashed openwrt with a very reasonable amount of tweaking. It's actually quite powerful and has 256M of RAM and 128M of flash memory. I have one, it's very cool.
This is due to SQLite calling fsync a lot by default, to be on the safe side. You can use pragma journal_mode = "wal" and pragma synchronous = "off" and it should be much faster, without risking corruption if your server powers down unexpectedly (at least in theory, you should still make regular backups, which can be done from within a running sqlite instance using the backup command)
Yes! My desktop PC is almost 10 years old at this point and it works perfectly, I've upgraded the CPU to an i7-4790K (the best of that generation) and added some RAM (32G total) which makes it very good for most of my work and even for occasional gaming. Thankfully I don't use windows that much, but I know I can't upgrade to windows 11, so according to Microsoft I would just have to throw it away in a year or two, which is completely ridiculous.
I still happily develop on an i7-950 with 24GB of RAM from 13 years ago. I just haven't been able to justify the minimal performance improvements from upgrading vs all the other things my family needs.
Nope, that machine has a 512GB SSD and runs just as fast as modern corporate dev spec laptops I've used with the same core count. That being said, my static analyzer would definitely benefit from a more modern CPU!
As the main author of the Merkle Search Tree paper, I'm really glad to see the design become more known and start to be used. I did not go on and build a practical system that made use of the structure, although I originally came up with the data structure while thinking about how to build decentralized social networks and data stores for collaborative applications. Thankfully, the design itself is generic and can be adpted to many use cases, and it's great to see that people have gone as far as to build independant implementations in Go and Rust. I'd be curious to see how it performs in practice, in systems that have a real-world use, and not in the simplistic synthetic benchmark scenario of the paper. Hopefully it holds up to the promise :)
At Martin Kleppman's recommendation, we adopted MSTs for AT Protocol's data repository structures. This isn't my specialty so I wasn't deeply involved in evaluating or implementing them, but my understanding is that the self-balancing was a deciding factor and that everyone has been very happy with the outcomes. You should chat with Dan Holmgren at some point if you'd like to hear about his experience with it. Appreciate your work.
I thought it was really neat when I came across it. There was a detail that I didn't find in the paper (or might have missed).
What if the key you're adding has a lot more (or a lot less) leading zeros in its hash than the number of leading zeros in the current root layer? Do you just add nodes in between the root and the new node? What should go in those in-between nodes?
I think as long as your construction is deterministic, both options are possible: you can either add the intermediate nodes that have only one children, or you can skip adding them and allow links between non-consecutive levels. In that second scenario you would add intermediary-level nodes only when necessary, i.e. only when they actually need to have several children. The second approach is better for performance but might have a bit more implementation complexity.
Very interesting project and impressive work, thanks for sharing!
Can you talk a bit about how data is replicated between nodes when Stalwart is run in clustered mode, and what kind of data integrity/resilience properties we have when one, two, several nodes go down?
Also, have you considered implementing server-side encryption of e-mail messages so that a "honest but curious" system administrators could not read user's messages? (e.g. using the user's password to derive an encryption key). More generally, what are your thoughts on the "privacy" aspect?
> Can you talk a bit about how data is replicated between nodes when Stalwart is run in clustered mode, and what kind of data integrity/resilience properties we have when one, two, several nodes go down?
Data is replicated using the Raft consensus protocol and when multiple nodes go down the cluster will keep keep active unless there are not enough nodes to guarantee consistency. More details can be found on the documentation [1] but I plan to add more details on how replication works once the server passes the Jepsen tests.
> Also, have you considered implementing server-side encryption of e-mail messages so that a "honest but curious" system administrators could not read user's messages? (e.g. using the user's password to derive an encryption key). More generally, what are your thoughts on the "privacy" aspect?
Yes, in addition to server-side encryption also S/MIME and PGP are on the roadmap.
For back-end systems and distributed systems, I'd love to work on a new OS based on a microkernel and capabilities. Ditch Linux+Docker+Kubernetes, replace that with a microkernel that has a minimal set of modules to multiplex storage and networking, build an orchestrator directly on that that schedules processes as micro-VMs. In those VMs, we could run a stripped-down Linux kernel to handle legacy applications, but probaly we would want to design a few standardized abstractions for accessing other components of the system (storage, databases, queues, RPCs, the scheduler itself) as network services which we would use to implement more modern services. In many cases we could just leverage existing protocols that are built over HTTP (e.g. S3 for object storage). The scheduler would be aware of the dependencies of each services (i.e. the other services that it needs to call to function) and could have the task of directly connecting those things together. To contact one of its dependent services, a service or application would no longer use a hostname+port (resolving the hostname using DNS), it would just directly have a file descriptor to a socket or pipe or whatever kind of RPC interface that the OS directly maps to the corresponding service, either locally or on another node of the distributed system.
If you are aware of anyone working on something like this, please let me know!
I mean, sure, but we're operating under the premise that we don't want to give out our data to a cloud provider and instead want to store data ourselves. Half of the post is dedicated to explaining that, so pretending that you can't infer that from context seems a bit unfair.
Then you didn't look close enough. Have a look at the .drone.yml to see the entry points, we have both unit tests and integration tests that run at each build.
You probably want at least some redundancy otherwise you're at risk of losing data as soon as a single one of your hard drive fails. However if you're interested in maximizing capacity, you probably need erasure coding, which Garage does not provide. Gluster, Ceph and Minio might all be good candidates for your use case. Or if you have the possibility of putting all your drives in a single box, just do that and make a ZFS pool.
> You probably want at least some redundancy otherwise you're at risk of losing data as soon as a single one of your hard drive fails.
If by "fails" you mean the network connection drops out. Then yes, that would be a huge problem. I was hoping some project had a built-in solution to this. Currently, I'm using MergerFS to effectively create 1 disk out of 3 external USB drives and it handles accidental drive disconnects with no problems (I can't gush enough over how great mergerfs is).
But, if by "fails" you mean actual hardware failure. Then, I don't really care. I keep 1 to 1 backups. A few days of downtime to restore the data isn't a big deal; this is just my home network.
> Or if you have the possibility of putting all your drives in a single box...
Unfortunately, I've maxed out the drive bays on my TS140 server. Buying new, larger drives to replace existing drives seems wasteful. Also, I've just been gifted another TS140, which is a good platform to start building another file server.
You've given me something to think about, thanks. I appreciate you taking the time to respond!
It's not a hard requirement, you might just have a harder time handling voluminous files as Garage nodes will in all case have to transfer data internally in the cluster (remember that files have to be sent to three nodes when they are stored in Garage, meaning 3x more bandwidth needs to be used). 10Mbps is already pretty good if it is stable and your ping isn't off the charts, and it might be totally workable depending on your use case.
This one doesn't have WiFi, just 5 Ethernet ports. It can be flashed openwrt with a very reasonable amount of tweaking. It's actually quite powerful and has 256M of RAM and 128M of flash memory. I have one, it's very cool.