I really wonder why is it so common for people to default to JSON even when designing complex cryptographic protocols like this. I was skimming the spec for the script interpreter, which is something you'd expect would be focused on compactness, but it seems like it's designed entirely around JSON semantics and concepts, which means any implementation necessarily carries along this complexity. This isn't to say that JSON is bad, but to put it at the absolute core of the data model seems problematic. This problem happened with Matrix when they started having to layer JSON around base64-encoded-encrypted-JSON and generally the looseness of the format infecting the entire data model of the protocol.
Like the SPMessage format here too, uses JSON.stringify to indicate structure encoding, which necessarily implies that any implementation of the protocol is bringing along needing to handle JSON semantics. JSON does not seem to be well-suited here as it doesn't composing cleanly in the way a data structure defined in terms of bytes would.
> This problem happened with Matrix when they started having to layer JSON around base64-encoded-encrypted-JSON and generally the looseness of the format infecting the entire data model of the protocol.
Indeed, this has already created considerable problems for the request and event signing and getting different Matrix implementations written in different languages to interoperate correctly. I now firmly believe that cryptographically-signed JSON is always a disaster waiting to happen, no matter how hard you try to canonicalise it.
But just because it's a webapp doesn't mean you have to use JSON for everything, it's the other way around, using JSON for everything means that the only place the data model feels natural is the web.
Which is strange because JavaScript has had the Uint8Array that makes manipulating arbitrary byte structures easy for a while, it doesn't feel any less natural than other dynamic languages in that regard. Defining a "signed message with a header" structure in terms of bytes like I described above seems like a no-brainer. If you want the payload data of your protocol to use JSON then that's fine, but that shouldn't dictate the cryptographic layers underneath.
As others have said, there are unfortunately not that many good alternatives for serializing data in web browsers (that are both performant and don't require bulking your app with the serialization implementation).
We've done our best to make sure that the way we're serializing objects can either be done in a consistent manner across implementations or, if done inconsistently, it doesn't matter.
I don't really buy the argument that additional serialization logic adds bloat to the application when you compare it to the many thousands of lines of code in modern JS frameworks for manipulating DOM elements.
The consistency is really an important part here, are you spending less time doing the testing and bugfixing to ensure that JSON ambiguity is not possible / is harmless than you would writing serialization for cryptographic types in unambiguous byte-oriented structures?
SPMessage serializations are never regenerated after they leave the client, so it's not an issue as far as I'm aware. JSON is supported universally in all languages and all platforms, and natively in the browser. Is there a specific serialization format you're advocating for?
I suspect it's more that JSON is (rather explicitly) not designed to be stable across all implementations, so there are subtle differences. Numbers don't have a specified precision, strings don't have a specified encoding, maps have no canonical order nor do they require unique keys[1], etc.
Nailing all that down so you can e.g. have a truly deterministic signing process often requires custom encoders/decoders and is rather fraught with problems even then, at which point why even bother with JSON. Use something designed to be consistent.
We're aware of these issues with JSON, but they do not affect the integrity of Shelter Protocol, and by using JSON we get a lot of advantages in terms of speed and developer usability (support for tools like `jq`, schema-less serialization, widespread language support, easier debugging, etc.).
I was really excited when I tried out the scuttlebutt protocol a few years back. Distributed, peer-to-peer, offline-enabled social networking seemed neat.
But a healthy ecosystem of clients never emerged. When I looked into why, I discovered it relied on node's particular method of JSON serialization[0]. Are you confident that the shelter protocol has avoided this pitfall?
Yes, I am confident that we have avoided that specific pitfall because as I mentioned we do not ever re-serialize the JSON.
However, thinking on it more, there is one thing Groxx mentioned that has given me pause, and that is "Numbers don't have a specified precision". This could be an issue in the future if, for example, computers suddenly use 128-bit numbers and JSON implementations start to actually use such numbers. It's an edge case that might result in incompatibility with clients that use 64-bit numbers. So we'll consider the possibility of changing the serialization format after researching this more.
> So we'll consider the possibility of changing the serialization format after researching this more.
I asked Brendan Eich if JavaScript will ever change the way it represents floating point numbers (from 64-bit IEEE 754 values), and he said no ("don't break the Web"). And to represent larger bit values in JSON you're supposed to use a JSON string and schema to specify the type. I'm not sure exactly what he means by that, but I assume it means something like using JSON keys like "<BigInt>mykey" and string values instead of number values. Then manually parsing the string into whatever native 128-bit type or whatever representation you have.
So - JSON still seems to be the favorite here, but I totally get the anxiety around it because of how under-specified it is. If you avoid re-serialization nothing should break.
Avoiding re-serialization does avoid a ton of issues, yea. Though I would suggest building in validation for e.g. no duplicate keys immediately, because unexpected duplicates are a common source of bugs and security exploits (e.g. it's a fairly common issue with http headers when requests cross multiple systems).
After evaluating the various options, we seem to be settling on the idea that the protocol will specify that for duplicate keys, the handling must be the same as in browsers: meaning, the last key is the one that's used.
How about bencode? c-sexp? netstrings? The important features I would demand are: able to represent the types you need (e.g. integers instead of strings of digits or floats) and designed to have a canonical presentation from the beginning. There are so many bugs around ASN.1/XML/JSON parsing and "normalisation" that it's simply inexcusable to repeat this mistake. If the designers didn't come up with a really convincing explanation for selecting the wrong serialisation format for a cryptographic protocol how likely is it they read and understood the prior art in other fields like cryptography? Just because a canonical JSON (de-)serialiser is only a npm/pip/gem install is no excuse to burden a specification with the bloat and attack surface. Even fix sized, flat, packed structs are a better solution. K.I.S.S.!
I wish there was some language-agnostic tool for defining byte-oriented data strucrures that could emit code to manipulate them in different languages. Protobuf and similar tools exist but they still have their own weirdnesses and aren't entirely zero-overhead. I've seen plenty of pseudo-schemas used in standards documents, in particular the MLS spec which I think was finalized recently.
And I'm seeing a lot of "under construction" and "coming soon" in other parts of the protocol which seem critical to functionality, to the extent that I can't see how this protocol is even meant to fit together.
You're right the documentation is incomplete, we are actively working on the protocol and for any sections that we feel are too much "up in the air" as far as details go, we have purposefully left blank so as to not mislead implementations of the protocol.
However, the core of the protocol is stabilizing and I can answer generally as far as missing sections go. Regarding federation, instead of publishing events to the server you're connected to, you publish events to a different server. Everything else works pretty much the same.
I had a similar question about the vast number of "coming soon" sections and that led me to wonder why you would choose to submit this link at this stage of the specification's lifecycle
Would it be accurate to say you're trying to get broader adoption of this idea, but currently the only implementation is in your group-income repo?
A lot of federated protocols (including partially implemented protocols, thinking @-protocol) are being released right now. There is a paradigm shift occurring in the way software development of web apps is done, and we want people to be aware of all of their choices when exploring this new way of building web applications.
Maybe you want to build a federated application yourself, but want certain features that some of these protocols don't have. Well, now you know a little more about what's possible in this design space. Perhaps you've already settled on a protocol (e.g. have started building on ActivityPub), but want end-to-end encryption. Well, now you're aware that you can in fact add it in to your existing ActivityPub-based app. For example, there's no reason why Mastodon couldn't use Shelter Protocol to encrypt DMs between users. Shelter Protocol is very lightweight, and can be an add-on to existing apps.
Regarding the ZKPP, it is implemented, but like the rest of the code is not yet separated out from that repo. That is one of the next steps we'll be working on, a standalone developer SDK.
We are also welcoming early feedback and contributions from those who would like to participate.
comment in the source code of your demo app's login endpoint -- which doesn't appear to make use of the user's password at all? Surely that can't be right.
> I ask that with an especial emphasis on the zero knowledge password system which is currently undocumented but, if I'm understanding correctly, also unimplemented?
That one’s easy, it’s just (likely augmented) PAKE. There are a couple options in this space. The zero-knowledge stuff likely comes from the blind salt (OPRF, Oblivious Pseudo-Random Function). There are different variants to chose from, with slightly different security & performance trade-offs. This stuff indeed allows password authentication without ever revealing the password to the server, but the password database itself is still vulnerable to dictionary attacks once it’s stolen.
What worries me though is how they failed to mention PAKE. I’ve seen much of their talk, and to be honest it smelled like snake oil. Very shallow explanation of the protocol, and many missing steps between that and an actual use case. It’s supposed so solve many Big™ problems, but how it achieves it is very unclear to me.
Indeed, although undocumented, this is implemented and is a PAKE variant ([1] and [2]).
The way that it works, at a high level, is similar to how SRP works. Two random salts are generated (let's say, A and B), where A is used for authentication (and hence public) and B is used for deriving the other cryptographic keys.
When you authenticate, you retrieve A and then you prove to the server that you know what scrypt(A, password) is. At this point, the server provides you with B, and you can use this information to derive scrypt(B, password), which in turn you use to derive other cryptographic keys.
It being an oblivious password store, there are other steps taken to make the protocol stateless (from the perspective of the server) and to make parties commit to random values used so that runs of the protocol cannot be replayed.
> but the password database itself is still vulnerable to dictionary attacks once it’s stolen
This is correct, and I'm not sure there are good ways to prevent this or equivalent scenarios from occurring. Therefore, you should see this as an additional layer on top of your already secure password and not as a substitute for a secure password.
The reason for having this mechanism rather than not having it is to protect your password from brute-forcing by the public at large, in a scenario where the server operator is semi-trusted. Without this implementation, you have three alternatives: (1) Forego passwords entirely; (2) make the salted password 'public', along with the salt, which is all the information you need to brute-force it or (3) make more "normal" authentication flows without PAKE, in which case you need to trust the server even more. If you insist on using passwords, this is a compromise solution between (2) and (3), i.e., between anyone can break it and trust the server entirely.
> This is correct, and I'm not sure there are good ways to prevent this or equivalent scenarios from occurring.
I believe there isn’t indeed, sorry this part came out as a criticism.
I actually have deployed a PAKE at work for a corporate CRUD app once, and the entire security hinged on login/password. Clients authenticate to the server with the password, server authenticates to the clients with its database entry.
Sure the password could be brute forced if the databased leaked, and sure anyone could impersonate the server with a database entry, but this reduced security allowed simpler and more convenient administration: no need to bother with a PKI, just take good care of the password database and reset everyone’s passwords when we suspect a leak.
(Now if this was for the wider internet I would have added a PKI layer on top to prevent server impersonation.)
> The reason for having this mechanism […]
Yeah, PAKE is real nice. Ideally every password based login system would use augmented PAKE under the hood. Not only does it protect the passwords better, the protocol itself doesn’t need to happen in a secure channel (this can help reduce round trips), and the bulk of the computation (slow password hashing) happens on the client side. This reduces both network load and server load, what’s not to like?
Hey greg! Glad to see the recording of the D'Web presentation made it online [0], "Most of today's web apps have privacy settings, but none of those privacy settings are real. We didn't want to be one of those companies that gave our users privacy settings, and effectively be lying to our users" is a great problem statement.
For the blockchain skeptics, good news, it's not a blockchain: it's a distributed virtual machine without waste-heat-enforced-global-consensus. Of course it's much faster to see "smart contract" and hit the back button than investigate a new way of doing things so Shelter has their work cut out for them.
Am I right to read this as a replicated, optionally encrypted, append-or-kv-set "database"? With a mechanism for adding identities which can control each "primary key" of sorts?
I think I'm not following how the checksums are produced (are you going to serialize and replicate the code too? seems odd if you're going to include Vue in that serialized data...), nor how that does much at all different than an append-only log would achieve (maybe you can compress kv-sets and discard the history after some time?)... but docs are somewhat incomplete so maybe that's just not covered sufficiently yet. Or I skimmed too quickly and missed something.
Shelter Protocol is an append-only log that is stored in a key-value database. I recommend watching the talk if you haven't already, it should answer your questions.
Regarding storing Vue (or any other frontend framework), the answer is gonna get a bit technical (sorry, we haven't documented this on the website yet). This is a curiosity for how we happen to be using Shelter Protocol in Group Income (which uses Vue). Vue.js both is and isn't saved to the contract. Each contract manifest has two versions of the contract: one "complete" contract containing all of the code necessary for reproducing the state, and one "slim" version that has any large dependencies passed in to the contract at runtime. Group Income uses the slim version so that the contracts load quickly, but it's also possible for anyone to reproduce the state of a Group Income app independently, without using Group Income, by loading the "full/complete" version of the contract because all necessary dependencies are bundled in.
>an append-only log that is stored in a key-value database.
Yeah, that's definitely not clicking in my brain at the moment. I guess I need to actually watch/listen to that video rather than skimming the slides.
For storing code: yeah, I guess a more "storage polite" version of this would be to serialize the "engine" of whatever you're doing, rather than its presentation. That can often be quite small, certainly in comparison. Would it be possible to simply store e.g. a git repo name + sha, and then clients download as needed? Or do storage-providers need to be able to execute any of this (beyond enforcing key permissions)?
> Would it be possible to simply store e.g. a git repo name + sha, and then clients download as needed? Or do storage-providers need to be able to execute any of this (beyond enforcing key permissions)?
The server stores all of the events and all of the contracts in a content-addressable way, in whatever key-value database it is using. It does not execute any of the contract code though, no, that's done locally by clients.
> This virtual machine defines operations (“op codes”) for managing keys, defining so-called "smart contracts" (computer programs), and performing both encrypted and unencrypted actions.
There's nothing inherently wrong with a block chain. It's a cryptographically immutable linked list. Anti-buzzwords can be as shallow as buzzwords.
With the exception of the inefficiency of proof of work, nearly all the problems with cryptocurrency are human problems related to the toxicity of the ecosystem rather than intrinsic issues with the tech. Code doesn't scam people. People scam people.
I'll never understand the undeserved hate that blockchain tech gets. I understand that some people automatically associate blockchain with crypto scams, but fundamentally the tech is inspiring.
I thought the same but I really do understand the "hate" now. In fact, as someone who sees a lot of potential in the tech, there's even more reasons to be upset. "Blockchain" and even "crypto" are terms that became synonymous with opportunistic speculation at best, and scams and frauds at worst.
And honestly, the vast majority of people on the "inside", i.e. those actually working with it, were opportunists as well. Most of them saw the tech narrowly as an unregulated financial instrument.
I'm inspired by alternate modes of data storage and compute, and blockchain gets points for popularizing public key pairs as a means of committing transactions, but I totally understand the hate: whenever a tool is invented, it makes certain things easier than they used to be, sometimes unintentionally. Merkle trees make verifying data integrity fast, and proof of work was invented to make sending spam emails slow. The two combined with a money-metaphor makes spinning up pyramid schemes zero marginal cost. Whether or not you're on board with Mr Nakamoto's banking critiques, the result of the technology has been a tidal wave of thin schemes defrauding hopeful and desperate people, while adding little value outside of that world.
because all the people working on blockchain-esque systems never seem to grasp societal trust boundaries.
its like theres no amount of Adam Curtis documentaries that will shake silicon valley folks from the myth that the computer will lead to a better, more equal society.
> you cant compute yourself out of a broken world.
Most certainly not, but surely you can build better tools with the aspiration of facilitating certain goals, can't you? It's not the tools in or by themselves that will improve (or worsen) the world, rather something at your disposal to pursue your goals.
> the myth that the computer will lead to a better, more equal society
Agreed that it won't. But, IMO, the strength of Shelter is that it covers a niche that many other systems (blockchain-y or otherwise) don't, which is data autonomy and confidentiality. Most popular web apps today are centralised silos that don't give you privacy from the operator, and those that aim for federation often also don't give you much privacy either.
Now, it can be that those factors are not important for the specific thing you're developing, and that's fine. But, if they are, having an existing framework to build on top of can give you a head start (even indirectly, by showing you what works or doesn't).
Disclaimer: I'm involved in the development of Shelter. All opinions are my own.
This is unnecessarily defeatist. You can make improvements to a broken world, including new frameworks for compute that removes dependency on billionaire data brokers.
As an Adam Curtis fan (for all his faults), I don't believe technology is neutral nor that progress is teleologic. I do believe that people could be better served that software that works in their interests instead of against them.
And funny you mention a broken world, as if we're doomed to be excluded from the paradise of eden, the very first walled garden. Those of us working on distributed applications are trying to make walled gardens obsolete, no forgiveness required :)
What do you mean by replacing trust with crypto? Like in 'code is law'? If so, yeah, you can't replace one thing with the other because they're fundamentally different things that may only overlap in certain areas.
But on a broader scale, I don't see what in cryptography makes power inherently more concentrated. Crypto is just a way for enforcing certain trust relations that have already been established or agreed upon. Just like you can use crypto to help centralise power (e.g., allowing you to only run signed applications that can only show signed content), you can use crypto to help decentralise power with tools for confidentially presenting content and allowing you to vet your applications haven't been tampered with.
In both cases the underlying technology has many common components, and what changes is the use you make of it.
bitcoin-esque systems, yea - largely agreed. they're trying to technologically force rules on a social system. there are Issues™ frequently adjacent to that.
blockchains have little to do with that though, they're more of an easily-validated data replication technique than anything that has social implications. and they've been around for much, MUCH longer than bitcoin: https://en.wikipedia.org/wiki/Merkle_tree
> I'll never understand the undeserved hate that blockchain tech gets. I understand that some people automatically associate blockchain with crypto scams, but fundamentally the tech is inspiring.
Is it? From where I sit it's an environmental catastrophe as we burn squillions of CPU/GPU cycles for the modern day tulip craze (bitcoin).
For any purported use case of blockchains, with possible exception of buying illegal things online, existing technologies are better.
Monero is the one cryptocurrency I will endorse for having a genuine focus on private transactions. For people living under authoritarian regimes, oppressive families, or just whoever wants privacy. My only misgiving is the energy usage.
I don't hate blockchain, but I also don't see much use for it that isn't better accomplished through other means. So I guess I'm just not inspired by the tech.
But I already have end to end encryption - that's literally any sensible TLS configuration, and what iOS requires for apps unless they explicitly opt in to allowing weak connections.
Responding to the nonsense introduction:
> By design, traditional web applications enable server administrators to monitor all user activities.
That's a choice. What is the compelling reason for a company that wants to monitor the use of their apps to use a system that ostensibly says you can't?
> Although these web apps offer “privacy settings” to users, they fail to provide any real privacy protection.
Yes, because the options are either the data is inherently insecure, or the data is fully encrypted. Governments and users both have difficulty with this concept: you cannot have data security and also backdoors, you can't have data security and also "I have lost every component of my account identity: devices, passwords, and passcodes, but want you to recover my data".
This is ignoring companies for whom "privacy settings" are an intentional lie (Facebook, Google, ...), and again, why would such a company adopt a platform that ostensibly forces lack of spying?
> Shelter Protocol introduces new ways to handle logins and data storage on the server while preserving the conventional username/password experience that users are familiar with.
The username/password system people are familiar with is widely understood, and clearly demonstrated, as being bad for security.
> Instead of storing data in a database in clear text on the server, data can now be end-to-end encrypted and synced across multiple devices, and even across servers operated by different individuals.
Already completely doable, and the companies that don't do so have chosen not to, for a variety of reasons - some good, some bad, but those reasons are not because encrypting content securely is hard.
> The Shelter Protocol (SP) defines operations for a high-level, lightweight, federated, end-to-end encrypted virtual machine.
Or you can use JS, which is already available, runs on every machine that exists at this point (is this good?), is already federated: any device can run any JS you send it.
> [remainder of front page]
Largely nonsense.
* Key concepts *
> Since every action in SP is signed using a user’s private key, which in turn is derived from their password
So it's bad crypto. Huzzah!
After this I got bored reading this nonsense.
There's no actual justification for why this magical VM is necessary or good, nor any explanation of how they're going to make it "federated" (because despite advertising federation, it does not appear to be), what they consider federation to be, or why that is good.
Their one example app does nothing that requires any of their advertised features - literally every part of this could be done with existing web tech, and largely be done better.
Like the SPMessage format here too, uses JSON.stringify to indicate structure encoding, which necessarily implies that any implementation of the protocol is bringing along needing to handle JSON semantics. JSON does not seem to be well-suited here as it doesn't composing cleanly in the way a data structure defined in terms of bytes would.