Existing research strongly suggests that deriving asymmetric keys from human-provided seeds (i.e. passphrases) and nothing else is strongly inadvisable.
This is because it is possible to attack large numbers of passphrases at once, and it is possible to harvest large numbers of targets from public databases of public keys.
Additionally, entropy estimation tools generally are not good at accounting for the fact that human behavior results in non-obvious biases and patterns in passphrase selection.
To see an example of this model failing on a large scale, look at the number of Bitcoin "brain wallets" that have been hacked, despite having apparently strong keys.
They work well for me, but some people dislike the abbreviation aspect. I'm investigating how to make a grammar-based engine (rather than bigram-based) for smaller data files and better phrases.
The algorithm is [898, 537, 321, 205, 361] -> tidlotexidaifol -> "tidy lot existence daily following". The phrase is generated deterministically from the password, so it's no more secure than the previous steps. It's just a mnemonic to help remember the password.
If by trigram you mean "3-letter word prefix", then yes, they are all equally likely. If you mean "group of 3 words", then no, they're very biased. "raccowsli" is as likely as "afrdisuti", but it will always generate the mnemonic "race cow slipped", and never "racter cowslip slithered".
Just to add to why you would or would not want to use the passphrase vs. the abbreviation - I've personally found that I find it much easier to actually type a password that consists of words, names or "nonsense words" generated by Markov chains (though not usually ones generated by letter frequencies).
I am often comfortable entering in a given password on a device on which I am not comfortable unlocking my password database (or entering my password database password + keyfile), but I usually have a device which does have my password database on it (phone, laptop, etc), so my application is "it doesn't have to be memorable, it just has to be something I'll enter without making a typo". For whatever reason, I find that entering in "tidlotexidaifol" would be harder for me to do accurately than "tidy lot existence daily following" or "tidylotexistencedailyfollowing".
That said, I like this approach, because some sites (like Bank of America, shockingly), restrict total password length, making a "very long password with relatively low entropy per character" approach infeasible.
You could use the phrase if you liked, but each word is only 10 bits of entropy.
One nice thing about having a mnemonic for the password is that you don't have to remember the words precisely -- you can modify the mnemonic to suit your tastes as long as the prefix remains. I originally wrote it to generate passwords that I have to enter frequently (e.g. sudo), so being short was a goal.
I'm working on making actual passphrases with more entropy per word.
Yeah, I assume that you can just have a non-deterministic expansion which will add some entropy per word to increase the entropy of the passphrase over the abbreviated version, but my point was just that even if there's no increase in security by using the full passphrase, counter-intuitively, using the full phrase would likely be more convenient for me (and probably for many others).
> This is because it is possible to attack large numbers of passphrases at once
Do you mean some kind of rainbow table attack, where you go through a dictionary and derive keypairs for every likely password? Then you could match any public key to the precomputed private key.
I'm not familiar enough with ECC to know if this is feasible, but it seems like an obvious weakness of the system.
Yeah, the fact that breaking this cryptosystem is essentially equivalent to attacking unsalted passwords means that an attacker can save a lot of work.
Also, this simplifies setting up the attack in general. For example, attacking a bunch of keys based on random data + passwords is way harder than attacking keys based on just passwords, because for the former you must also have the random data (which presumably is not posted online in massive troves).
I think the hesitations about passphrase being subject to brute force, rainbow table, etc. are warranted, but I have another concern:
If my passphrase gets compromised, I have to retire the keypair.
That's true of a key file with current asymmetric systems; but, presently if the passphrase of my GPG private key is compromised (e.g. by a hardware key logger), I only have to change the passphrase and ensure the old keyfiles are destroyed.
With MiniLock, if my passphrase is compromised the entire key material is compromised and I need to revoke the public key. But how do I revoke it? Do I tweet a message with the private key saying the public key is revoked? Will there be a centralized place to publish revocation messages? Efficient key revocation will be absolutely critical to this system and that's hard if the key distribution mechanism is tweets or some other ad hoc mechanism. This is one thing that PGP key servers really help with.
I actually approached Kobeissi with this point in the meeting in Noisy Square right after the talk, suggesting he integrate a TPM into his key management system (like how you can call out to one in Firefox for SSL with libpkcs5.so or some similarly named library). He responded that the specs were open enough that anyone could add that in. As to a centralized place your guess is as good as mine. Also can MacBook users even access their TPMs?
Yeah, I wouldn't trust the TPM - certainly not from a Windows machine, and not even an Apple one after the recent revelations/research, which shows Apple tries to make the device secure against "regular" hackers, but very easy to access by Apple itself or the US government.
My current one is from atmel in 2008, before atmel quit making them, so I figure at least in this case I'm safe. I would probably not use a newer one if I was worried about TLAs though. As I am currently in the market for an MBP, where do I find this information about Apple TPMs?
But wait… if the entire key is derived from the passphrase, if two people both choose the passphrase "password", would they not then get the same public and private keys? Can I not brute force people's private keys by taking millions of common passwords¹ and generating public keys from them, and then seeing which ones match my friends?
Am I wrong? Doesn't this seem much easier to brute force than a RSA key? (Presuming the private key hasn't been compromised; if it has, it's likely protected by a password, and then these two are about equal.)
¹accepting that some will get rejected because of "uses the zxcvbn library in order to impose a strict limit on the amount of detected entropy present in entered passphrases. miniLock will not allow passphrases that fall below the threshold of 100 bits of entropy"
Yes, and it clearly stated in the demo[1]. The key feature that this approach is trying to accomplish is allowing the user to not have to store anything beyond a password. Keeping track of a keyfile is HARD for a lay-user, so miniLock is trying to do the best it can securing files with only a password.
Except that since there's no salt or other measures involved, there isn't even a trivial protection against rainbow tables letting me create one table and crack all the passwords. It's using ECC as a fun buzzword but tossing all the actual realworld benefits of key-pair crypto out the window by having a single user-provided string map to a single keypair.
Please enlighten us on how to generate rainbow tables for passphrases. Assuming they use a lower bound of 6 bits per character, a 100 bit phrase is 46 characters long.
Off the top of my head, I'd use a book of quotations, popular lines from movies, etc. And try to hit common permutations of each. So there's a bit of low hanging fruit. But that could be detected when they generate their key.
Remember, salt doesn't really prevent anyone from using "password" or the first line of Billie Jean.
Assuming they were actually keeping the contents of such wordlists out and actually ensuring high-entropy passphrases, we would be in an OK place, though still far removed from the security provided by randomly seeded ECC keypairs.
And as an attacker, I'm using my rainbow tables specifically to target the low hanging fruit. It gives me the best initial odds, and also the best return on any given hit: I'm way more likely to get more users per match for things in the common phrasebook, by nature of it being the common phrasebook.
>Assuming they were actually keeping the contents of such wordlists out and actually ensuring high-entropy passphrases, we would be in an OK place, though still far removed from the security provided by randomly seeded ECC keypairs.
I'm fairly skeptical of the ability of software to "ensure high-entropy passphrases". I don't think it's trivial to anticipate the entropy-lowering strategies that people will come up with in order to help them remember their passwords.
Anyway what will be easy to remember for a human will be sequences exhibiting high ngrams correlation, which is efficiently modeled by markov chains. Wondering if studies have been done from this perspective on password strength, and how chains generated from leaked password / passphrase collections would deviate from ones generated from common language.
I don't get your point with your link. It says it cannot detect certain repeated character runs, or the word "password" in Morse code.
Basically you're arguing against memorizing the key. That means users are forced to keep a file around. It ignores the many cases where you don't want to have to maintain a file.
> Please enlighten us on how to generate rainbow tables for passphrases.
Pick P, the set of passwords, to be something like "4-6 dictionary words appended". Pick R, the reduction function, such that it maps from a hash to a random value in P.
Why not just store the salt with the data in the same file? Its not as though the program needs to maintain the files' binary compatibility with anything since the encrypted file isn't supposed to be readable by any other program other than the decryption program.
What makes dealing with key pairs hard for the lay user?
If the issue is "they have to transport them to use the crypto on systems other than their own": We should not be teaching lay-users (or non-lay-users) to enter passphrases or use private keys on untrusted systems.
For using multiple trusted systems, having the client software support multiple keypairs (like OpenSSH) or providing an easy way read the key off removable media (like an encrypted thumb drive) are great.
What about making the user choose two passphrases, which are concatenated together by the tool? Cracking a single passphrase may be easy, but cracking two concatenated passphrases is significantly harder.
E.g. password 1: "the quick brown fox jumps over the lazy dog"
Password 2: "jack be nimble jack be sick"
Final result, used to derive a keyfile: "the quick brown fox jumps over the lazy dogjack be nimble jack be sick"
Notice how "quick" was switched with "sick" in the last word. Now there's no pattern that can be easily cracked. If we force a user to explicitly do something like that, then this can still work.
That example was chosen intentionally to be weak. It shows that even with two relatively weak passphrases, the result is still somewhat strong. If we add extra requirements on top of that, such as forcing the user to use numbers and capitalizations, then the result should be sufficiently unique.
There's no difference between that and just requiring the user to use a longer password.
Anyone trying to attack the system will just program their cracker to be more likely to try concatenating words together awkwardly in the password somewhere.
It's easier for humans to remember two distinct passwords with individual complexity requirements than one gigantic passphrase.
By forcing the user to choose two passphrases which are then concatenated, the result is one gigantic passphrase that a cracker can't easily crack, yet is easy for humans to remember. It seems like this solves the problem of rainbow tables.
A keylogger could still break this system. But if an adversary has planted a keylogger, they could've simply stolen your keyfile.
No, because then the person generating their table just takes that into account. It's slighly harder, but not noticeably so compared to how secure it needs to be.
It seems like as long as the complexity of the passphrase is sufficient, then a rainbow table can't be effective. For example, a 128-bit random AES key is a kind of passphrase that's generally not susceptible to a rainbow table attack (though it's very hard for humans to remember). So the problem here is, how do you force the user to make their passphrase sufficiently complex?
Passphrases also don't protect against keyloggers, which is a downside of this approach.
You force the password to be sufficiently complex by doing what you said: creating it (pseudo)randomly. The aforementioned 128 bit random AES key is robust, assuming the PRNG is solid. A user-provided password utilizing the human mind as its PRNG will never come close.
This is a bad cryptosystem which will result in people being fucked. Take it offline. A serious known weakness of zxcvbn is that it will grossly overestimate the entropy of things like quotes, lines from songs, lines from movies, etc.
"the quick brown fox jumps over the lazy dog" has 111 bits of entropy according to zxcvbn.
Secondly this only really works for English. While they are used to being forced to use English passwords, we can't expect that speakers of other languages would like English passphrases. It also has heuristics making assumptions of how people choose passwords (e.g. l33tifying) which may be less valid for longer phrases than short passwords. zxcvbn uses all English word lists and keyboard layout and automatically gives you a Unicode bonus[2] if you use unexpected characters. If I write a sentence in Chinese, I have a high entropy.
Here minilock actually penalises Chinese by adding the key.length > 32 requirement as 32 Chinese characters is equivalent to a much longer English passphrase. I suggest lowering the key length requirement in this case. However, if my passphrase were in Arabic and my attacker knew it was likely to be in the Arabic alphabet, I'm not really entitled to the aforementioned Unicode bonus as the entropy drops to be similar to English (I think).
That said, the design decisions of zxcvbn do make sense for Dropbox and zxcvbn is not the crucial part of the minilock program (the crypto is). Users will always find a way to game the system and find the lowest possible entropy passphrase.
Please don't use language like "People will get fucked" when critiquing a cryptosystem. HN is better than that.
Tarsnap has no restrictions on passphrase entropy whatsoever, yet people have no problem with Tarsnap. It's interesting that people are singling out Minilock for this feature. Is this the worst thing that can be said about Minilock?
EDIT: I accidentally said Tarsnap; I meant Scrypt.
Tarsnap does not allow anyone who has your public key to attempt to crack your paraphrase. Minilock does, and in fact you can load all public keys into a bloom filter and crack them simultaniously with nearly the same speed as a single key. The design of this system is simply irresponsible. Saying people will be fucked is entirely appropriate here.
Hmm, I meant Scrypt. You can use Scrypt to encrypt files using a passphrase with no entropy restrictions. It doesn't use keys. People never raised this concern about Scrypt, and certainly didn't say people would get fucked for using it. What am I missing here? Why does Minilock warrant this outrage, but not Scrypt?
That tool generates a random salt, so passphrase cracking time is O(n) where n is the number of files being cracked vs O(1) for Minilock public keys. Additionally, encrypted files are generally still not "public", whereas Minilock public keys likely would be.
Assuming that all files are encrypted with the same passphrase, and you crack the passphrase, not the key generated from kdf(salt, passphrase), complexity is O(1) for Minilock as well.
Tarsnap generates a key file - your password is not used directly to derive the key. A password is used for your account, and for encrypting the key file.
Not really. The feedback period is for things like UX etc - this essentially makes the software (as usual for these homebrew efforts) much less secure than you think it is. When the primary reason that this software is at all secure is that it enforces a minimum entropy, if there's a bug in that then no one should use the software.
I don't see any evidence of the feedback being "for things like UX"
I also don't think the word homebrew is appropriate. Afaik homebrew means, 1 making beer at home. 2 apple package manager. 3. Term generally for endeavours connected to corporate products/projects but themselves small and independent.
It took me awhile to figure out what zxcvbn referred to. I even looked at the source code which was in the repository and found 'zxcvbn' in the weak password list.
Finally, whilst on my tablet, I noticed that it was as obvious as the "QWERTY" keyboard that showed up on screen. As a primarily-dvorak user, such a silly stroke of keys had not occurred to me and gave me a bit of a laugh.
I still have a lot to learn about information theory and I'd like to have expert input regarding entropy. Is it believed / agreed upon that entropy is an objective measure ? It seems obvious to me that it is absolutely relative and meaningless without the associated computation method / prior information.
I am not a cryptographer, but I'll offer some advice, if I may: don't let people choose a passphrase, generate one for them. People are very bad at creating good passphrases, but decent at memorizing a good one.
I say that because I don't trust the zxcvbn library. It underestimates the entropy of "aaaaa" as 7 bits [log(26 * 5)], not the correct value of 23 bits [log(26) * 5], for example. In this instance, it's to your advantage, but it doesn't inspire confidence in its other calculations.
zxcvbn has problems but your example is backwards. "aaaaa" gets (as it should) low entropy because it is a single repeated character. It is more likely to be used than "rqntd". The major limitations are that it is worthless for estimating entropy of passphrase (which is what it's being used for here) and words not in its dictionary (non english).
MiniLock looks like a great option to introduce encryption to my non-technical friends. The alternative to minilock right now, for these users, is to do nothing.
Even if we do not like it, right now state of the art on file sharing (for most of the non-technical world) is an unencrypted email attachment. MiniLock looks like it might be something I can install on my mothers (non-technical) computer so that I can send her a sensitive doc (copy of my tax return, for example). This crypto system is sufficient for that use case, and the alternative is to do nothing at all. The alternatives are not GPG, or RSA, or whatever, because outside of the technical community people have no idea how to use these things.
Exactly! When it comes to crypto apps, I have noticed two kinds of criticism: "This software is not built for the threat models that interest me" and "This software fails to properly address the threat model it claims to". Too often, commenters will act as though their critique belongs in the second category when it really belongs in the first.
(It's great to question the design goals of a project! But that's very different from saying that a project fails to do what it says. In this case, Minilock has very clearly accepted a threat model where, if the passphrase is compromised, that's the game. If you don't like that, don't use it!)
Afaik the main reason for base58 is making it impossible to mistake an 'I' for an 'l', etc.
Beeing able to always double-click is a nice added bonus, though.
Relying on a passphrase only is not good enough. Enter it once on a compromised system and it's game over. Since your ID is tied to your passphrase you even have an issue and you will need to revoke your public key.
Many years ago, people realised you need to rely on something more than knowledge (of a password/passphrase) alone. Pick any two out of {something you know,something you own,something you are}, the latter being implemented by biometry.
That's why you need your ATM card, and why it has a PIN code.
This is a step back into a world were security wasn't good enough. Security is more than crypto alone, it's mainly about keeping things secret. There are too many ways a passphrase could be compromised or discovered.
This approach focuses too much on the crypto aspects, and not enough on all the other things involved in building something that is secure.
> Enter it once on a compromised system and it's game over
The same is true of any heavily used system today. All systems: banks, credit cards, facebook, etc. use "something you know". "Something you own" has just recently picked up steam in the form of two factor authentication, but then again, how many people you know actually use that?
The only thing that I know of/heard of that might hold up against the type of attack that you describe is the behavioral keys: i.e. a key which is something about you that you dont necessarily know yourself, e.g. your style of chess, or your phrasing of sentences.
Perhaps I'm missing something obvious, but I don't understand. What's so great about not having to store the key pair on disk? After all, this is a file encryption software. Its job is to store data on disk. In fact, it already adds a bunch of headers to every encrypted file. Why not just grab 128 random bits from /dev/urandom, make it the private key, encrypt it with the passphrase as all the other programs do, and stick the encrypted key in the header? It will only add a few dozen bytes to the header, which is peanuts.
This comment is a perfect example of why encryption is mostly not used at all. (Which is far worse than any given vuln). There's kind of a perverse all-or-nothing attitude where the goal is to poke at any possible flaw in a system as proof that it is completely worthless. What's wrong with there being different tradeoffs between security and convenience? The insistence on all-or-nothing solutions has resulted in the powers that be knowing almost everything about almost everyone in our society, because almost no one wants to deal with the trouble of "completely secure" encryption.
If there are no media with your secret key, it can't be stolen.
Of course, if your computer is cracked into, or your adversaries are using rubber-hose cryptanalysis, all bets are off. But this scenario is usually less probable than having your physical wallet or keychain stolen.
Some times ago I created my own file encryption software using libsodium. It uses XSalsa20, hmacsha256, pbkdf. It hide password input on the terminal. It is really slow for large files (GBs)...
I would use a secret key to encrypt the file, then encrypt the decryption key with each of the keys of those recipients, and add each (user ID, encrypted decryption key) to the file.
Reading the linked page, that's exactly what they do (with a nonce to thwart various (differential, known plaintext, etc.) attacks)
The header itself is a stringified JSON object which contains information necessary for the recipients to decrypt the file. The JSON object has the following format:
{
senderID: Sender's miniLock ID,
fileInfo: {
(One copy of the below object for every recipient.)
Unique nonce for decrypting this object (Base64): {
fileKey: Key for file decryption (Base64),
fileName: The file's original filename (String),
fileNonce: Nonce for file decryption (Base64),
}
(Encrypted with shared secret derived from the sender’s
private key and recipient's public key.
Stored as Base64 string.)
}
}
Note that in the above header, fileName is padded with the 0x00 byte until it reaches 256 bytes in length. This is done in order to prevent the discovery of the fileName length purely by analyzing an encrypted miniLock file’s header.
The file would be encrypted using a symmetric encryption scheme (like AES), and then the key for decrypting it would be encrypted to each recipient. Adding an extra recipient then just involves encrypting the 128 or 256 bit key, which should be negligible increase in size.
seems that how they generate the keys, it's basically the same. The same password would generate the same keys. Anyone who uses the same password would be able to decrypt data sent to anyone else using the same password.
If I send you an encrypted file with minilock, you won't know my password, and I won't know yours, but you'll be the only one that can read it, and also you'll be sure I've sent it and not anyone else.
This is because it is possible to attack large numbers of passphrases at once, and it is possible to harvest large numbers of targets from public databases of public keys.
Additionally, entropy estimation tools generally are not good at accounting for the fact that human behavior results in non-obvious biases and patterns in passphrase selection.
To see an example of this model failing on a large scale, look at the number of Bitcoin "brain wallets" that have been hacked, despite having apparently strong keys.