Insecurity in the Jungle (disk)

tptacek · on June 3, 2011

Summary: Jungledisk doesn't protect the integrity of encrypted data, and doesn't securely derive keys and is thus vulnerable to fast offline attacks. The thing Jungledisk right is to use the same block cipher mode as Tarsnap (and, incidentally, virtually every mainstream encrypted storage system).

The impact of using unauthenticated encryption to store data is that your backup provider could end up owning your machine. Attackers can carefully choose which data to corrupt. They can exploit the randomization of corrupted decryption to set up conditions for memory corruption exploits, and, in more sophisticated but totally realistic attacks, exploit guesses about known plaintext to produce attacker-controlled nonrandom plaintexts. A backup provider with client-authenticated crypto can't do that, because the keys that encrypt the data also ensure it's integrity.

The password storage issue is no different than any other password storage problem; again, direct your attention to http://codahale.com/how-to-safely-store-a-password/, mentally substituting "storage of password hash" to "derivation of AES key".

To my mind, the key derivation is the real problem here. A surprisingly large number of secure encryption storage products don't ensure data integrity. Realistic attacks against that vulnerability are feasible but difficult: you'd have to be targeted.

If you're going to write an article about how a competitor's encryption is inferior to yours and cast it as a vulnerability report, I'd suggest not recommending your own encryption scheme as the replacement. The scrypt recommendation in this article sticks out like a sore thumb. Virtually nothing uses scrypt.

We can nerd out on CTR mode vs. CBC mode; I'm starting to come around to Colin's take on CTR because of ciphertext indistinguishability as I see more practical vulnerabilities that take advantage of it. I think the padding issue is a red herring. CBC padding is easier to get right than absolute rock solid reliable generation of CTR nonces and absolute rock solid management of CTR counters, which are things I see people get wrong regularly. Distinguishability is the real problem with CBC.

cperciva · on June 3, 2011

To my mind, the key derivation is the real problem here. A surprisingly large number of secure encryption storage products don't ensure data integrity. Realistic attacks against that vulnerability are feasible but difficult: you'd have to be targeted.

I think the lack of integrity is more important than you're making it sound. There's a lot of situations where a lack of integrity can be exploited to create a lack of privacy too.

But the main reason I mentioned the lack of integrity first is that I needed to mention the lack of HMAC to explain why they had the ridiculous "salted key hash" construct.

If you're going to write an article about how a competitor's encryption is inferior to yours and cast it as a vulnerability report, I'd suggest not recommending your own encryption scheme as the replacement. The scrypt recommendation in this article sticks out like a sore thumb. Virtually nothing uses scrypt.

I think you're misstating what I wrote a bit. I said that scrypt is the state of the art in the field -- which it is -- and that given that Jungle Disk was around before I developed scrypt, they should have used PBKDF2 or bcrypt.

tptacek · on June 3, 2011

I'd rather geek out about CTR v CBC than harp on the scrypt recommendation. Consider the scrypt thing a friendly style note. You wrote an article about a competitor's insecurities. When you do that, don't recommend they adopt your own cryptosystem unless (like CRI had to do with DPA countermeasures) they have to. Here, it just made you look unnecessarily petty.

What privacy attacks were you thinking of? Call some of them out.

cperciva · on June 3, 2011

Consider the scrypt thing a friendly style note.

Note taken. :-)

What privacy attacks were you thinking of?

Things like replacing files with malware.

tptacek · on June 3, 2011

Yeah, we were saying the same thing, I think, but you said it more clearly.

sunchild · on June 3, 2011

Setting aside the data integrity issues:

I think the author's point about privacy is valid, and a little silly. If I understand correctly (the article is very confusingly worded in some places), he is saying that weak passwords are weak. Anyone who cares about privacy should already be choosing long, complex, strong passwords for this kind of application.

Also, I'm confused about one feature of JD. When I signed up years ago, they allowed me to hold my key privately and it never left the client. I had the option to upload that key to the server, if I wanted to, or not. I understand from the article that the client might misbehave and, for example, share my key in ways I don't want it to. Am I getting this right?

When I looked into secure cloud-based storage two years ago, I found that JD was the best mix of privacy and convenience, if for no other reason than it could be deployed on a mix of Windows, Mac and Linux boxes. It was clear even then that data integrity was the weak link/trade off.

I'm interested in hearing about the latest, best solutions for easy, cross-platform, secure backups to cloud services that offer better data integrity.

tptacek · on June 3, 2011

This article points out two flaws. Neither of them are silly.

First, there's no integrity protection on data stored on Jungledisk. Jungledisk can own up your machine. That's not a good property for a secure backup system to have.

Second, the key derivation scheme it uses makes every passphrase, no matter how carefully chosen, drastically weaker.

I'm glad you like Jungledisk and I don't think you need to read stories like this as an indictment of your choice or a demand to change services. But it doesn't help to downplay them.

Locke1689 · on June 3, 2011

Second, the key derivation scheme it uses makes every passphrase, no matter how carefully chosen, drastically weaker.

I'd just like to repeat this point because it's so important. The password verification method in JungleDisk is fundamentally broken and needs to be rearchitected immediately.

For non-cryptography people, this is similar to the vulnerability that allowed passwords to be retrieved from the Gawker database hack a couple months ago (just not quite as vulnerable).

sunchild · on June 3, 2011

OK, I freely admit that I'm not an expert in this area, so I'll rescind my "silly" comment.

But, "drastically weaker" than what? If the password is strong, JD doesn't make it weak. JD just doesn't make it as strong as it should/could? Is this correct?

cperciva · on June 4, 2011

But, "drastically weaker" than what? If the password is strong, JD doesn't make it weak. JD just doesn't make it as strong as it should/could? Is this correct?

Correct. The vast majority of people can't remember strong passwords, so it's necessary to "strengthen" them using a good key derivation function. Jungle Disk doesn't do this.

sunchild · on June 4, 2011

OK, well, I guess I don't see how that's fairly described as a "flaw" in JungleDisk.

I can understand why a responsible developer should assume their users are simple-minded, mouth-breathers who can't be trusted to choose a proper password (and I'm sure there's plenty of evidence to support that assumption), but it just isn't right to characterize JungleDisk as compromised from a security perspective because it relies on the user to choose a strong password.

cperciva · on June 4, 2011

Saying that Jungle Disk is secure as long as users pick strong passwords is like saying that the Ford Pinto is safe as long as drivers don't get into rear-end collisions. In both cases you're asking for behaviour which we know perfectly well that users don't exhibit; and in both cases there is a simple fix for the problem.

The Ford Pinto is an unsafe car, and Jungle Disk is an insecure backup service.

sunchild · on June 4, 2011

I'm trying to understand this. Again, I'm no expert.

I can see why the data integrity issues may allow external factors to compromise the security of my buckets and/or local device. That's me in a Pinto, at the mercy of the bad driver behind me.

I don't see how password strength is open to any external factor; it would seem to be purely a matter of user error. That part doesn't seem to fit the Pinto analogy. That's where I'm struggling to follow your article.

SoftwareMaven · on June 4, 2011

The issue is how fast the password can be broken. MD5 is a very fast hash, so even a relatively slow computer can do a lot of attempts very quickly.

Bcrypt, on the other hand, can be tuned to go as slow as you want. You can force it to take 250 milliseconds, regardless of how good or bad the password is.

And that is the fundamental flaw. Jungle Disk's key derivation makes it possible to crack your password in a reasonable time; bcrypt does not. Because of that decision, everybody's data is much less safe as a result (I'm referring to everybody's data in a statistical sense: the average password sucks and is easily broken in this scheme, so the average file is at risk).

As a provider of security software (like my company is doing), Jungle Disk should be doing everything it can to help users keep their data secure. Jungle Disk isn't doing that.

cperciva · on June 4, 2011

If you're a good driver, you'll be paying attention to the traffic behind you and avoid being rear-ended.

But ultimately what it comes down to is that if 99% of users make a particular error, it isn't useful to say "oh, that's user error".

sunchild · on June 4, 2011

OK. I think I understand now. I still don't think it's fair to call it defective design (and I'm not really certain that you ever did call JD's password privacy defective, BTW). More like a design that is unsafe in the hands of the typical driver, perhaps.

Why do I care? I just want to understand the risks for someone like me, who has taken care to choose very strong passwords.

My conclusions from all of this:

(1) The data integrity issue is serious because it presents an opportunity for introduction of malicious code, creates a risk of data loss, and may lead to security breaches.

(2) The local binary is opaque, and therefore presents a theoretical risk of compromising even the most "close to the vest" key management strategy.

(3) The password protection issue is a serious shortcoming that can, and should, be mitigated by choosing strong passwords.

Yes?!

cperciva · on June 4, 2011

Yes, that sounds like an accurate summary.

sunchild · on June 4, 2011

Cool, thanks for hanging in there with me. I found this very informative.

cipherpunk · on June 4, 2011

One way requires the user to have a drastically stronger password to be safe, and the other significantly strengthens passwords, protecting a subclass of users that will always exist (those that are unable to remember strong passwords or don't know enough about the dangers of password cracking to know how to effectively choose passwords).

It is madness to defend the use of MD5 for password hashing these days. It is clearly not designed for that at all.

tptacek · on June 3, 2011

Missed the word "right" in "The thing Jungledisk (does) right is". Sorry!

rarrrrrr · on June 3, 2011

My understanding is that SpiderOak, Tarsnap, and Wuala all do this correctly (using one of PBKDF2, bcrypt, or scrypt.)

Colin - Perhaps the companies in the backup space that put effort into handling this carefully should work together and create a PSA style website with a matrix chart of how the varies providers handle "encrypted" data. Make it a separate domain and do our best to be elaborately objective about it. Any interest?

tptacek · on June 3, 2011

What block cipher mode does SpiderOak use, and how does it verify the authenticity of its data? Tarsnap goes through a lot of extra trouble to MAC its data; few other providers do. You'd hate to see everyone treat key derivation as a shorthand for "doing all of encryption right".

I looked on the SpiderOak site, saw a lot of material on how keys are derived and not stored on SpiderOak servers (great!), but didn't see a lot of details about the mechanics of actually encrypting and checking data.

rarrrrrr · on June 3, 2011

Thanks for asking. If you're interested, would be very happy to discuss SpiderOak's crypto strategy in depth with you the next time I'm in Chicago. Could share source code, etc. IMO, most interesting parts are the key scoping, which allow users to selectively publish ("share") portions of stored data by publishing the appropriately scoped keys.

SpiderOak uses AES256 in CFB mode with authentication via HMAC. The code is careful about unique nonce/counter usage, crypto code is confined to specific modules that rarely change, and reviewed by cryptographers outside SpiderOak. Client and server have minimal trust relationship.

Being paranoid about data integrity (not only because of crypto issues, but also because bitrot happens routinely at petabyte scale) the data authentication happens repetitively at a few different layers. From all end user devices, we see about one bit error per 4.2tb of upload transactions.

JoachimSchipper · on June 4, 2011

If you're willing to talk publicly, I'd be interested in hearing more about your key scoping design.

Locke1689 · on June 3, 2011

Why CFB vs CTR? Is there some reason parallelization is unneeded or impossible?

tptacek · on June 3, 2011

Also... what counters?

(Regardless: happy to get together anytime in Chicago).

cperciva · on June 3, 2011

Any interest?

Not really. I've seen far too many "best practices" and "standards" bodies go nowhere to think that a committee can put together a useful website like this.

jasondavies · on June 4, 2011

A matrix chart sounds like a great idea. Anyone know if Backblaze handles this correctly? Here is a post describing their setup: http://blog.backblaze.com/2008/11/12/how-to-make-strong-encr...

imajes · on June 3, 2011

@cperciva: Thanks for this; now i'll convert my 8char ascii system password to a 10char one. Do you have any data showing how large a password needs to be to make it ridiculously expensive for a TLA (gency) to commit a large amount of hardware to cracking? i.e. how much time past the 10chars does it consume ?

cperciva · on June 3, 2011

It depends on your KDF. MD5 is ridiculously weak; the standard MD5-crypt is 1000 times stronger; bcrypt is better yet; and scrypt is vastly stronger.

The best source for this my scrypt paper, really.

dchest · on June 3, 2011

Link: http://www.tarsnap.com/scrypt.html

SoftwareMaven · on June 3, 2011

What license is the scrypt code released under?

tptacek · on June 3, 2011

It's BSD licensed but probably not easy to integrate on your platform. BCrypt is an easier choice. When we see Java and .NET implementations of scrypt, we'll start recommending it, but I'll be honest and tell you that we rarely recommend scrypt today.

pilom · on June 3, 2011

I wish I could find a link but,US military spec for secure passwords is 14 characters with capitals and special chars. And they have to be changed every 30 days.

inklesspen · on June 3, 2011

http://xkcd.com/538/

Splines · on June 3, 2011

True enough, if you're targeted it's not going to help very much. However, like outrunning a bear, you only need to be harder to catch than the guy behind you.

euroclydon · on June 3, 2011

If, like me, you're wondering: "Does 1Password use all the stuff?"

http://agilebits.com/products/1Password/user_guide

jmtulloss · on June 3, 2011

More details here: http://help.agilebits.com/1Password3/agile_keychain_design.h...

euroclydon · on June 3, 2011

Thanks, that's the page I was on -- stupid AJAX navigation!

Anyway, the document seems *nix specific. Does Windows have a /dev/random?

kevindication · on June 3, 2011

Stupid question: Why is the 34 character password easier to crack than the 8 character password?

(Upon re-reading I think I may have missed the assumption that the long password only contains english text, no punctuation, numerics, etc.)

cperciva · on June 3, 2011

Yes, the 34-character is English text.

PonyGumbo · on June 3, 2011

Given the available options, what's the best option for automated backups?

dchest · on June 3, 2011

Apart from Tarsnap, maybe Duplicity http://duplicity.nongnu.org?

cperciva · on June 3, 2011

Agreed. There are some things I dislike about duplicity (e.g., its reliance upon GPG) but it's probably what I would use if I couldn't use Tarsnap.

click170 · on June 3, 2011

What makes you shy away from backup apps that rely on GPG?

tptacek · on June 3, 2011

Cryptographers hate GPG. GPG is ugly as sin†. Unfortunately (and I mean that only with a little bit of snark), GPG mostly still works, in the sense of standing up to active, informed attackers with modern techniques.

† For instance, look how it handles message integrity.

cperciva · on June 3, 2011

Your definition of "mostly still works" is "it's secure as long as you ignore the vulnerabilities people keep on finding"?

tptacek · on June 3, 2011

This is a slippery slope argument that ends in you arguing that the best tested cryptosystem in common use (TLS) is also insecure. All cryptosystems have vulnerabilities; the question is, how workable is the system after those flaws are fixed.

cperciva · on June 3, 2011

Well, yes. I also think SSL is too complicated for people to get right. ;-)

tptacek · on June 3, 2011

For the record, I respect the critiques practitioners have of GPG. Unfortunately, their alternatives tend to be ad-hoc. There should be a clean, simple, GPG-like standard, perhaps based on ECC and AE cipher constructions, to replace GPG. But until that happens, in the choice between ugly and workable vs. simple and fragile, ugly and workable is the right choice for most people.

As always I think you drastically underestimate how dangerous this stuff is because you've dedicated your career to it, while normal implementors --- even crypto enthusiasts (look at Tor and SSH) --- have little of the nuance required to get it right.

I like the fundamentals of TLS more than you do; I don't think it's a bad or needlessly complex protocol (except maybe session resumption). I see that reasonable people can differ on that point. But, very importantly, TLS is also a vehicle for collecting and implementing the best known methods in cryptography. I think you tend to overlook that.

As always, my opinions are as a software security practitioner and not as a cryptographer, since I am not one.

sigil · on June 3, 2011

It sounds like Colin is taking issue with openssl the implementation, while you're defending TLS the protocol. In that case, I agree with you both.

(As an aside, it's great to see two of my favorite HN commenters in the security field engaged in conversation at this level.)

tptacek · on June 3, 2011

The appearance and track record† of the code in OpenSSL does the credibility of TLS no favors, and it is totally understandable why someone who had to deal with software security for a platform that ships and depends on OpenSSL would become allergic to it.

But, two responses to that:

* First, what Joel Spolsky says about rewrites. Sometimes code is ugly for a reason. Clean rewrites of OpenSSL will inevitably introduce bugs. Introducing bugs in SSL†† implementations is perilous.

* Second, there are mature alternatives to OpenSSL. For instance, most? browsers don't use it.

† In fairness, that's because OpenSSL dates back to a time when nobody was getting C software security even close to right.

†† I use TLS and SSL interchangeably, which is a foible I should work on correcting, but the difference doesn't matter much here.

euroclydon · on June 3, 2011

Hey, Since you mentioned TLS/SSL: I can't seem to find an answer to this question: Does my browser or system, need to contact the CA each time it encounters a new SSL Cert, or is having the root certificate enough?

tptacek · on June 3, 2011

Your browser does not need to contact a CA to verify the signature in an SSL certificate, but may in some cases want to contact the CA to check for revocation.

cperciva · on June 3, 2011

GPG is big and complicated. The more code you have the more likely it is that you'll have security vulnerabilities. (This is especially true for code like GPG which reads attacker-provided inputs, since it allows the attacker to pick which of many code paths get invoked.)

chirayuk · on June 3, 2011

SpiderOak? https://spideroak.com/

If TarSnap had iPhone, etc. versions, I would use it for everything. (the permission specific keys are just awesome.)

masnick · on June 3, 2011

What about Arq for Mac? http://www.haystacksoftware.com/arq/

It is/was used by Chris Wanstrath (of github), among others. http://chris.wanstrath.usesthis.com

tptacek · on June 3, 2011

Arq appears to have similar flaws to JungleDisk, judging from its file format specification.

masnick · on June 3, 2011

Arq says they don't think so: https://twitter.com/#!/arqbackup/status/76693981286768641

tptacek · on June 3, 2011

They don't publish their key derivation scheme, but I'd be shocked (and pleased) to find that they were savvy enough to actually use PBKDF2 or even stretched SHA1. Believe it or not, plenty of commercial vendors literally take the ASCII of the password as the key.

I'd also worry, based on that spec, that the Arq developers believe the SHA1 hashes they store are fully equivalent to a deliberate MAC.

I should have noted that Arq's git-like scheme makes them inherently more careful about storage and data integrity (under non-adversarial conditions) than Jungledisk. My perusal of their site was casual. I really don't know much about them and am not offering a professional opinion.

sreitshamer · on June 3, 2011

I emailed you asking for professional help in reviewing the security aspects of Arq. I'm not an expert, and I'd like to get it right. If anybody else has the expertise to do this review, please email me at stefan@haystacksoftware.com.

One way or another I want to get this right.

tptacek · on June 3, 2011

In general, if you're an indie developer and you're doing custom crypto stuff, I'm happy to do a consult free of charge. You'll probably find other software security firms are similarly willing to do that kind of stuff, just like the good law firms will tend to do up-front consults for free.

Full-on software reviews, particularly by consultants competent enough to review crypto, are very expensive. You can probably get away without doing one, as long as you get good advice and have people to bounce ideas and problems off of.

Karmically, being someone to bounce ideas and problems off of has paid off for Matasano dramatically, so, anyone else reading this thread, consider this an open invitation.

sreitshamer · on June 3, 2011

I'm in the process of getting an app review done by a security expert. Then I can answer that question (hopefully) definitively. (I'm the author of Arq)

tptacek · on June 3, 2011

I'm happy to cross-check this stuff with you in private, if you'd like a free consult from another professional (reiterating something I said on Twitter a minute ago).

sreitshamer · on June 3, 2011

Awesome! I emailed you. Also I'm happy to pay for a review. I just want to get it right.

cperciva · on June 3, 2011

I use Tarsnap. ;-)

acqq · on June 3, 2011

I took a look https://www.tarsnap.com/gettingstarted.html and I don't understand why "If you have multiple machines, you almost certainly want to create a separate key file for each machine."? Can you explain why, if I'd like to access the same data from the different machines? Or is the main assumption that every machine has its own, nor shareable, backup? Isn't the main advantage of an online service to have the same data accessible from more machines?

tptacek · on June 3, 2011

Tarsnap isn't Dropbox. It's a backup system. Its cost structure and security model is optimized for backup, which is why you can't e.g. read your Tarsnap files from a web interface at Tarsnap.

cperciva · on June 3, 2011

What tptacek said. :-)

gst · on June 3, 2011

Depends.

On my server I use Tarsnap.

On my client I use Wuala's "backup folder" feature.

drivebyacct2 · on June 3, 2011

I continue to not understand how people imagine these services working (de-dupe, block level updates, etc) without access to the unsecured version of the data. As for the claims about what Amazon could do to your data... there's even less sinister options. S3 is not 100% safe storage. There's a chance for bit rot and that may occur. If you don't check the file yourself, you won't know. Again, that seems a bit inevitable, no?

edit: left out a 'not'

cperciva · on June 3, 2011

There's a sucker born every minute. Jungle Disk claims to be secure, and most people believe them -- most people have no way to assess whether they're doing things right or not.

gst · on June 3, 2011

De-dupe: Wuala encrypts the file with a key derived from the file itself. This key is then encrypted with the user's key and both (the file and the encrypted key) are uploaded to the cloud. Disadvantage: If the file is known to an attacker (i.e., a copyright holder) the attacker can possibly find out which users have access to this file. Advantage: Allows for de-duplication, but is more secure than Dropbox.

Block-level updates: I don't see a problem with this. Partition the file into blocks on the client (before the encryption). The server doesn't need access to the data for this.

tptacek · on June 3, 2011

As Steve Weis pointed out in an earlier thread about schemes like this, deriving keys from the contents of files breaks semantic security. Lay engineers reason about this problem the way you just did: "the RIAA can tell I have Lady Gaga MP3s". But practitioners are worried about much more subtle and devastating flaws, particularly in cases where attackers may exercise some control over the blocks being encrypted.

Any scheme that derives passwords from file contents gives me the willies.

drivebyacct2 · on June 5, 2011

Most file-level encryption that I'm aware destroys benefits of blocked data. For example, changing a few bytes in an encrypted file will cause MANY bytes to change in the actual file on the disk... at least with ecryptfs and truecrypt. If there is an encryption scheme that works well with striping, I'd really super appreciate you pointing me in that direction - it would greatly help with a problem I'm currently trying to solve.