Hacker News new | past | comments | ask | show | jobs | submit login
Researchers reverse-engineer the Dropbox client: What it means (techrepublic.com)
147 points by heyitsnick on Aug 27, 2013 | hide | past | favorite | 45 comments



How could there have been any doubts that the heavily obfuscated Python could be reverse engineered? Me, and some others, did it quite a while ago. It wasn't a lot of work to find the opcode mapping using frequency analysis and a bit of reasoning (ie, mapping against known libraries). Anyone remember dropship? https://en.wikipedia.org/wiki/Dropship_(software) I wonder if they're going to send a takedown request this time too.

Oh I see dropship is mentioned in the paper, great :)

In any case, interesting that they found some previously unknown security holes this way. This again proves that security through obscurity, at least for client software, doesn't work. When will people learn. You can't hide anything on the client for the user, at least not for long.


It does raise the bar slightly though, so is still worth doing. Instead of simply running the pyc files through a decompiler as would be the case without obfuscation, one has to reverse engineer their modified Python binary to figure out the altered format of the code blocks. This is not a very common set of skills.


Obfuscated code is surely harder to understand and work with than original code with descriptive variable names, comments, formatting, etc. Wouldn't this make it more difficult to find vulnerabilities?


It makes it just as easy for the whitehats and for the blackhats so it makes no difference. It may give some people a false sense of security that they would have not had if they were able to look at the code.

Presumably dropbox is through its enormous distribution a very fat target and I find it hard to believe that this published effort would be the first instance of such an undertaking. You're average blackhat isn't going to publish his hack but will market it for all it is worth.

Then you get pages like these:

http://1337day.com/exploit/description/19604

(click 'ok')

I don't think the dropbox team obfuscates their code as a security measure, they more likely do it to increase the depth of their moat by a little bit and to make it a bit harder to write third party clients against their non-published api's.


Then the question becomes, is it more beneficial to make it difficult to find vulnerabilities, or make it easier and fix it when found.


http://neopythonic.blogspot.co.il/2011/06/depth-and-breadth-...

"The contrast with my visitor the next day couldn't be greater. Through a former colleague I got an introduction to Drew Houston, co-founder and CEO of the vastly successful start-up company Dropbox.

Python plays an important role in Dropbox's success: the Dropbox client, which runs on Windows, Mac and Linux (!), is written in Python. This is key to the portability: everything except the UI is cross-platform. (The UI uses a Python-ObjC bridge on Mac, and wxPython on the other platforms.) Performance has never been a problem -- understanding that a small number of critical pieces were written in C, including a custom memory allocator used for a certain type of objects whose pattern of allocation involves allocating 100,000s of them and then releasing all but a few. Before you jump in to open up the Dropbox distro and learn all about how it works, beware that the source code is not included and the bytecode is obfuscated. Drew's no fool. And he laughs at the poor competitors who are using Java."

Sometime after that, Drew poached Guido from Google. I remember this post. :)


You can use a custom memory allocator in python? I wonder if this is somehow pluggable, or if they had to modify the interpreter.


All you need to do is set the appropriate type's tp_alloc/tp_dealloc function pointers [1] (type-specific malloc/free functions). Dropbox was having fragmentation issues from the large amount of garbage generated while scanning the filesystem, and making memory allocation use type-specific memory pools fixed it.

    [1] http://docs.python.org/2/c-api/typeobj.html#PyTypeObject.tp_alloc


The dropbox guys gave a talk about this at pycon one year but I'm having trouble finding it now. I remember thinking it involved less work than I thought it would.



According to the article, they do use a modified interpreter.


When you have to start patching the framework to handle memory allocation its probably time to move on. Just use Mono and you get your cross platform feature and obfuscation.


I am slightly confused as to why reverse-engineering a client allows you to sidestep two-factor auth. That should be entirely a server-side thing.


From skimming the original paper, it seems as if you can bypass authentication if you know certain keys that that particular dropbox client stores locally. Of course, if you were able to access those values on the local hard-drive, you likely already have access to the victim's hard-drive or computer. In that case you have the victim's local copy of the dropbox folder already, there is no need for reverse engineering.

This "weakness" is no different than the weakness of two-factor authentication in any scenario where login is persistent. I have two-factor gmail authentication for gmail with "remember me" set so I do not have to log in every day. If someone steals my laptop and gets my cookies, they can log in as me regardless of two-factor authentication, until the cookie authentication expires.


If somebody steals my laptop and it's still open, they still have to provide my user password since the screen gets locked after some time of inactivity. And reading from the hard-drive directly won't help, because I got an encrypted hard-drive (with dmcrypt).

I did this precisely because the laptop is a single point of failure. Steal somebody's laptop and bam, you've got access to everything important to that person.

My Android phone is also encrypted (with a much weaker password) and I can also remotely delete everything on it through Google Apps.


I hope you did things like disable firewire, and superglue the ram in the slots.

And I've done something similar with my iPhone. It reverse SSHs in to my server, and provides me a shell login. Something bad happens? I can rm -rf * to my iPhone.

And I can do bad things :)


Do you want to go through a two-factor challenge every time a single file is synchronised?


It's two-factor authentication not two-factor authorization.

I'd also think that authentication was (should be) a server-side thing: and that at that point you'd get some form of session/token/ticket.


It already is a server-side thing.

This post shows how to steal that session/token/ticket after the authentication step.

What are you proposing?


In that case this is no worse than stealing a cookie for a site which required 2FA, which presumably requires local access.


I always find reverse engineering things made by people amusing. We could just, you know, ask someone.

It's like when a new iPhone comes out and they throw the custom silicon under electron microscopes. It's entertaining, and I'm sure fun for the people doing it, but fighting information wars against ourselves just seems silly.

There are large problems humans don't have answers to, but we're busy making things then figuring out how the things we made work. Madness ensues.


> There are large problems humans don't have answers to, but we're busy making things then figuring out how the things we made work

Many technologies have been developed or accelerated through the need to reverse engineer something. I would argue the techniques developed to break the Enigma Code during WW2 had profound effects on computing generally.

Often reverse engineering a technology can also allow you to make improvements the other party has yet to realise, catalysing new ideas and research.

Not that all this means you are necessarily wrong, although perhaps it is a little too idealistic to hope for a world where information isn't a valuable currency?


Think of it taken to extreme measures.

Imagine a company where Team Database releases a binary-only library to the rest of the company. They won't tell you how it works and you can't talk to them, but it seems to work well enough. Then one day, Team Website wants to do something else with the database (a new type of query, new type of storage model, something non-trivial). In this backwards company, Team Website spends months reverse engineering the library and protocol to hack their own functionality into it. That's mad, right?

A large view presents two views of knowledge: things humans know —and— things humans don't know. We're circling around rediscovering what other people have done while they sit there quite able to give us what we want to know.

Now, adversarial conditions prevent such blanket sharing: capitalism, sovereign nations, war, etc.

Think of Intel. In some ways, they control the pinnacle of CPU design that humanity can surface at this point in time. We don't have anybody to ask "well, what comes next?" in the 10 year CPU roadmap—we have to discover the future along the way.

We should spend more time asking "well, what comes next?" and less time rediscovering what people already know how to do (modulo it making you better at actually discovering new things, or just for fun, or for cyberwar, etc).


I thought the enigma had been stolen from the U-571 ... ahah


There's lots to the Enigma story. Yes, some have been recovered from the enemy, but that wasn't the beginning nor the end of decrypting them.


I just read the actual whitepaper (https://github.com/kholia/dedrop/blob/master/paper/accepted/...) and one of the interesting takeaways is that this particular reverse engineering resulted in the discovery of actual vulnerabilities that were responsibly reported to Dropbox and patched.

Simply asking Dropbox how this stuff worked would've (probably) never uncovered these security issues.

Edit:

Just wanted to add one more benefit of this attempt at reverse engineering, from the whitepaper's introduction:

> Our work reveals the internal API used by Dropbox client and makes it straightforward to write a portable open-source Dropbox client


Do you ever find it amazing we still run closed sourced software?

Is it not bad enough the Microsoft and Adobe hegemony force the entire world to have an attack surface wider than Jupiter to exploit at the whims of eastern european teenagers?


Open source alternatives exist for most major Microsoft and Adobe products. It is just a question of how much user experience you are willing to sacrifice for safety.

And open source products are not inherently safe--vulnerabilities are found in all software products, that is not a phenomenon limited to the closed source world.


This isn't true.

Adobe's suite isn't just 'user experience'. It's functionality.

Show me an open source alternative to Premiere, or After Effects, or even easier: InDesign, Photoshop, Illustrator, Edge.

I bet for any open source alternative you find, I can show you a huge set of features that everyone uses, that it doesn't have.


In the real world, when you talk to people (serious Business People doing Business Things), they'll spout of gems of "can you send it to me in Adobe?" or "hey, is Adobe on this machine?"

I'm not too worried about exploits in After Effects or Lightroom.

Adobe = "pdf reader" to almost every computer user in the world. Adobe even took PDF out of their product name to just call it "Adobe Reader." (More appropriate name: Adobe Helps Hackers Slurp All Your Data Away ... Reader)

With Windows + Office + IE + Adobe Reader, you'll be safer just sending the bad guys your corporate secrets directly. It'll save you the shock of when you discover for the past six months all your data has been round robin copied to BIRC.


Fighting any wars against ourselves seems silly. But the problem is that companies aren't that willing to share information, or it is only available for a large price and/or with restrictive NDAs. Also, finding out how things work is simply fun.

Say, I needed write a custom GPU driver for some device, either to improve performance for some specific application or to work outside the dependency or API constraints of the binary blob (like porting to another OS). Usually vendors provide no register level documentation about graphics hardware, so the only way to do this is by reverse engineering.

Another reason for reverse engineering can be to find backdoors and security vulnerabilities (like these guys did) or even for legal reasons to find whether some copyrighted (or GPLed) code was used.

No madness needed at all. Or maybe just a bit.


The "expert" analysis was a bit lame. He brings in an expert pen tester who provides a legal opinion?!


Read the paper. They haven't actually found a way to really bypass two-factor authentication and all other security measures. With their findings, you can hijack an account if:

- you feel like cracking a 256-bit random value remotely (can't locally bruteforce it), or

- you have filesystem access.

I'd say both are irrelevant. You can't crack 256-bit values locally, let alone if you have to check the value remotely, and with filesystem access I imagine you can do a whole lot more than just uploading files to someone's Dropbox.

Bypassing two-factor authentication with either of the options is possible though, and I can see the issue, but this is by design. I don't think you want to have to enter your credentials (username, password, second factor) every single time you store a file or check for updates.


If you have filesystem (write) access, you don't have to hack Dropbox to upload files, just put the files in the appropriate folder. And if you can execute code, you can just remote control the UI (move the cursor, type) and do anything the user can.

But I'm glad to hear that they found no "actual" weakness, that would enable a hacker with only my account name, or who is on my WiFi, to access my Dropbox.


so why does this need to be obfuscated? is it not possible to do this securely and transparently?


I'm not a security expert, but from a management standpoint, obfuscation certainly /slows/ (not inhibits) the spread of homebrew Dropbox clients. Homebrew clients have the potential to create lots of customer support issues...

To a _conservative, lean organization_, it's better to constrain customer use cases to known good clients than to handle fallout such as "I lost all my data!" "What rev of client were you running?" "zAxX0r'2 m0D51c|< ph3y3Ldr.0p 0.0.69r."

That said, I could hope for Dropbox to evolve to a more open (ssh-based?) model, though I'm not a security architect :)


Link to the presentation of the reverse-engineers: https://www.usenix.org/sites/default/files/conference/protec...


Does Dropbox not use Amazon S3 as their storage engine anyway? This should have an open API?


Dropbox does have an API, https://www.dropbox.com/developers but this is about reverse engineering the client which seems to use things not here -- in particular, some authentication stuff. I haven't read in depth about why that allowed them to bypass 2-factor auth though.


From the whitepaper (https://github.com/kholia/dedrop/blob/master/paper/accepted/...):

> We found that two-factor authentication (as used by Dropbox) only protects against unauthorized access to the Dropbox’s website. The Dropbox internal client API does not support or use two-factor authentication!


Don't they use S3 internally? I assumed the desktop client does not access S3 directly, but that their server is middleware.


http://pastebin.com/gzF4XkBL

Fun find from the source code: There's a module named "gandolf.py" which appears to have something to do with version control.


Fun fact: the GNU/Linux Dropbox client is licensed under the GPL. I don't know if the article was referring to it though.


No it's not. The dropbox installer and Nautilus hooks are released under the GPL. But it actually downloads a proprietary binary which does most everything.


My mistake, sorry.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: