Ask HN: How are all of these data dumps of user info happening?

koolba · on Oct 21, 2016

The short answer is that most software is crap.

The longer answer is that most software is crap but you don't notice as a consumer. That combined with the lack of incentive to fix things unless they're visibly broken means that crappy software will exist till it's either publicized or someone on the engineering team cares enough (and has enough clout) to fix it.

throwawayReply · on Oct 21, 2016

Credential re-use.

Someone uses the same password on github as they did for photoshop and suddenly a breach in one place leads to source access.

And private source code is typically full of credentials or makes it very easy to find poorly secured admin functions, if it doesn't contain copies of data itself.

Data hygiene, good security around credential re-use and 2fa, not putting in backdoors all lead to leaks.

Or more mundane unsecured direct object references which catches out even large companies sometimes.

infodroid · on Oct 21, 2016

One of the best ways to protect your users is actually to store less of their data in the first place. Do you really need to store all that personally-identifiable information to operate your service? Can some of it be anonymized or de-identified instead? For example, do your really need to store the home address and mobile phone number of your users? My personal pet peeve is when sites would ask for my date of birth as a recovery question.

r00fus · on Oct 23, 2016

I always give out fake DOB. Unless I know they will check it. Most frustrating "Authentication" dimension.

carom · on Oct 21, 2016

SQL injection is probably the primary attack. You might also have unsecured endpoints that give too much information (eg. /query_user_info?email=me@yahoo.com). If you have endpoints that can be leveraged to achieve remote code execution, you then have someone on the system who can pivot to a database. An employee's machine could be compromised. An employee could leak them.

There are probably one hundred paths to database dumps. This is why principles like defense in depth, least privs, whitelisting, are all important to apply system wide. If your team doesn't know how to do it, hire a company to do security audits or study like crazy.

imh · on Oct 21, 2016

If I have a simple side project, and can't afford to hire someone else to secure it, but still want to follow best practices, where can I learn them? Books? Courses? Anything anyone here recommends?

mplewis · on Oct 21, 2016

OWASP is a great place to learn about web security. Check out their 2013 PDF, linked on this page [0]: it gives a fantastic overview, including best practices, technical details, and ways to fix issues.

[0]: https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Proje...

roninb · on Oct 21, 2016

I'm not an expert by any means but will a 2013 PDF still be relevant in 2016? Are the best practices from 3 years ago still sufficient? I feel like half of the "best ciphers to use in 2013" are deprecated due to insecurity, but that might be ignorance on my part.

Edit: I see now that this informs readers about several classes of vulnerabilities and doesn't speak to specific frameworks or libraries being used to avoid or block them. This is probably good info, even 3 years from now.

mplewis · on Oct 21, 2016

Yep! It's still very up to date. Tons of vulnerabilities and attacks still apply in 2016 – CSRF, XSS, SQL injection, etc. The bright side is that modern web frameworks are smart enough to handle 90% of them automatically if you follow the rules.

geoffreyhale · on Oct 21, 2016

Secure? Don't do it yourself. Use a library or a framework with security packages built-in.

imh · on Oct 21, 2016

I don't mean to implement security libraries myself. The problem is I don't even know what kinds of things to find libraries for.

carom · on Oct 21, 2016

You'll want to look for front end libraries that have XSS mitigations built in. For cookies, at the most basic level, use an unguessable (secure random) value (more advanced is signing cookies, which some frameworks do so they can not be tampered with). Use CSRF tokens. Have a good content security policy. For backend / db, use prepared statements or object model frameworks.

What it boils down to, never trust user input, never include user input in anything that resembles code before sanitization, validation, white listing, etc.

cheiVia0 · on Oct 21, 2016

Why is SQLi still a problem in 2016? Aren't we all using ORMs and SQL interfaces that only support prepared statements?

ramimac · on Oct 21, 2016

We really aren't; even discounting people who home-roll it, many companies just don't use existing security features of languages well. An example I've seen in action is Rails' html_safe, which I've seen multiple developers use to make user supplied content "safe," when really it is meant to demarcate data that you have ensured is safe to include

smacktoward · on Oct 21, 2016

Not to mention the staggering number of tiny applications that were cobbled together in the Dark Ages and never updated.

cheiVia0 · on Oct 21, 2016

Any thoughts on how can Ruby's html_safe be fixed to avoid that misuse of it?

ramimac · on Oct 21, 2016

I'm in no way an expert, so I don't know how qualified I am to comment. I'd say the issue isn't html_safe's functionality, but rather a common misunderstanding and misuse. In my experience this is steeped in the name "html_safe" itself, which leads developers to make incorrect assumptions. Really the problem is bad or uninformed developers, and you can only do so much to mitigate that.

mikeash · on Oct 21, 2016

I'd like to see the type system used to improve this stuff, by using more specialized types for different kinds of data. I don't think it's ultimately sensible that items as diverse as filesystem paths, SQL statements, HTML code, and raw user input are all represented as "strings." Imagine if instead you had a Path type, an SQLStatement type, an HTML type, and a UserInput type, and combining two items of mismatched types without the appropriate conversion/encoding was a hard error.

mattnewton · on Oct 21, 2016

Here there isn't a type system, but his example of bad developers are going out of their way to defeat defaults by casting the HTML as "safe", so I don't know what hope there is for an automated solution.

dewyatt · on Oct 21, 2016

I think it's often a combination of technical and non-technical vectors.

Here's a video of Phineas Fisher hacking a police union a while back:

https://tune.pk/video/6528544/hack

How he took down Hacking Team:

http://pastebin.com/raw/0SNSvyjJ

General guide:

http://0x27.me/HackBack/0x00.txt

robertelder · on Oct 21, 2016

I think in most cases it isn't publicly disclosed what the true root cause was. Most cases probably fall into 2 categories:

1) Companies are to embarrassed to admit they made a mistake, and furthermore there is no legal or security benefit to publicly declaring "We have an open SQL injection on xyz url."

2) Companies don't even know how or when they got hacked. Senior devs may have reached a point of thinking "There are so many moving parts here and I have so many bugs to fix that it's not even worth time trying to try to make them all secure against a targeted hacker." When they do get hacked, it may be the first time they realize that they haven't been logging everything that might allow them to actually trace the origin. If the attack happened far in the past, necessary information required to investigate may have been lost long ago.

HFTGuru · on Oct 21, 2016

It's spear fishing on social media. You disguise a link as bit.ly and watch the fishes roll in.

ramimac · on Oct 21, 2016

I think this is misleading. I'm assuming your referring to the election related leaks, which as far as I've seen are significantly different than private sector database leaks.

supersan · on Oct 21, 2016

I can't say for big companies but for smaller companies it is generally software like WordPress for blog or some Forum software that you installed and then they find an exploit and exploit every site that has it. There is very little you could do even if you are updating frequently.

If you really need to install third party software i feel it's best to put them on their own instance and separate database than to share any resources with your main site.

SixSigma · on Oct 21, 2016

If you store it, they will come.

ramimac · on Oct 21, 2016

An additional avenue I've seen frequently exploited that hasn't been mentioned yet is password reuse. It is cyclic, however with all of these database thefts, attackers are gaining access to large numbers of credentials which can be leveraged to gain privileged access to other companies and systems.

benguild · on Oct 21, 2016

It depends, but with really complex web applications that are very modular... often times different people working on said modules don't fully understand how all of the other modules interact with the ones they are modifying. Sometimes a simple change to one can open a gaping security hole in another.

arekkas · on Oct 21, 2016

it's often human error, weak passwords or weak security rules for employees with privileged access