Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How are all of these data dumps of user info happening?
49 points by securityunaware on Oct 21, 2016 | hide | past | favorite | 28 comments
It seems like every week there's some new story about data theft in the news (Yahoo, LinkedIn, Target, now Weebly). How are these attacks being done? Is it primarily SQL injection? Social engineering to get the database credentials? How can we protect our own databases from such attacks?



The short answer is that most software is crap.

The longer answer is that most software is crap but you don't notice as a consumer. That combined with the lack of incentive to fix things unless they're visibly broken means that crappy software will exist till it's either publicized or someone on the engineering team cares enough (and has enough clout) to fix it.


Credential re-use.

Someone uses the same password on github as they did for photoshop and suddenly a breach in one place leads to source access.

And private source code is typically full of credentials or makes it very easy to find poorly secured admin functions, if it doesn't contain copies of data itself.

Data hygiene, good security around credential re-use and 2fa, not putting in backdoors all lead to leaks.

Or more mundane unsecured direct object references which catches out even large companies sometimes.


One of the best ways to protect your users is actually to store less of their data in the first place. Do you really need to store all that personally-identifiable information to operate your service? Can some of it be anonymized or de-identified instead? For example, do your really need to store the home address and mobile phone number of your users? My personal pet peeve is when sites would ask for my date of birth as a recovery question.


I always give out fake DOB. Unless I know they will check it. Most frustrating "Authentication" dimension.


SQL injection is probably the primary attack. You might also have unsecured endpoints that give too much information (eg. /query_user_info?email=me@yahoo.com). If you have endpoints that can be leveraged to achieve remote code execution, you then have someone on the system who can pivot to a database. An employee's machine could be compromised. An employee could leak them.

There are probably one hundred paths to database dumps. This is why principles like defense in depth, least privs, whitelisting, are all important to apply system wide. If your team doesn't know how to do it, hire a company to do security audits or study like crazy.


If I have a simple side project, and can't afford to hire someone else to secure it, but still want to follow best practices, where can I learn them? Books? Courses? Anything anyone here recommends?


OWASP is a great place to learn about web security. Check out their 2013 PDF, linked on this page [0]: it gives a fantastic overview, including best practices, technical details, and ways to fix issues.

[0]: https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Proje...


I'm not an expert by any means but will a 2013 PDF still be relevant in 2016? Are the best practices from 3 years ago still sufficient? I feel like half of the "best ciphers to use in 2013" are deprecated due to insecurity, but that might be ignorance on my part.

Edit: I see now that this informs readers about several classes of vulnerabilities and doesn't speak to specific frameworks or libraries being used to avoid or block them. This is probably good info, even 3 years from now.


Yep! It's still very up to date. Tons of vulnerabilities and attacks still apply in 2016 – CSRF, XSS, SQL injection, etc. The bright side is that modern web frameworks are smart enough to handle 90% of them automatically if you follow the rules.


Secure? Don't do it yourself. Use a library or a framework with security packages built-in.


I don't mean to implement security libraries myself. The problem is I don't even know what kinds of things to find libraries for.


You'll want to look for front end libraries that have XSS mitigations built in. For cookies, at the most basic level, use an unguessable (secure random) value (more advanced is signing cookies, which some frameworks do so they can not be tampered with). Use CSRF tokens. Have a good content security policy. For backend / db, use prepared statements or object model frameworks.

What it boils down to, never trust user input, never include user input in anything that resembles code before sanitization, validation, white listing, etc.


Why is SQLi still a problem in 2016? Aren't we all using ORMs and SQL interfaces that only support prepared statements?


We really aren't; even discounting people who home-roll it, many companies just don't use existing security features of languages well. An example I've seen in action is Rails' html_safe, which I've seen multiple developers use to make user supplied content "safe," when really it is meant to demarcate data that you have ensured is safe to include


Not to mention the staggering number of tiny applications that were cobbled together in the Dark Ages and never updated.


Any thoughts on how can Ruby's html_safe be fixed to avoid that misuse of it?


I'm in no way an expert, so I don't know how qualified I am to comment. I'd say the issue isn't html_safe's functionality, but rather a common misunderstanding and misuse. In my experience this is steeped in the name "html_safe" itself, which leads developers to make incorrect assumptions. Really the problem is bad or uninformed developers, and you can only do so much to mitigate that.


I'd like to see the type system used to improve this stuff, by using more specialized types for different kinds of data. I don't think it's ultimately sensible that items as diverse as filesystem paths, SQL statements, HTML code, and raw user input are all represented as "strings." Imagine if instead you had a Path type, an SQLStatement type, an HTML type, and a UserInput type, and combining two items of mismatched types without the appropriate conversion/encoding was a hard error.


Here there isn't a type system, but his example of bad developers are going out of their way to defeat defaults by casting the HTML as "safe", so I don't know what hope there is for an automated solution.


I think it's often a combination of technical and non-technical vectors.

Here's a video of Phineas Fisher hacking a police union a while back:

https://tune.pk/video/6528544/hack

How he took down Hacking Team:

http://pastebin.com/raw/0SNSvyjJ

General guide:

http://0x27.me/HackBack/0x00.txt


I think in most cases it isn't publicly disclosed what the true root cause was. Most cases probably fall into 2 categories:

1) Companies are to embarrassed to admit they made a mistake, and furthermore there is no legal or security benefit to publicly declaring "We have an open SQL injection on xyz url."

2) Companies don't even know how or when they got hacked. Senior devs may have reached a point of thinking "There are so many moving parts here and I have so many bugs to fix that it's not even worth time trying to try to make them all secure against a targeted hacker." When they do get hacked, it may be the first time they realize that they haven't been logging everything that might allow them to actually trace the origin. If the attack happened far in the past, necessary information required to investigate may have been lost long ago.


It's spear fishing on social media. You disguise a link as bit.ly and watch the fishes roll in.


I think this is misleading. I'm assuming your referring to the election related leaks, which as far as I've seen are significantly different than private sector database leaks.


I can't say for big companies but for smaller companies it is generally software like WordPress for blog or some Forum software that you installed and then they find an exploit and exploit every site that has it. There is very little you could do even if you are updating frequently.

If you really need to install third party software i feel it's best to put them on their own instance and separate database than to share any resources with your main site.


If you store it, they will come.


An additional avenue I've seen frequently exploited that hasn't been mentioned yet is password reuse. It is cyclic, however with all of these database thefts, attackers are gaining access to large numbers of credentials which can be leveraged to gain privileged access to other companies and systems.


It depends, but with really complex web applications that are very modular... often times different people working on said modules don't fully understand how all of the other modules interact with the ones they are modifying. Sometimes a simple change to one can open a gaping security hole in another.


it's often human error, weak passwords or weak security rules for employees with privileged access




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: