Hacker News new | past | comments | ask | show | jobs | submit login
Data is a Toxic Asset (schneier.com)
180 points by interweb on March 12, 2016 | hide | past | favorite | 65 comments



The Ashley Madison data breach was such a disaster for the company because it saved its customers' real names and credit card numbers. It didn't have to do it this way. It could have processed the credit card information, given the user access, and then deleted all identifying information. To be sure, it would have been a different company. It would have had less revenue, because it couldn't charge users a monthly recurring fee.

This seems to me the wrong way to solve the problem. The crazy thing about credit cards, social security numbers, and bank account numbers is that these numbers are supposed to be kept secret and private, and yet you need to constantly give them out to people. Everyone you write a check to gets your bank account number, every place you buy from gets a credit card number. This is insane.

The right way to solve this is that Visa and Master Card need to develop a standard to make super easy to generate a unique payment number everytime you make an online purchase. Then that should be built in as a browser extension or component. So I browse to a site, click to pay with my Visa card, and Visa automatically generates a unique code for that site and fills it in on the form.

Also it is insane that someone can steal my identity by simply knowing my social security card. The right way to solve this would be to have an indentity provider that has a short 10 second video of myself on file. Then, when I want to sign up for a credit card or bank account, I take a 10 second video of myself using my cell phone, granting approval to open the account. A staffer at the credit card company then compares the video with the video on file with the identity provider, and verifies that it matches. The identify provider also sends a message to an email address or mobile number on file, so that I am alerted that someone is opening an account in my name. Using these two simple safe guards, identity theft would be much, much harder. A video recording of a person is very hard to fake, much harder to fake than a signature.

A final key innovation would be if email providers would make it super-easy to generate aliases per site. I do this myself manually with fastmail, but if there was a simple browser extension that would automatically create an alias and fill in a form, that would be great, because I could have a unique address that all funnels into one place, for everything I sign up to.


>The right way to solve this would be to have an indentity provider that has a short 10 second video of myself on file. Then, when I want to sign up for a credit card or bank account, I take a 10 second video of myself using my cell phone, granting approval to open the account

What should those of us without smartphones do? Not to mention that this seems trivial, if not easier to break. I can find the target on Facebook and use a faceswap program to generate a video that looks good enough so that the $9.50/hr worker spending all day comparing faces, who doesn't quite care enough, accepts the video.


Public key infrastructure > 10 second video


Is this a US specific thing? Why would you need to keep your SSN and bank account number private?

Ok, I know US citizens are not automatically given ID cards, so if everybody takes the SSN you give them at face value, I get that.

I don't understand the bank account especially. Like I have some automatically deducted monthly payments, but I remember I needed to specifically authorize the receiving account to be able to ask for the money with my bank.

With cards,the standards are starting to get there, i.e: I can enable with my bank that every time I use the card for internet payment, I need to confirm my identity with code they send me in sms. As far as I know, I could ask for different second factor of authentification, I know my dad has standard rsa token.

Unfortunately I had problem using this with some foreign site (I think it was Amazon?), so I had to disable it. I live in Czech Republic.


> Is this a US specific thing? Why would you need to keep your SSN and bank account number private?

For SSN, if you have good credit, you a SSN and a name is basically all that's needed to open a new account connected to your general credit record. If the account was opened in your name without your consent, it's a lot of work to get it disassociated from you.

For bank account numbers, most payments are processed through the 'automated clearing house', which is fancy check clearing. In the old days, maybe your bank would look at the check presented and return it without payment if they could tell it wasn't legitimate / your signature wasn't right. With an electronic withdrawl, there's not really any information provided to them to check anything.


The poster you replies to knows the answer you've given. The question was rather: `Why is the system set up in such a way that this is the case?'


The automatic withdrawal system has always seemed ridiculous to me because it inverts the dependency chain for my finances but still provides no one any guarantee they get their money.

It's a constant pain that there isn't a common standard scripting language for finances so I can automate this stuff sensibly.


for example, someone could take your SSN and then go apply for loans at the bank under your name


If as a merchant you want something that works now, you can use Stripe - send identifying credit card info directly to Stripe without holding on to it yourself, and then ask it for a persistent customer ID that you can repeatedly charge.

You could still retrieve some identifying information through their API, but if you keep your account credentials somewhere separate from your database it's less likely for an attacker to get both.


I think you have identified better solutions than the current ones. What is more interesting is that many underlying processes already exist, for example any gmail address can be "customized" by adding "+<string>" to the end, Google ignores the +string and delivers it anyway. You can then filter on email sent to the +string value. Not everyone however accepts an email with + in the name part. Paypal has the ability to send you out to Paypal to authorize a payment, the vendor never sees your banking details. Their API works well and could be adopted by any bank if they chose. At Blekko we separated queries from IP addresses, from userid (if they were logged in). You could do it, still get ranking training data from it, and be completely unable to turn over "the last week of searches from this IP" to federal agencies. That was driven by the CTO.

So at the end of the day there is a lot of things which make data toxic easily avoidable, and it takes people at the company willing to invest in making the data "non-toxic" and to some extent non-useful to people outside the company.


> Not everyone however accepts an email with + in the name part.

It's beyond this. MOST email address forms won't accept a '+'. I had to change the extension character to '_' on my server because it's the only non-alphabetic character that everybody seems to accept.


FastMail has an option to use subdomain addressing[0]. Instead of user+string@example.com, it can be string@user.example.com.

[0] https://www.fastmail.com/help/receive/addressing.html


That is an excellent solution as well. Qmail was pre-configured to work with '-' in the user string. But the use of subdomains would work as long as you could meet things like the Google DKIM checks.


Paypal is not a solution. They send my paypal email address to every merchant which is no different from the POV of fraud then sending any other identifying info. That's one thing I liked about Google Checkout, they had the option to send a one time email address.


A PayPal email address, as opposed to a credit card number + expiry, is insufficient to make a transaction alone.


Around here, the local payment system is inverted: the site generates a code for your order, and the user instructs their own bank to send money to that code.

That said, many banks - including in the US - can already generate single-use virtual CC numbers.


That's not inverted. That's the proper order.


Oh, I agree; it's inverted relative to the CC mechanism.


The right way to solve this is that Visa and Master Card need to develop a standard to make super easy to generate a unique payment number everytime you make an online purchase. Then that should be built in as a browser extension or component. So I browse to a site, click to pay with my Visa card, and Visa automatically generates a unique code for that site and fills it in on the form.

Blur[0] from Abine has this in their premium version. I have used it, and overall it worked well, but I had some password syncing issues and stopped using it.

A final key innovation would be if email providers would make it super-easy to generate aliases per site. I do this myself manually with fastmail, but if there was a simple browser extension that would automatically create an alias and fill in a form, that would be great, because I could have a unique address that all funnels into one place, for everything I sign up to.

Something like this can be done on FastMail using a catchall alias[1], but it requires a custom domain, and the domain could be used to link all the accounts to you.

I'm experimenting with it, but what happens when I forget a password and the email I used to sign up for it? A password manager is an option for that, but they have their own problems.

Edit: FastMail also has subdomain addressing[2]. I believe it works with all of the FastMail provided domains.

[0] https://www.abine.com/index.html [1] https://www.fastmail.com/help/receive/alias-catchall.html [2] https://www.fastmail.com/help/receive/addressing.html


There's an interesting concept in here that governments should consider registering a personal identity TLD under their country codes, and just make it policy that everyone gets a unique one under their legal name + a word or phrase to avoid collisions.


It sounds to me like you need to read patio11's article about names :) http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-b...


Virtual credit card numbers have been around since the late 90s. Adoption has been essentially 0 because the UX sucks and credit card fraud is not the consumer's problem.


It is the consumer's problem for me. Because I travel a lot I get charges denied because the card company's computer decides the charge looks suspisious. I have to spend 20-40 minutes dealing with it on top of which I lose hotel reservations (last room available->charge denied->20-40 minutes to resolve->room gone) or I'm standing in a line to buy something, charge denied. Now I have to go somewhere quiet to spend the 20-40 minutes to resolve. Then get back in line. This happens about once a every 2 months. It's super annoying.

It's infuriating there's any chance for fraud at all when it seems like a solveable problem in 2015


Bank of America and Citi have virtual credit card numbers on a least some of their cards. I don't use it cause, why? It's a pain point to log in and generate a new one when I could just open up my password number and copy/paste the CC number. Especially when paying with PayPal or a site I've bought from before and already have my CC saved and I can just select it. It doesn't really effect me if someone steals my CC, I just call up the bank and they reverse unauthorized charges and send me a new one.


I always found it weird that in order to for my company to put money in my account electronically, they apparently need enough information to take money out. Why are these not two different levels of security?


For per-service email and occasional single use credit cards, I've been pretty happy with blur from abine: https://www.abine.com/index.html


There are already secure solutions today. When i make an online payment with my visa card with a new vendor based in my country, i get directed to the card issuer's website where i need to enter a one time code generated by putting the card in a keypad (which my bank issued), entering a starter code and my pincode and then reading the generated number off the keypad's display. Without the physical card and pincode no purchase from a new vendor is possible. In the physical store i need to input card and pin to use the secure chip on the card to sign the transaction.

Of course, then i go to the US and any random hobo on the street can charge me with just the card number and a scribble. The problem isn't that credit card companies don't know how to get rid of card fraud, it's that their customers like the convenience too much and won't let them do it.

The same thing for identity theft. In my country opening a bank account or getting a loan requires an id card, which is government issued and contains a digital certificate protected by a personal pin. Unless someone steals that card and knows your pin, they can't steal your identity.

These are easily solved problems. The reason they're not solved in the US is because the people won't allow them to be, or at least banks and government perceive it as such.


Video sequences will not be secure enough. 2FA with hardware tokens is the only practical solution that's secure.


> The right way to solve this is that Visa and Master Card need to develop a standard to make super easy to generate a unique payment number everytime you make an online purchase. Then that should be built in as a browser extension or component.

There was a mechanism from the 1990s to do online payments without giving the merchant a reusable secret identifier.

https://en.wikipedia.org/wiki/Secure_Electronic_Transaction

It's too bad that something like that didn't become widespread sooner, because it could have drastically cut down on credit card fraud.


With regard to SSN, the real issue is that it is used as both an identifier and as an authenticator. It's fine as an identifier, but authenticating with a number that anyone can steal (and yet, perversely, is difficult to replace with a new number) is terrible. I agree with your suggestion of some kind of biometric type authenticator, whether it be a video, retina scan, fingerprint, or some combination thereof.


Some payment gateways offer "gateway recurring billing" where the credit card data is stored at the gateway not on your most likely less secure servers. Ashley Madison could have done that and avoided some of the damage from the breach.


> need to develop a standard to make super easy to generate a unique payment number everytime you make an online purchase

So something like, a public key with a private key that only you own? ;)


Google wallet was supposed to add the ability to generate unique credit numbers. Idk if they did.


Most mobile payment solutions today operate on top of Tokens, with MasterCard, Visa, and Amex serving as Token Service Providers. Tokens appear similarly to current credit card numbers (not distinguishable in some cases). Each token is a unique number used for a payment, which the TSP then links to the card on file / credit card number. Merchants only ever have access to the Token.

Android Pay, that succeeded Google Wallet, uses such tokens.


Android Pay succeeded Google Wallet?

I thought Google Wallet was available in many platforms, e.g. even in my browser which may or may not run Android.


Schneier is missing one major reason why companies keep data: regulation.

So many regulatory bodies and laws requiring companies to keep all kinds of data for all kinds of reasons for a wide variety of periods, so that simply having a policy to "store all the things" is way, way simpler to implement than to carefully study and adhere to each individual rule.

Nothing really new here, even before cheap storage and ubiquitous computers, companies kept boxes and boxes of all the paperwork ever, just in case some audit may require them to dig it up. Only physical limitations sometimes caused them to throw away stuff labeled "a decade ago", and today there simply is more data and zero incentive to destroy it.


Good point, here's an example: EU VAT, which obliges companies selling digital goods in Europe to store customer and transaction details for 10 years.

https://www.gov.uk/guidance/register-and-use-the-vat-mini-on...


How does this work with digital stores (Steam/App Store/Play Store)? Do you even get that data from them as a developer?


I think those stores take care of VAT and all the requirements around that, so the developer doesn't need to worry about it. That's what the 30% cut is for.


The laws are totally insane when it comes to data, one law says you can't store it and if you do you have to enable the user to remove it, change it and view it, the other says you should keep it secret and should keep it for many years.


Buy the concept of taking data offline or to another network applies to this.

For example, while banks are required to keep tons of data for legal reason, the ones I've worked with have procedures where, for example, tellers are required to shred everything and send it for incineration. Then, the digital copies, once they can only be required if theres legal compulsion going on (ie after x number of years), are transfered by batch jobs which encrypt everything with a key generated by a CA that is offline most of the time, to a tape library which is only online for batch writes and can only be brought online manually by physically going into the data center. Then, after a little more time, but still within legally required reporting periods, the tapes are moved into a warehouse which very much resembles a bank vault.

And as soon as theres a reason that the data isnt mandatorily kept, the tapes are destroyed.

Honestly the security around those tapes is higher than bricks of cash, and they're destroyed even more readily.


Securing data is a daunting state to be in; it's not a task, but a lifestyle.

Out of 100 average web developers only a handful take security into account during design and fewer still think and work through what's necessary to keep anything safe at all.

It's no wonder that popping servers is so trivial and even high value targets with dedicated security teams and constant proactive threat response get pw0nd daily.


It's not only the software developer side, but also the business side that doesn't take it seriously. Businesses currently see only the upside in aggregating ever-larger pools of crosslinked data. Schneier's article here is pointing out the downside: collecting big centralized pools of data is an accident waiting to happen, a giant pile of toxic waste stored in rusty drums, which businesses are happily piling more waste onto as fast as they can.


More than that but there's no regulatory push (outside of maybe healthcare?) to require stringent handling of peoples data. If anything the opposite is true where the government wants people to build a specific backdoor for them.

This is certainly one of the reasons I've become more aware recently of the amount of data certain companies have on me. Data breaches have the potential to be catastrophic and very few people are looking out for my best interests.


It has nothing to do with web developer and everything to do with the insane state of web development and software development in general. You want a full-stack developer that also deal with security, as they say good luck with that. Knowledge in web development is always incomplete because there is simply not enough time to learn everything. So web developers and software developers are necessarily hackers. All of them. It is impossible for a software developer to be an expert because the domain of knowledge is beyond an individual's comprehension. So good luck adding security to the repertoire of already overworked people who are doing the jobs of 3 different people and usually earning a pittance for all the work they are doing.

You want security, then hire a security expert to oversee the development and ensure security. Pay him or her 60,000-70,000 to ensure that. Otherwise forget about it, not going to happen. Your developer is already too busy as he or she is.


> It has nothing to do with web developer and everything to do with the insane state of web development and software development in general.

?por que no los dos? I agree that the stack is crazy-huge these days, and shifting too much to solidly learn everything. We should still expect devs to try and integrate security through the entire thing though


Sorry but your expectation will not be met. You might be able to get better security, like for instance avoid SQL injections but it's a question of whether you think security is an all or none game. That either it's totally secure or a few known well known attacks are usually mitigated. A few attacks, sure. Secure. I don't think so. The attack surface is too large and the timelines too short to ensure anything other than, it works.

Just like we do QA you need security reviews that follow QA, a security team needs to review the code for just that security. Some of it can be automated, other stuff needs to be carefully studied. You can expect to add 20-30% to the final cost to account for something like this.


Security is easy. Do not trust user input. In any framework or library just identify user inputs and treat them carefully. Hard part is to remember that.


No, no it's really not.

https://www.owasp.org/index.php/Top_10_2013-Top_10 and that's only Web security.


Almost all of those items are exactly what I wrote: do not trust user input.


There has to be a middle ground between saying "security is simple" and "the sky is falling".

I think stating things like "don't trust user input" risks things like https://kivikakk.ee/cryptography/2016/02/20/breaking-homegro... happening.

Security is hard, programming is hard, we should all get better at both.


Session fixation - a commonly-used session-based attack - can be prevented simply by giving a user a new session ID whenever they obtain a new level of permission on a site (for instance, after they successfully login).


Requirements from clients are also a form of untrustworthy input :)


Part of this is our fault in implementing security libraries though. Let's face it: SSL has a terrible user experience when it doesn't work. While "just fail" is good practice, it's way too difficult to use in development across multiple languages compared to just using unencrypted traffic.


Shouldn't this be fairly standardized and easy by now?


You can't standardize vigilance.


Maciej Ceglowski has been saying the same thing: http://idlewords.com/talks/haunted_by_data.htm


This is a great article, and I hope it gets read widely. I love the phrase "toxic data spill". We won't have reached maturity in the IT world until it becomes completely accepted and assumed that your systems will be broken into and whatever data is accessible there will be stolen. Only when we start designing with that in mind as a first principle will we actually have a chance of making people safe. For now, virtually every system I come across is designed around the principle that nobody bad will ever get in, and all we have to focus on is layers of encryption and network security to stop them - it is honestly just ludicrously naive. Even with perfect security, one day someone you let in will turn bad and expose data.


I'd like to see 2 or 3 or 4 step authentication in place whenever someone tries to USE my data.

Your data is over the place, in many hands. While it should be protected, it should also be much harder to use it to pretend to be you.

You should be able to set up 0 or 1 or 2 step authentication for trivial purchases, 3 step for larger purchases or accessing credit, or even 4 step authentication for things like buying a car or house.

Some steps could be approval require or denial required. Its enough to be able to deny the purchase of a latte, but you might want to always have to approve spending thousands of dollars.

And we need to be able to set up new kinds of authentication steps, like fingerprints or the approval of one or more trusted relatives for an older person or child. Or even use a notary public. And you might have to use more of these if you are from home.

And none of this should be manditory, but there should be sensible defaults that individuals can change. AFTER being well authenticated, of course :-)

If we raise the difficulty level of stealing MOST people's identity, this will largely solve this problem, especially for those most wanting to solve it.


Or in the simplest terms possible, the best 'private' service or site is the one that doesn't store any personal information for its users. If you're running a site like Ashley Madison, then don't store real names and information. Same with if you're running an anonymous message service, an anonymous emai service, etc.

That's not some shocking new thing. Forums and other such sites have been letting people sign up with no more than an email address and password for years. And the payment stuff on these sites and services could easily go through PayPal or some other third party provider (who's likely got a much more secure system setup than you).

But no, a lot of sites and companies and services seem to be all 'let's store everything about everyone, and then wonder why it causes a meltdown when the site gets hacked and said data leaked all over the internet'.


He's right that data volumes have non-linear risk profiles.

He's wrong that there is evidence more data isn't better. While there are indications of this for advertising, it is definitely not the case for financial data.

And the other subtlety is that lots of low quality data is indeed useless, but small sums of high value data can do a lot. That high value data is what people are looking to steal. Having a little bit of user financial traction data, for example, is incredibly powerful. Much more so than, say, cross-website shared cookies or Amazon referral patterns.

And there is a whole class of data that has value proportional to the total sum of it you possess. A good example of that is surveillance data. Ubiquitous video coverage of an environment is much more useful (to both machines and humans) than partial coverage.


The trick is to just collect and store data at exabyte scales like Google or the NSA.

That way when there's a breach, it's impractical for the attackers to exfiltrate the complete dataset because the target probably represents a non-trivial percentage of the world's storage capacity.

The attackers can filter the data, but surely someone's going to notice ten thousand machines whirring away at odd MapReduce queries.


Let's say all of the most sensitive data of a person can be fit in 20kb, SSN, CC, Bank, your dogs's high school's mascot's sweetheart, etc. The entire US would just about fit on a 6tb drive.

The rest of the data is not that valuable in comparison.


>your dogs's high school's mascot's sweetheart

I'm not sure what that is exactly, but it sounds really sensitive. :)

In all seriousness, while the release of banking and identity information is certainly bad, I'd argue the contents of private communications or browsing/search history are potentially far more damaging for a lot of people. In order to include that in your hypothetical, it'd require either a lot of filtering or the 6TB number would balloon quite a bit.


>I'm not sure what that is exactly, but it sounds really sensitive.

I was poking fun at "security" questions.

These bits of data are gatekeepers, if I have your aol account password, I get all of those for free.

(So I think we're in agreement, my SSN isn't controversial to my friends, employer, family, news, etc, but most people have probably had conversations or searches that could look really bad)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: