An excellent post by Schneier. > The problem isn't just that such a system is wr...

coffeemug · on June 20, 2013

This is a pretty bad example to illustrate the fallacy because a 25% confidence is actually extremely good. I don't know what good a test for sociopathy is, but if we had a test this good at identifying terrorists, it would be incredibly useful. If signals intelligence could produce a list of people and guarantee that a quarter of the people on that list are terrorists, it would absolutely revolutionize law enforcement.

smackay · on June 20, 2013

This issue has nothing to do with the efficacy of the tests. In a perfect world having a list where 25% of the names are potential trouble-makers (for various definitions of trouble-maker) would be an enormous benefit and allow resources to be targeted more efficiently. The real problem is what happens in the imperfect world where law enforcement, government, self-imposed officers of authority get lazy or down-right malevolent. Do you really want to live in a world where your family members might be carted off, never to return, because their name came up on a list generated by a computer. Try asking the folks in North Korea, for example, whether this type of test is a good idea or not.

7952 · on June 20, 2013

The best outcome would be for these kind of traits to be passively detected and for the community to provide help and support as an emergent property of that community. In a way this already happens on sites like Reddit where suicide prevention emerged organically based on the dynamics of the community. This is in stark contrast to the "real world".

The internet is well known as a negative influence on certain people, but couldn't it be having a positive effect that is harder to measure and more an unintended side effect.

smackay · on June 20, 2013

Yes, but what happens when your community find out that you no longer believe in God or that you think the Earth actually revolves around the Sun. It's not when things go right but when things go wrong that matter. The points of failure for this type of system are innumerable.

The real goal is building social/political systems that are robust and have checks and balances so that they cannot be perverted by special interests and are accessible to those who need them (child abuse support lines are a good example). Anything where a group intervenes on behalf of an individual is prone to disaster.

7952 · on June 20, 2013

The system you describe is exactly what we have in the world now. The financial system has numerous checks and balances and is notoriously prone to non-virtuous behavior. Aren't hacker news or stackexchange examples of creating virtuous behavior using an algorithm, and a strong community? It is self regulating because if you suddenly have a deep opposition to the ethos of a community you can just leave (slashdot -> digg -> reddit).

roc · on June 20, 2013

You can 'just leave' pseudonymous communities because their aggregated judgment doesn't follow you to the next one. But if one's real identity is flagged as "(likely) sociopath" on a Real Name service, how does one 'just leave' that determination behind?

Are search engines and archives going to all willingly 'forget' that data when you 'just leave' Facebook? Are they going to not aggregate and correlate it to any new service you join?

This is one of the huge points of criticism of Real-Name-required services: a person can never escape an unjust judgment of such communities, due the long memory of the internet.

rossjudson · on June 20, 2013

"We're here to help you. And watch you succeed, friend!"

jjs · on June 20, 2013

> In a perfect world having a list where 25% of the names are potential trouble-makers [...]

... would be pointless, as a perfect world would have no concept of "trouble".

steve19 · on June 20, 2013

"in a perfect world" is an English phrase. It means, in this context, "if this worked perfectly".

It is not a statement about the world in general.

jjs · on June 20, 2013

You say idiom, I say chronic lack of imagination.

paganel · on June 20, 2013

> and guarantee that a quarter of the people on that list are terrorists,

Honest question, what would happen to the other three quarters?

jkldotio · on June 20, 2013

They'd end up on a no fly list and be deprived of basic liberties to travel with no right to appeal secret courts and determinations. Or they'd get invasive checks even though they are a kid, or in a collapsible wheelchair and obviously not a terrorist.

All because people can't understand the example in question, which appears in the first few chapters of most introduction to statistics books. And while all that money is being spent on useless checks the 9/11 terrorists, who the agencies were warned about, and the Boston bombers, who the agencies were ALSO warned about, are not followed up on because human and other resources are being spent on mass surveillance.

brazzy · on June 20, 2013

They'd be subject to background checks and surveillance that they ideally would not even notice - yes, I'm aware that we don't live in an ideal world, but the point is that traditional "leg work" policing is pretty good at determining whether a suspect is actually engaged in nefarious activities - but it's expensive and requires a reasonably narrow list of suspects to begin with.

anigbrowl · on June 20, 2013

Most likely nothing. It depends on what sort of test it is; if it's something non-intuitive like (say) a habit of writing sentences that always have a prime number of words in them, you'll get your false positives but most those people won't pass any other tests, whereas the actual terrorists will.

What Schneier is missing is that while you can't ID people that well from a single test, you can apply a bunch of them. In his example, one test improves the probability of correctly ID a sociopath from 4% to 24%. Apply another, different test of similar efficacy to that result set and you'll have a population of 21 true positives, and 8 or 9 false positives, increasing the probabiliy of a successful ID from 25% to ~70%. Sure, there's no single test that will give you reliable answers, but so what? It's OK to use a multi-pronged solution.

fpgeek · on June 20, 2013

Applying multiple tests only works if the tests are independent. When you're searching for the proverbial needle in the haystack, you probably don't have enough needles to let you reliably calibrate several independent tests in the first place.

bandushrew · on June 20, 2013

Waiting until someone actually commits a crime provides a list that is 100% accurate.

finnw · on June 20, 2013

I can just see the Daily Mail headline now:

"Facebook records reveal convicted killer wrote 13-word post 5 years ago - red flag was raised - why was nothing done?"

anigbrowl · on June 20, 2013

No, it doesn't. People commit crimes and get away with them al the time, in fact. I'm not proposing that we put people in jail for having criminal potential.

bandushrew · on June 20, 2013

Nothing that is being proposed will stop people getting away with crimes.

Waiting until someone actually commits a crime will stop people being persecuted for a coincidental similarity of their behavior to that of a terrorist, sociopath or mime artist.

ZeroGravitas · on June 20, 2013

I think the fact that in this hypothetical 4% of all people are actually terrorists, and the potential terrorist list would be 10% of the total population, would have a greater impact on law enforcement than the accuracy of the test.

Even 0.4% of the population that get tested and are incorrectly "proved" innocent of being a terrorist amounts to more than a million undetectable terrorists in the US alone.

mortov · on June 20, 2013

Do I understand you correctly ?

You're saying that you would be happy to join 74 other non-terrorists (i.e. law abiding citizens) plus 25 actual terrorists and be taken off to Guantanamo Bay indefinitely ?

You're really sure about that being a Good Thing for law enforcement ?

abecedarius · on June 20, 2013

He got 25% starting from a base rate of 4%. The base rate for terrorists is a little lower than that. I agree that the essay probably ought to emphasize this point so skimmers won't take away the wrong idea.

stephencanon · on June 20, 2013

I don’t pretend to know what the value of N ought to be[1], but 1/3 is ridiculously low.

[1] http://www2.law.ucla.edu/volokh/guilty.htm

6ren · on June 20, 2013

I found his version clearer because "%" distinguishes proportions from quantities.

DanBC · on June 20, 2013

Yes, but imagine you're taking the test.

Suppose you have a test that's 90% accurate in identifying both people with X and people without X. If you assume that 4% of people are people with X and you're told that you test positive for someone who has X.

Do you really find it easy to arrive at your actual chance (26%) of having X? Let's not forget that most people on HN are at the smarter end of the bell curve. It'd be interesting to see the results of a large scale study about answers to questions like this.

6ren · on June 20, 2013

I think you're misreading my comment. When I said "%" distinguishes proportions from quantities, I was implying there'd be both proportions and quantities (as his version has). Otherwise, they needn't be distinguished.

When I said I found his version clearer, I meant between the two versions originally given. The one you've just added is of course less clear because unlike the other two, it doesn't point out the issue.

BTW: But maybe it is clearer, for calculation rather than understanding, because I get 27.(27)%, not 26%... https://www.google.com/search?q=%28.9*.04%29/%28.9*.04%2B.1*...

apalmer · on June 20, 2013

not sure how to calculate this with percents

true positives .90 * .04 = 0.036 false positives .10 * .96 = 0.096

total positives 0.132

positives that are true positives 0.036 / 0.132 = 0.2727...

i had to think about the calculation as i was doing it wasnt automatic even though it was just multiplication, but I think the difficulty is more to do with the fact that you have to use some relative of bayesian probability not really the fact that you had to deal with percentages

TeMPOraL · on June 20, 2013

I vote for just using real numbers instead of percentages, like "0.4 chance" instead of "40% chance". The reason is that a lot of people get the math of percentages wrong simply because it involves a lot of back and forth mental conversion between percentages and fractional representation. I always found it easier to just use the latter.

jlgreco · on June 20, 2013

To be honest, with these numbers the percentages actually do a better job of giving me the impression that this test is worthless. 25% sounds much worse than 1 in 4.

DanBC · on June 20, 2013

Once you get to 25% that's true.

But today try to ask a few people around you, and see what they say.

> Suppose you have a test that's 90% accurate in identifying people who have a disease, and 90% accurate in identifying people who do not have the disease. Assume that 4% of people have this disease. Hypothetical_Bob is tested, and the test says that he has the disease. What are the chances that Bob actually does have the disease?

Lots of people - smart people too! - struggle with this. Even if you give them pencil and paper and let them doodle around they will often give you an incorrect number. And most of them will be surprised if you tell them it's as low as 26%.

jlgreco · on June 20, 2013

Ah yes, I see what you're saying. Real numbers are easier to reason with and get correct results than percentages.

I think my point is slightly orthogonal since I misunderstood you; if you tell someone that something is "10%" they will think "that is pretty bad" whereas "1 in 10" is more likely to get a "hey, that's not too shabby" response. Percentages sound "worse" than numbers, even when they are the same (at least to me). Perhaps because they are harder to reason with?