> The problem isn't just that such a system is wrong, it's that the mathematics of testing makes this sort of thing pretty ineffective in practice. It's called the "base rate fallacy." Suppose you have a test that's 90% accurate in identifying both sociopaths and non-sociopaths. If you assume that 4% of people are sociopaths, then the chance of someone who tests positive actually being a sociopath is 26%. (For every thousand people tested, 90% of the 40 sociopaths will test positive, but so will 10% of the 960 non-sociopaths.) You have postulate a test with an amazing 99% accuracy -- only a 1% false positive rate -- even to have an 80% chance of someone testing positive actually being a sociopath.
Interestingly here he uses percentages to describe base rates and risk. Gerd Gigerenzer has a nice book, Reckoning with Risk, where he explains with many examples the problems of this approach. Gerd asks people to use real numbers instead, which are much easier to understand for most people.
Thus, Schneier's example becomes:
> Out of 1,000 people about 40 of will be sociopaths. You have a test that will tell you if someone is, or is not, a sociopath. The test will be correct 9 times out of 10. Bob has taken the test, and has been identified as a possible sociopath. The chance that Bob is actually a sociopath are actually about 1 in 4. This is because the test will tell you that 36 of the 40 sociopaths are sociopaths, but it will also incorrectly tell you that 96 non-sociopaths are sociopaths.
My writing is lousy, and other people will be able to clean this up, but even with my poor writing style it's easier for most people to follow and understand than the percentages.
This is alarmingly important when you're making a health decision - "Should I remove my breasts to reduce my risk of breast cancer?" for example.
EDIT: I use "sociopath" because it's in the source article. I agree with NNQ that it's very troubling to bandy around diagnostic labels like this, and deem people to be dangerous, just because of a tentative probabilistic diagnosis.
This is a pretty bad example to illustrate the fallacy because a 25% confidence is actually extremely good. I don't know what good a test for sociopathy is, but if we had a test this good at identifying terrorists, it would be incredibly useful. If signals intelligence could produce a list of people and guarantee that a quarter of the people on that list are terrorists, it would absolutely revolutionize law enforcement.
This issue has nothing to do with the efficacy of the tests. In a perfect world having a list where 25% of the names are potential trouble-makers (for various definitions of trouble-maker) would be an enormous benefit and allow resources to be targeted more efficiently. The real problem is what happens in the imperfect world where law enforcement, government, self-imposed officers of authority get lazy or down-right malevolent. Do you really want to live in a world where your family members might be carted off, never to return, because their name came up on a list generated by a computer. Try asking the folks in North Korea, for example, whether this type of test is a good idea or not.
The best outcome would be for these kind of traits to be passively detected and for the community to provide help and support as an emergent property of that community. In a way this already happens on sites like Reddit where suicide prevention emerged organically based on the dynamics of the community. This is in stark contrast to the "real world".
The internet is well known as a negative influence on certain people, but couldn't it be having a positive effect that is harder to measure and more an unintended side effect.
Yes, but what happens when your community find out that you no longer believe in God or that you think the Earth actually revolves around the Sun. It's not when things go right but when things go wrong that matter. The points of failure for this type of system are innumerable.
The real goal is building social/political systems that are robust and have checks and balances so that they cannot be perverted by special interests and are accessible to those who need them (child abuse support lines are a good example). Anything where a group intervenes on behalf of an individual is prone to disaster.
The system you describe is exactly what we have in the world now. The financial system has numerous checks and balances and is notoriously prone to non-virtuous behavior. Aren't hacker news or stackexchange examples of creating virtuous behavior using an algorithm, and a strong community? It is self regulating because if you suddenly have a deep opposition to the ethos of a community you can just leave (slashdot -> digg -> reddit).
You can 'just leave' pseudonymous communities because their aggregated judgment doesn't follow you to the next one. But if one's real identity is flagged as "(likely) sociopath" on a Real Name service, how does one 'just leave' that determination behind?
Are search engines and archives going to all willingly 'forget' that data when you 'just leave' Facebook? Are they going to not aggregate and correlate it to any new service you join?
This is one of the huge points of criticism of Real-Name-required services: a person can never escape an unjust judgment of such communities, due the long memory of the internet.
They'd end up on a no fly list and be deprived of basic liberties to travel with no right to appeal secret courts and determinations. Or they'd get invasive checks even though they are a kid, or in a collapsible wheelchair and obviously not a terrorist.
All because people can't understand the example in question, which appears in the first few chapters of most introduction to statistics books. And while all that money is being spent on useless checks the 9/11 terrorists, who the agencies were warned about, and the Boston bombers, who the agencies were ALSO warned about, are not followed up on because human and other resources are being spent on mass surveillance.
They'd be subject to background checks and surveillance that they ideally would not even notice - yes, I'm aware that we don't live in an ideal world, but the point is that traditional "leg work" policing is pretty good at determining whether a suspect is actually engaged in nefarious activities - but it's expensive and requires a reasonably narrow list of suspects to begin with.
Most likely nothing. It depends on what sort of test it is; if it's something non-intuitive like (say) a habit of writing sentences that always have a prime number of words in them, you'll get your false positives but most those people won't pass any other tests, whereas the actual terrorists will.
What Schneier is missing is that while you can't ID people that well from a single test, you can apply a bunch of them. In his example, one test improves the probability of correctly ID a sociopath from 4% to 24%. Apply another, different test of similar efficacy to that result set and you'll have a population of 21 true positives, and 8 or 9 false positives, increasing the probabiliy of a successful ID from 25% to ~70%. Sure, there's no single test that will give you reliable answers, but so what? It's OK to use a multi-pronged solution.
Applying multiple tests only works if the tests are independent. When you're searching for the proverbial needle in the haystack, you probably don't have enough needles to let you reliably calibrate several independent tests in the first place.
No, it doesn't. People commit crimes and get away with them al the time, in fact. I'm not proposing that we put people in jail for having criminal potential.
Nothing that is being proposed will stop people getting away with crimes.
Waiting until someone actually commits a crime will stop people being persecuted for a coincidental similarity of their behavior to that of a terrorist, sociopath or mime artist.
I think the fact that in this hypothetical 4% of all people are actually terrorists, and the potential terrorist list would be 10% of the total population, would have a greater impact on law enforcement than the accuracy of the test.
Even 0.4% of the population that get tested and are incorrectly "proved" innocent of being a terrorist amounts to more than a million undetectable terrorists in the US alone.
You're saying that you would be happy to join 74 other non-terrorists (i.e. law abiding citizens) plus 25 actual terrorists and be taken off to Guantanamo Bay indefinitely ?
You're really sure about that being a Good Thing for law enforcement ?
He got 25% starting from a base rate of 4%. The base rate for terrorists is a little lower than that. I agree that the essay probably ought to emphasize this point so skimmers won't take away the wrong idea.
Suppose you have a test that's 90% accurate in identifying both people with X and people without X. If you assume that 4% of people are people with X and you're told that you test positive for someone who has X.
Do you really find it easy to arrive at your actual chance (26%) of having X? Let's not forget that most people on HN are at the smarter end of the bell curve. It'd be interesting to see the results of a large scale study about answers to questions like this.
I think you're misreading my comment. When I said "%" distinguishes proportions from quantities, I was implying there'd be both proportions and quantities (as his version has). Otherwise, they needn't be distinguished.
When I said I found his version clearer, I meant between the two versions originally given. The one you've just added is of course less clear because unlike the other two, it doesn't point out the issue.
positives that are true positives
0.036 / 0.132 = 0.2727...
i had to think about the calculation as i was doing it wasnt automatic even though it was just multiplication, but I think the difficulty is more to do with the fact that you have to use some relative of bayesian probability not really the fact that you had to deal with percentages
I vote for just using real numbers instead of percentages, like "0.4 chance" instead of "40% chance". The reason is that a lot of people get the math of percentages wrong simply because it involves a lot of back and forth mental conversion between percentages and fractional representation. I always found it easier to just use the latter.
To be honest, with these numbers the percentages actually do a better job of giving me the impression that this test is worthless. 25% sounds much worse than 1 in 4.
But today try to ask a few people around you, and see what they say.
> Suppose you have a test that's 90% accurate in identifying people who have a disease, and 90% accurate in identifying people who do not have the disease. Assume that 4% of people have this disease. Hypothetical_Bob is tested, and the test says that he has the disease. What are the chances that Bob actually does have the disease?
Lots of people - smart people too! - struggle with this. Even if you give them pencil and paper and let them doodle around they will often give you an incorrect number. And most of them will be surprised if you tell them it's as low as 26%.
Ah yes, I see what you're saying. Real numbers are easier to reason with and get correct results than percentages.
I think my point is slightly orthogonal since I misunderstood you; if you tell someone that something is "10%" they will think "that is pretty bad" whereas "1 in 10" is more likely to get a "hey, that's not too shabby" response. Percentages sound "worse" than numbers, even when they are the same (at least to me). Perhaps because they are harder to reason with?
The "fallacy" described here is a non-issue. Medical tests have the same characteristics but are still incredibly useful. HIV testing is roughly 99% accurate, but given the large number of tests performed, a very large percentage of people will end up with the wrong result. And yet this test is vital.
Even wildly inaccurate tests can be useful. Imagine that driving drunk will result in a accident 10% of the time. 90% of the time, however, a driver will make it home safely. This is a wildly inaccurate predictor, but it still critical to know someone's blood alcohol level before giving them their keys.
One must simply be aware of the uncertainty involved in any test, and treat test results as probabilistic signal, not as proof.
I tend to agree with you, except it is not a non issue.
Look at the numbers of people getting very serious medical treatment because they, and their clinicians, have not understood the numbers.
If doctor (well educated intelligent person) cannot get this right I'm scared that labelling someone as "POTENTIAL TERRORIST" on the basis of a 1 in X possibility is going to have disastrous consequences.
Yes, this is an active area of debate in evidence-based medicine (EBM). If you aggregate outcomes, administering some tests actually appears to worsen prognosis, in the sense that if you take two groups of people with identical distributions of (unknown) conditions, and test one group while not testing the other, the tested group has worse overall outcomes. For example, in some cases people have surgery for a condition that, absent the test, would have remained asymptomatic and benignly ignored. With certain kinds of conditions, negative outcomes from retrospectively unnecessary treatment are frequent enough to outweigh the cases where discovery and treatment improves outcomes, if we're talking about aggregate outcomes.
Of course, discovery does not require treatment, so you could test, find a positive, and not do anything. But EBM people tend to view idealized responses with suspicion, and some argue that taken in real-world conditions, not administering certain tests, or administering them in more restricted situations, or at least not recommending them as the default, would improve aggregate outcomes (and the data seems to support that). They would then restrict the tests to cases where testing statistically improves outcomes.
I once tested positive for HIV with the basic antibody test you mention - a pretty stunning result in view of my lifestyle. A followup Western blot test with higher accuracy confirmed I was not positive. The basic test is fast and cheap but if there are no more accurate tests to detect false positives, it's a dangerous thing to rely on.
HIV testing is largely a fraud, though, as it's conclusions are based on unscientific claims that HIV and AIDS are the same thing or even scientifically proven correlated.
The field is laughable, really. It's sad that people making important decisions in the medical and pharmaceutical fields cannot correctly interpret a confusion matrix, or if they can they are corrupt and decide to ignore it anyway.
You really don't want to apply the same standards to law enforcement, as they are appalling.
EDIT: to the downvoters, I'd suggest doing some reading. In particular, check on the existence of any verifiable testing that proves any semblance of correlation between HIV and AIDS. The standard is appalling for any scientist who's willing to check on the numbers. In general, the standard in medical science is rather low compared to say physics, but in this case it's astonishingly low considering the accolades Montagnier got for his research. I understand it's easy to dismiss this as conspiranoia but it really is not. There are dubious claims passed as truths with very specific interests behind. Not making an assertion either way on the HIV<->AIDS relationship, just pointing out that the standard of the research is shocking and the statistics skills of the people involved are incredibly poor, if not intentionally corrupt.
This is even more pathetic than creationism. This is on par with insisting that the earth is flat. Denial is a hell of a drug.
Using condoms ever killed anybody. Stay save. And if you personally refuse therapy, ok. But HIV does cause AIDS and has a one hundred percent lethality. In case you affected use the remaining time.
You are making a series of stupid assumptions here. Like I'm denying that AIDS kills, or that STDs kill. It's clearly established. Which is why I've always been extremely cautious and I'm healthy as it gets.
The fact that AIDS is virus-induced, that remains unproven. However, it's pandered as such and a number of companies are cashing big on antivirals and antiretrovirals, in many cases worsening the patient's health. To claim something is proven without proof, that is pathetic.
And this is not the only scam induced by big pharma. There are many other perverse effects stemming directly from the fact that pharmaceuticals cash big on chronic illnesses, giving them an incentive to research not in cures, but in long lasting treatments.
Feel free to downvote away though. It's important and downvotes might actually get more people to read it.
Being "positive on a sociopathic personality test" does now equal being "a potentially dangerous pathological sociopath"!
Even if you get a system good enough to overcome the base rate problem, you'll only end up labeling a bunch of mostly harmless people. Think a about a hypothetical uber-villain that would want to recruit children or teenagers, brainwash them and turn them into assassins or other kind of agents. He may find out that people with some sociopathic traits are better candidates for this, so he will target them. Now think the uber-villain is, uhm... (working for) your government :)
...not to mention the mislabeling of people with atypical social interaction patterns, like ones with mild/pseudo aspies which combined with the base rate fallacy brings serious mislabeling.
I'm sure this kind of electronic-psycho-profiling is already in use, and I even think it may have interesting side-benefits, like cool work being done in AI research for use in this (no better way to start "humanizing" and AI than to have it model human personalities and predict their actions), but there's tons of things that can go south with it for lots of innocent people that just happen to be "different" (like most people who end up making breakthrough discoveries or world changing inventions, you know...).
Agreed. And perhaps the notion of "dangerous sociopath" can be revised altogether. I'm no expert but in my understanding sociopathy is more about non empathic social behavior. It can be felt as creepy and depicted as violent in fiction, but that's doesn't make it a dangerous behavior in itself, and can actually be socially rewarded.
It can be argued that the lack of empathy can lead to a more violent behavior, but violent acts comming from deep personal relationship are plenty a dozen as well, who knows.
The problem here is that people are using a term that has a long and colorful history in a very imprecise manner. Sociopath hasn't been in the literature for years, but people still seem to use it interchangeably to speak of antisocial behavior and/or psychopathic traits which are not at all interchangeable.
The DSM doesn't even list psychopathy as a diagnosis any more, only antisocial personality disorder which requires a history of, and we might as well just quote the DSM "... a pervasive pattern of disregard for, and violation of, the rights of others..." So yes, a diagnosis of ASPD does generally indicate someone who could be considered dangerous to others.
Now if we're talking about someone with psychopathic traits that scores high on a Hare, then no it does not necessarily indicate dangerous behavior. However, a Hare is still important when dealing with criminals as you do not want to give a psychopath treatment as it simply makes them better criminals.
> you do not want to give a psychopath treatment as it simply makes them better criminals.
What do you mean by "treatment" in this context? I thought that caught psychopathic criminals can end up in criminally insane facilities and such in most developed countries and somehow treated or at least attempt to do so.
There is a large base of research on this problem for the US military (and CCCP etc). i.e. How to discern effective killers within troops, how to shape them and how to 'uplift' the usual grunts to effectively kill.
I suggest you research "natural killers" from 1946 onwards (famous paper starts the ball rolling there - Combat Neuroses / Fatigue is a tip). Their percentage increase in producing effective killers is impressive. I won't link to specific papers, since it's outside the remit of H.N., however a lot of hard science & tech has been brought to bear on the issue.
The flip-side of this, the mirror to a sociopath (let's call it "the empathetically linked / driven") makes just as an effective killer for the record. If not better.
Tl;dr
You're about 70 years too late to pioneer this field. However, in looking @ the current trend to map Autism onto AI networks as a model, I'd suggest you'd probably have more joy looking at the empathetically linked individuals to see how their strong network connections / protective instincts are harnessed if I were to build a neural Map (weak) AI.
Bruce has a great point about how people overreact due to the base rate fallacy, but I'm afraid that it will forever be too subtle for legislators and judges.
My immediate thought is that if the base rate fallacy were part of their education, society would be better off, but legislators still have to play to their constituents and it's hard to have hope in that sphere.
I'm expecting social networks to be used 'voluntarily' for insurance soon. Likely the no social network quote will be higher, because it's higher risk because it's based on less information.
So you get a quote through their app, which uses your likes, your social network, and maybe some NLP on your posts to decide if you qualify for a lower quote.
Insurance people in the UK tell me this would be very useful, but probably isn't possible from a legal perspective, but there will be other countries where it is.
An effect of this would be to penalize people who don't use social networks.
I doubt that 'normal people' would be able to commit terrorist attacks because normal people experience empathy which would prevent them from hurting people voluntarily (even soldiers who act under strict orders in a war that is perceived as just are often traumatized by hurting/killing an enemy combatant... and this would be much more severe if victims were known to include innocent people or even children).
The problem is that there is always enough people who are not normal (there is a great book about it by Erich Fromm: The Anatomy of Human Destructiveness) and it's possible to 'make' people into something not normal in psychological sense - i.e. suppress their empathy for certain group of people by some war trauma or conditioning/brainwashing.
BTW, I think that it's a myth that even bad men love their mommas. It depends how you define love. From what I remember based on one psychopath that I know really well he claims that he loves his mother and maybe he even thinks he feels something like that but he acts in such a way that his mother often gets hurt by his actions and he acts with complete disregard of that. Not love in my book.
Here's the real problem: suppose you have a bunch of data on people and are convinced that somewhere in the data exists the spectrum from sociopath to non-sociopath.
Who is going to label the data with ground truth? Clinicians who "know it when they see it"? What is the ground truth that the classifier is going to train on?
If you're going to do an unsupervised classifier (eg clustering) who is going to label the clusters? What is going to keep the data from turning into uncorrelated mush?
Two words: Bayes' theorem. I think I learned this in the fourth or fifth week of my first probability course. Thought it was a bit more commonly known.
I think the word "sociopath", like "nihilist" and to a lesser extent "traitor", never had much meaning outside of "someone whose characteristics or actions I do not like".
So I question the basis on which Mr. Adams -- whose works some might consider a sociopathic attack on American business practices, and by extension, capitalism -- thinks content analysis is a good idea.
Accuracy is known to be a unreliable metric to measure the quality of a test (classifier). One paradox is that tests of higher accuracy might have less predictive power than the one with lower accuracy.
In other news psychiatry is bullshit and I would say psychology is not far behind. The whole point of society is that we don't assume people are a certain way until they harm someone. If we no longer do that we're no different from any of the other totalitarian regimes and religions that prescribe their own flawed dogma about what constitutes a flawed person deserving to be punished. Any such civilization should be overthrown because it runs counter to all ideals of the enlightenment.
Perhaps you’re thinking of people who are antisocial? I wouldn’t be surprised if some of the Facebook members who are most active and have the most ‘friends’ are sociopaths.
I’m not talking about crime. Most sociopaths never commit any crimes and it’s not illegal to be a sociopath. That would be silly, like jailing someone for having red hair. Also, it’s quite possible that there are more sociopaths than there are people with red hair – estimates range from 1% to 4% of the population.
> "antisocial" and "sociopathy" are roughly the same thing.
When you look them up in the DSM (the big book of psychological disorders), it will seem that way. However, you can be antisocial or possess sociopathic traits without having a disorder. In general dictionaries, ’antisocial’ is synonymous to ’unsociable’. That’s how it’s most often used and that’s how I meant it.
> Perhaps you're thinking of people who are non-social?
I had never heard of that word. After Duckducking it, it seems to me that no one uses it. If I were to use another word, I’d probably go for ’asocial’.
Think of some the most successful people you know, have lots of friends, charming and charasmatic. Seems to do no work but gets to the top anyway? Always in press releases, and newspapers but the rest of the team gets no reconigtion? Good chance they're a sociopath. Sociopaths also tend to be good at getting funding for start ups ;)
> The problem isn't just that such a system is wrong, it's that the mathematics of testing makes this sort of thing pretty ineffective in practice. It's called the "base rate fallacy." Suppose you have a test that's 90% accurate in identifying both sociopaths and non-sociopaths. If you assume that 4% of people are sociopaths, then the chance of someone who tests positive actually being a sociopath is 26%. (For every thousand people tested, 90% of the 40 sociopaths will test positive, but so will 10% of the 960 non-sociopaths.) You have postulate a test with an amazing 99% accuracy -- only a 1% false positive rate -- even to have an 80% chance of someone testing positive actually being a sociopath.
Interestingly here he uses percentages to describe base rates and risk. Gerd Gigerenzer has a nice book, Reckoning with Risk, where he explains with many examples the problems of this approach. Gerd asks people to use real numbers instead, which are much easier to understand for most people.
Thus, Schneier's example becomes:
> Out of 1,000 people about 40 of will be sociopaths. You have a test that will tell you if someone is, or is not, a sociopath. The test will be correct 9 times out of 10. Bob has taken the test, and has been identified as a possible sociopath. The chance that Bob is actually a sociopath are actually about 1 in 4. This is because the test will tell you that 36 of the 40 sociopaths are sociopaths, but it will also incorrectly tell you that 96 non-sociopaths are sociopaths.
My writing is lousy, and other people will be able to clean this up, but even with my poor writing style it's easier for most people to follow and understand than the percentages.
This is alarmingly important when you're making a health decision - "Should I remove my breasts to reduce my risk of breast cancer?" for example.
(http://www.amazon.com/Reckoning-Risk-Learning-Live-Uncertain...)
EDIT: I use "sociopath" because it's in the source article. I agree with NNQ that it's very troubling to bandy around diagnostic labels like this, and deem people to be dangerous, just because of a tentative probabilistic diagnosis.