Humans operating lie detecting machines with personal interviews has yet to rise above the standard of 'Mostly used to bullshit and intimidate suspects', but these clowns think they can use facial recognition to detect lying?
What a huge scam. I don't know what conman dreamt this up, but to convince multiple governments to go along with it he has to be good.
Maybe it's time to resurrect the Canadian Fruit machine project and get some idiot in the EU government to finance it. Except instead of detecting 'gayness' it detects 'Terrorist thoughts'.
I am so confused. There seems to be a blatantly obvious method to find out how well this machine works, no anecdotes or trust required: You get a decent amount of test subjects to dry run it in a controlled environment.
Has this not been done? If not, why? Or am I wrong in believing it's that simple?
But let's be honest here: we SHOULD use this approach for ALL of these kinds of assessment, yet we don't. Even in the US, we use things like fingerprint matching, handwriting analysis, drug testing laboratories, DNA analysis, dog sniffs for drugs and so forth in legal cases yet we don't perform independent blinded tests of the accuracy of these tools.
It’s driven by two things - by the public perception that all bad guys fly coach (a la 911) and by governments seeking to expand the number of biometric means to enforce compliance of all sorts of things unrelated to travel. The irony is that many people who oppose a physical border wall on humanitarian grounds don’t seem to see the perils of virtual walls...such as when a visited country decides to share back my fingerprints it collected at the border with my home country despite legal protections against collecting my fingerprints when I’m inside/entering the home country OR any number of other applications that have nothing to do with border protection or everyday travel... unlike a physical barrier that can do nothing but it’s intended purpose. Sorry don’t mean to make this about borders but I’m trying to make a point that sometimes we use tech we end up creating entire new classes of problems worse than Ones we were trying to avoid... maybe the Amish have it right in this regard?
And out of all of those, the only one that actually works is DNA analysis and even that fails from time to time due to how easily samples can be contaminated.
My understanding is that DNA is basically unique, but the patterns use for conventional DNA analysis are not. While they may be statistically close enough to unique to trust over the whole population, there have been different issues when you start scoping things to different racial groups or areas with smaller gene pools (not dangerously small, just smaller).
Because actually sequencing and comparing someones entire DNA would be prohibitively expensive if done for every case, they just look for a set of markers and assume that a large enough collection is close enough without accounting for distribution patterns of those markers AMONG THE SAMPLE POPULATION. (e.g. you don't have to look hard in small towns or in populations of the same heritage to notice a lot of physical characteristics that are quite common in that sub-population while being pretty distinct in the human population at large, the same is true for these markers.) . Since crimes often involve suspects from the same location and/or racial profile, a rule that is pretty reliable for the earth is not so reliable for this town/community/group of suspects.
So the science is a problem even before you get to human error. The odds of a false match take hits of multiple orders of magnitude.
Note: I'm not qualified to speak on the topic, but I've followed the topic with interest at a layman level for a while - be glad to hear from someone who IS qualified to speak on it to shed more light on why I'm right/wrong.
> My understanding is that DNA is basically unique
It is, in a lab environment with lots of source material.
If you are taking small samples of DNA from terribly dirty places and then amplifying the hell out of it, you get massive amounts of contamination.
Rape kits are "mostly" reliable under most circumstances, but have some problematic edge cases. Swabs from crime scenes, on the other hand, are generally garbage.
>My understanding is that DNA is basically unique, but the patterns use for conventional DNA analysis are not.
Even if they were, unless it's something like blood DNA after an attacker gets hurt, liquids from a rape, etc. it's trivial:
1) for the culprit to take another's DNA and place it on a crime scene
2) for an innocent to leave DNA at what would later be a crime scene
3) for police (or anyone with access) to add someone's DNA at a crime scene [1]
And the worst thing is, the DNA match then is considered "irrefutable" evidence...
[1] If you (3) is far-fetched, you haven't been following 100 years of police tampering to frame people, even in the USA, and much more in places like Latin America, etc. Just two recent examples:
The point is that the particular techniques used in court cases are often not verified by any scientific testing. The last sentence of the PNAS paper abstract is: "Examiners frequently differed on whether fingerprints were suitable for reaching a conclusion."
"Examiners frequently differed on whether fingerprints were suitable for reaching a conclusion."
is one of those weasel phrases which - while an accurate statement - means absolutely nothing because first "differing" in an opinion means almost nothing and the word "conclusion" is not defined.
Many people assert the Earth is flat and have detailed explanations on why that is the case, yet they "differ" in opinion too.
If fingerprints stood alone as a single point of evidence for guilty/innocent, I might agree. But it's used primarily as a "we have person A who matches close enough, let's go talk to him" that's useful. Or alternatively, if a fingerprint is one (of many) pieces of evidence that goes into making the case, that's useful. Only the worst cop shows say "the computer matched this guy! It must be him!"
* I was with the FBI on fingerprint analysis and saw the data, processes, etc and how a fingerprint is only a data point (aka clue) within the larger case.. and sometimes it's not enough depending on how many points you match.
To the extent they work at all, lie detectors only do so if the person telling the lie has a real reason to lie. Skin in the game, high stakes, and real harm if their deception is uncovered.
Having a controlled environment to test this is really hard. You can't just have someone say, "My mother is 35" when their mother is really 53. That's a silly lie, and can be told with the same level of calmness as the truth.
So how do you design an experiment with "real" liars (telling non-trivial lies), while knowing which participants are lying or not?
You let participants know that they'll receive money for every lie they "get away with" making them feel loss-aversion.
This has been used in psychological studies, but it's effectiveness is proportional to cost of doing the study, so it is expensive to get large datasets.
If the machine could catch people lying for $10 (or $50, or ...) in a double blind study, that would set a quantifiable lower bound for its effectiveness.
False positives are more concerning here than false negatives. It should be obvious to everyone that way way more people are "not calm" for reasons other than lying!
Yeah. My "to the extent..." preamble represents my feeling on polygraphs and lie detectors in general.
The entire situation of being strapped into a machine and asked invasive questions is going to be nervous-making. And depending on which questions are asked, and how they're asked, the examiner can easily contaminate the exam and extract a false positive.
That's an interesting point of view. If some hypothetical "bad guys" are trying to set off a series of tactical nuclear weapons, false negatives are actually pretty bad.
(Yes, I actually am a small-l libertarian and have a strong aversion to overbearing and stupidly-designed government programs, which this one certainly seems to be.)
Maybe you could induce something similar by providing the liar with strong incentives / stress. Something like this:
>We're going to hook you up to a state-of-the-art polygraph. We know you've heard that polygraphs are snake oil, but we think we've made a breakthrough on this one, which is why we're running this test. Answer truthfully to all questions except the five on this shortlist. For any of listed questions, you may choose to lie. If you tell one undetected lie, you get $5. For each additional undetected lie, your payoff is multiplied by 1.9. If any of your lies is detected, you get no money and you get shocked. The intensity of the shock has a similar exponential relationship with the number of lies you told.
If shocking volunteers doesn't get past the ethics committee, then maybe you start off the volunteers with a baseline payout of $50 or so and allow the payoff to both grow and decrease exponentially. The goal being that on that fifth lie, they're really on the edge, knowing that they could either make a lot of money or get {shocked,nothing}.
how about this -> we are going to tell you how to defeat the polygraph, which is easy because it is actually mostly BS, and we will pay you $100K if you successfully deceive the polygraph when the examiner asks you if you have been told how to defeat the test.
Then tell the polygrapher that everyone will lie on that question so they should flag all responses as lies. Then everyone's results can be evaluated by another polygrapher to determine who was lying 'well', with no payout needed.
The purpose of these machines isn’t to actually work as intended. These machines are here for plausible deniability: “we pulled you aside because the machine said so, not because you’re Muslim”.
Or the computer said he was lying about not being a terrorist, so we went to arrest him but he was not cooperative and violently resisted, which of course is what a terrorist would do, and thus we had no choice but to shoot and kill him. In self defense of course and to protect the public. This is OK because the computer determined he was a terrorist and computers don't make mistakes.
Apparently it has been tested, has been shown to have statistically significant result. As of November 2018 it was not used nor planned to be used on enforcing any border control, but was tested at three borders in real conditions.
Even a good system would have false positives. In itself, that does not invalidate the whole system, even if I think the whole premise of this project is very shaky.
Being statistically significant is nowhere near a high enough bar for this sort of quackery. Statistical significance is gameable, doesn't carry information about type I and II errors and frankly just means there is at least 1 common case that the machine gets right.
People shouldn't be given or denied opportunity because a magic machine likes them. This machine will be like a human - it will develop a bizarre set of biases and one of them will happen to be correlated with reality. It could be producing about as much evidence as someone guessing that Arabs are Muslim - grossly stupid, statistically significant results.
Honestly at 4.5m that's a marginal project that is unlikely to be used anywhere seriously. And the EU answers makes it clear it will never take decisions alone.
You know, there has been a bit weird push by brexit promoters for "frictionless" borders relying on magic tech. I suspect the funding of this could come from there.
Expect it to not go anywhere once the brexit mess is over
I'm no libertarian, but I will concede that this is a perfect example of libertarian griping about government spending: it's not getting done because no one has any incentive for it to get done.
Does the manufacturer have any incentive to honestly test the machine? Hell no, then no one would buy it.
Does the government have any incentive to honestly test the machine? No, because airport/customs security is security theater and their goal is convincing voters that "something is being done". (Voters might have an incentive to honestly test the machine, but it's more-or-less impossible for this preference to affect policy, unless huge numbers of people become single-issue voters about security device testing. Which isn't likely.)
Do the operators have any incentive to honestly test the machine? No, because if a terrorist slips through the machine and then blows up a plane, none of the blame comes back to the individual officer who ran the test.
Without any incentive for the machine to work, it's no surprise that it doesn't.
the point isn't detecting lies, the point is pressure to confess during interrogations. what matters is whether the person being interrogated thinks it works -- or even if they don't, to make lying more stressful.
You viddy, it was a very, very horrorshow machine, for it made all those vecks ittying across the border go bezoomny trying to figure out what litso to make to keep the machine like happy.
It was, oh my brothers, on purpose, as to make these vecks to not want to try do that again. The bratchnies at the top must have loved it. [1]
The fruit machine was reincarnated for pedosexuals: a device attached to their genitals measures if they get sexual arousal from pictures of children. Those that do are not deemed ready for rehabilitation.
Where most people yell scam or digital phrenology, I have a somewhat contrarian view: These systems do work. It is possible to tell, better than random guessing, if someone is gay or has a violent disposition, with just a single picture. Prisons for violent crimes see way more inmates that are bald, bearded, acned, square-jawed (signs of high testosterone). Replicated studies have shown that the profile pictures of gay men are significantly different from straight men, from subtle effects, such as more attention to grooming, to more physically noticable, like shape of the jaw being more rounded.
I have no reason to disbelief that an automated system could check for tell-tale signs that someone is hiding something: needing a lot of time to answer basic questions, using their lead hand to cover their chin, looking not in the direction commonly associated with recall, but that of imagination, trembling voice, anxious eye twitches, etcetera.
This is what flawed human border guards are already doing. Isreal has the most advanced airport security and trains European and American border guards to detect suspicious behavior. The TSA has over 3000 behavior detection agents. These are people with their own political and religious beliefs, prejudices, and variance -- and they can't be audited rigorously. We just never heard the accuracy, so we can't say if lie detectors can beat this (or can help as a human tool). But I bet they can.
I was dissapointed that the actual video chat with the journalist and the digital border guard was not included in the investigative article. They argue that the system be interpretable, but give no full transparency themselves. I'd trust that she did not tell any lies, but I don't trust that they did not try to game/fool the system, as to have an actual article to write about. Anyway, using just one test subject is majorly flawed, and comes close to not understanding that science can't provide 100% accurate predictions, just probabilities. I feel it is a reasoning flaw to discard any automated system, by honing in on a single mistake.
Why would you wait until someone is no longer a pedophile before deeming them ready for rehailitation?
Sure, one could grant that gay people on average are slightly more feministic, and criminals on average are more testosteronistic (as are athletes and law enforcement officers).
How do you define "work"?
How is this information usable, even slighly, in a security context, overcome the completely predictable shitshow that it will create in practice?
You say "telltale", but that's not supported by the evidence.
> Israel has the most advanced airport security
Because they have highly trained officers interrogating people and searching packages, not running AI dowsing rods.
Rehabilitation in society: most people do not want convicted pedosexuals who show no signs of betterment to be around children, just like most people do not want murderers released when they say to the prison doctor that they still have an urge to kill.
Work, as in serve as a double-check for a human border agent. If someone failed to correctly (as deemed by a reasonably accurate system) answer all 16 questions, I do not want to fly with that person, before a border guard has had a second look. This is how fraud detection often works: An automated system gives a high score, and possible explanations for this score, and then a human analyst can make a more informed decision.
Model-performance based accuracy (both human and artificial neural networks) supports the evidence for efficiency.
> Because they have highly trained officers interrogating people and searching packages, not running AI dowsing rods.
These highly trained officers also sit behind a video camera to observe passengers. Do you think detecting suspicious behavior from video is AGI-complete? BTW: Isreal invests a lot into large scale face detection at its borders, has plenty of intelligent hardware devices aiding its security, uses statistics to skip a pat-down of a 5-year old Isreali boy, they track cars the moment they enter the parking lot and track the time there -- and cross-reference if this car has been near the border or power plants, they may (not sure) do social media analysis, like the US is doing now, the Isreali army unit Intelligence Corps 8200 is actively supporting airport security, the Isreali border patrol focuses all their attention on passengers, and not their luggage (why search their luggage after they've been cleared by a behavior check?), they use TraceGuard to swab clothes for substances, they have a similar Suspect Detection System called VR-1000 which automatically checks for signs of lies, such as profuse body sweat and eye movements, BellSecure ties up all sources of information on the web and in databases to get a better no-fly list, they track their own border agents with automated systems to spot opportunities for learning and malbehavior, WeCU also automatically checks facial clues, they have automated weapon scan systems, Vigilant's surveillance systems are deployed in Israel and the US and act as a digital border guard and motion/gait recognizer.
What may sound like an AI dowsing rod to you, could actually help combat airline terrorism.
> WeCU Technologies (as in "we see you") is a technology company based in Israel that is developing a "mind reading" technology for the purpose of detecting terrorists at airports. The company's products evaluate reactions to specific images for indications that someone is a potential threat.
> The technology involves projecting an image that only a terrorist would be likely to recognize onto a screen. The idea is that people always react when they see a familiar image in an unexpected location. For example, if a person unexpectedly saw an image of their own mother on the screen, their face and body would react. For the terrorist detection, the people passing by the screen would be monitored partly by humans, but mostly by hidden cameras or sensors that are capable of detecting slight increases in body temperature and heart rate. Other detection devices, which are more sensitive and currently under development, could be added later.
No the sources are from CIA and FBI agents trained in interrogation and spotting lies (and wanting to sell their books, like researchers want their research read). One of the agents used these signs to know that Timothy McVeigh was lying. They also give a counter to your hand-pick: Observe the person when they are not lying/natural environment, note any ticks, and discount these when interrogating.
Place your lead hand thumb on your cheek and two fingers on your chin and imagine you are talking to someone standing one meter from you. Do you feel sincere?
There is plenty of research that show that lie detection is not all bunkum, and that techniques such as cognitive overloading help catch lies and lower defenses (which need focus and don't come naturally to most people).
>> Place your lead hand thumb on your cheek and two fingers on your chin and imagine you are talking to someone standing one meter from you. Do you feel sincere?
I really can't think of anything I could do that could make me feel insincere when I was being sincere. This sounds a bit like the discredited claims about power-posing, or smiling to feel better etc.
I'm sorry but I really think you're letting yourself be taken in by some extraordinarily shoddy science and by the pseudo-scientific claims of people who are either engaging in magickal thinking and really believe they can "tell when you're lying" or just charlatans trying to take advantage of the naivete of others.
> I'm sorry but I really think you're letting yourself be taken in by some extraordinarily shoddy science and by the pseudo-scientific claims of people who are either engaging in magickal thinking and really believe they can "tell when you're lying" or just charlatans trying to take advantage of the naivete of others.
You're citing the Stanford gaydar paper, a pseudo-scientific attempt to cash in on the hype about neural nets. It was widely condemned for its ethical and technical deficiencies at the time.
Edit: to clarify, I'm also interested in why you think all you say in your comment is true. The sources you cite either do not support your claims, or are disreputable like the deep gaydar paper [edit: or they are irrelevant like the sources about the training of border agents].
For example, I quote from the Wikipedia article on the plethysmograph:
>> 1998 large-scale meta-analytic review of the scientific reports demonstrated that phallometric response to stimuli depicting children, though only 32% accurate, had the highest accuracy among methods of identifying which sexual offenders will go on to commit new sexual crimes.
32% accuracy means those tests are incapable of detecting whatever they're looking for. Even if other tests are worse. My dowsing rod is better than my crystal ball at finding water, but that doesn't make it accurate.
No. This is what the Wikipedia page says for measuring sexual response in pedosexuals:
> In one study, 21% of the subjects were excluded for various reasons, including "the subject's erotic age-preference was uncertain and his phallometrically diagnosed sex-preference was the same as his verbal claim" and attempts to influence the outcome of the test.[28] This study found the sensitivity for identifying pedohebephilia in sexual offenders against children admitting to this interest to be 100%. In addition, the sensitivity for this phallometric test in partially admitting sexual offenders against children was found to be 77% and for denying sexual offenders against children to be 58%. The specificity of this volumetric phallometric test for pedohebephilia was estimated to be 95%.
> Further studies by Freund have estimated the sensitivity of a volumetric test for pedohebephilia to be 35% for sexual offenders against children with a single female victim, 70% for those with two or more female victims, 77% for those offenders with one male victim, and 84% for those with two or more male victims.[30] In this study, the specificity of the test was estimated to be 81% in community males and 97% in sexual offenders against adults. In a similar study, the sensitivity of a volumetric test for pedophilia to be 62% for sexual offenders against children with a single female victim, 90% for those with two or more female victims, 76% for those offenders with one male victim, and 95% for those with two or more male victims.[31]
> In a separate study, sensitivity of the method to distinguish between pedohebephilic men from non-pedohebephilic men was estimated between 29% and 61% depending on subgroup.[27] Specifically, sensitivity was estimated to be 61% for sexual offenders against children with 3 or more victims and 34% in incest offenders. The specificity of the test using a sample of sexual offenders against adults was 96% and the area under the curve for the test was estimated to be .86. Further research by this group found the specificity of this test to be 83% in a sample of non-offenders.[32] More recent research has found volumetric phallometry to have a sensitivity of 72% for pedophilia, 70% for hebephilia, and 75% for pedohebephilia and a specificity of 95%, 91%, and 91% for these paraphilias, respectively.
These systems work! And, while scary, or invasive, or not 100% accurate, this is no argument to reason that they don't.
There has been no peer-reviewed paper calling in question the gaydar paper. There has been a master student who tried to replicate the study with his own crawled dataset, and got better than human guessing, but slightly below the paper accuracy. News outlets ran with that to say that the study was flawed. Another was by a Googler who claimed that the neural net solely looked at eye shadow or glasses, but he also got better than random and human guessing on his own sanitized dataset, and, one could argue that eye shadow and glasses are fair game when classifying from a face picture, as they are included in the picture, and these pictures were also shown to the human evaluators (even ground).
The next web article is by a journalist with a history degree, not an ML scientist. But based solely on the merit of his arguments, he also agrees with the results of the paper:
> there’s nothing wrong with the paper and all the science (that can actually be reviewed) obviously checks out.
and seems to take more issue with the ethical considerations, binary sexuality, and builds his point around: humans have no functioning gaydar at all, so it is insignificant that a neural net could beat a coin flip. His point is weak, as he gives no evidence for humans lacking a gaydar, and the paper (which was not wrong as claimed) includes human assessments which are higher than random guessing.
I think my contrarian view is true from mere pragmatism: Israel has the best airport security in the world, and uses these Suspect Detection Systems extensively, seemingly constantly improving and making enough profit for new players to enter the market. AKA the people that actually do this for a living keep innovating on it, and I find that rather unlikely if all of this is tea leaf reading.
I think, in general, that the HN crowd overreacts when it comes to controversial tech, and that a simplistic "this does not work, and is a sham, and fraud to take research money" is an uninformed weak claim. It takes a lot of chutzpah to denounce the many months work of legit scientists as obviously flawed from behind your keyboard when one probably has not even read the full paper. The authors, by picking such a controversial topic, are partly to blame for this pushback and popular media reporting, but that does not make it right.
I will not defend the use of plethysmograph and eye tracking studies to measure a sexual response. Just claim that it is better than random guessing, it allows for better treatment when measurements are out of line with self-reports, and that it is still in use and very similar to the Fruit Machine. The Fruit Machine is already back.
> My dowsing rod is better than my crystal ball at finding water,
This I do not get what you refer too (I know you as a ML knowledgable person from your other comments, so I am afraid to assume things, but if your crystal ball is random, and your dowsing rod is better than random, you are succesfully doing predictive modeling, no, not a sham? [1]). These systems do not need extremely high accuracy, if they do not auto-deny a person, and it is changing the goal posts a bit to demand accuracy when better than random guessing has been demonstrated (which is questioned by the majority of the commenters here).
> or they are irrelevant like the sources about the training of border agents
User kindly requested sources for all of my claims. I claimed this and sourced it. My point was that we already have human Suspect Detection Systems in place, so either those must go (you have a fundamental problem with SDS's) or they can't be automated (because you don't trust AI research or believe these systems need common sense problem solved first). I could then offer counter-arguments to both.
For the question about the eye direction, look at the sourcing for telltale signs of lies I posted in reply to another commenter. It depends on if you are left- or right handed.
[1] > A concept class is learnable (or strongly learnable) if, given access to a source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent. - The Strength of Weak Learnability
Regarding the gaydar paper, yes, I have read the full paper (if memory serves,
I read two versions, a pre-print and the published paper). At the time, I
wanted to publish a rebuttal, perhaps a letter in a journal or something, but
in the end I didn't think I'd be adding much to the debate and the paper had
been widely discredited already anyway.
My objection with the methodology in the paper was that the authors had
assembled a dataset where the distribution of gay men and women was 50% of the
population, i.e. there were as many gay women as straight and as many gay men
as straight in the data. This was for one of their datasets, the one were
everyone had a picture. There were two more where the distribution was less
even but still nothing like what it's usually estimated to be. This despite
the fact that the paper itself cited a result that gay men and women are
around 7% of the population.
The reason for this discrepancy was clearly to improve the results by reducing
the number of false negatives which are expected when there are many more
negative than positive examples in binary classification.
This from the point of view of machine learning. There were other flaws that
others pointed out, e.g. the choice of metric (I don't remember what it was
now, I can look it up if you like), the premising of the paper on prenatal
hormone theory that is another piece of bunkum without any evidence to back it
etc.
And of course there were the ethical considerations.
Sorry but I don't have the courage to reply to the rest of your comment. You
write way too much.
Rebalancing an imbalanced dataset is common in industry and academicia. You use that when you focus on accuracy, to make claims like: We were 54% accurate on classifying sexuality of females easily interpretable, without needing a distribution-balanced benchmark (you simply know it is a coin flip).
If there is signal in the rebalanced dataset, there should be signal in the imbalanced dataset. If they'd switched to logloss or AUC and an imbalanced dataset, do you think now their results would be as good as random? Because that is what you are implying and you are basically implying the research is fraudulent. This is a very strong claim to make, in the absence of legit discrediting studies that failed to replicate any predictability, and requires more than guessing the authors rebalancing act was "clearly" to improve the accuracy (with 7% negative class, you could get 93% accuracy by always predicting positive class, so if they wanted to inflate the accuracy, they shouldn't have rebalanced).
The ethical considerations are moot/personal opinion, as they passed the ethics board of Stanford. Those are people who evaluate ethics of academic research for a living, or are you saying they were also shoddy and wrong to give this a pass?
Magical thinking is not wanting something to be true, because it would be an uncomfortable truth, and so deeming that something which is objectively true, must be false, so you can continue to think happy thoughts in line with your world view.
You keep talking about the paper being widely discredited, but can't provide a single academic source for this. Instead, you question my sources (business insider?) while posting articles from The Next Web written by a History degree journalist who does not want the concept of binary sexuality to be true, or even allow it in constructing a dataset of gay and straight people by self-classification.
It takes more energy and letters to attack a point than to make a point. You made quite a lot of weak points.
>> Rebalancing an imbalanced dataset is common in industry and academicia. You use that when you focus on accuracy, to make claims like: We were 54% accurate on classifying sexuality of females easily interpretable, without needing a distribution-balanced benchmark (you simply know it is a coin flip).
You quoted The Strength of Weak Learnability and I figured you must have at least a passing acquaintance with computatinal learning theory. In computational learning theory (such as it is) it's a foundational assumption that the distribution from which training examples are drawn is the same as the true distribution of the data, otherwise there cannot be any guarantees that a learned approximation is a good approximation of the true distribution.
The following is a good article on machine learning with unbalanced classes:
>> This is a very strong claim to make, in the absence of legit discrediting studies that failed to replicate any predictability, and requires more than guessing the authors rebalancing act was "clearly" to improve the accuracy (with 7% negative class, you could get 93% accuracy by always predicting positive class, so if they wanted to inflate the accuracy, they shouldn't have rebalanced).
The gay class was the positive class and the straight class negative, in this case. If you did what you say and identified everyone as straight, you'd get a very high number of false negatives: you'd identify every gay man and woman as being straight. You'd get very high recall but abysmall precision. The authors validated their models using an AUC curve plotting precision against recall and such a plot would immediately show the weakness of an always-say-straight classifier.
>> You keep talking about the paper being widely discredited, but can't provide a single academic source for this.
An "academic source", like a publication in a peer-reviewed journal is not always necessary. For example, you won't find any peer-reviewed work debunking Yuri Geller. In this case my instinct is that no reputable scientist would want to get anywhere near that controversy (and that was one reason I also stayed away).
Some of the criticisms are technical, some are from the point of view of ethics. It would be a grave mistake to discount the ethical concerns, but if you prefer technical explanations there is quite a bit of meat there.
Thanks! That article has a lot of critique and I also like that the author collected the responses from one of the authors.
But, to me, most of the critiques seem uninformed (not made by ML practicioners) and focus on the ethics (where I agree with the authors: we need solid research into weaponized algorithms and show what is currently possible by ML practicioners, who may use such technology adversarialy, and can look at reclassifying profile pictures to the same degree as we do information about sexuality, religion, or political preference). By my estimation, most of the critiques are by people who find this research to be threatening to them, their friends, and their sexual identity. That may very well be the case, but it also leads people to conclude the scientific study was flawed and that an automated gaydar can't possibly work. Two replications by scientists who took issue with the paper, and lack incentive to fudge the data or metric to dress up their paper, also demonstrated a better than random automated gaydar. These systems work! (And that poses a problem we can now tackle, where before we did not even know this was possible, and the majority in this thread still thinks it is all bunkum).
Many statistical assumptions are regurarly broken, for pragmatic reasons (it just works better), or because the world is not static (and so the IID assumption is broken). There is an entire subfield of learning on imbalanced datasets, which includes resampling, subsampling, oversampling, and algorithms like SMOTE. It is common to use these techniques to get a better performance, including on unseen out-of-distribution data. Fraud - and CTR - and medical diagnosis models are regurarly rebalanced for other purposes than trying to break assumptions or cheat oneself into a seemingly higher accuracy. Plus, the signal does not dissapear when training only on originally balanced data. These systems do not work by the grace of a rebalancing trick alone, but they may work better (as usually the case with neural nets, which do not even give convergence guarantees: something only a statistician would worry about).
You can switch negative with positive class and my point remains: if the authors wanted the fraudulenty hack the accuracy score, this is way easier with imbalanced data. AUC metric robust to class imbalance anyway: ranking won't change for unseen data out of distribution, you can just adjust the threshold to match it.
I'd say an academic source is necessary in this case, because you implicitly accuse these scientists of doing shoddy hyped up work, with fudging tricks to appear more accurate. I need more than popular media sources or previous HN discussions to admit this paper was "widely discredited".
Yes, of course many theoretical assumptions are broken- but that is because people who break them either ignore them completely, or deliberately voilate them in order to produce better-looking results. That is more common in industry where it's easier to pull the wool over the eys of senior colleagues, but it's not unheard of in academia, quite the contrary. Anyway, just because people do shoddy work and then report impressive results doesn't mean that we should accept poor methodology as if it was good.
In particular about the gaydar paper, the authors cook up their data to get good results and then use those results to claim that they have found evidence for an actual natural phenomenon (hormones influencing haircuts etc). That's just ...pseudoscience.
You seem to be under the assumption that rebalancing is always bad or ignorant. That techniques, such as SMOTE, are only used to produce better-looking results and pull the wool over someones eyes. This is simply not true. Rebalancing is not shoddy, but accepted practice. It is certainly fair to question it, but not to draw the conclusion of fraud or shoddy science (without making you look pretty silly).
Again, I do not think rebalancing data justifies the conclusion that the authors were cooking up their data to report better results. Take a step back and assume good faith: could there be any other reasons to resample data, other than wanting to commit fraud?
The Google scholar links includes 10+ cited and peer-reviewed papers on the Yuri Geller drama.
I don't know enough about hormone theory to say anything against or for their conclusion, just focusing on showing that working automated gaydars that perform better than average/random guessing exist and have been scientifically demonstrated. I can agree with you on that the connection is spurious, without dropping my point that this controversial technology actually works (rebalanced or no).
> Travelers who are deemed dangerous can be denied entry, though in most cases they would never know if the avatar test had contributed to such a decision.
Jesus fuck. Denied entry because the clowns who thought this face[0] is a good idea inadvertently screwed up somewhere else. Slaves to the algorithm. Disgusting. Takes diffusion of responsibility to a whole new level.
> iBorderCtrl
Why? Seriously. Clowns confirmed.
> A study produced by the researchers in Manchester tested iBorderCtrl on 32 people [...] 75 percent accuracy [...] unbalanced in terms of ethnicity and gender.
I’d bet a decent lunch that these dummies tested it on themselves. It’s like QA-ing my own stuff. That one path I implemented works fine. What do you mean there are other ways?
I’ll add a dessert to said lunch if the accuracy of a representative sample is better than a dice roll.
> this face [0]
> An EU research program has pumped some 4.5 million euros into the project
Kewl. I can think of many ways this could be spent more productively. Like setting it aflame and frying up some bacon.
Isn't that an equivocation/conflation of differently bad things? Shouldn't border control optimize risk management based primarily on wise human intelligence on-the-ground supported by reliable or at least reliability-qualified data that doesn't hassle people any more than is absolutely necessary? For example, in the US (not the EU), the number of secret databases and watch lists that feed into current border control decision-making is absolutely frightening (per whistleblower revelations).
Working as designed: a smokescreen for arbitrary interrogation and denial of groups that border agents don't like. Everybody else that fails the test gets the benefit of the doubt. 75% accuracy is exactly useless, and I find it hard to believe it beats random chance anyway. More racism packed into black box quackery.
This is an horizon 2020 project. This is an EU program that distributes money to groups of companies + research institutions to do some research. The funding is usually a few millions over a few years. The requirements on the delivery are very lax. Basically you can get away with doing nothing useful, after all it’s research and you can’t be penalized for not finding anything, as long you show you did what you said you would try you’re ok. The commission requires the projects to have a validation part, what is described in the article is probably exactly that.
Having worked in such projects the money is sometimes completely wasted. There is a whole cottage industry dedicated in bidding and executing these projects, it survives just out of the research funding and they are expert in meeting the commission requirements instead of being expert in the subject matter. Sometimes I have the feeling part of this money is a way for EU to keep some engineers busy in the poor parts of the continent, some times I feel it’s just soft corruption (reviews are often also reviewees), but for sure some project are scientifically horrible: this one looks to be one of those.
That being said some projects are also nice, for instance Firefox is going to merge a web page translator which was produced by another h2020 project. The program as a whole is most probably a positive contribution to society as a whole. It’s like VC money for public research, many projects are a disaster but some are very good and make the endeavor worth.
Ignoring the fact that this doesn't work, can you imagine if you could accurately detect someone is lying with devices that everyone carries with them?
Just imagine the power that such a technology a totalitarian government could wield using this.
Jethro's phone beeped incessantly to remind him that he had not completed his patriot assessment in the last 24 hours. He groaned internally, and even though he was running late for work he decided that he had better take the 5 minutes to complete the test. He couldn't afford to lose any more social credit points this month.
Opening the app, he could see his face in the top corner of the screen to help him know the camera was centered on his face, not that he really needed this after doing the assessment almost daily since his 10th birthday.
The first question flashed onto the screen, "Have you committed any crimes in the last 24 hours"
I imagine there has been many a time with someone making a hateful speech where they believe every word that they are saying (regardless of whether are right or not).
Politicians are also masters of speech where they are technically correct, they sound like they are saying one thing, but remain evasive/non-committal.
A manager in the border security apparatus with no real-world experience sold their managers on technology solutions to human intelligence problems, and the higher-ups drooled at all the salary and hourly employee money they could divert to their pet projects and pyramidal fiefdoms.
As a Brit, I can confirm that this is most certainly not a well respected institution; the UK has some of the best universities in the world, but this is not one of them.
Well it depends, all the UK universities have strengths and weaknesses. In some cases there are straight up practical reasons. Years ago my university's chip fab caught fire and burned to the ground, because chip fabs do that sometimes, and obviously that significantly reduced the ability to expose students to practical skills in a clean room and so on. But most UK universities don't have a chip fab at all, which means they don't have academic staff who need one, which means undergraduates are also not getting classes with people who have that knowledge at the front.
MMU has a bunch of expertise in subjects like textile making and some practical engineering disciplines, where as I wouldn't necessarily regard some of the UK's more famous universities as good on those subjects.
Textile making isn't an academic discipline, and historically would not have been taught at a university at all. If you want high quality academic research and rigour, you're far more likely to find it somewhere other than MMU.
This is an unfortunate truth, but a truth nonetheless.
It's definitely true that "historically" textile making wouldn't have been taught at a university. Sure looks like a real academic discipline to me though, at least as much as say, law (maritime law being another of Soton's specialities) or indeed electronics. It so happens our culture prizes knowledge of electronics very highly and the development of new textiles much less so, but it could be otherwise.
It sounds like a story we have heard so many times. A sense of false security enforced by fear of terror and a system that does not work correctly. Looks like now it's the turn for European citizens to lose all their privacy and have the remaining of their data to be collected. Very unfortunate.
> A spokesperson for iBorderCtrl declined to answer questions for this story.
Oh, they don't have anything to hide, do they?
Before scrolling down I knew the UK would be involved somehow. They must do anything except admit that their continous meddling (to put it mildly) in the middle east incites retaliation.
That's not entirely fair, it's in the UK it's a "EU research program has pumped some 4.5 million euros into the project, which is being managed by a consortium of 13 partners, including Greece’s Center for Security Studies, Germany’s Leibniz University Hannover, and technology and security companies like Hungary’s BioSec, Spain’s Everis, and Poland’s JAS."
At least they could use something like what has been used in India[1] (though many jurisdictions forbid its usage), at least there is some science behind it. However, because it’s science based and not just a bunch of “woo”, it has the inherent danger of being ”incontrovertible”. Kinda like DNA,but obviously flawed none the less.
My golly - the havoc that beast would wreak if applied to all officials! It would almost be worth unleashing on the public, if perfected from the top down. Though it never seems to work that way. I'm already lucidly envisioning future politicians murmuring very nervously through federally issued mandatory truth helmets.
Realistically, I truly wonder if the consummate versions of such tech won't become the keystone of the future world's pyrrhic utopia. In some paved global labyrinth of potted plants, designer integrity and perfect safety, where lies are only fiction of the past and dissidence is a vestigial dead synapse in the collective mind.
If you’re going to put people through this kind of system, you should have some solid science to back it up and as far as I can see, this is just some bargain basement ML agent.
Unreliable junk like lie detectors are used on purpose. Random reinforcement simply ups the fear the peasants have of the system. Which is the intended result; who cares if a few innocent people get chopped up in the process.
Meanwhile, we aren't far removed from the year when image recognition confused black people and gorillas. Who would ever roll this out thinking that we've made such progress since then that we can tell such nuances as lies or truth? And they still didn't balance the training set with regard to ethnicity and gender?? What a shit show. You can bet your ass that it has higher false positives for some group or other.
- Inaccurate (surely, since real lie detectors are)
- Does more harm than help
- Humiliating
- Will not stop any real threat
- Looks like something of a dystopian film
We shouldn't automate human interactions. We shouldn't automate social stuff. You automate mechanical things which are clear, not the delicate inaccurate job of a human to tell if another human is suspicious or not.
> IBorderCtrl’s lie detection system was developed in England by researchers at Manchester Metropolitan University, who say that the technology can pick up on “micro gestures”
Aren't "micro-expressions" pseudoscience? Furthermore, AI based micro-expression "detector" sounds even more like snake oil.
> IBorderCtrl’s lie detection system was developed in England by researchers at Manchester Metropolitan University, who say that the technology can pick up on “micro gestures” a person makes while answering questions on their computer, analyzing their facial expressions, gaze, and posture.
As an European, I don't even know how that project obtained funds from EU. Even if it "worked", it's the opposite of the future I would like to build.
I suspect this project will never be implemented full-scale, and is only for research on human emotions understanding.
This project should be reviewd by more journalists, because very few Europeans know their taxes are used for that kind of anti-humanitarian projects and they should be informed about it
Released a paper that suggested "75% accuracy on 35 people" with admission of bias in the test set participant make-up.
Sounds like the confidence interval would be very poor. This is exactly like all those other studies that found an effect in a small sample but didn't use basic statistics to realise what kind of sample size they would need to actually prove an effect.
Nor (if I can tell) was there any acknowledgement about the fact that asking normal people to lie and trying to observe a hardened criminal / fundamentalist is not the same. I worked on something with an ex-chief of the flying squad who said that there is a huge difference.
The lied detector was invented by the creator of the Wonder Woman comic after he noticed his wife's blood pressure rose when she was angry. It gained widespread adoption after he made commercials for Gillette to "prove" that their razors were superior to the competition. It's wacky comic book technology that doesn't work. It only appears to work because it tricks people into confessing to crimes.
Why stop at the lie detector? Why not have a block of kryptonite at every airport to protect them from attacks by Zod? Why not give every cop a batarang? It's insane that we still allow our governments to spend our tax dollars on this.
One of the big challenges with software to detect lies is getting good training data. I would be suspicious of anything that wasn't trained using actual data from the particular domain that it was going to be used in. So maybe the best that can be said for this effort is that for the first few years they should ideally report scores that are actually just random (but not tell the actual border agents this, obviously) until they get enough data to train a reasonable classifier. I wouldn't be surprised if this is what is actually being done.
There's no way I'm installing software like this on my computer. They already snoop through your phone when you pass through border control, I'm not assisting them in their data collection efforts by granting them access via an install.
Regardless of how stupid, wrong, and dangerous this is, it or something just as useless will crop up because government can easily justify wasting money when "your safety" is at stake.
While this system is hard to defend, it is still in test and it's possible it will never go into production. Less BS than a polygraph maybe?
> Our reporter — the first journalist to test the system before crossing the Serbian-Hungarian border earlier this year — provided honest responses to all questions but was deemed to be a liar by the machine, with four false answers out of 16 and a score of 48.
He might have answered them truthfully, but if he tried (or even unintentionally acted) to influence the perception of the answers is not known.
Regardless of how good it is, do you really want to be told where you can or cannot travel by a machine? Call me an egotist but I think of myself as better than cargo.
Humans operating lie detecting machines with personal interviews has yet to rise above the standard of 'Mostly used to bullshit and intimidate suspects', but these clowns think they can use facial recognition to detect lying?
What a huge scam. I don't know what conman dreamt this up, but to convince multiple governments to go along with it he has to be good.
Maybe it's time to resurrect the Canadian Fruit machine project and get some idiot in the EU government to finance it. Except instead of detecting 'gayness' it detects 'Terrorist thoughts'.
https://en.wikipedia.org/wiki/Fruit_machine_(homosexuality_t...