Hacker News new | past | comments | ask | show | jobs | submit login
MIT Researcher Exposing Bias in Facial Recognition Tech Triggers Amazon’s Wrath (insurancejournal.com)
172 points by jonbaer on April 14, 2019 | hide | past | favorite | 156 comments



Models that are used for anything important should be explainable. That is, you should be able to get a definitive answer as to why a particular result was achieved in any particular case. If a model does not have this property, it should not be used for anything critical.

Also, people should have the right to know when machine-learned models are used to make decisions about their lives. They should be able to ask why a particular decisions was made and get that information.

This is real AI ethics.


> Models that are used for anything important should be explainable. That is, you should be able to get a definitive answer as to why a particular result was achieved in any particular case. If a model does not have this property, it should not be used for anything critical.

What's the current standard for anything "important"? Most things important decisions are biased and not explained. Judges have biases and are allowed some discretion when sentencing. I'm sure police officers have biases as well. Same is true with just about any person making a judgement. There are laws (for good reason) that prevent certain types of biases, but it's naive to believe that the status quo on important decisions is great.

A model is better in a number of ways. First it is based on actual evidence. Although that can be manipulated, it's a lot easier to observe control than the life experiences of an individual. And you can guarantee that certain factors won't play a direct role at least by excluding them as inputs. It's much harder to tell a person to ignore some factors.

I think algorithmic, objective decision making on important decisions is very much preferable to what we have now.


I disagree. Having humans as deciders in key positions of society is a feature.

* having judges that live within a changing society is what allows laws to be changed in accordance to what the state of public debate is at a given time. For some it might be too slow or too fast, but it kind of works, if you think over the centuries.

* it is harder to manipulate all judges in the land than changing one algorithm or its goals. The distribution and fragmentation of power is a feature and not a bug. It makes some things weird and others slow, but ultimately it helps making a dictator take over hard.

Any new proposed system must be able to proof it has similar resistance against fundamental change from democracy into dictatorship, while still beeing able to slowly shift where it counts.

As a system the things we have are not bad and in western societies they managed quite well. I think any modern day proponent of alternative systems of governance should realize their hubris in the light of the things we already have working. The stuff we got is not just some service that could be run better, it is a way of preventing us from tearing ourselves and each others appart.

As desireable as algorithmic decision making sounds, it needs to be objective and deterministic. And it needs to be able to factor in circumstance in order to stay humane. But if it does so all the time it will be abused. In the end it would have to produce decisions that are accepted by humans after all.

I think for topics of high importance it would be best to have informed decisions by humans who educate themselves about the matter and bring forth arguments. For smaller decisions algorithms might work, as long as they are transparent


> * having judges that live within a changing society is what allows laws to be changed in accordance to what the state of public debate is at a given time

I agree that the system needs to have a feedback loop and the law should evolve over time. But IMO there are better ways to achieve that than introduce a biased decision maker. You can rely more greatly on legislation or bring in some evolving value system into your model.

> * it is harder to manipulate all judges in the land than changing one algorithm or its goals. The distribution and fragmentation of power is a feature and not a bug. It makes some things weird and others slow, but ultimately it helps making a dictator take over hard.

Fragmentation of power is generally a good thing. But the idea of having many judges as fragmentation of power is not actually fragmented. It would be fragmented if they all heard your case and sentenced you accordingly with some aggregation method. Or if they have a small jurisdiction. But in many cases, one judge (e.g. your sentencing judge), has all the power (ignoring appeals).

Let's assume I wanted to influence a sentencing judge. I could learn his biases. I could hire a lawyer with a good relationship with the judge. I could give him some financial incentive (e.g. kick backs from a private prison or campaign donations). I could manipulate jury selection using racial biases. I could look at the statistical properties of sentencing during different parts of the day and try to manipulate that. Less common in the US, but I could intimidate or threaten the judge.

How would that play out if it was an algorithm? I could hack into the administrating body and somehow retrain the algorithm? I could try to subvert some data scientist working on the algorithms and have them introduce slight biases into the algorithm that would eventually favor me?

I'd take my chances with the human judge. And that may be a good thing, but I don't think humans are harder to manipulate than algorithms.

> And it needs to be able to factor in circumstance in order to stay humane.

Again, I think it's a lot easier to tell an algorithm consider these new factors with the explicit goal of increase/decreasing X.


The word "objective" is problematic here. Normally people mean by this term "explainable by some quantitative criteria". In the case of AI this is technically true, but seems to lack substance. the workings of a neural networks for facial recognition could be decomposed to some god-awful switch statement involving all the pixels of the input picture (after all, this is exactly the structure of the resultant machine instructions), but how would one go about justifying any of the individual branches of such a structure? In other words, could you comment the code? I'm not 100% disagreeing with you, I just think it is problematic to call a statistical process "objective".


Modern facial recognition uses landmarks, much like humans do. First it identifies a face, which is essentially a flat 2-d plane. It's able to identify locations of the nose, ears, mouth, chin, etc. It then looks for relationships between these features (e.g. how wide is the mouth, how far apart are the eyes, etc).

It's a much simpler problem than the general image detection. Imagine how many different types of dogs you can encounter at various angles, sizes, colors, perspectives. For that it's much more of a black box as your model would have to support detecting a dog from behind, side, front, above, sitting down, walking, jumping, standing, dogs with long tails, short tails, long hair, short hair, etc. Now think about a face. Most faces are very much alike.

https://github.com/ageitgey/face_recognition


AI people have spent a lot of time trying to get systems to reason about the real world. These studies are lumped into the categories of common-sense knowledge and naive physics. Both have not got so far because it's proven hard to encode all the factors that go into real world decisions. Current AI models are a partial take on a part of the real world, this is why you get poor recognition rates if you feed photos taken from odd angles or with unexpected objects in them into deep learned systems.

So the algorithmic objective decisions of current systems are actually partial and selective. We must be very careful not to attribute powers to them that they do not have. They can provide useful tools, but they are not the locus of decision making; that rests in the place that they were created, and may be distorted by either accident or design.


You know what AI systems had the ability to explain their decisions? Expert systems.

Expert systems were the big success of GOFAI, but they fell out of favour in the last AI winter, at the end of the '90s or so, clearing the way for probabilistic inference and statistical machine learning.

Since then it seems, we took one step forward (with accuracy in classification) and one step back (with the loss of the ability to explain decisions).

Who knows, maybe a new AI winter will wipe out the statistical machine learning dinosaurs of today and leave a clear field of play for the AI Mammals of tomorrow.


I work at this company:

https://en.m.wikipedia.org/wiki/Cyc

It's a survivor of the AI winter, and what we have is basically a generalized expert system. Every conclusion incorporates cross-domain knowledge and can fully explain itself. It's been a long, slow trudge of research and development over the past couple decades, but we're starting to poke our heads out and ride the current AI hype wave. We like to say that Watson is our marketing department.


> We like to say that Watson > is our marketing department.

Did you forget a /s there? For me the marketing around Watson, with the unrealistic and not scientifically backed assumptions about the state of AI they put out to the general public, epitomize what's wrong with how the world sees our field.

More on topic: the underlying problem with the original post here seems to be one of selection bias in the training data. Somewhat of a remnant of human decision making that, likely unintentionally, ended up in creating biased decisions in the end. While the system you linked seems to potentially create nicely decomposable decisions, is there anything that inherently prevents "bias" in what it learns? The approach seems to face many of the same problems modern ML systems in the Linked Data / Semantic Web space would given unrestricted learning from the web.


>> Did you forget a /s there? For me the marketing around Watson, with the unrealistic and not scientifically backed assumptions about the state of AI they put out to the general public, epitomize what's wrong with how the world sees our field.

IBM's marketing Watson as a medical application is one thing. The system itself is quite another. And the system itself remains the most advanced NLP system created so far.

Note that I say "system". Most NLP research consists of testing the performance of various algorithms against very specific benchmarks, but very little work goes towards creating a unified system that can integrate multiple NLP abilities. Watson on the other hand is exactly such a unified system. It integrates statistical machine learning, symbolic reasoning (frames, fer chrissake, frames! In this day and age!), pattern-matching (with Prolog) and so on and so forth. Far as I can tell there isn't anything like it anywhere - but of course, I don't know much about what Google, Facebook et al are doing internally.


Sure, I specifically meant the marketing though. I kind of get why they do it but I'm not sold on the idea that it's helpful to present AI as an infallible silver bullet.

In terms of systems I'm not sure why IBM would stand out, Microsoft, Amazon and Google all offer relatively coherent pipelines for the engineering side of NLP and all of those are backed up by accomplished research teams. I'm sure you can find everything from total horror stories to beautiful testimonials for all of these platforms.


Unfortunately it's difficult to tell what Google etc are doing with their AI systems, but I get the feeling that Google and Facebook in particular have a culture of focusing on end-to-end statistical machine learning to the exclusion of everything else. I base this intuition on the way google translate is trained end-to-end and without any attempt to integrate grammatical or other background knowledge (which causes it to make mistakes that a rule-based system would easily avoid, e.g. getting the gender of nouns wrong when translating between two languages that have gendered nouns via English, that doesn't, etc).

I think these two companies in particular would be very unwilling to design and build a system like Watson, integrating symbolic techniques alongside statistical ones. They probably recruit so much for statistical machine learning skills that they don't have the know-how to do it anyway.

The funny thing is that, like many large corporations, they probably have ad-hoc expert systems except they don't call them that. Pretty much any sufficiently large and complex system that encodes "business rules" is essentially an expert system. But, because "expert systems failed" companies and engineers will not use the knowledge that came out of expert system research to make their systems better.

That AI winder really got us good.


The joke may not have translated the way I wrote it; IBM makes promises about Watson's intelligence that it can't live up to, and we've had clients in the past who've come to us because we can do (some of) what Watson advertised but couldn't deliver.

The main way our system would combat bias is by shedding light on it. No human-built system could be completely impartial, but ours will say "I decided X because of A, B, C, and D", and if people decide that C is biased then that piece of knowledge can be adjusted accordingly.


That makes more sense than what I read into it, apologies for that (and glad to hear that since I've had similar discussions). Thanks for clarifying.


On an intuitive level, this actually seems like a more promising approach than "from scratch" learning systems if we want our AIs to "understand us".


I know about Cyc, of course. Godspeed [and speed us away] :)


This seems like a fundamentally flawed and ultimately doomed approach to creating general agents. It's like you're trying to build a house, except only one molecule at a time.


The intent is to seed it by hand with enough knowledge that it will be able to do true NLU and learn new knowledge itself from the web. It's already shown some promise against these sorts of problems, which ML really struggles with:

https://en.m.wikipedia.org/wiki/Winograd_Schema_Challenge


That doesn’t really make any sense though. Maybe I’m not fully understanding what you’re saying here? Can you clarify how exactly you would turn a connected graph of data points into something that can learn arbitrary new knowledge and assimilate it into its understanding?

Because it seems like currently, this process is an entirely human driven manually done task wherein you have people making the decisions on how to connect and add new relations/knowledge to each other.


What you suggest, using previously acquired knowledge to learn new concepts, is par for the course in my field of study, Inductive Logic Programming. ILP is a set of algorithms and techniquest that learn logic programs from examples and background knowledge, where the examples and BK are themselves logic programs.

Initially the BK comes from some existing source- it can be a hand-crafted database of a few predicates deemed relevant to the learning task or a large, automatically-acquired database mined from some text source, data from the CYC Project of course, etc. In any case, because of the unified representation, learned hypotheses (the "models") can be used immediately as background knowledge to learn new concepts.

Edit: I don't know if the Cyc project uses ILP. But what the comment above says is doable.


In our current NLU experiments we use an off-the-shelf open source parser for the initial parse (Parsey McParseface, from Google). We then take the grammatically-tagged words and our system maps them to concepts it knows about. It theorizes about multiple possible interpretations of the sentence (pronouns in particular), judging their likelihood using real-world knowledge. A Winograd Schema can be solved because Cyc knows things about the concepts; it knows that a dog has four legs, and what it typically weighs, and that it's a mammal (and shares further traits of mammals), and knows that it couldn't interbreed with a cat because they're different species and the act of procreation requires two organisms of the same species. So far this information has been hand-encoded, but this has been going on for decades, and we have a lot now. Going past that:

1) We're able to map outside data from a DB into Cyc's knowledge format, rather than hand-encoding it. This knowledge is inherently not as rich as the rest, but it can obviously be useful anyway.

2) At some point we hope to reach a critical mass of knowledge that will allow Cyc to simply "import" a Wikipedia page by parsing and understanding it. It will interpret a given sentence into its own understanding, then assert it as true and do reasoning based on it down the line.


An expert system will never be able to process or understand new things unless given explicit solutions.

So you’re basically saying that it is possible to enumerate and solve for everything possible by manually iterating through every single possible edge case that exists. There’s a reason why Cyc has spent over 30 years and only has been able to get this far. You’re fundamentally limited by human constraints. The only realistic way of achieving a general purpose learning system is by teaching a system how to learn and then letting it figure things out on its own. Patently some method involving reinforcement learning.

By the way, it’s not like RL is some newfangled thing. Much of it started during the 80’s, as it concurrently developed with the other purported method of developing intelligence, which was through expert systems.

If you're interested, I highly recommend checking out a lecture that Demis Hassabis gives talking exactly about this issue: https://www.youtube.com/watch?v=3N9phq_yZP0


>> An expert system will never be able to process or understand new things unless given explicit solutions.

(Not the Cyc person).

Your comment is arguing for an end-to-end machine learning (specifically, reinforcement learning) approach. However, modern statistical machine learning systems have demonstrated very clearly that, while they are very good at learning specific and narrow tasks, they are pretty rubbish at multi-task learning and, of course, at reasoning. For breadth of capabilities they are no match to expert systems that can generally both tie their shoelaces and chew gum at the same time. Couple this with the practical limitations of learning all of intelligence end-to-end from examples and it's obvious that statistical machine learning on its own is not going to get any much farther than rule-based systems on their own.

Btw, reinforcement learning is rather older than the '80s. Donald Michie (my thesis advisor's thesis advisor) created MENACE, a reinforcement learning algorithm to play noughts-and-crosses in 1961 [1]. Machine learning in general is older still- Arthur Samuel baptised the field in the 1959 [2] but neural networks were first described in 1938 by Pitts and McCulloch [3]. Like Goeff Hinton has said, the current explosion of machine learning applications is due to large datasets and excesses of computing power- not because they're a new idea that people suddendly realised has potential.

_______________

[1] https://rodneybrooks.com/forai-machine-learning-explained/

[2] https://en.wikipedia.org/wiki/Arthur_Samuel

[3] https://en.wikipedia.org/wiki/Artificial_neuron


The future of generalized intelligence will not be based in brittle datasets, but purely based off repeated self-play, allowing for the bootstrapping of an infinite amount of possible data.

Rule based systems fail catastrophically the moment they encounter something that has not been codified into their rule set. I point to Chess as a prototypical example with two AI engines, StockFish and AlphaZero. StockFish, a manually created meticulously designed over the course of decades expert system, is handily defeated by AlphaZero, a reinforcement learning based system that trains purely through self-play.

If you look at any of the sample games between the two AIs, you can see a distinct difference in style between the two. In colloquial terms, StockFish acts far more “machine-like” whereas AlphaZero plays with a “human grace and beauty” according to many of the grandmasters that commented on its play. These of course are purely due to the fact that StockFish has certain inherent biases caused by brittleness from its codified rule set, which causes it to make sub-optimal moves in the long run, whereas AlphaZero is free from the constraints of any erroneously defined rules allowing it to do things like sacrifice it’s pieces as a strategy. Meanwhile, because StockFish codes in the value of losing a piece as giving negative points, it inherently has to overcome this bias every time it might choose to make a move in this manner, pushing its search space to find moves where it doesn’t have to be sacrificing pieces which is more optimal under its rule set.


Statistical machine learning models are just as brittle as hand-crafted rule-bases, when it comes to data they have not seen during training. They are incapable of genearlising outside their training set.

>> The future of generalized intelligence will not be based in brittle datasets, but purely based off repeated self-play, allowing for the bootstrapping of an infinite amount of possible data.

How will general intelligence arise through self-play, a technique used to train game-playing agents? There's never been a system that jumped from the game board to the real world.


OpenAI's GPT-2 language model (build using a deep neural network) recently achieved 70.7% accuracy on the Winograd Schema Challenge, which is 8% better than the previous record[1].

I've never used Cyc but I have used OpenCyc and I'm familiar with some of the applications of Cyc. It's interesting when it works.

[1] https://openai.com/blog/better-language-models/


Well, there are some statistical learning models that are explainable. Decision trees can be induced and can "explain themselves", just like in expert systems. Additive models are self-explanatory.

Not sure I'm sold on LIME and other similar approaches, though. Seems like a lot of deep learning people are all too happy to substitute "intepretability" for actual explanations.


Decision trees are not a statistical model. The learning method is statistical, if you will, but the model itself is a disjunction of conjunctions- a symbolic model (and more specifically, a propositional logic theory).

Decision Trees are in fact an example of the early years of machine learning where the trend was towards algorithms and techniques that learned symbolic theories. I believe the effort was driven by the realisation that expert systems had a certain problem with knowlege acquisition [1] which drove people to try and learn production rules from data.

I digress- I mean to say that decision trees are explainable because their models are not statistical.

To be honest, I don't know much about additive models.

_______________

[1] https://en.wikipedia.org/wiki/Knowledge_acquisition


Yup. That's what I learned first. Mycin being the first, success story I read about.

https://en.wikipedia.org/wiki/Mycin

I tried some quick DuckDuckGo-ing to see if anything new turned up. Drowned out by unrelated stuff. I did find what looks like a great, quick overview of expert systems for folks unfamiliar with them. Might be new, default link I share about how they historically were perceived. What you think of this one?

https://www.tutorialspoint.com/artificial_intelligence/artif...


The tutorial is a bit old.

I think that advances in probablistic reasoning & modelling, such as practical Bayes networks should be included, and the mechanics of resolution have improved massively with the introduction of answerset systems - this gets over the problem of commitment that kiboshed gen 5.

https://en.wikipedia.org/wiki/Answer_set_programming

https://en.wikipedia.org/wiki/Probabilistic_programming_lang...


Can you explain how you, as a human being, understand handwriting? Can you explain it in detail without handwaving some stuff away as "now I detect the pattern"? Because I'm pretty sure you can't, but you use it for basically everything in life. So, should a person's eyes not be trusted when it comes to "something important"?


>So, should a person's eyes not be trusted when it comes to "something important"?

A documentary called "Murder on a Sunday Morning": https://www.youtube.com/watch?v=LFLbptkb1eM

A woman was murdered in front of her husband. The man saw the attacker up close and was the only eyewitness.

He accused a completely innocent man, and it took half a year of jail and court-related turmoils for this to get cleared up.


>Can you explain how you, as a human being, understand handwriting?

We explain how to understand handwriting to every single child that goes to school. It's not the answer you want, but it's the answer that actually matters here.

Trying to equivocate AI and human cognition in this way is completely disingenuous.

Human reasoning is not 100% reliable, but we know very well in which ways it's unreliable and how to deal with it. We have shared biology and millennia of experience trying to empathize and communicate with others.


We do not explain how to read handwriting, we teach by example - and that is exactly how ML works.

And your last point is wrong. ML models are studied and understood much better than human reasoning.


>We do not explain how to read handwriting, we teach by example - and that is exactly how ML works.

Teaching children to read is an interactive process that has pretty much nothing in common with data steamrolling in modern machine learning.

>And your last point is wrong. ML models are studied and understood much better than human reasoning.

Is that why new ANN architectures are almost universally constructed by trial and error?


But if you were to present handwriting as evidence in court, if the veracity of the evidence were in question, you would need to present an expert whose job it is to explain the methodology behind the analysis.


Given that eyewitness testimony is consistently unreliable, biased, and internally inconsistent, no, they shouldn’t. Human judgement is the best we can do so far, but it is hardly a worthwhile goal, especially with systems that take us from affecting individuals on a person-to-person level to influencing the interactions of millions of people with their government.

Algorithms can be explainable, and I agree that anything that affects your standing in the eyes of government should have that as a minimum requirement.


The best work on this was done by Brendan Lake and Josh Tannenbaum. https://www.sas.upenn.edu/~astocker/lab/teaching-files/PSYC7...

There's a recent review that they published.

https://arxiv.org/pdf/1902.03477.pdf


> Can you explain how you, as a human being, understand handwriting?

I do not want to speak for the parent but I think you might be on a different level of abstraction when talking about "explaining" things. Consider this for example: someone writes a letter to your boss that you should be fired. That someone does not need to explain the process of writing but probably should be required to explain why you should be fired.


This is such an interesting topic and I really don't know where to stand on it. On one hand, this is an argument that systems such as laws, lending practices, etc. should be based on repeatable and explainable rules. Rules will end up with some bias, but the bias can be dissected and argued about until we reach some consensus that the bias is acceptable/desiresble. In the case of lending, maybe we consider credit history a fair input metric but zip-code an unfair metric, for example.

On the other hand, systems such as laws and lending, which are inherently about social interaction, do tend to have some unspecified human element. Judges interpret the law and apply it to specific cases, underwriters have latitude to make exceptions under certain vague circumstances, teachers may regrade a paper upon realizing they misspoke in a lecture. This is a feature, not a bug--if our social systems have no room for empathy then there is a big problem.

So now that AI is "unexplainable", how is this worse than the unexplainability of human systems? You can ask a human why they took some action, but their explanation tends to be incomplete, wrong, a lie, or maybe they don't recall. I once was given the opportunity to lease a top-floor apartment in my building because I had a daily chat with the receptionist. All fair housing laws were followed, I just happened to be the first to know because I had such frequent interaction. If you were to ask the receptionist why I got the apartment she probably wouldn't say "zwkrt tells me bad jokes and we share pictures of our pets", but that probably is part of the reason.

I think fundamentally we understand that humans share a way of life and a set of ineffable values, and we believe that computer-generated results do not share these values. Unfortunately I do not have a conclusion, just a formulation of a problem.


The present systems have more attributes than “they somehow work, but slow”. Think about separation of power and decentralization. Slowness is a feature here. If you want to radically change a society where decisions are carried out by any number of individual actors, you need to take them with you by either using military force, money or convincing them otherwise. In any case this is a project that could also fail. With an algorithm, you could change it today and it would be in action without many even noticing it.

I could get behind a simple and transparent tax system, where you just see in realtime what money you give while beeing sure that big company has to do the same, without gaming the system.

But I am not sure the system that decides on these rules should be another system.


The dispute here is about gender classification. If you can give a consistent explanation of how your own visual system works for this task, then perhaps one day we’ll be able to program a computer to do the same.


I can't, but human visual perception can be a subject to introspection to a high degree, and can be top-down overriden. That is, I can name at least some of the reasons why the person I'm looking at looks male/female to me, and in cases that are not obvious, I can step what I see through high-level, explicit reasoning, and the result actually informs my perception.

Also, to the benefit of my perception works the fact that human beings have shared brain architecture, and the way I see stuff is the same as everybody else, so whatever half-assed explanation I could give, it's intimately understood by other people. In contrast, ML models are completely alien to us.


The problem is that most machine learning model choices are humanly indecipherable.

You can understand how the model works and the math behind it but you will be hard pressed to understand the exact path behind a particular choice.

Decision trees are one exception to this. Most humans can understand those.


Do you put the same requirement on natural neural nets?

Is the decision reached by human members of a jury, for example, explainable in the way you mean?


>> Chris Adzima, senior information systems analyst for the Washington County Sheriff’s Office in Oregon, said the agency uses Amazon’s Rekognition to identify the most likely matches among its collection of roughly 350,000 mug shots. But because a human makes the final decision, “the bias of that computer system is not transferred over into any results or any action taken,” Adzima said.

The gentleman is saying that the system selects a small subset of the 350,000 mugshots, but because a human selects one mugshot from this small subset, there is no bias.

That just makes no sense.

[edited to remove stronger language]


It does make some sense. Yes, technically, the bias is still there, but it's lower. Getting some accuracy on image detection isn't that difficult, but making it more and more accurate is. Going from 50% accuracy to 51% is much easier than 98% to 99%.

Of course, with the error rates reported on certain groups a human making the final decision is not enough.


The problem is a human would at best reduce but not eliminate the bias.

Let’s assume the photo was not a match for the database. Now a human doing the final step is essentially pick a random face from a biased sample.

Worse, people are really bad about assuming whatever option they are considering to be far more likely than the base rate. Read up on an random obscure disease and suddenly you start thinking it’s a real risk. Sadly, police do the same thing with criminal suspects resulting in innocent people in prison.


"Her tests on software created by brand-name tech firms such as Amazon uncovered much higher error rates in classifying the gender of darker-skinned women than for lighter-skinned men."

Besides the article's obvious anti-man, anti-white bias, I'm not surprised that facial recognition software has a hard time analyzing darker skinned people.

With photography, I've always had a hard time photographing someone who had dark skin. Lots of light needs to be used, and even then it needs to be filtered correctly, etc. This goes for any subject that is dark.

Unfortunately, this is going to be a tough problem. Cameras used by law enforcement and government agencies, (which the article seems to focus on) are normally pretty shitty; software can only do what it does with whatever input it gets. So if the lighting and image quality is shitty, then your results will be as equally shitty.

The article doesnt go into what kind of equipment the MIT researcher was using, but I will assume that it is a high quality camera. If so, and if the software is still failing as the article alludes to, then yes, these companies need to make their software better.

Even so, it's a crapshoot from the get-go, due to the hardware being used.


I did not notice an anti-man or anti-white bias in the article. What stuck out to you in the article that seemed biased?


> With photography, I've always had a hard time photographing someone who had dark skin. Lots of light needs to be used, and even then it needs to be filtered correctly, etc. This goes for any subject that is dark.

The issue does come down to a mix of (likely) inadvertent racism and physics.

It is very difficult to photograph people with darker skin. This is intentional, per Mother Nature. Very briefly, our atmosphere passes a certain bandwidth of colors to the ground. The peak frequency is in the yellow-green (this is why plants are this color usually). Unfortunately, one of those colors is in the UV frequency that just so happens to be the same frequency as the small channel of eukayotic DNA. Thus, cancer risk is raised when that UV photon energetically breaks up the DNA. So, mother nature evolved to co-opt the melanin molecule, which happens to absorb those UV photons, to help protect skin cells from this damage. Thing is, though, that melanin molecule also absorbs not just that UV color, but a lot of other ones. PhysicsGirl has a good video on how sunscreen works and what a freckle looks like under different wavelengths [0] that is informative here.

Now for the (inadvertent) racism. When they were first starting to make photographs, just about any photo sensitive material was 'good enough'. These, of course, were all in black and white. But, the frequencies/colors that these chemicals were photosensitive to, were much more than we can see with our eyes. It went from the low IR to the high UV, with all kinds of various notches and mixes in there. Tin-type photographs are a great example. During the US Civil war, there were a lot of pictures taken of slavery and 'colored' soldiers. It's not hard to distinguish the faces and features of these people [1][2]. One thing to note about these tin-types, look at the clouds, rather, the total lack thereof. That's an easy way to notice that the frequencies you are seeing in the photograph are not the frequencies that your eyes see.

When color photography started to become more widespread, the frequencies used for each color (CYM) and the absorptive bandwidth of each part of the color films were not accidents. They were chosen such that attractive young 'white' women in very heavy make-up and under very bright 1930's studio lights would be best seen. These color choices, due to the inherent racism of the times and other factors, were perpetuated into modern CCDs and CMOS chips we use today. There are important differences in the physics of digital and film color photography, but largely those original choices have been conserved.

One important difference is the IR spectrum pick-up. In most modern phones, the front facing camera does not have an IR filter on it. Next time you are near a security camera or at a toilet with those IR sensors, take a look at your phone using the front-facing camera. It should be straightforward to see the little blinking IR LEDs of these devices.

So, though darker skinned people are inherently difficult to photograph due to melanin and Mother Nature, it is not difficult to find the right frequencies that will work. It's a case of the 'lock in' effect that prevents this.

[0] https://www.youtube.com/watch?v=GRD-xvlhGMc

[1] https://www.pinterest.com/pin/385480049327844109/

[2] https://atlantablackstar.com/2015/01/05/7-things-about-the-f...

[3] https://www.pinterest.com/pin/126452702010324184/


Issues are going beyond face recognition and algorithms : photography has a long history of bias [1]. Besides the issue of insufficient testing, I had to ask myself : what if these algorithms are relying on a biased process thinking it was impartial ?

[1] https://jezebel.com/the-truth-about-photography-and-brown-sk...


Can anyone explain exactly what is missing in photography due to "bias"? I'm just an amateur photographer that developed film in high school and plays around with RAW and Lightroom. I can't think of anything in the process that isn't simply a matter of light sensitivity and dynamic range. Film and image sensors don't understand high level semantic features like skin.

The hardest thing to take a photo of is a black haired cat, and I don't think that's because Adobe is biased agains black cats.


The issue is the physical inevitability of performing a political act.

1. There have been instances of photos of black people being postprocessed to make their skin darker. For example, OJ Simpson's mug shot on Time's cover was "artistically interpreted" [1] and attack ads against Obama darkened his skin [2]. Also of Kerry Washington's skin being made lighter [3].

2. Some people interpreted darkening photos as a political move, embodying a belief that dark skin is scary. And lightening the photos as embodying a belief that light skin is more beautiful.

3. Photographers can't keep their hands off the postprocessing tools. And even if they could, they've still got to choose an aperture and exposure when they take the photo.

4. Some people would say, because lightening and darkening black people's skin is political, and photography can't avoid lightening and darkening, photography is inherently political. (and that saying you don't photograph black people is more political, not less)

Of course, some would see that argument as a bit of a stretch. I can understand where it's coming from I just don't agree - or at least, I'm not sure what it means in terms of my personal actions.

[1] http://hoaxes.org/photo_database/image/darkened_mug_shot/ [2] https://www.motherjones.com/kevin-drum/2012/05/obama-street-... [3] https://www.cosmopolitan.com/entertainment/celebs/news/a3610...


The main point seems to be that photography was made by whites for whites, and thus discriminates against non-whites.

What I don't understand is how the article seems to imply that there's something wrong with that. Consider that

- Photography was invented and for the longest time mostly used in a majority white area. - It makes a lot of sense that, in predominantly white cultural spaces, lighter skin is preferred, as it historically indicated that a person didn't need to work in the open, which implied some degree of wealth. - Most importantly, it seems to not understand how light works. Dark surfaces reflect less light, so there's less contrast between strongly lit areas and shadows, meaning it's both harder to make a photo look clear and for an AI to analyze it.


In the days of colour film photography, one of the stages applied was skin tone matching. When trying to set your printer up you printed a fairly standard picture, and checked the skin tone looked right. For a long time nobody thought to do the test with people who weren't white, so often the skin tones of darker people were off - hence the pictures you got developed were more likely to be accurate if your subjects were white.

https://www.npr.org/2014/11/13/363517842/for-decades-kodak-s...


Following the link I posted might be a good start.


Do you really think Kodak didn't want to make a film that could capture dark subjects until furniture companies asked for it?

The reason I commented is because that article makes a lot of complaints that drum up outrage. As far as I know are of the complaints all much better explained by either 1) less light is harder to photograph, or 2) photographing bright things and dark things at the same time is harder than photographing just bright things or just dark things.


The article doesn't mention papers so I am linking them here:

Original study [PDF]: http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a...

New study [PDF]: https://www.thetalkingmachines.com/sites/default/files/2019-...


I read the original paper. Wow, they could have tested for so many factors impacting accuracy but they only looked at mean accuracy for each cluster.


Haven't read the study myself yet, but that sentence alone makes me not want to waste my time on it.


> Her tests on software created by brand-name tech firms such as Amazon uncovered much higher error rates in classifying the gender of darker-skinned women than for lighter-skinned men.

What about lighter-skinned women? This seems to be phrased on purpose to incite bias against white men.


Here's the paper [0]. The answer to your question is in Tables 4 and 5 (page 9). To summarize, darker skinned women are misclassified far more often than any other group, in all three of the tested facial recognition systems. Error rates for lighter skinned women are between 5x and 10x lower, depending on the system. Lighter skinned men have the lowest classification error rates by a wide margin in the IBM and Microsoft systems, and are tied with darker skinned men for lowest error rate in the Face++ system.

[0]: http://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a...


Going by the article, it just seems to be because the researcher is a darker-skinned woman and the software is mainly trained on lighter-skinned men.

"Darker-skinned women were the most misclassified group, with error rates of up to 34.7%. By contrast, the maximum error rate for lighter-skinned males was less than 1%."


sure, but I don't think that's her intention. she's just refitting an old already polarizing topic as a vehicle to troll big tech. polarizing topics like these are like performance-enhancers for messaging if you can weave them into your narrative.


Beyond the very real intersectional problems highlighted . . .

The fact that the Amazon gender classifier miss-identifies 7% of white females as male, but 0% of white males as female is very odd. That seems like a JV level bias tuning mistake. That's not systemic violence so much as it is just sloppy.


Most of the time the headline for these articles seems like it should be "Crap product also biased".

I have some suspicion they haven't focused on bringing down the error rates for minority groups because they know even their best case isn't good enough.


Check out the academic article.[0] It shows that Microsoft and IBM are way way better.

[0] http://www.aies-conference.com/wp-content/uploads/2019/01/AI...


Just a quick thought: women tend to wear makeup, which is rather uncommon among men. That may be one part of the reason why that happens.


True, if the system was only trained on women wearing make-up, but tested on women not wearing makeup, that would definitely create this kind of bias.

That's exactly the kind of thing that a good AI engineer should be looking out for.


Where is the wrath? Seems like Amazon just hand waved away her criticism, they didn't really take any actions against her.


I heard about different research like this about six years ago. So the algorithm wasn't tested on dark-skinned people. Is this really something new? Product developers in a wide variety of markets still neglect to take into account dark skinned peoples when developing their products. Why should it come as a shock that facial recognition product developers suffer from the same bias?


> So the algorithm wasn't tested on dark-skinned people. Is this really something new?

No. Racism in the legal system, and our society overall, is not "new." However it's still a real problem in our society. As such, race- and gender-bias that's "accidentally" hard-coded into software sold to police is problematic. Every new product containing this flaw is news.

> Why should it come as a shock that facial recognition product developers suffer from the same bias?

At the very least, it should come as a shock that one of the biggest software development firms in the world is attacking critics in defense of buggy behavior. Especially since, as you note, this phenomenon is common knowledge and there's a simple and well-known workaround.


Well, here's the question though: should she be naming and shaming the development company or instead the organization that uses the technology? I think the responsibility here lies solely on law enforcement as long as Amazon didn't lie. In fact, I would expect law enforcement to test this software thoroughly themselves, because that's their job.


> ...should she be naming and shaming the development company...

Yes, if you're offering a product for sale, it's fair for people to review that product in public. And that's not "naming and shaming," it's a critical review of an irresponsibly-developed product being sold to law-enforcement agencies.

> In fact, I would expect law enforcement to test this software thoroughly themselves, because that's their job.

Yes, that is also their responsibility. But where is the recourse? Law enforcement agencies have abysmal records of self-investigation, and the judicial system is unreliable at best, in holding law enforcement agencies accountable.

The public has a right to know what technology is being used to police them. If you want to call investigative journalism "naming and shaming," then yes, absolutely, she made the ethical choice in speaking out.


I see how this sort of software bias may be an indicator for racism, but you seem to almost equate the two concepts.


From the article:

"Those disparities can sometimes be a matter of life or death: One recent study of the computer vision systems that enable self-driving cars to “see” the road shows they have a harder time detecting pedestrians with darker skin tones."

Just think about the consequences of deploying such systems.


I wonder if that's also true for human drivers. I certainly have a harder time detecting pedestrians wearing darker clothing, at least in some circumstances.


That's why road workers, cyclists, and joggers wear brightly colored reflective vests.


There is also an important ethical difference between dark skin tones and dark clothing.

I see the point you are making, but to the extent that an AI can do better than a human due to the physical aspect, for example, of dark skin reflecting less light at night, I think we should at least try.


So maybe we should not be using opaque ML algorithms for life-and-death situations to begin with? Instead of, you know, hoping that whomever is screwed by them has an active advocacy group in places like MIT.


Using NL for such things hasn’t been working too great, so it seems worthwhile to try for something better.


such as?


Are you asking for examples where NL hasn’t done well or for potential ways to do better?


the latter. It’s relatively easy to say “stop doing X”; it’s harder to say “...and have you considered Y instead?”


I’d hope it was obvious from context that ML was the potential way to do better here.


I read you as saying that it was the thing doing poorly and that some superior object detection method was required. What that approach could be is totally unobvious to me.


Maybe you missed my too-subtle use of “NL” to describe natural learning i.e. squishy meat computers.


The story doesn’t cite the specific paper. But I think I submitted the paper last month:

“Predictive Inequality in Object Detection”

https://news.ycombinator.com/item?id=19379827

https://arxiv.org/abs/1902.11097

I’m not denying the conclusion of the paper, but it does have a lot of limitations. Also, Tesla notwithstanding, isn’t object detection primarily done via lidar?


The computer vision systems could still beat real human vision systems. This could be by absolute numbers crashed into, or by having less bias than humans, or by both.

It's not even a bias problem. It's a lighting, contrast, and camo problem. You could as well claim a bias between asphalt and concrete pavement, based on what skin tones are more visible with the pavement as a background.


Suppose the system does beat real human vision by absolute numbers , but it turns out that it is objectively worse for someone who is not white, and the higher average is because the majority of people in that society are white. Is that acceptable?


> The computer vision systems could still beat real human vision systems.

For pattern recognition, No, the computer vision system can't best a human.


But the entire self-driving system can. It is never tired, or distracted, has far fewer blind spots, and possibly LIDAR and IR to help it.


It can’t today, and until it proves to do so making claims about future tech in relation to present policy is a bad idea.


I'm not saying it does today.

I'm saying it could do so, while still having bias. The existence of bias doesn't make the computer worse than humans.


I saw a talk by one of the authors of the pedestrian detection study that I think they are talking about, and their results did not seem support the scary racist self-driving car hypothesis that the study clearly implies (only marginal/statistically insignificant effects, with very little controls for confounding factors (I believe they controlled for bounding box size or something, but very little else)).

With that said, fairness in ML/AI is a real problem, and some people are doing some really good/important work in this area. I am not familiar with Buolamwini's work, but I'm much more inclined to believe disparities in facial recognition than pedestrian detection where very little skin is visible and almost everything you're seeing is clothing.


"Just think about the consequences of deploying such systems."

Those systems are already widely deployed - they're called 'humans'. All vision systems are going to be more likely to confuse like colours.

This problem is quite a bit different from the 'mugshot' problem, or a problem wherein there was a bias or lack of training data for certain samples.

The author I think conflated a lot of the issues and just boiled it down to 'evil AI' which I don't think is the right thing to do.

Issues of ethnic orientation are quite a bit different from systems that have trouble literally due to the colour of something.


No-one is saying it is a shock - we're saying it is wrong and should be fixed.

As for Amazon's self-serving response, it should be given the same degree of respect as any other statement by an entity that is not prepared to discuss it in an interview.


Any product is going be a poorer fit for customers fitting a less common profile than those fitting a more common one. You will never get rid of that. Even if performance between races equalized due to a disproportionate amount of resources being expended on adapting the product to fit those belonging to minority racial groups, performance between unusual and common facial types or minority and majority ethnic groups within the races would exist.

There will always be some minority group that the product fits less well with, because you can divide people into an infinite number of groups based on an infinite number of traits.

This is a consequence of R&D resources being finite and the complexity of the world being unbounded. It's not sign of a moral failing or mis-placed priorities.


I agree with you. However, without trying to take your points down a slippery slope, your points don't apply to safety concerns. We can't have robo-cars hitting black people because the designers neglected to test the pattern recognition libraries on dark-skinned people. That, you can probably admit, is different from a bar of soap making peoples' skin a bit dry, or lotion not taking into account melanin, or the game designer neglecting the Linux community.


The accident rate should be decreased in the most efficient way possible, to save as many lives as possible. That might not mean targeting R&D resources at reducing the accident rate for one minority group that experiences a higher-than-average malfunction rate.

Here's a thought experiment to illustrate the principle I'm trying to get across: if we find out that members of a very small ethnic group suffer disproportionately from some deficiency in software that does a poor job at recognizing features common to them, and this leads to an extra 50 people dying each year, and that the resources it would take to fix this deficiency would allow the software to be improved to reduce the accident rate by 5%, leading to 5,000 fewer people dying each year, but only 5 fewer members of that ethnic group dying each year, should we target the resources at reducing the accident rate for the ethnic group, just because the group they find themselves in happens to be ethnic?


You did not really have to demonstrate the scope for tendentious arguments, we got that already, but seeing as you raised the issue, human rights is not a strictly utilitarian concept. If it costs a bit more to do the right thing, so be it.


My argument is not tendentious. You're not addressing it. And software not working as well with a particular group as a result of the group having less common features is not a violation of human rights. It's an unfortunate reality of the finiteness of our resources and the limitations of our technology. Expanding the performance of the softare by focusing on how it performs with some arbitrarily selected subset of the group, rather than the whole of it, is misguided.

>>If it costs a bit more to do the right thing, so be it.

In this case, the cost is more lives being lost, and the right thing is only the right thing according to an arbitrary and flawed value system that you are submitting as the ideal.


> ...arbitrary and flawed value system

All ethical guidelines are, in some sense, arbitrary, but the one that says one should try to treat people equally is a pretty good one, and better than any that needs tendentious arguments to make. One of the reasons that it is a good one is that there is, as you have pointed out, an almost unlimited range of variation from which to fabricate tendentious arguments.

Slippery-slope arguments are not, in general, very good ones, but in this case, the number of times that various factions have actually gone down the slope from "these people are not the same as us" to "it's better if we get rid of them" (just in the 20th. century, even) is good reason to stay away from it.

And your argument is, indeed, tendentious -- too much for its own good, in fact. As, in your view, saving the most lives for a given expenditure is the only thing that matters, then all expenditure on facial recognition should immediately be redirected into life-saving activities that save more lives for the buck, of which there are plenty.

But this is moot, because your arguments are just armchair pedantry, devoid of real-world relevance.


>>All ethical guidelines are, in some sense, arbitrary, but the one that says one should try to treat people equally is a pretty good one

If the software has a less than 100% success rate, then it does not treat people equally.

What you're actually arguing is that inequality between ethnic/racial/gender groups is more important to eliminate than inequality between other groups.

And the fact that you keep calling my argument tendentious is thoroughly intolerant and unconstructive.

I see my position as a completely reasonable response to the topic at hand, and just because you have a different view on the issue, you attack my motives.

>>because your arguments are just armchair pedantry, devoid of real-world relevance.

You realize that trying to ridicule and demean people like this is a terrible way to address views you disagree with, right?

Please don't act like this.


Well, I have given you the benefit of the doubt as to your motives here, and I continue to do so, because I will also assume that you think you are making a good argument.

> What you're actually arguing is that inequality between ethnic/racial/gender groups is more important to eliminate than inequality between other groups.

What I am actually arguing is for dealing with bias wherever we find it, and against your position that this is invalidated by possible existence of other biases, no matter how small, and even though they continue to be hypothetical.

As I wrote in an earlier post, "if you have any actual evidence that the other groups you name are being measurably affected, make your data known, so we can make corrections in those cases as well." You have consistently failed to show any real-world evidence for your position. Ethics is primarily a matter of what people do in response to real-world situations, not some sort of hypothetical trolley problem from an intro. to philosophy course.

Your argument is tendentious in the way it arbitrarily takes certain fixed positions, such as the above insistence on not doing anything unless you can guarantee 100% success, and your insistence that maximizing the number of lives saved being the only justification for expenditure, only so far as it justifies (in your view) doing nothing in the case of ethnicity, race and gender, but no further.

It is also rather telling that you seem to think that a noticeable bias with regard to gender, of all things, would be some sort of corner case (not that ethnicity and race are small issues, globally, either.)

As for your excuses, in the other thread, for not showing evidence, well, evidence exists in the cases in question, so I will continue to regard your argument as a hypothetical one.


>>What I am actually arguing is for dealing with bias wherever we find it, and against your position that this is invalidated by possible existence of other biases, no matter how small, and even though they continue to be hypothetical.

The most efficient way to reduce all of what you call "biases", which are simply imperfections, is to reduce the total error rate. Prioritizing the error rate of one subset of the whole will be less efficient at reducing the total error rate, and that's what you do when you identify a disadvantaged group and change the development focus from reducing the total error rate to reducing the error rate for their subset.

>>You have consistently failed to show any real-world evidence for your position.

I've already addessed this argument. Repeating myself:

I can't show real world numbers when the phenomenon in question can't have controlled experiments run on it, and I don't need to show real world numbers to make a case for the logical soundness of a principle, in this case the principle that prioritizing improvement of a metric other than overall performance will generally lead to less overall performance improvements than not doing so.

>>Your argument is tendentious in the way it arbitrarily takes certain fixed positions, such as the above insistence on not doing anything unless you can guarantee 100% success,

What are you referring to?

>>and your insistence that maximizing the number of lives saved being the only justification for expenditure, only so far as it justifies (in your view) doing nothing in the case of ethnicity, race and gender, but no further.

Again I don't know what you're referring to. What do you mean when you say "only so far as it justifies [not focusing on reducing gender/racial disparities]"? You're alleging that my motivation for promoting the objective of maximizing the number of lives saved is to prevent action to reduce gender/racial disparities?


>> Your argument is tendentious in the way it arbitrarily takes certain fixed positions, such as the above insistence on not doing anything unless you can guarantee 100% success,

> What are you referring to?

It is rather surprising that you claim not to follow here, but here's just one example:

>> All ethical guidelines are, in some sense, arbitrary, but the one that says one should try to treat people equally is a pretty good one

> If the software has a less than 100% success rate, then it does not treat people equally.

As your quote is presented as a response to mine, the argument you are making here is that if the software has a less than 100% success rate, then trying to treat people equally is pointless.

> Again I don't know what you're referring to. What do you mean when you say "only so far as it justifies [not focusing on reducing gender/racial disparities]"?

Firstly, rewriting a quote is fraught with problems, especially when the actual words are just as easy to quote, and even more so as you had just quoted them accurately. In this case, they are "only so far as it justifies (in your view) doing nothing in the case of ethnicity, race and gender, but no further."

To give an example, you write "the objective should be to maximize the number of lives that are saved with the resources available", but your position only goes so far as to do nothing about this particular case, and has not been extended to its logical conclusion, which is to redirect all expenditure on facial recognition to more cost-effective lifesaving measures.

>> You have consistently failed to show any real-world evidence for your position.

> I've already addressed this argument. Repeating myself:...

Repeating yourself does not somehow nullify my response to its first appearance, which was to point out that there is good evidence for bias in the case of ethnicity, race and gender, but we are seeing no evidence whatsoever for the sort of confounding problems that you make up in your so-called "thought experiment". Your claim that you don't need to show evidence, because you have an argument in principle, leaves your position wide open to the criticisms of being unrealistic and pedantic.

> Prioritizing the error rate of one subset of the whole will be less efficient at reducing the total error rate.

Given the frequency that the law of diminishing returns is a factor, that is not nearly the given you think it is.

Let's consider some examples of how your point of view would play out. For example, there was recent fatal crash that revealed a corner case in Tesla's vision system, and other crashes that have revealed problems with Boeing's MCAS system. By your argument, it would necessarily be counterproductive to do anything that attempted to mitigate either of these issues specifically. I say 'necessarily' because if it were a contingent matter, then it would be inconsistent with your claim that you don't have to show evidence for your principle being realistic.


>>As your quote is presented as a response to mine, the argument you are making here is that if the software has a less than 100% success rate, then trying to treat people equally is pointless.

You misunderstand. I am saying that a less than 100% success rate is an indication of people being treated equally, by definition. Anyone for whom the software is not working has an unequal experience.

Reducing the error rate reduces inequality. Focusing on the error rate for the whole of the population is a more efficient way of reducing said inequality than focusing on reducing the error rate for a subset of the population.

>>To give an example, you write "the objective should be to maximize the number of lives that are saved with the resources available", but your position only goes so far as to do nothing about this particular case, and has not been extended to its logical conclusion, which is to redirect all expenditure on facial recognition to more cost-effective lifesaving measures.

The rest is implied. It goes without saying. Seeing it as conveying otherwise is an ungenerous, bad faith reading.

>>Repeating yourself does not somehow nullify my response to its first appearance,

I'm repeating a rebuttal to your point, which you have not responded to.

>>which was to point out that there is good evidence for bias in the case of ethnicity, race and gender, but we are seeing no evidence whatsoever for the sort of confounding problems that you make up in your so-called "thought experiment".

I've already addressed the logical shortcoming of your argument, repeatedly. You're simply ignoring the point and repeating what's been rebutted.

>>Let's consider some examples of how your point of view would play out. For example, there was recent fatal crash that revealed a corner case in Tesla's vision system, and other crashes that have revealed problems with Boeing's MCAS system.

Tesla/Boeing do not have a measurable rate of catastrophic error that can be reduced. These corner cases are the entirety of the measurable fatal errors found in the system. That's unlike facial recognition software, which has a measurable "catastrophic" (as catastrophic as errors in facial recognition software care be) error rate that can be reduced.


It is amusing to see you quote large parts of my previous post (while, as we shall see, skipping past some relevant context), and then fail to respond to the points in them. The claim that "I have already answered that" is, of course, often the last resort of the person who does not have an answer and does not want his claim examined further. It is not often used by people who actually did already answer (at least, not without quoting or referencing the specific relevant passage) because it looks so transparently evasive.

In addition, your response to the first quoted passage does not address the issue raised in its original context (which you left out of the quote.) I had no difficulty understanding your point that "a less than 100% success rate is an indication of people being treated [un]equally", but as it was presented, as a response to "one should try to treat people equally" [my emphasis here], it is formally a non-sequitur, but also clearly seems to be saying that anything less than 100% would mean that there is no justification for that policy.

I freely admit that I don't understand (but not in bad faith, which was itself a somewhat bad-faith allegation) your response to the second quoted passage: " The rest is implied. It goes without saying" -- the rest of what goes without saying? I am afraid it does not for me.

Similarly, I am confused by the statement "Tesla/Boeing do not have a measurable rate of catastrophic error that can be reduced. These corner cases are the entirety of the measurable fatal errors found in the system." As these are, you say, measurable fatal errors, then we would seem to have the data to calculate a rate of catastrophic error that has actually occurred, and if they are capable of mitigating the problems without making others worse, it would seem that the rate would go down. In fact, I would be extremely surprised if the word's aviation regulators do not want to see some plausible figures in that regard before allowing 737 MAXs to fly commercially. I don't want to be accused of bad faith again, so I will await your response before continuing this line of analysis further.

I am also confused by why having a measurable rate in the case of facial recognition makes it different with respect to your position, as, up to now, you have been claiming that your argument does not need real-world numbers. As, however, you are now apparently saying that these measurements are available, you will no doubt be able to show that your argument is neither hypothetical nor pedantic, by presenting real-world data.

Curiously there is one issue from my previous post that you did not mention at all in your reply: the confounding effect of the law of diminishing returns.


It certainly seems that there are infinite ways of making tendentious arguments against fixing something that should be fixed.


Resources are finite. The objective should be to maximize the number of lives that are saved with the resources available, not ensure that the number of people who die is equal across racial groups.

I have no problem with more resources being spent to "fix a problem", if that doesn't mean it comes at the expense of resources being spent to fix more serious problems. There is nothing inherently more unfair about members of a particular race suffering from a poorer experience with a particular software program than members of a particular psychological profile or height or distance between-the-eyes group facing a poorer-than-average experience with the software.


And if you have any actual evidence that the other groups you name are being measurably affected, make your data known, so we can make corrections in those cases as well.


That can be trivially done for the reason I mentioned before: you can divide people into an infinite number of groups based on an infinite number of traits. Showing the software is discriminatory thus becomes a pointless exercise of arbitrary human categorization scheme creation.

The objective should be to reduce how many people in total die as a result of the software's flaws. Showing what particular groups are "measurably affected", meaning have traits that the software does less well with, does not provide any valuable information. It ignores the whole to focus on an arbitrarily elevated part deemed more important than other parts. Because if the software fails in 7% of cases with anti-race-disparity development priorities, and 5% of cases when development prioritizes population-wide performance, you are sacrificing more people of other groups to get better results with a favored group.

As for unfairness: we can slice and dice the statistics to show less or more disparity between the average and a disadvantaged group, by selectively manufacturing group categorizations that produce more or less disparities (a group can be anything: people with dark skin, people with wide-set eyes, people with small chins). It's impossible for the software to perform equally well with all people, unless it is perfect. Getting to perfection is more efficiently done by focusing on improving the statistics in relation to the whole of the population, rather than any subset of it.

So to summarize: choosing race or skin-color as the categorization determinant is not objectively any more moral than choosing any other trait, and trying to find groups that are exceptionally disadvantaged is an impossible feat because an infinite number of groups can be created using an infinite number of trait combinations.


Let's see some real-world numbers that show this is a real issue in which your course of action is the only reasonable one.


I can't show real world numbers when the phenomenon in question can't have controlled experiments run on it, and I don't need to show real world numbers to make a case for the logical soundness of a principle, in this case the principle that prioritizing improvement of a metric other than overall performance will generally lead to less overall performance improvements than not doing so.


This is replied to in the other fork of this thread: https://news.ycombinator.com/item?id=19706695


We should it be OK that everyone makes mistake A just because some people did it?


[flagged]


Please don't delete posts just because they are voted down. It only hurts discussion.


[flagged]


> you will get better at recognizing dark skinned people, at the expense of getting worse at recognizing light skinned ones, which generally constitute the majority in the real world

Sure, buddy.


[flagged]


[flagged]


I made no judgment at all. I explained why the algorithms behave the way they do, and that there's currently no real way to make them behave better on darker skinned people without hurting accuracy overall, assuming the data distribution in the wild is not perfectly balanced, which it is not. That's literally all I said. Whatever you make up in your perfervid imagination is just that, your imagination, comrade.


Is this something simple like class imbalance in the training sets? Would be pathetic if that were the case. So easily fixed.


Not the first article about this kind of stuff. Police departments can do whatever they want to obtain leads (as long as it is legal (Gray area)). I will be worried when they will start using technology instead of judges.


That seems like a shortsighted position because it ignores the cost of selective enforcement. Everyone breaks laws on a daily basis - jaywalking, speeding, etc. - and you could have a severely disproportionate cost by enforcing that more for one group than another, even with every trial being completely fair.

Here in DC there was an example awhile back prior to legalizing marijuana: white people apparently used at a higher rate but most of the prosecutions were of black people both due to heavier police presence and because demographics meant that white users tended to have more privacy (limited visibility from the street, more distance between houses/sidewalks to make smell harder to notice, etc.) which made it harder to get evidence clearly showing that a specific person had been the one using. The process could be fair without changing the fact that the results disproportionately impacted one group.


That's not "selective enforcement". Race is not being selected for. People committing crimes in public is being selected for. Or people committing crimes in high-crime areas.

Some races happen to correlate with some selected traits more than others, but race is not the trait selected for.

It's completely predictable that not all traits of interest for law enforcement will be distributed equally across all racial groups. To treat this fact as a sign of systemic racism is to guarantee that you will consider every society on Earth systemically rac/sex/[group] ist.


Selective enforcement isn’t specific to racism — if you have a law which primarily impacts teenagers while other people break it with far lower penalty rates, that’s selective enforcement even if everyone is the same race.

This does commonly fall upon racial lines in countries like the United States with a long history of racial discrimination but it’s not exclusive and it’s important for anyone building systems to consider pitfalls like this because we know the users are likely to assume that a computer is unbiased.


Disproportionate impact is not the same thing as "selective enforcement". Selective enforcement means consciously choosing to enforce a law more commonly when a particular racial group breaks it. It does not members of a particular racial group disproportionately having the law enforced against them because they disproportionately exhibit a particular non-racial trait that is correlated with higher enforcement.


I've been saying this for a long time: selective enforcement of laws is not okay. We need to design laws that are not enforced selectively, but consistently. If it means that jaywalking shouldn't be against the law then so be it.


Eternal debate.


It seems like every other month, some facial recognition system is being attacked because of this.


Does anyone have a link to the paper?


I love this, whether the conclusions are valid or not, using science to troll big tech where they live. ouch, right in the feels.


What does this article have to do with racism and not the failing of the recognition system? It is supposed to be a facial recognition system, it doesn’t work on specific people at a large percentage.. It’s brought to the attention of the manufacturers and should be worked on being fixed.


And when the manufacturer says "nuh uh" like they did in this case?


[flagged]


> The presumption is that this is racism, but is there any support that it’s not caused by physiological differences?

The conclusion is that the system is biased towards a particular race. The reasons WHY it was biased makes no difference to the fact that the system IS biased.


It might do?

There are 5x more white people than black people in the US (roughly). So... it's kind of understandable that any training data set would contain more white people than black people if you didn't filter it beforehand?

I guess theoretically you would have to limit the size of the training data for all groups to the size of the set of training data you have on the smallest group, so that all groups could be represented equally in the training data, but then perhaps you get a far less accurate model overall and it results in greater negative consequences?

Just spit balling here. But it's a complex problem.

The whole 'racist AI' thing screams Conway's Law to me.


If anything, shouldn't groups be represented proportionetly? Else the lack of viable inuit training photos is going to put a real cramp on everything.


[flagged]


Please stop.

> Please don't use Hacker News primarily for political or ideological battle. This destroys intellectual curiosity, and we ban accounts that do it.

> Eschew flamebait. Don't introduce flamewar topics unless you have something genuinely new to say. Avoid unrelated controversies and generic tangents.

https://news.ycombinator.com/newsguidelines.html


I can't help but wonder which is more likely - that the system has a much harder time classifying the gender of darker skinned women because of bone structure and light reflection or because the training set was not provided enough pictures of darker skinned women in contrast to other combinations of gender and skin tone.

I think it's probably more likely the second just because when there is a significant problem in classification of other things for a Bayesian or other type of machine learning algorithm it often turns out that the corpus had poorer examples of the problematic classification in relation to other classifications - that is to say it is a known problem with a predictable result.

But probably should do some studies about skin reflection, that's a good idea.


The researcher said she identified racial bias. The factors you describe are merely reasons why such bias would exist. What’s your basis for saying this researcher is race baiting, hyper partisan, or showing a lack of understanding of anything?


[flagged]


The article contains no occurrences of the term “racism” or “racist.”

Why is there necessarily no problem with a bias if it comes from a physiological difference? If a certain group of people gets falsely convicted of crimes at a disproportionate rate because some computer program misidentifies them more frequently than it does other groups, that is a huge problem that needs to be fixed, even if the misidentification is due to some inherent difference between the group.


[flagged]


Ah, the good ol’ “the real racism is pointing out racism” thing.

I assume you also oppose things like wheelchair ramps and Braille signage, since any problems handicapped people have with stairs or printed signs is due to physiological differences?


Isn't the problem, then, that the software doesn't work well enough with dark skin?


I’ve seen no evidence of it.

What I’ve seen is differential comparisons, eg, comparing the rate of white detection to black detection or the difference in certainty scores on each — but I’d really appreciate it if people could show me the actual certainty numbers on black faces so I can see if it’s failing to recognize, misrecognizing, or just less sure then white faces.


The evidence is literally in the original article: "Darker-skinned women were the most misclassified group, with error rates of up to 34.7%. By contrast, the maximum error rate for lighter-skinned males was less than 1%"

Regardless of whether the higher error rate is a combination of race, gender, or both, it's still a huge issue. Granted, that study was from a year ago, and other companies have since improved their facial recognition systems. But an overall precision/accuracy/f1 score doesn't mean much when accuracy varies that much by group. Sure, you can market it as "accurate on white males", but you can't market it as "accurate"


I've heard people say that sexual dimorphism varies by race. You can find papers when searching for it, anyone more familiar with the space able to confirm the validity one way or the other?


Stature ratios in https://en.wikipedia.org/wiki/List_of_average_human_height_w... say yes. Of course these statistics could be biased by racial prejudice of the people who conducted the measurements, so it would be premature to say that we are not all the same.


Dh


I sure feel privileged that Amazon's software can track and spy on me effectively.


It is amazing to what length those 'activists' go in order to make sure Amazon's product does work better. Do they know that this stuff will be largely used for surveillance, military uses and other nasty stuff (like exposing adult performers on social media)?

If people plan on spending time on activism, there is plenty of real issues with Amazon.


Well, when Boston Dynamics' dog robot walks into your house with a gun on its back, it's about making sure it knows how to differentiate you from the bad guy it's looking for.

We don't want autonomous killer bots to not be able to tell black people apart just because their creators can't.


America would have to become a drastically worse place for any law enforcement agency to deploy an autonomous lethal force that had less than 100% identification accuracy.


It already deploys a manned lethal force with far less than 100% identification accuracy, so what's the fundamental difference? The arguments in courts would be the same - "an honest mistake", "incident due to unfortunate circumstances" etc, wrapped up with a claim that if such broad latitude to make "mistakes" is not given, then society will descend into lawless anarchy. But it would be applied to people who deploy the robots, instead of the people pulling the trigger.

Alternatively, and more likely, measures would be taken to make it not autonomous on paper, e.g. requiring a human operator to approve any action the robot is intending to take. In practice, this would likely be one of those "moral crumple zones" with little practical meaning.


I think several decades from now I'd take the Boston dynamics robot walking in rather than a police officer, because the robot won't have to fear for its life, so it's less likely to use lethal force.


They'll probably never hit 100% accuracy. But, for example, if 5,000 ids are required everyday, then an accuracy of 99.999999% would be probably good enough for a hundred year period.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: