> As Google and other tech giants adopted the technology, no one quite realized it was learning the biases of the researchers who built it.
Woah, this came out of nowhere and it’s completely wrong. The problem isn’t that deep learning is picking up biases of the researchers, it’s that it picks up biases from the training data.
Algorithmic bias is not ONLY a data problem, it’s also a model problem [0]. The bias from model developers are coded via learning rates, network hyper parameters and objective functions.
Model bias is not a huge issue. Maybe something about class imbalance or regularization. The huge issue is deployment - what is the model used for? How is it affecting people in reality? What metric is it optimizing?
Between all these the degree of L1 regularization or the class weights are minor things. Most models will perform similarly given the same data. It's mostly the data that makes the difference.
There's an interplay between the two insofar as a model built to handle a specific dataset will involve design decisions informed by the data. E.g. you might pick a certain level of L1 regularization because it maximizes performance on the data you have, which can lead to bias against data you don't have.
But if you take "model" to mean the pure mathematical description without parameters or hyperparameters that need to be determined by experimentation, then I agree that optimizing the model on a dataset will not lead to bias against specific groups of humans unless the data used contains such a bias.
Hyperparameters play a significant role in bias when you're dealing with imbalanced classes, or long tail samples.
But this ties back to the original data problem, right? If you don't have enough training samples for (known or unknown) unknowns, your model is likely to be biased against them.
While this is true, "learning the biases of the researchers who built it" is a very misleading way of putting it, because it is still very unclear if and how certain design decisions impact the bias of the resulting model.
Given that reducing bias while not giving up other desirable properties is a young and open research direction, researchers in general should not be faulted for using the current (imperfect) state of the art or for working on something that is not (yet) focused on bias.
Yes, the optimization goal (the objective function) is a major factor in the function of algorithmic systems. I'm not sure bias is the best word to use here, however.
It is a known challenge to align the designed purpose of an algorithm with actual optimization metrics. For instance, recommendation systems may have the purpose of improving user experience, but if time-on-site metrics are used as the optimization function, there can be unexpected results.
Yeah, but it is indirectly the biases of the researchers. A researcher is more likely to notice and correct for training data problems that conflict with their biases.
This requires him to be aware of each pattern/“bias” in the data, which he isn’t, which is the reason we use the algos in the first place.
Come to think of it, isn’t that an interesting venue for GAN-esque methods to detect the relation of patterns falling into these categories of biases? Or is that recursive problem? If not, put me in the paper :-)
> This requires him to be aware of each pattern/“bias” in the data, which he isn’t, which is the reason we use the algos in the first place.
That's not strictly true. In a lot of cases you start out oblivious to biases in the data, and then when you evaluate the model you notice problems.
But your point about obliviousness to bias is exactly what I'm speaking towards. One might be oblivious to bias that aligns with your own biases, but notice bias that conflicts with it.
People doing ML “research” are not the ones applying it to specific data sets day to day. “I pointed a neural net at our sales data” is not research in the normal sense.
Yeah, I look at the semantic problem with calling it "ML research" and just throw up my arms. These discussions aren't generally driven by people who care about semantics.
That's not the only mechanism. Even if you want to stick to technical issues, it can be biased because you didn't train it long enough or the model is too small. And of course the entire question depends on what the model is being used for.
Everything you say is true, but in context the important point is that the sort of biases ML models pick up are overwhelmingly related to training insufficiencies and are often incredibly difficult to spot unless you already know they exist. For a practical example, see the recent Twitter image cropping oddities (https://twitter.com/bascule/status/1307440596668182528).
The idea (as quoted) that models are routinely picking up biases directly from researchers is complete nonsense.
Right, that's some kind of human interest journalist fudging and it's not true. But bias/surprising wrong answers in ML is obviously a real problem and fixing the data is not always the right answer. You might not be able to tell what's wrong with the data, or where you could get any more of it, and you might be reusing a model for a new problem and not have the capability to retrain it.
I would argue that DNNs would not work as well if they weren't picking up biases. Sometimes we need to learn the biases in order to better detect them.
Even humans need to know about swear words in order to consciously avoid using them, or need to learn about reproduction in order to avoid teenage pregnancies. Not knowing does not make us or the AI better.
For example, what GPT-3 needs is a "conscience", a separate model monitoring and rejecting harmful outputs. If I am not mistaken the demo is already displaying warnings when it goes off into weird places.
"The idea of a neural network dated back to the 1950s, but the early pioneers had never gotten it working as well as they’d hoped. By the new millennium, most researchers had given up on the idea, convinced it was a technological dead end and bewildered by the 50- year- old conceit that these mathematical systems somehow mimicked the human brain."
This is not only false, but in the context actually intentional misrepresentation. Most of the issues with the model was solved by introduction of hidden layers and backpropagation learning, which is at least in my opinion required knowledge in CS since at least early 90's and probably earlier (it is not clear when the idea was formulated in usable form, put the most cited publications are from late 80's, eg. Rumelhart, D., Hinton, G. & Williams, R. Learning representations by back-propagating errors. Nature 323, 533–536 (1986). https://doi.org/10.1038/323533a0).
On the other hand obviously more complex modern approaches to the "throw bunch of poorly understood linear algebra at he problem" problem have value and there is definitive generational shift in the current "AI-anti-winter" (for lack of better word), but still...
I graduated in 2010 with a degree supposedly in "CS with AI specialization" and was taught that neural networks weren't useful for much of anything at the time.
The best techniques we learned were MCMC and random forests, our computer vision was OpenCV and didn't work so well, and there wasn't any suggestion that buying a lot of GPUs and not bothering to understand the problem space would produce better results than our laptops.
Backpropagation was probably the most important step, but there was also the issue of workable training methods for many-layer networks. Fukushima did some very important - but rarely mentioned - work about this starting in 1979.
He trained layer by layer, proceeding to the next only when the current stabilized. The main practical issue were the enormous training and execution times with the computers of the day.
It's mostly right. "Early pioneers" is wrong, since the idea of neural networks predates backpropagation, but by the turn of the millenium they really were regarded as old fashioned, and attention had moved to things like support vector machines and Adaboost.
"Inevitably, the next bid wouldn’t arrive until a minute or two before the top of the hour, extending the auction just as it was on the verge of ending. The price climbed so high, Hinton shortened the bidding window from an hour to 30 minutes. The bids quickly climbed to $40 million, $41 million, $42 million, $43 million. “It feels like we’re in a movie,” he said. One evening, close to midnight, as the price hit $44 million, he suspended the bidding again. He needed some sleep." (So Google paid north of $40m to hire Hinton and his lab).
That all makes $2m a year for Ilya Sutskever, or $600k a year for some run-of-the-mill dude with a degree in AI seems like a bargain. The founders of Nuro got $40m each to leave Google and start their own company. Hopefully we get some more transparency about pay to bring up the whole field rather than just "AI." After all, at the time of writing software tends to have a higher accuracy rate than machine learning ...
As soon as I read this part I figured they’d end up selling to Google, because only PR people would include a defensive aside like this :)
“In the days before the auction, [Microsoft] complained that Google, its biggest rival and likeliest competitor in the auction, could eavesdrop on private messages and somehow game the bids. Hinton had raised the same possibility with his students, though he was less expressing a serious concern than making an arch comment on the vast and growing power of Google.”
Hinton has an incredible press agent, from articles like these you would think only his and a few other minds are actually working on AI. Nothing against Hinton, just wish others in the AI field got as much press, there are alot of incredible things happening in AI the past few years.
Hinton has such a good story that I think it’s was clear from the outset that he was going to be the “face” of AI.
He was on one of the first backprop papers, worked pretty consistently on neural nets though the AI winter, and was pretty instrumental in the renaissance.
Was he the only person working on this stuff? No, but I think we as a society like to tell ourselves a story about science and scientists. The narrative of the “lone scientist” wiling away their days in solitude until reaching a breakthrough probably hasn’t actually existed since the 19th century. Pretty much all significant discoveries nowadays are the product of intense collaboration and iterative progress. Science is a team sport, just one where 99% of the team happens to be invisible.
Also notice how skillfully names of his students are not mentioned - no PR for the unworthy. Of course they are easily discoverable from the published papers.
The lone hero/savior/prodigy archetype is also just an easy narrative to sell, especially to anyone aspiring towards success who would feel inspired or vicariously validated by such a story.
There are countless articles that romanticize entrepreneurs who were child chess champions, or rubik's cube solving geniuses, who dropped out of college, or were academics who were under-acknowledged, only to build it all by themselves and hit the jackpot. At least that's how the story often goes. It almost seems like a sort of modern mythology, one that taps into the American dream that so many yearn for.
$44 million + the many more millions he makes annually hires a good agent. Also I’m sure it’s not unlikely that Google is using their PR contacts to pick the writer that does the profile can yield some favorable coverage.
> It included only two other people, both young graduate students in his lab at the university. It made no products. It had no plans to make a product. And its website offered nothing but a name, DNN-research, which was even less inviting than the sparse page.
Were the students or Hinton under any obligation to stay at DNN-research?
Have the proposed impact of Neural Nets come to light yet? I don’t say this in a snarky way. In my life I don’t encounter them much or at all and thinking about Google specifically their search results don’t seem to be getting better.
They’re everywhere! I can take a photo on my phone and it’s instantly given convincing artificial depth of field. I can search my photos for the name of anyone I know and instantly all the photos containing those people show up. I regularly use voice assistants, translation, and can dictate full length messages while I’m driving in my car. Neural nets can even drive cars now!
This stuff is already making massive impacts, and yet it’s still in the early stages. The meeting in the article was less than 10 years ago. Just imagine in 20 or 30 years from now.
Kind of. Google Images mostly seems to use words on the page with the image. If you search for someone's name, you'll find plenty of photos of other people.
Image search by content, improved language translation, Siri/Alex/Google Assistant, automatic content moderation to name a few.
With that said, I generally feel like my life has not been impacted greatly. Siri can barely tell me anything other than the first paragraph of a wikipedia page or today's weather. I don't search my photos often. I don't translate languages often, and I'm not in a back office in charge of moderation.
Recommender systems, text to speech (and the other way round), probably some problems in logistics like getting you deliveries and keeping the power running.
An ascending auction is strategically equivalent to a 'Vickery' 2nd price auction and logistically simpler, since it only requires one round of bidding.
An ascending price auction is strategically different from a second price auction when the bidders each have a noisy estimate of some underlying true value. With a second price auction, the only information you can incorporate into your bid is your own signal. With an ascending price auction, you can look at the prices at which the other bidders drop out (except the second highest bidder) and incorporate that information into the price at which you're willing to drop out. This also means that an ascending price auction raises more revenue than a second price auction, by the linkage principle.
Evidence by “That left Baidu, Google, and Microsoft. As the bids continued to climb, first to $15 million and then to $20 million, Microsoft dropped out too, but then returned. “.
Microsoft could not have “returned” in a Vickrey auction
Wish I had met Hinton while at U. Of Toronto. I was slated to do a PhD with a guy from the Medical and Zoology schools studying human vision and who was applying NN’s with Hinton’s help. Early ‘90’s when ‘soft computing’ was all the rage.
How would somebody reading the emails game the system? The only effective moves are to pass if the price is too high, or bid the minimum increment if you think it is still cheap enough. Passing when the price is lower than your threshold is obviously losing. Raising more than the minimum has no value.
You could drive up the price by continuing to bid even when past your threshold, but only after an opponent also bids. But first, that does not get you any benefit, it just inconveniences a rival. And second, this will lead to a tie, which presumably has 50/50 odds of you ending up the winner.
I had a similar question about the backpack: they thought they could open it and find out what Baidu’s bidding strategy was. But that knowledge would have had no value to them. The outcome would not change.
This depends on the exact nature of the object being auctioned. The details matter a lot. Some good economists have worked on this problem both in theory and empirically for a long time.
Under independent private valuation (where the value of the object is completely idiosyncratic to you), your thought is (AFAIK) correct. Here think of art: ignoring resale value, how much someone else would pay for an object depends on how much he likes it, but doesn't have anything to do with how you should bid - your bid is a function of how much you like it. (To a first approximation - again the details of the auction format really matter.)
In a common value situation (where the value of the underlying object to any player is the same, but players _signals_ (an input to their beliefs) about that value contain an idiosyncratic component), you might learn something about the quality of your own signal by knowing something about the bids of other players in the auction. For example: are you about to overpay b/c you believe Hinton's time is worth a lot more than Google and Baidu believe it to be worth??
The phrase "the winner's curse" was invented for this problem in the context of bidding for offshore oil leases. The value to all players (the number of barrels of oil in the field) is the same, but they have different beliefs about what that number is. The winner's curse is that the highest bidder was generally the most optimistic about the value of the field. It's a "curse" b/c the optimism was frequently not justified.
What was actually being auctioned here is a little unclear - something like "the time and expertise of Hinton + two grad students", but it's plausible that Google, Microsoft and Baidu could have made equally good use of it. It reads more like a common value setting, that is. So knowing the bids can matter.
If you want to know more about auction theory (this comment may already be more than you want to know!), I can recommend Krishna's "Auction Theory" or Milgrom's "Putting Auction Theory to Work."
Raising above the minimum increment does have value.
If Baidu is currently winning at say $21M, Google might bid $22M if they know Microsoft will not bid in that round, but might bid $22.1M if they know Microsoft will bid $22M in that round. (Or $23.6M if they know Microsoft will bid $23.5M.)
Or suppose Microsoft is willing to go to exactly $22.5M, Baidu to exactly $23.0M, and Google to $40M. If Google can get that information, Google can bid $22.6M and beat both Baidu and Microsoft (if Baidu doesn’t bid this round, because they’re anticipating a $22M bid this round that they’ll cover at $23M next round).
Not sure I buy it. The limits here are more 'go talk to your boss' than 'stop bidding when it gets to $XXMM' for everyone involved except for Deepmind (which was later also bought by teh googs).
Furthermore, execs are creatures of emotion and whimsy, rather than strictly rational operators. As the price soars higher, they see that everyone at the table values the thing highly, and are likely willing to increase their max bid as their sense of FOMO increases... Subincrements aren't going to matter much, except to ratchet up the feelings at the table even higher, faster.
While it seems reasonable to assume there is no advantage in the perfect information scenario, it feels wrong. For large auctions perhaps a pattern emerges in which a rival strategy could be inferred.
Of course in the end it didn't matter! Hinton subconsciously knew he wanted to join Goog. And may have been inadvertently signalling such all along. And the American rivals themselves were probably colluding as well to keep China in check.
I suspect this "history" of Deep Learning was quite dry to begin with. As it mostly featured academics. Some cloak and dagger style suspense was peppered in to make it a more exciting read ;)
Maybe Google really doesn't like Microsoft or vice versa and they would try to out bid each other to prevent the other from winning, but they would both let another 3rd party win.
It is a bit surprising that they didn’t use neutral 3rd party as arbiter, and that they didn’t go ahead with Dutch auction. They would be done in few hours with very close result.
Another surprising thing is that there were no good will clause, ie that they could terminate auction and refuse to work for the winner (which they effectively did).
I see this journalist (Cade Metz) everywhere these days it seems: this article, a NYT article about the Google ethical AI debate, and the article about Slate Star Codex..
Well, we're going to need a new slur for journalists, something which is reminiscent of their rampant hypocrisy, their agendas, and general desire to throw stones without repercussion. "My side's violence is speech; your side's speech is violence" is the new refrain and how I would love to hear coined a word which can adequately encompass that kind of loathsomeness.
Traditional east coast media is losing their grip on narrative formation and they're lashing out at intellectual figures outside of their social group, especially from the west coast indie-intellectual scene (e.g. Scott Alexander).
In this context it would likely be someone who provides intellectual leadership through writing, but is not employed by a large media company like The New York Times.
Scott Alexander is from Michigan. I don't know if blogging is really a West Coast thing, I thought of it as more DC but maybe that's just because of politics.
As someone who once competed with him for stories: Cade is a very strong writer with a robust pedigree who really gets the current state of play in ML, and has the chops to write about it for a general audience. At his best, his articles are able to help non-experts understand the stakes and wonder that make up this corner of the tech world.
The race for AI supremacy started when people started doing CRISPR in their basement. It's not "ideological" for the US to go first, it's the only option.
Woah, this came out of nowhere and it’s completely wrong. The problem isn’t that deep learning is picking up biases of the researchers, it’s that it picks up biases from the training data.