The auction that set off the race for AI supremacy

kortilla · on March 17, 2021

> As Google and other tech giants adopted the technology, no one quite realized it was learning the biases of the researchers who built it.

Woah, this came out of nowhere and it’s completely wrong. The problem isn’t that deep learning is picking up biases of the researchers, it’s that it picks up biases from the training data.

tchalla · on March 17, 2021

Algorithmic bias is not ONLY a data problem, it’s also a model problem [0]. The bias from model developers are coded via learning rates, network hyper parameters and objective functions.

[0] https://twitter.com/sarahookr/status/1361373527861915648?s=2...

visarga · on March 17, 2021

Model bias is not a huge issue. Maybe something about class imbalance or regularization. The huge issue is deployment - what is the model used for? How is it affecting people in reality? What metric is it optimizing?

Between all these the degree of L1 regularization or the class weights are minor things. Most models will perform similarly given the same data. It's mostly the data that makes the difference.

yorwba · on March 17, 2021

There's an interplay between the two insofar as a model built to handle a specific dataset will involve design decisions informed by the data. E.g. you might pick a certain level of L1 regularization because it maximizes performance on the data you have, which can lead to bias against data you don't have.

But if you take "model" to mean the pure mathematical description without parameters or hyperparameters that need to be determined by experimentation, then I agree that optimizing the model on a dataset will not lead to bias against specific groups of humans unless the data used contains such a bias.

baylearn · on March 17, 2021

Citations needed to back your claim. Research literature seem to support the opposite.

tchalla · on March 17, 2021

Is that your personal opinion? Research seems to disagree. Citations in the previous link.

blueblisters · on March 17, 2021

Hyperparameters play a significant role in bias when you're dealing with imbalanced classes, or long tail samples.

But this ties back to the original data problem, right? If you don't have enough training samples for (known or unknown) unknowns, your model is likely to be biased against them.

blackbear_ · on March 17, 2021

While this is true, "learning the biases of the researchers who built it" is a very misleading way of putting it, because it is still very unclear if and how certain design decisions impact the bias of the resulting model.

Given that reducing bias while not giving up other desirable properties is a young and open research direction, researchers in general should not be faulted for using the current (imperfect) state of the art or for working on something that is not (yet) focused on bias.

ageek123 · on March 17, 2021

https://doxa.substack.com/p/googles-colosseum is a pretty good treatment of this topic.

dr_dshiv · on March 17, 2021

Yes, the optimization goal (the objective function) is a major factor in the function of algorithmic systems. I'm not sure bias is the best word to use here, however.

It is a known challenge to align the designed purpose of an algorithm with actual optimization metrics. For instance, recommendation systems may have the purpose of improving user experience, but if time-on-site metrics are used as the optimization function, there can be unexpected results.

cbsmith · on March 17, 2021

Yeah, but it is indirectly the biases of the researchers. A researcher is more likely to notice and correct for training data problems that conflict with their biases.

inductive_magic · on March 17, 2021

This requires him to be aware of each pattern/“bias” in the data, which he isn’t, which is the reason we use the algos in the first place.

Come to think of it, isn’t that an interesting venue for GAN-esque methods to detect the relation of patterns falling into these categories of biases? Or is that recursive problem? If not, put me in the paper :-)

adjkant · on March 17, 2021

I can't tell if watching the switch from they to he pronouns in a post about bias awareness is making me side more with or against your point :)

inductive_magic · on March 17, 2021

Lets just say I made the consicious descision to go with the masculine form :-)

cbsmith · on March 18, 2021

> This requires him to be aware of each pattern/“bias” in the data, which he isn’t, which is the reason we use the algos in the first place.

That's not strictly true. In a lot of cases you start out oblivious to biases in the data, and then when you evaluate the model you notice problems.

But your point about obliviousness to bias is exactly what I'm speaking towards. One might be oblivious to bias that aligns with your own biases, but notice bias that conflicts with it.

visarga · on March 17, 2021

> A researcher is more likely to notice and correct for training data problems that conflict with their biases.

That's not a systematic way of tackling bias. I would rather have invested more in creating good benchmarks and norms.

cbsmith · on March 18, 2021

I would very much agree it is not a systematic way of tackling bias. That's why it creates bias. ;-)

kortilla · on March 18, 2021

People doing ML “research” are not the ones applying it to specific data sets day to day. “I pointed a neural net at our sales data” is not research in the normal sense.

cbsmith · on March 27, 2021

Yeah, I look at the semantic problem with calling it "ML research" and just throw up my arms. These discussions aren't generally driven by people who care about semantics.

astrange · on March 17, 2021

That's not the only mechanism. Even if you want to stick to technical issues, it can be biased because you didn't train it long enough or the model is too small. And of course the entire question depends on what the model is being used for.

d110af5ccf · on March 17, 2021

Everything you say is true, but in context the important point is that the sort of biases ML models pick up are overwhelmingly related to training insufficiencies and are often incredibly difficult to spot unless you already know they exist. For a practical example, see the recent Twitter image cropping oddities (https://twitter.com/bascule/status/1307440596668182528).

The idea (as quoted) that models are routinely picking up biases directly from researchers is complete nonsense.

astrange · on March 17, 2021

Right, that's some kind of human interest journalist fudging and it's not true. But bias/surprising wrong answers in ML is obviously a real problem and fixing the data is not always the right answer. You might not be able to tell what's wrong with the data, or where you could get any more of it, and you might be reusing a model for a new problem and not have the capability to retrain it.

visarga · on March 17, 2021

We should only use models where they work well. Like in architecture, we should only build what will be safe for use.

mrtnmcc · on March 17, 2021

Researchers choose the training data..

inductive_magic · on March 17, 2021

>implying that humans are aware of every pattern (“bias”) in the data, which they are not, which is the reason we use these algos in the first place

audible sigh

psychiatrist24 · on March 17, 2021

It is also exactly the job of DNNs, to pick up biases. That is literally what they are designed to do.

visarga · on March 17, 2021

I would argue that DNNs would not work as well if they weren't picking up biases. Sometimes we need to learn the biases in order to better detect them.

Even humans need to know about swear words in order to consciously avoid using them, or need to learn about reproduction in order to avoid teenage pregnancies. Not knowing does not make us or the AI better.

For example, what GPT-3 needs is a "conscience", a separate model monitoring and rejecting harmful outputs. If I am not mistaken the demo is already displaying warnings when it goes off into weird places.

TheAdamAndChe · on March 17, 2021

I personally don't want infrastructure or tools to dictate the bounds of my morality or the overton window of the whole society.

dfox · on March 17, 2021

"The idea of a neural network dated back to the 1950s, but the early pioneers had never gotten it working as well as they’d hoped. By the new millennium, most researchers had given up on the idea, convinced it was a technological dead end and bewildered by the 50- year- old conceit that these mathematical systems somehow mimicked the human brain."

This is not only false, but in the context actually intentional misrepresentation. Most of the issues with the model was solved by introduction of hidden layers and backpropagation learning, which is at least in my opinion required knowledge in CS since at least early 90's and probably earlier (it is not clear when the idea was formulated in usable form, put the most cited publications are from late 80's, eg. Rumelhart, D., Hinton, G. & Williams, R. Learning representations by back-propagating errors. Nature 323, 533–536 (1986). https://doi.org/10.1038/323533a0).

On the other hand obviously more complex modern approaches to the "throw bunch of poorly understood linear algebra at he problem" problem have value and there is definitive generational shift in the current "AI-anti-winter" (for lack of better word), but still...

astrange · on March 17, 2021

I graduated in 2010 with a degree supposedly in "CS with AI specialization" and was taught that neural networks weren't useful for much of anything at the time.

The best techniques we learned were MCMC and random forests, our computer vision was OpenCV and didn't work so well, and there wasn't any suggestion that buying a lot of GPUs and not bothering to understand the problem space would produce better results than our laptops.

heja2009 · on March 17, 2021

Backpropagation was probably the most important step, but there was also the issue of workable training methods for many-layer networks. Fukushima did some very important - but rarely mentioned - work about this starting in 1979. He trained layer by layer, proceeding to the next only when the current stabilized. The main practical issue were the enormous training and execution times with the computers of the day.

https://en.wikipedia.org/wiki/Neocognitron

QuesnayJr · on March 17, 2021

It's mostly right. "Early pioneers" is wrong, since the idea of neural networks predates backpropagation, but by the turn of the millenium they really were regarded as old fashioned, and attention had moved to things like support vector machines and Adaboost.

vl · on March 17, 2021

AI Renaissance

choppaface · on March 17, 2021

Whole article: https://archive.is/uL2y1

Key Quote:

"Inevitably, the next bid wouldn’t arrive until a minute or two before the top of the hour, extending the auction just as it was on the verge of ending. The price climbed so high, Hinton shortened the bidding window from an hour to 30 minutes. The bids quickly climbed to $40 million, $41 million, $42 million, $43 million. “It feels like we’re in a movie,” he said. One evening, close to midnight, as the price hit $44 million, he suspended the bidding again. He needed some sleep." (So Google paid north of $40m to hire Hinton and his lab).

That all makes $2m a year for Ilya Sutskever, or $600k a year for some run-of-the-mill dude with a degree in AI seems like a bargain. The founders of Nuro got $40m each to leave Google and start their own company. Hopefully we get some more transparency about pay to bring up the whole field rather than just "AI." After all, at the time of writing software tends to have a higher accuracy rate than machine learning ...

Lammy · on March 17, 2021

As soon as I read this part I figured they’d end up selling to Google, because only PR people would include a defensive aside like this :)

“In the days before the auction, [Microsoft] complained that Google, its biggest rival and likeliest competitor in the auction, could eavesdrop on private messages and somehow game the bids. Hinton had raised the same possibility with his students, though he was less expressing a serious concern than making an arch comment on the vast and growing power of Google.”

subsubzero · on March 17, 2021

Hinton has an incredible press agent, from articles like these you would think only his and a few other minds are actually working on AI. Nothing against Hinton, just wish others in the AI field got as much press, there are alot of incredible things happening in AI the past few years.

NYTimes is positively in love with him, are a few other articles - https://www.nytimes.com/2017/11/28/technology/artificial-int... ... https://www.nytimes.com/2016/12/14/magazine/the-great-ai-awa...

Bukhmanizer · on March 17, 2021

Hinton has such a good story that I think it’s was clear from the outset that he was going to be the “face” of AI.

He was on one of the first backprop papers, worked pretty consistently on neural nets though the AI winter, and was pretty instrumental in the renaissance.

Was he the only person working on this stuff? No, but I think we as a society like to tell ourselves a story about science and scientists. The narrative of the “lone scientist” wiling away their days in solitude until reaching a breakthrough probably hasn’t actually existed since the 19th century. Pretty much all significant discoveries nowadays are the product of intense collaboration and iterative progress. Science is a team sport, just one where 99% of the team happens to be invisible.

vl · on March 17, 2021

Also notice how skillfully names of his students are not mentioned - no PR for the unworthy. Of course they are easily discoverable from the published papers.

rirarobo · on March 17, 2021

The lone hero/savior/prodigy archetype is also just an easy narrative to sell, especially to anyone aspiring towards success who would feel inspired or vicariously validated by such a story.

There are countless articles that romanticize entrepreneurs who were child chess champions, or rubik's cube solving geniuses, who dropped out of college, or were academics who were under-acknowledged, only to build it all by themselves and hit the jackpot. At least that's how the story often goes. It almost seems like a sort of modern mythology, one that taps into the American dream that so many yearn for.

vlovich123 · on March 17, 2021

$44 million + the many more millions he makes annually hires a good agent. Also I’m sure it’s not unlikely that Google is using their PR contacts to pick the writer that does the profile can yield some favorable coverage.

hooloovoo_zoo · on March 17, 2021

Maybe, but what to make of the towels? The story implies he doesn't know humidifiers exist.

kortilla · on March 17, 2021

It’s a hotel. Humidifiers aren’t a common amenity and he took a train there.

hooloovoo_zoo · on March 17, 2021

OTOH, CVS is a block away.

LeegleechN · on March 17, 2021

"I signed contracts saying I would never reveal who we talked to. I signed one with Microsoft and one with Baidu and one with Google"

fggg444 · on March 17, 2021

hilarious, you can't make this shit up

lawrenceyan · on March 16, 2021

“The future is already here – it's just not very evenly distributed.” ― William Gibson

trott · on March 17, 2021

> It included only two other people, both young graduate students in his lab at the university. It made no products. It had no plans to make a product. And its website offered nothing but a name, DNN-research, which was even less inviting than the sparse page.

Were the students or Hinton under any obligation to stay at DNN-research?

fxtentacle · on March 17, 2021

I believe what they effectively did was receive their salary upfront.

for example:

3 people x 5 years guaranteed loyalty = $45 mio

=> $3 mio per person per year, paid upfront

That's still high, but not that much higher than what Google already pays to top performers. Plus they get all the PR benefits, as we have seen.

huac · on March 17, 2021

probably they had shares/cash payouts that would vest over some schedule

sjg007 · on March 17, 2021

rubiquity · on March 17, 2021

Have the proposed impact of Neural Nets come to light yet? I don’t say this in a snarky way. In my life I don’t encounter them much or at all and thinking about Google specifically their search results don’t seem to be getting better.

MattRix · on March 17, 2021

They’re everywhere! I can take a photo on my phone and it’s instantly given convincing artificial depth of field. I can search my photos for the name of anyone I know and instantly all the photos containing those people show up. I regularly use voice assistants, translation, and can dictate full length messages while I’m driving in my car. Neural nets can even drive cars now!

This stuff is already making massive impacts, and yet it’s still in the early stages. The meeting in the article was less than 10 years ago. Just imagine in 20 or 30 years from now.

nharada · on March 17, 2021

Just off the top of my head:

- Siri, Alexa, Google assistant all use deep learning for both understanding your speech and creating artificial voices

- Automatic translation

- Anything where you search for images or with images

- Virtual lenses in Snapchat/Messenger/etc, portrait mode and artificial depth of field in cell phone cameras

- Any recommendation system like YouTube, Spotify, or Tik Tok

- Tesla autopilot

- Image upscaling in video games

- Facebook/Google/anyone targeting you in ads

wodenokoto · on March 17, 2021

Except for Tesla autopilots all of those existed at one time as a non-neural network.

Mougatine · on March 17, 2021

But they weren't as good.

e.g. look at translation. It was not usable a decade ago

nl · on March 17, 2021

Ever searched for images and had it work?

Neural nets are incredibly good.

mkl · on March 17, 2021

Kind of. Google Images mostly seems to use words on the page with the image. If you search for someone's name, you'll find plenty of photos of other people.

nl · on March 17, 2021

Sure, but then do "similar images"

somethingAlex · on March 17, 2021

Image search by content, improved language translation, Siri/Alex/Google Assistant, automatic content moderation to name a few.

With that said, I generally feel like my life has not been impacted greatly. Siri can barely tell me anything other than the first paragraph of a wikipedia page or today's weather. I don't search my photos often. I don't translate languages often, and I'm not in a back office in charge of moderation.

astrange · on March 17, 2021

Recommender systems, text to speech (and the other way round), probably some problems in logistics like getting you deliveries and keeping the power running.

howmayiannoyyou · on March 17, 2021

And today projects like https://numer.ai/ have made ongoing business out of AI auctions.

oli5679 · on March 17, 2021

An ascending auction is strategically equivalent to a 'Vickery' 2nd price auction and logistically simpler, since it only requires one round of bidding.

https://en.wikipedia.org/wiki/Vickrey_auction

This ignores phycological factors, and some minor quirks relating to minimum bid increases.

harryposner · on March 17, 2021

An ascending price auction is strategically different from a second price auction when the bidders each have a noisy estimate of some underlying true value. With a second price auction, the only information you can incorporate into your bid is your own signal. With an ascending price auction, you can look at the prices at which the other bidders drop out (except the second highest bidder) and incorporate that information into the price at which you're willing to drop out. This also means that an ascending price auction raises more revenue than a second price auction, by the linkage principle.

https://en.wikipedia.org/wiki/Linkage_principle

jtsiskin · on March 17, 2021

Evidence by “That left Baidu, Google, and Microsoft. As the bids continued to climb, first to $15 million and then to $20 million, Microsoft dropped out too, but then returned. “. Microsoft could not have “returned” in a Vickrey auction

bernardv · on March 17, 2021

Phycological? Not sure what you mean - I don’t think you mean Fungi do you ;-)

bernardv · on March 17, 2021

Wish I had met Hinton while at U. Of Toronto. I was slated to do a PhD with a guy from the Medical and Zoology schools studying human vision and who was applying NN’s with Hinton’s help. Early ‘90’s when ‘soft computing’ was all the rage.

cbsmith · on March 16, 2021

I am so surprised they did not encrypt the e-mails with the bids.

jsnell · on March 16, 2021

How would somebody reading the emails game the system? The only effective moves are to pass if the price is too high, or bid the minimum increment if you think it is still cheap enough. Passing when the price is lower than your threshold is obviously losing. Raising more than the minimum has no value.

You could drive up the price by continuing to bid even when past your threshold, but only after an opponent also bids. But first, that does not get you any benefit, it just inconveniences a rival. And second, this will lead to a tie, which presumably has 50/50 odds of you ending up the winner.

I had a similar question about the backpack: they thought they could open it and find out what Baidu’s bidding strategy was. But that knowledge would have had no value to them. The outcome would not change.

huitzitziltzin · on March 17, 2021

This depends on the exact nature of the object being auctioned. The details matter a lot. Some good economists have worked on this problem both in theory and empirically for a long time.

Under independent private valuation (where the value of the object is completely idiosyncratic to you), your thought is (AFAIK) correct. Here think of art: ignoring resale value, how much someone else would pay for an object depends on how much he likes it, but doesn't have anything to do with how you should bid - your bid is a function of how much you like it. (To a first approximation - again the details of the auction format really matter.)

In a common value situation (where the value of the underlying object to any player is the same, but players _signals_ (an input to their beliefs) about that value contain an idiosyncratic component), you might learn something about the quality of your own signal by knowing something about the bids of other players in the auction. For example: are you about to overpay b/c you believe Hinton's time is worth a lot more than Google and Baidu believe it to be worth??

The phrase "the winner's curse" was invented for this problem in the context of bidding for offshore oil leases. The value to all players (the number of barrels of oil in the field) is the same, but they have different beliefs about what that number is. The winner's curse is that the highest bidder was generally the most optimistic about the value of the field. It's a "curse" b/c the optimism was frequently not justified.

What was actually being auctioned here is a little unclear - something like "the time and expertise of Hinton + two grad students", but it's plausible that Google, Microsoft and Baidu could have made equally good use of it. It reads more like a common value setting, that is. So knowing the bids can matter.

If you want to know more about auction theory (this comment may already be more than you want to know!), I can recommend Krishna's "Auction Theory" or Milgrom's "Putting Auction Theory to Work."

sokoloff · on March 17, 2021

Raising above the minimum increment does have value.

If Baidu is currently winning at say $21M, Google might bid $22M if they know Microsoft will not bid in that round, but might bid $22.1M if they know Microsoft will bid $22M in that round. (Or $23.6M if they know Microsoft will bid $23.5M.)

Or suppose Microsoft is willing to go to exactly $22.5M, Baidu to exactly $23.0M, and Google to $40M. If Google can get that information, Google can bid $22.6M and beat both Baidu and Microsoft (if Baidu doesn’t bid this round, because they’re anticipating a $22M bid this round that they’ll cover at $23M next round).

vl · on March 17, 2021

But it doesn’t really matter - they can just bid more in the next round. It’s not eBay - there is no time limit.

sokoloff · on March 17, 2021

There’s value in paying $22.6M rather than $24.0M to win the auction. Around $1.4M in value.

sdenton4 · on March 17, 2021

Not sure I buy it. The limits here are more 'go talk to your boss' than 'stop bidding when it gets to $XXMM' for everyone involved except for Deepmind (which was later also bought by teh googs).

Furthermore, execs are creatures of emotion and whimsy, rather than strictly rational operators. As the price soars higher, they see that everyone at the table values the thing highly, and are likely willing to increase their max bid as their sense of FOMO increases... Subincrements aren't going to matter much, except to ratchet up the feelings at the table even higher, faster.

ArtWomb · on March 17, 2021

While it seems reasonable to assume there is no advantage in the perfect information scenario, it feels wrong. For large auctions perhaps a pattern emerges in which a rival strategy could be inferred.

Of course in the end it didn't matter! Hinton subconsciously knew he wanted to join Goog. And may have been inadvertently signalling such all along. And the American rivals themselves were probably colluding as well to keep China in check.

I suspect this "history" of Deep Learning was quite dry to begin with. As it mostly featured academics. Some cloak and dagger style suspense was peppered in to make it a more exciting read ;)

bostonsre · on March 17, 2021

Maybe Google really doesn't like Microsoft or vice versa and they would try to out bid each other to prevent the other from winning, but they would both let another 3rd party win.

vl · on March 17, 2021

It is a bit surprising that they didn’t use neutral 3rd party as arbiter, and that they didn’t go ahead with Dutch auction. They would be done in few hours with very close result.

Another surprising thing is that there were no good will clause, ie that they could terminate auction and refuse to work for the winner (which they effectively did).

sjg007 · on March 17, 2021

I mean this was just price discovery. Google could have reneged too. Really showed the value of incorporating

kemitchell · on March 17, 2021

I hope he paid that lawyer on time.

nmfisher · on March 17, 2021

Great article, I had no clue Hinton had that amount of commercial savvy.

cjauvin · on March 16, 2021

I see this journalist (Cade Metz) everywhere these days it seems: this article, a NYT article about the Google ethical AI debate, and the article about Slate Star Codex..

elefanten · on March 17, 2021

He’s an NYT tech reporter, so that lines up with your observations.

He’s inherently a part of the ongoing NYT vs Silicon Valley spat, so probably being circulated more in tech circles as that plays out.

macksd · on March 17, 2021

This spat is news to me. What's going on?

offby37years · on March 17, 2021

https://greenwald.substack.com/p/the-journalistic-tattletale...

https://greenwald.substack.com/p/journalists-start-demanding...

batty_alex · on March 17, 2021

Reading this sort of stuff will just make you bitter and resentful. It’s a step away from using a slur to describe journalists they don’t like.

at_a_remove · on March 17, 2021

Well, we're going to need a new slur for journalists, something which is reminiscent of their rampant hypocrisy, their agendas, and general desire to throw stones without repercussion. "My side's violence is speech; your side's speech is violence" is the new refrain and how I would love to hear coined a word which can adequately encompass that kind of loathsomeness.

psychiatrist24 · on March 17, 2021

Just take the blue pill, never question anything!

centimeter · on March 17, 2021

Traditional east coast media is losing their grip on narrative formation and they're lashing out at intellectual figures outside of their social group, especially from the west coast indie-intellectual scene (e.g. Scott Alexander).

avs733 · on March 17, 2021

What in the heck is an indie-intellectual?

1123581321 · on March 17, 2021

In this context it would likely be someone who provides intellectual leadership through writing, but is not employed by a large media company like The New York Times.

meroes · on March 17, 2021

Little fish who failed in the big pond who now cultivate followings in littler ponds

astrange · on March 17, 2021

Scott Alexander is from Michigan. I don't know if blogging is really a West Coast thing, I thought of it as more DC but maybe that's just because of politics.

coderintherye · on March 17, 2021

It's because he's pumping his book, as seen at the end of the article as well as his other articles, his Twitter, etc.

belril · on March 17, 2021

As someone who once competed with him for stories: Cade is a very strong writer with a robust pedigree who really gets the current state of play in ML, and has the chops to write about it for a general audience. At his best, his articles are able to help non-experts understand the stakes and wonder that make up this corner of the tech world.

fggg444 · on March 17, 2021

they coordinated the auction via gmail and Google won. there's no way Google didn't peek at their other correspondences.

david-cako · on March 17, 2021

The race for AI supremacy started when people started doing CRISPR in their basement. It's not "ideological" for the US to go first, it's the only option.