A Proposal For the Dartmouth Summer Research Project on A.I. (1955)

AndrewOMartin · on July 12, 2016

My opinion is that you can't talk about the Dartmouth Project without eventually mentioning the Lighthill Report [https://en.wikipedia.org/wiki/Lighthill_report], an investigation from a British mathematician as to whether the UK Government should put all its egg in the AI basket.

It's a highly recommended read [http://www.aiai.ed.ac.uk/events/lighthill1973/lighthill.pdf], or you could watch his presentation of the report (to Minsky amongst others) [http://www.aiai.ed.ac.uk/events/lighthill1973/] it's also on YouTube.

He was astute in identifying that AI has succeeded in A) Automation of well defined tasks, C) Investigation of problem solving processes, but failed to product much in the way of B) the combination of A and C, an independently intelligent artifact.

It's been a while since I read it, but I remember the video being entertaining, especially the exchanges between the Lighthill and the Minsky, and the analysis being relevent even to today's state of AI.

deepnet · on July 12, 2016

Very informative, but Minsky was not at the debate, it was Michie, Gregory, & McCarthy.

The Lighthill report destroyed the UK's lead in AI at Edinburgh.

Edinburgh's AI lab, founded by Donald Michie, a wartime colleague of Turings & Richard Gregory a vision & theory of mind expert.

Edinburgh had a Robot Arm that could assemble various wooden toys from randomly scattered blocks using vision & planning.

Edinburgh had produced POPLOG, a widely used European LISP (with less brackets :) )

Michie was a proponent of Machine Intelligence, his "trial and error" BOXES algorithm could learn to balance a pole - everywhere else used hand engineered Symbolic GOFAI.

Michie BOXES enabled learning robots that anticipated neural nets & SGD using reinforcement learning.

Edinburgh's unique vision was world leading at the time. Sadly European industry followed Lighthill's lead - the 1st AI winter.

Lighthill was a pure mathematician and not well qualified to vet AI, his criticisms proved wrong in hindsight - Edinburgh had automation, vision & learning on a PDP-11.

Pariah in Europe, Donald Michie went on to help develop Japanese robotic assembly lines and use BOXES for factory & satellite control. Here he laments : [18:24] http://www.bbc.co.uk/iplayer/episode/p0306rt1/micro-live-ser...

robotresearcher · on July 13, 2016

POPLOG was more than a LISP! It was an IDE for POP-11, Common Lisp, Prolog, and Standard ML. All other languages were incrementally compiled to POP-11. Lisp was the least used of these in my experience at Sussex.

\sidebar:

POP-11 has Lisp-y list support with a Pascal-like syntax. It was pretty nice. It had assignment the right way around (if you have the stack as a mental model):

  5 -> x

Grad students and undergrad keeners were advised to learn enough LISP to read papers from the MIT AI Lab :)

https://en.wikipedia.org/wiki/Poplog

It was pretty well done, but not exactly fast on a 1990 Sun minicomputer shared by a hundred students.

Almost all of our undergrad assignments were set and submitted via the POPLOG system, including lecture notes and tutorials: 'teach texts' with hypertext links. You could highlight code snippets with the cursor and run them in the REPL. All pre-web on a VT-100. Great stuff.

deepnet · on July 13, 2016

Ah yes, many thanks, I was thinking of Popplestone & Burstall's functional language POP-II from Edinburgh. POPLOG was developed at Sussex.

Did you work with Margret Bowden's robotics group ?

robotresearcher · on July 13, 2016

I took Boden's mandatory first-year class Intro to CogSci in 1990. She was head of department, so I saw her around but I didn't know her well. I enjoyed the class very much.

She was a very respected figure in philosophy of cogsci and AI, but I don't recall her doing any practical robotics, which was barely present at Sussex at that time.

mturmon · on July 12, 2016

I agree that his breakdown into A/B/C is instructive. From the outside, one sometimes fails to appreciate that the "B" role (basically, building a robot that will encounter novel problems) is really critical.

This "shower thought" within the report also caught my eye:

Incidentally, it has sometimes been argued that part of the stimulus to laborious male activity in creative fields of work, including pure science, is the urge to compensate for lack of the female capability of giving birth to children. If this were true, then Building Robots might indeed be seen as the ideal compensation. There is one piece of evidence supporting that highly uncertain hypothesis: most robots are designed from the outset to operate in a world as like as possible to the conventional child's world as seen by a man: they play games, they do puzzles, they build towers of bricks, they recognise pictures in drawing books (bear on rug with ball) -- although the rich emotional character of the child's world is totally absent.

It's somewhere in the Venn diagram between deeply perceptive and reductively essentialist, and I can't decide where.

nabla9 · on July 12, 2016

Alan Perlis epigram #63: When we write programs that "learn", it turns out we do and they don't.

2bitencryption · on July 12, 2016

What's funny is I feel in some ways the exact opposite is true.

When Google creates a program to learn Go, it learns go so well that it knows it (arguably) knows it far better than any human (even if it isn't flawless).

But what did we learn about go? Well, we learned a bit about the opening I guess, since Lee Sedol has become fond of the "AlphaGo opening," but other than that... not much, right?

That's the funny thing about neural networks. They can converge to a set of weights that, when activated, perform better than any human. But we can't look at Weight 483 and Weight 958 and say "Ah, that's where it decided the corner is very valuable!" or something.

It learns, we don't. We can only learn from what it can then show us it has learned.

argonaut · on July 13, 2016

This is not true. We can look at networks and manually look at their weights to determine what features it learned, even high level features. You can do it right now (by examining a pretrained network).

People don't bother to because:

1) it's a very boring problem (we already have a high-level view of what networks learn through various visualizations, and what you'd learn would be specific to one network learned for one dataset)

and 2) it's very tedious and not repeatable (have to do it all over for each new dataset and each new model).

2bitencryption · on July 13, 2016

We might be thinking of different scopes of machine learning.

You could look at AlphaGo's weights for the entire neural network for ten thousand years. But you would become no better at Go. The only way AlphaGo can help us improve at Go is by showing us what it has learned in the games it plays.

argonaut · on July 13, 2016

You're dancing around, and haven't really defined, what it means to "learn Go."

Sure, humans wouldn't become better at Go. But that's a limitation of the human brain (we're not good at mathematical memorization and computation).

For all we know, what the network has learned about Go (a highly complex and interconnected set of statistical dependencies) is what there is to learn about Go. You're implicitly making the assumption that what the network learns about Go is guaranteed to be translatable to something humans can learn.

On the contrary, what the network learns is merely reducible, with loss of accuracy to what humans can understand. And that is an active area of research (feature visualizations and explanations), but that is tangential to your point.

kkylin · on July 12, 2016

I quite agree with everything you said, but I'm not sure what we mean today by "learn" (in the context of ML) is what Perlis would have meant.

Houshalter · on July 13, 2016

The exact opposite seems to be true. In many domains, machine learning showed even very simple algorithms easily outperform "experts". It's not uncommon to find that experts perform barely better than chance. See: http://lesswrong.com/lw/3gv/statistical_prediction_rules_out...

Animats · on July 12, 2016

Another McCarthy summer project was to build a robot to assemble a Heathkit color TV kit. That went nowhere, but the TV kit was purchased. After a few years, someone assembled it by hand, and for years afterward it was in the student lounge in Margret Jacks Hall at Stanford.

They were so optimistic in the early days. And they had so little compute power.

kleer001 · on July 12, 2016

And are we the opposite these days? So much computing power and pessimism.

daveguy · on July 12, 2016

I wouldn't say there is much pessimism in AI today. There should be a lot more pessimism to prevent a funding vacuum. I'd like for the early 90s to be the last AI winter, but I doubt it will be.

sapphireblue · on July 12, 2016

Actually the success of deep learning has made us very optimistic again. Just read this blog post: https://deepmind.com/blog there is a screenshot of environment that deepmind uses to train its agents. It's like science fiction that an algorithm can learn to behave in such rich 3d environment, but it works.

Also see the graphs for atari environments: many games are played by rl agents "at human-level or above".

projectramo · on July 12, 2016

I love this classic document. My favorite part is how easy they all thought the problems were going to be. It was the early days, there were breakthroughs everywhere, and in 14 years they were going to land a man on the moon.

It also feels a little like reading Andy Warhol's diary and realizing all the famous people knew each other. Never realized they were so close.

I hope they got their funding.

mturmon · on July 12, 2016

Another document from this bubbly time is this NYT article from July 1958, on a demo of Rosenblatt's perceptron ("NEW NAVY DEVICE LEARNS BY DOING") -- http://query.nytimes.com/gst/abstract.html?res=9D01E4D8173DE...

"Perceptrons might be fired to the planets as mechanical space explorers."

sapphireblue · on July 12, 2016

An interesting but not widely known fact is that the Dartmouth conference was also attended by Ray Solomonoff. Contrary to his colleagues he focused on questions of machine learning before learning became perceived as a worthwhile research direction by the AI community.

This has led Solomonoff to investigation of a question of universal sequence prediction. A couple of years later Solomonoff wrote a paper about such a system for prediction that used algorithmic probability (he is cited later as the original developer of algorithmic information theory which was later independently discovered by Kolmogorov who later acknowledged that the Solomonoff was the first). This method, Solomonoff's induction, is proven to be the most optimal (though incomputable) machine learning method possible.

He has never abandoned this project and for the rest of his life he focused on making more sophisticated system designs that are computable while still being proven to be optimal.

His latest system is called "Alpha", and it is designed as a machine for solving a sequence of function inversion and time limited optimization problems (a majority of science/engineering problems can be formulated this way) in a way that exploits experience gathered while solving these problems. This system, again, is proven to be optimal in a certain sense. He also tried to implement this system with various practical optimizations, but it didn't converge fast enough on his training sequences and on the hardware of that time.

Still, with modern hardware it is a possibility that it could work. And the whole design is described in the papers, so people can (and actually do, though privately and perhaps without much success) implement this system.

Here are the relevant papers: http://world.std.com/~rjs/publications/IncLrn89.pdf http://world.std.com/~rjs/nips02.pdf

simonh · on July 12, 2016

If all these efforts were successful, where would we be now? The singularity might have come by 1966, the year I was born. Actually, that move's (almost) already been made Colossus: The Forbin Project (1970).

I've long had a bias against minsky because I thought he said some very silly things about AI back in the day, but I think I was probably wrong, or at least that he deserved more attention than I gave him. I watched some interviews with him in the Youtube channel 'Closer To Truth' and he's by far my favourite interviewee. Very incisive.

bbctol · on July 12, 2016

It's just such a shame that Minksy came out so harshly against perceptrons. He really was an inspiring figure, and at the time really did seem to have good reasons for preferring symbolic logic over neural networks, but the more success NNs have, the more it looks like one of the pillars of 20th century AI got this one colossally wrong. If further improvements in AI follow the same lines, he may wind up an Edison-like figure, hugely influential but with major drawbacks.

EDIT: Like I said, he did have good reasons! That's why Perceptrons was so influential; it's just the weird unfortunate luck of history that he ended up diverting effort from what's now become a much more promising field.

varjag · on July 12, 2016

Mind you the "successes of NNs" didn't start to show up until 2010 or so, despite active research in multi-layer ANNs going from early 1980s. Just the past decade ANN classifier performance wasn't particularly remarkable compared to other methods. And given the 20th century technology, the deep learning architectures of today were computationally unfeasible.

Bartweiss · on July 12, 2016

It does seem rather harsh to hold Minsky to account for a conclusion which is only true with access to the massive computing resources of the 21st century. Not only was the future power of computation unknown in 1955, the quality of neural networks remains nonobvious even with that prediction - you have to actually run the things and see if they work.

None of that makes Minsky right, but it's hard to see how much could even have been achieved on neural nets back in '55. Our architecture design today descends from experimental results that were not going to be available for many decades.

varjag · on July 12, 2016

True, things often seem simple with advantage of hindsight. However Minsky's original criticism was to linear separability of original perceptron, the only known ANN at the time, and as such as technically sound now as it was then. Even when people got some spare cycles on their computers and started to throw extra layers to increase dimensionality the results weren't too encouraging for a long time.

EDIT: well I basically made the same point as you.

projectramo · on July 12, 2016

That is true.

There are two ways that Minsky could have (mathematically) looked past the linear separability issue:

1. If you add another layer to the perceptron, it can solve the XOR problem.

2. If you add a non-monotonic threshold function, it can solve the XOR problem.

So these are two rather simple solutions to the issue he brought up.

argonaut · on July 13, 2016

To be clear, those solutions were well known at the time. It was known at the time that multiple layers could compute any boolean fucntion (https://en.wikipedia.org/wiki/Perceptrons_(book)).

The role of Minsky in killing perceptrons is seriously overblown.

projectramo · on July 13, 2016

This is news to me. I had been steeped in a different lore. I have read the original article (or perhaps it was an excerpt?). I don't recall this reference.

I see from the wikipedia article you linked to, that they did know about the multiple layers. I thought it was suspicious that they had somehow missed it since it is so simple (at least to us now), and these guys are so very smart.

I wonder if they also knew (or realized, rather), that a single layer neuron with a non-monotonic function could have also "solved" XOR.

varjag · on July 12, 2016

Again, hindsight. Backpropagation was first applied to ANNs on the verge of 1980s. So we are slowly moving from Misnky wrongly criticizing perceptrons to Minsky not inventing sound backpropagation/multilayer ANN algorithm, which is perhaps taking it a bit too far.

projectramo · on July 12, 2016

Just to be clear, Minsky didn't need backprop, just the multiple layers. (In order for a perceptron to act like an XOR gate, or to "solve the XOR problem" in some parlance.)

Minsky assumed a trained percepton with the weights already set to act like an AND or an OR gate. He wasn't dealing with the learning problem.

varjag · on July 12, 2016

Fair point, although without knowing that this approach is practically viable to begin with it was getting into speculations territory. Not that Minsky didn't like to speculate..

Anyway, IMO people overestimate Minsky influence to single-handedly shut down an avenue of research. The reason _Perceptrons_ conclusions caught up is because they were sound and reasonable to his peers at the time.

I also seem to remember his video interview from a few years back where he elaborates on perceptrons and how much of his original conclusions are applicable to the state of art ANNs. Can't quite find it though.

argonaut · on July 13, 2016

To reiterate, people already widely knew multiple layers solved XOR at the time.

projectramo · on July 13, 2016

I didn't know that till I saw your other response. Responded there.

robrenaud · on July 12, 2016

neural nets were sorting tons of US mail since the 90s.

varjag · on July 12, 2016

And symbolic methods beat human players in chess and checkers for decades now, and handled U.S. military logistics since the Desert Storm. Markov chains were writing spamfilter-busting prose for years, and graph clustering powered Google search.

It's great to be enthusiastic about breakthroughs, but the history of AI is littered with partial success stories.

projectramo · on July 12, 2016

True, though he made that decision in a particular historical context. Consider:

1. The computational resources at the time. I tried to run 14 node multi-layer NNs in the 90s and I'd have to go to lunch and come back before a single run was done. (backprop looped several times to converge).

2. The whole symbolic vs statistical debate that was going on at MIT. You had Chomsky, Fodor and Minsky lining up on almost philosophical grounds. (In Fodor's case, explicitly so).

zackmorris · on July 12, 2016

Ya I think so too. I envision strong AI as happening in the next 10-20 years, even as just a random walk of the problem space on clusters of thousands of machines with thousands of cores. But it could take decades after that to get to the point where we understand AI enough to create one with a minimum of cores and algorithms. So he was probably right to discount the importance of creating AI if we don't understand it, but wrong to discount the significance of such an achievement.

mooneater · on July 12, 2016

would love to hear what exactly came of this.

according to [1]: "The project was approved and brought together a group of researchers which included pioneers such as Newell, Simon, McCarthy, Solomonoff, Shannon, Minsky, and Selfridge, all of whom made seminal contributions to the field of Artificial Intelligence in later years. "

[1] http://www.asiapacific-mathnews.com/04/0403/0015_0020.pdf

jpm_sd · on July 12, 2016

And then there was the Summer Vision Project! [1]

"After fifteen minutes of searching with Google, the majority of web pages give a citation that the person who said this was Marvin Minsky and the student was Gerald Sussman. According to the majority of these quotes, in 1966 Minsky asked Sussman to "connect a camera to a computer and do something with it".

They may indeed have had that conversation but in actual fact, the original Computer Vision project referred to above was set up by Seymour Papert at MIT and given to Sussman who was to co-ordinate a group of 10 students including himself. [2]

The original document outlined a plan to do some kind of basic foreground/background segmentation, followed by a subgoal of analysing scenes with simple non-overlapping objects, with distinct uniform colour and texture and homogeneous backgrounds. A further subgoal was to extend the system to more complex objects.

So it would seem that Computer Vision was never a summer project for a single student, nor did it aim to make a complete working vision system. Maybe it was too ambitious for its time, but it's unlikely that the researchers involved thought that it would be completely solved at the end. Finally, Computer Vision as we know it today is vastly different to what it was thought to be in 1966. Today we have many topics derived from CV such as inpainting, novel view generation, gesture recognition, deep learning, etc."

[1] http://www.lyndonhill.com/opinion-cvlegends.html [2] https://dspace.mit.edu/handle/1721.1/6125

eggoa · on July 12, 2016

They succeeded, the singularity happened in secret, and we've been controlled by skynet ever since. How else do you think the world survived the cold war?

PeCaN · on July 12, 2016

I particularly like this quote:

“[T]he major obstacle is not lack of machine capacity, but our inability to write programs taking full advantage of what we have.”

Which is more relevant now than ever. It's interesting that at the time they still didn't consider themselves to be taking full advantage of what they had—which was positively primitive—and I wonder how they planned to squeeze more power out.

bjornsing · on July 12, 2016

And it's pretty affordable too: $13,500 total budget (in 1955 dollars though of course...). :P

soared · on July 12, 2016

Only... $120,000!

http://www.in2013dollars.com/1955-dollars-in-2016?amount=135...

adamnemecek · on July 12, 2016

Good thing they did.

dang · on July 12, 2016

The submitted title ("Minsky, McCarthy and Shannon: solve all major AI problems over the summer") broke the HN guidelines by editorializing. Submitters: Please use the original title unless it is misleading or linkbait.

https://news.ycombinator.com/newsguidelines.html

projectramo · on July 12, 2016

Sorry, I didn't realize that was a guideline.

Q: What if the original title is too long to fit in the space allotted?

dang · on July 13, 2016

A: Then you have no choice but to change the original title enough to fit 80 chars. But please preserve as much of it as you can. In nearly every case you can get there by finding a less important word or two to drop.