Hacker News new | past | comments | ask | show | jobs | submit login
Artificial Intelligence Software Is Booming, But Why Now? (nytimes.com)
108 points by retupmoc01 on Sept 19, 2016 | hide | past | favorite | 136 comments



    Date		Approximate cost per GFLOPS inflation adjusted to 2013 US dollars
    ---------------------------------------------------------------------------------
    1961		$8.3 trillion
    1984		$42,780,000
    1997		$42,000	
    April 2000		$1,300
    May 2000		$836
    August 2003		$100
    August 2007		$52
    March 2011		$1.80
    August 2012		$0.73
    June 2013		$0.22
    November 2013	$0.16
    December 2013	$0.12
    January 2015	$0.06


And then there's the metric of how pitifully little intelligence we've managed to get from all those GFLOPS. I'd say that all the GFLOPS together don't add up to the intelligence of a single Portia africana.

http://news.nationalgeographic.com/2016/01/160121-jumping-sp...


I don't buy that. Spiders can't beat a human at Go. Sure, they weren't evolved to do that. But how many generations of selective breeding do you think it would take to evolve a spider that could beat the best humans at go?

Whereas if you made a spider hunting video game, I bet researchers could train AI's that could beat it within a few months. Video game playing has actually become a big area of research recently and we will soon see deep nets that can beat starcraft and other real time games that take planning and skill.


I think you're missing the point there. Portia is able to, using just a handful of neurons (~600K), accomplish sophisticated vision tasks, complex planning, online learning and scheduling. Portia and insects in general are much more flexible to changes than our Neural nets. Changing objectives or a slight modification in the rules will often require retraining and a new design but insects can adapt to various lighting, environmental, prey, predator and navigational concerns. From the wiki on portia:

> Laboratory studies show that Portia learns very quickly how to overcome web-building spiders that neither it nor its ancestors would have met in the wild.

We should strive for its kind of energy and hence, computational efficiency in our models. Think for example, what that would mean for prosthetics. I find the overzealous defense of the current state of the art just as frustrating as out of hand dismissals of AI which constantly move goal posts.


How many connections betweens those 600k neurons? That is one large difference between biological and artificial neural networks (though there are many others).

Current DNN architectures won't be the final solutions, but they do the job much better than anything else we've tried before. That said there is ongoing work in more biologically inspired approaches that might produce better results for the things you mentioned in the future


This, this right here is the EXACT situation that causes all the bioinformatics guys to be EXTREMELY NERVOUS using biological analogies like neurons for machine learning topics. They are NOT actually analogous except in the shallowest ways.

First off, there's substantial evidence to suggest that in a biological systems, an activation profile of the neurological system is more analogous to what a Neural Network in machine Learning considers a 'neuron'- that means that when you consider the spider, you're not looking at 600K nodes, you're looking at 600,000 FACTORIAL. As in 600K!, or approx. 2x10^3,200,000.

Second, even THAT number may be substantially too small, because the STRENGTH, as represented by the number of connections a given node in a biological Neural net has to another node, can ALSO variably affect the performance/recall/whatever of the organism. I don't recall what the higheat known number of connections between two nodes is in humans, let alone spiders, but it is not only 2 or 3, it can be many more.

Portia and other insects are operating on an entirely different level, with different hardware, and a different operating system, than your Neural Network ENTIRELY - trying to compare the nodes-to-performance ratio of the spider vs. the DNN is not even wrong, it's like the Twilight Zone of every computer competent neuroscientist on this planet. Utterly and completely meaningless, except maybe in an inspirational sense, to flagellate your statisticians into feeling so inferior that they magically squeeze greater performance out of their networks, not because they cleverly utilized the fundamental differences between biological and silicon computer systems, but rather because they feel so bad about the comparison.

I apologize for my snooty tone- in college I argued with a professor about this, with me asserting that clearly, no one would every REALLY equate biological neurons with machine learning neurons one-to-one, and yet here you are, clearly he was right and I have lost our bet, I shall have to go back and eat a hat. ;)


I don't have any disagreement with you--except with your fermi estimate--I've made some of those points before too. In retrospect (and I'd realized it before you posted) I shouldn't have emphasized neuron counts but watts. Ultimately, similar computations are occurring with hugely different energy profiles. That's what we should figure out how to do using software and the right hardware.


>Portia and insects in general

Spiders are not insects!


You're right! and I've known this but that distinction didn't come to mind as I was writing the post. It's not really relevant though, as my point was about how adaptable and surprisingly intelligent behaviours even these tiny, tiny creatures with barely more than a milliwatt (I'm guessing) to their name, can produce. And you know, and can be used in a similar way to either set union or intersection in natural language :)


>> how many generations of selective breeding do you think it would take to evolve a spider that could beat the best humans at go?

That's a different issue. Spiders, as all insects, don't learn.

It seems that insects have some kind of firmware-style programming that can't change, because they don't have the cognitive ability to change it. This firmware can be unbelievably complex, like in the case of Portia spiders stalking prey, but it can't develop any further, at least not on the individual level.

Even this however is nothing short of amazing. The truth is that we haven't developed a single program that can do what a Portia spider can do, and in fact learning algorithms (or strategies) like reinforcement learning are our best bet to avoid having to explicitly write down the rules for this sort of behaviour. And we want to avoid having to do that because it's probably beyond our abilities.

The learning algorithms we have right now however aren't a very good way to develop intelligence that can display such complex behaviours. Those algorithms are one-trick ponies, they learn to do one thing very well but if you want to do more than one thing you need to train more instances of the algorithm- and then you need to find a way to combine the trained instances in a cognitive architecture. And on that we haven't made much progress yet.


>> Spiders, as all insects, don't learn.

That is blatantly false. A recent paper has shown that bees can even be trained to distinguishing a handful of human faces( a task which they are of course not optimized for).

for starters: http://insects.about.com/od/behaviorcommunication/f/caninsec...


It might be more accurate to say that bees have the innate ability to distinguish and remember faces. An insect cannot learn a novel concept they are not able to do innately.


> That's a different issue. Spiders, as all insects, don't learn.

But portia spiders do seem to exhibit learning behavior. While they have preprogrammed strategies for common preys, they do attempt to catch unfamiliar preys and seem to learn from their mistakes and come up with new strategies.


That's planning, not learning.

An instance of learning would be training a spider to play Go, say.


What is the difference between planning and learning, and would you consider a human that is unable to learn something like general relativity because they are mentally retarded to be unable to learn?


That's exactly my point. Spiders can't learn. Maybe they are amazing for what they do, but the comparison is apples to oranges. They do fundamentally different things.

Of course spiders are complex. Everything in biology is complex. It's been slowly refined over millions and millions of years. Countless trillions of tweaks to its DNA tested and rejected until it hit upon something that worked. No human could design something as complex as a spider.

Our artificial neural networks are just as complex though. AlphaGo probably knows millions of facts and heuristics about Go. You would never be able to understand everything it knows. Even though we didn't create the complexity ourselves, we "evolve" it just like the spider, sometimes even with algorithms based on evolution. But we have much more efficient algorithms than that, like backprop.


>> Our artificial neural networks are just as complex though.

I don't know about that. I think it depends on how you count the complexity of a model. There's three cases: a) insects are less complex than machine learning models, b) both are equally complex, c) insects are simpler.

If (b) or (c) is the case then we're shafted, because we're failing to reproduce insect behaviour despite building systems at least as complex as theirs. If (c) is the case, we're particularly screwed because it means we're missing a simple, more efficient way to do things.

If (a) is the case then we may have a hope to get to where insects are in terms of complexity and possibly reproduce their behaviour. Still, there's the issue of combining models to perform well at a broad range of tasks, which is far from solved.


The large majority of deployed machine learning algorithms actually can't learn. The bias is so large that most (all?) specialized hardware assumes that only inference will be done. In contrast, there is evidence for online adaptability in insects^

^ http://www.annualreviews.org/eprint/5PHddkgYYKaduPp4CcN5/ful...


>> The large majority of deployed machine learning algorithms actually can't learn.

Learning algorithms construct a model of data- that's what we call "learning", or "training". The trained model is then used to make decisions, so that's what you would deploy in a production system.

In that sense the deployed systems don't learn any further, yes, but that's because you don't deploy the algorithm per se.

An exception is what is known as "online learning" algorithms, that can continuously update their model as new data becomes available (hance, "online").


I took care to point out deployed models. I did not say the models did no learning. In fact, one can draw an analogy from evolution and instinct to offline training/hyperparameter search/architecture tweaks and deployed model.

And, technically, the manner by which most nets are trained, via (minibatched) stochastic gradient descent is online. But the architectures are so ill-adapted to learning continuously and backprop is so data inefficient that retraining on new data is often just better done from scratch (or at best, freezing weights).


They learn just fine during training.


In terms of number of nodes, complexity of the arrangement of nodes, and amount of computation performed by each node, even a small biological brain is incalculably more complicated than a top ANN.


>Spiders, as all insects, don't learn

>Spiders

>insects

For shame!


Fine, fine:

>> Spiders, as all arthropods, don't learn.

Better now?


Build us a machine which can navigate the environment as swiftly as a spider, THEN general spider AI is reached. ( not talking about the hardware, talking about the smooth integration of image recognition, motor control, general learning abilities and whatnot)


How intelligent is a spider, truly ?

I mean, creating a machine that beats us at a game we invented ourselves is one thing, but how does that compare to a species that has survived much longer on this earth than us?

I mean most people are more worried about iPhone 7 than climate change.

I am not arguing that we are not smarter than spiders, but I'm also not going to argue we truly know that we are.

It would be wise to keep our egos under control.


you could wipe out that species with a focused effort on your part in a couple of years. does that mean you are more important than spiders? bringing unrealistic philosophical considerations into tangible concerns is a pet peeve of mine


Well, yes, if you could make a completely artificial, controlled environment that presents a problem well-suited for the current flavor of AI, the AI would probably be able to beat a random animal at solving that problem.


Spiders have much better hardware than the state of the art in low cost miniatuarised robotics.

However I'm not convinced that if you trained a deep net and gave it access to a spider's sensor suite and effectors, that it would do worse than an actual spider would.

A lot worse in terms of efficiency per watt, sure...


How would you train that deep net? What data would you give it? What would you train it to do exactly?


Survive. Reproduce. Just like the video games nets are already trained to play, there's a pretty clear scoring function. The inputs would be spider sensorium, the outputs would be spider effectors.

I'm kind of waiting for someone to do this already with C elegans by excising their 300 or so neurons and replacing them with a GPU and laser (but I imagine the practical impediments are way higher than I handwavingly imagine).


>> The inputs would be spider sensorium, the outputs would be spider effectors.

That sounds great on paper. In practice, how do you even begin to collect "spider sensorium"? fMRI?

As to "spider effectors" I'm pretty sure we don't have the tech to do that yet, at least not in spider-scale (excluding Giant Spiders of Doom).

On paper, you can approximate any function with a multi-layer perceptron with enough layers. In practice, "enough" layers may be infinite and you'd need infinite amounts of data infinitely difficult to collect.


I think the most reasonable near-term approach "if you had to do it" would be reverse engineering and virtualities, but please bear in mind it was a hypothetical to begin with.

I expect the number of inputs to spider sensorium would be low thousands of analog inputs. To be clear I can't cite a reference for this.

Yes it seems intractibly difficult on paper but spider hardware can do it.

It's hard to rate what we'll expect from self driving cars on the flatworm to spider spectrum (probably closer to the flatworm end, but the cameras add an interesting dimension), but clearly reasonable yet ambitious people believe these problems are tractable with sufficient effort.


I agree that mastering board games requires intelligence. I also agree that good chess players are usually intelligent, even quite smart. But it's not like people's abilities in chess are directly proportional to their intelligence, is it?

I mean, you would probably have mediocre chess players that could beat Plato or Darwin or Einstein.

Why is it that computers' abilities to win chess or Go are seen as the proxy for how far AI has come? As if the pinnacle of civilisation is Go or if Magnus Carlsen is the greatest mind of all times. Go and chess are sports. It's rugby for nerds. A computer beating Kasparov is impressive in its own right, but it's a far cry from the computer having his intelligence.


Actually bulk of what we call human intelligence is basically nothing more than a chain of facts we built up, each fact derived based on the truths we learned(or perceived) before.

Intelligence therefore is that process where you get from A to B, with steps in between that require consciously investigating the already available knowledge to us.

This is already happening with machines today. Someday a person will come up with a generic learning algorithm, which has a meta component to either search for answers in the space it already knows, or use external means to learn more about it.

Please also note without means to move around, use your hand/legs/eyes et al to test around and build your experiences, we will likely have the same primitive intelligence. Only difference now is instead of giving the machines these sensory organs, we are only working with an algorithm with inputs and outputs and then have it evolve.


There is no such thing as a universal intelligence test. I'm not claiming that winning at Go implies intelligence. I'm simply demonstrating that spiders aren't generally intelligent. They can't learn to do things other than what they were programmed to do. Whereas ANNs can do a wide variety of tasks, including tasks no animals are remotely capable of.


Depending on how you define 'soon' - it might take a little longer. A bunch of really good people at Facebook AI just worked on Starcraft using reinforcement learning, and even getting the micro down is really hard.


>> Spiders can't beat a human at Go.

Finding the best next move from a given board position at Go is a single cognitive task, whereas surviving in the natural world requires an intelligence that can excel at a possibly infinite number of such tasks.

Portia spiders are particularly amazing in that they can exhibit such complex behaviours as ambushing or stalking prey. Like all insects they do this sort of thing with a tiny number of neurons. We have no idea how to do so well with such few resources.


There is no mathematical distinction between different "tasks". Playing Go requires thousands or even millions of different heuristics and pieces of knowledge. Each of those is a "task" in and of itself. The same technology which beat Go could also, in principle, be used to play a real time video game that requires hunting and stalking things.

Insects are indeed tiny. But they are highly specialized. They can't learn to do things differently than they are programmed. GPUs are extremely general purpose. Not only can neural nets running on them learn many tasks, but they can execute many algorithms which aren't neural networks at all.


> The same technology which beat Go could also, in principle, be used to play a real time video game that requires hunting and stalking things.

If you generalize things enough to mean application of linear algebra, the chain rule and map, then yes that's true. However, the fact is that Neural networks still take a huge amount of skill and experimentation to design new architectures. The more powerful deeplearning becomes, the less skill will be required to do complicated things.

The AlphaGo bot was very data inefficient; at ~1 hr 18 min in http://techtalks.tv/talks/deep-reinforcement-learning/62360/, David Silver points out how they broke correlations by brute forcing millions of games while only keeping one move per game. We want something with better generalization ability than that. As amazing as AlphaGo was, it's more a very early waystation than our final destination. Recognizing the weaknesses of our models instead of glossing over them is how we get to AI!


>If you generalize things enough to mean application of linear algebra, the chain rule and map, then yes that's true. However, the fact is that Neural networks still take a huge amount of skill and experimentation to design new architectures. The more powerful deeplearning becomes, the less skill will be required to do complicated things.

Sort of. Hyperparameter tuning is indeed difficult, but in principle it can be automated and there has been great work on this in the past few years. But I never claimed that it was just plug and play. The point is it's the same underlying technology in speech recognition or self driving cars or face recognition. Even if the number of layers and convolution sizes and learning rates are totally different, the principles are the same.

>The AlphaGo bot was very data inefficient... David Silver points out how they broke correlations by brute forcing millions of games while only keeping one move per game.

This problem was solved in the past with temporal difference learning. Maybe there are technical reasons they didn't do that, I don't know. At the very least they could pick a random move from each game per batch, instead of discarding the rest of the moves. I don't really see how that helps solve anything.

>Recognizing the weaknesses of our models instead of glossing over them is how we get to AI!

There is a huge difference between recognizing weaknesses, and the AI Effect. Where AI solves problem X, and people disregard it "because it can't do Y".


> Sort of. Hyperparameter tuning is indeed difficult, but in principle it can be automated and there has been great work on this in the past few years.

There's far more to modification than just hyperparameter tuning; this is why people can keep turning out papers on the latest breakthrough. The gates of LSTMs are a non-trivial addition to RNNs. Attaching a differentiable stack or figuring out how to best do soft attention requires far more thought than just bashing at hyperparameters. CTC, dilated convolutions, residuals, extending auto-encoders to capture the data distribution were not trivial ideas. VGGNet and other conv nets are a very manual heavy design, differing significantly from vanilla MLPs and not something that could be automatically arrived at (at least not yet). Things like FractalNet or recursive tree based architectures differ greatly from vanilla networks.

> Even if the number of layers and convolution sizes and learning rates are totally different, the principles are the same.

Again, the underlying captured algorithms are very different. If you could look at the underlying source code these neural net programs represent, they'd vary at least as much from each other as the code for say a merge sort, insertion sort, binary search or an AVL tree would.

> This problem was solved in the past with temporal difference learning. Maybe there are technical reasons they didn't do that, I don't know.

The problem pointed out in the lecture, part of why it took so long to find a good Go policy, was that immediate moves differed so little from each other that it was too difficult to get any kind of signal without the brute force approach to decorrelating they took. There are probably better ways but that was the shortest delta available to progress.


>> There is no mathematical distinction between different "tasks".

Sure there is. For instance, the minimax algorithm is a mathematical model of optimal behaviour in a specific kind of game (two-player, zero-sum, complete information games). Performing minimax to dominate such a game is a very clearly delimited task and that's exactly why a machine can do it in the first place.

More broadly, any "task" can be described mathematically by means of a program for a given machine. Saying there's no mathematical distinction between different tasks is like saying you can't write a program that does a specific thing.

>> Insects are indeed tiny. But they are highly specialized.

Agreed- see my comment above.

However, that on its own doesn't say much. Sure, insects have highly specialised firmware-style programming. How did they come to have them and how can we reproduce them? We can't hope to create such programs "by hand" so we're currently trying to create algorithms that learn them. We're still far from achieving it though.

That neural networks and other learning algorithms can learn all sorts of different things is true, however you have to train a new instance of an algorithm for every different thing you want it to learn (what I call a "task", so a new program). For each instance you need new data (and a lot of it) another huge chunk of processing power and a long time to train. Then you have to find a way to combine all the trained models into one ... thing that can operate as a coherent whole.

We haven't even started on the last bit.

Just having the technology to create pigments is far from being the same as creating a Mona Lisa.


Just as important is global data available to train the AI. According to this we're at 10+ Zetabytes versus < 0.1 in 2005.

https://media.licdn.com/mpr/mpr/p/3/005/078/00c/0745630.jpg


This should not be overlooked.

From 1960 to 2000, AI was "brute force" goal and optimization problems. Having mountains of data for anything you want turns it into a "dumb but accurate" pattern matching problem.


Don't forget storage! I'm still stunned I can get an 8 Tb drive for $250.


I believe the NN/ Deep learning renaissance started around 2011 -- iirc Alex net was released in 2012 and the famous Google's paper on detecting cat on youtube also around that time.

It seems like the drop of costs from 2007 to 2011 is higher than the previous period. What happened in 2007? Or was it just a slow down during the start of 2000s ?


The drop 2011 was due to GPUs taking over. (See my source: https://en.wikipedia.org/wiki/FLOPS#Cost_of_computing ) I believe the gaming industry has (indirectly) heavily subsidized AI's progress.

Nevertheless prices still fell by a factor of 16 between just 2000 and 2007. So the 2,000's weren't stagnant.


More like 2006, although there were convnets and other neural nets working well long before then of course. It just took several years for the techniques from the papers in the scientific rivial to make their way into products. The vision community was quite resistant to it, even more than the speech community.


That's counting the GPU processing power, which is not so easy to use. No way the CPU FLOPS have become 30x cheaper since 2011 (its increase really slowed down since then).


There have been a lot of smartphones sold since 2011.


That's kinda scary for AI, actually. We're plateauing..



Where's the plateau here?


Nice chart. I call this exponential drop "cost gravity". 50% every 18 months, more or less.


It's about 13 months per halving. It seems to be consistent whether I start at 2000 or 1961.


TLDR from article which supplement's eli_gottlieb's comment:

"Much of today’s A.I. boom goes back to 2006, when Amazon started selling cheap computing over the internet. Those measures built the public clouds of Amazon, Google, IBM and Microsoft, among others. That same year, Google and Yahoo released statistical methods for dealing with the unruly data of human behavior. In 2007, Apple released the first iPhone, a device that began a boom in unruly-data collection everywhere."

The combination of smartphone + cloud created a virtuous (for AI, that is) cycle of unending data collection, which fed the improvements in theoretical research models which needed data at a larger scale to be validated.


I remember doing AI work in 2005-2007 (genetic programming and similar things) and one of the hardest thing back then was that good quality training data sets were really hard to come by. I can second the idea that data makes a huge difference, and there's tons of it today.


AI is moving fast. Faster than it has in a long time. In research circles, "supervised learning", or data classification problems, are considered solved. Consider the magnitude of that statement: If we have sufficient labeled data, we can predict those labels accurately on data we haven't been exposed to: fraud, faces, diseases, you name it.

AI is moving fast for three reasons, which Andrew Ng summarizes neatly: 1) We have more data than ever before (some of which has been organized in fabulous datasets like ImageNet), much of which is being generated online or by sensors. 2) We have more powerful hardware than ever before -- this is the combination of distributed computing with GPUs. You could say that AI advances at the speed of hardware, or at least is limited by hardware advances. Brute force AI like deep learning is the main beneficiary. 3) We have seen a steady stream of algorithmic advances for years. Specialists despair of keeping up with the literature, research in AI is so feverish.

Others have pointed out additional factors: open-source projects that lower the barrier to entry for the algorithms; cloud computing services that open up access to the hardware. The choke right now, and what's slowing wider adoption, is skills. Many companies don't have the teams to implement AI well.

Some of those companies are also the noisiest about selling AI. I have deep doubts about some vendors to deliver on the hype they're encouraging. And I believe that will lead, not to an AI winter, but to a poisoning of the well that the AI sector will have to address for years to come.

But overselling and hype are inevitable symptoms of real advances in tech, and in some ways, the tech is advancing faster than the hype can keep up with, precisely because the people hyping it, whether salesmen or journalists, don't understand the true extent of what's going on.

AI is just math and code. In a sense, you could say we've entered the age of Big Math, which is the next stage after Big Data. The math is necessary to process the data and determine its meaning. The math takes the form of massive matrix operations that can be processed on the parallel calculators known as GPUs. To call it statistics, as the reporter of the piece does, is only partially true. The math involved in AI comes from probability, calculus, linear algebra and signal processing. It's more than fancy linear regression. And it's definitely more than just making predictions about customer behavior, however attractive that is for industry.

Asking why now about AI software is like asking why now about cars after the Model A came out. Because it's there, it's faster than horses, and it makes you look cool. Like cars or any other powerful technology, AI is part of a race, and that race is taking place between nations and companies. You can decide not to adopt AI, the same way newspapers decided to ignore the Internet, or the way the French decided to fight German panzers with mounted cavalry in WWI. There really is not choice whether or not to adopt AI-driven software. It's a question of when, not if. And for many companies, the when is now, because later will be too late.

To give one example of how fast it's moving: Deep learning has been widely thought to be uninterpretable, or without explanatory power, but that is changing with cool projects like LIME: https://homes.cs.washington.edu/~marcotcr/blog/lime/ Which is to say, for some problem sets, we'll be able to combine the impressive accuracy of DNNs with the reasons why they reached a given decision about the data.

On the hardware front, NVIDIA and Intel are racing to build faster and faster chips, even as startups like Wave Computing or Cerberas come out with their own, possibly faster chips, and Google works on TPUs for inference.


"the way the French decided to fight German panzers with mounted cavalry in WWI"

Quick note: I think this is historically inaccurate. In World War One, the Germans had essentially no tanks, unlike the British and French. In World War II, there's a popular rumor that Polish (not French) cavalry forces charged German tanks at Krojanty, but this is itself a myth (the cavalry attack was against infantry, tanks arrived afterwards).


>> In World War II, there's a popular rumor that Polish (not French) cavalry forces charged German tanks at Krojanty, but this is itself a myth (the cavalry attack was against infantry, tanks arrived afterwards).

Not just any cavalry- Polish winged hussars:

https://en.wikipedia.org/wiki/Polish_hussars

Obviously, they could totally take a tank single-handedy, but they have now ascended into heaven to party with the Valkyries.

(It is known.)


I did not know that about the cavalry charge occurring before tanks arrived (it is a little suicidal so is surprising)


Sorry, that should have been WWII, you're right. Anyone who wishes to know more about France's lack of preparation for both wars should read the French historian Marc Bloch, who fought in both and died in the resistance. L'Etrange defaite is his great work.


> It's more than fancy linear regression.

I am not an expert, but it seems to me that "big data algorithms" are actually simpler and require less ingenuity than classical statistics. Just look at a difference between frequentist and bayesian approaches in statistics. In frequentist methods, you need to choose model wisely, you're constrained what you can model, and there are many different models with different assumptions. With bayesian methods, you just represent any distribution somehow and do the calculation on those. Less assumptions for the models, more general models, but at the expense of more computations.


Because it's been just about 10 years since they figured out how to:

1) Use convolutional layers, ReLUs, and a few other tricks to ameliorate the vanishing gradient problem,

2) Perform continuous, high-dimensional stochastic gradient descent on graphics cards, and

3) Apply these things via stochastic grad-student descent to sufficiently massive datasets that even the most brute-force models and training methods (backpropagation of errors on a loss functional) can actually work.

In the meantime, the hardware for doing it has become commoditized and the software has consolidated and become standardized. So now it's A Thing in industry.

There are lots of "smarter" algorithms that almost definitely come closer to human cognition, for instance probabilistic program induction. But those aren't fast, and don't always neatly separate training from prediction: you're just not gonna be able to train those models ahead-of-time on a 10,000 image corpus inside a single week with today's hardware and software.

We need to find ways to make machine learning fast even when it's not just a bunch of matrix multiplies. Otherwise, every time we make our algorithms more interesting, we cripple ourselves computationally.


I'm concerned about something similar, where a lot of AI techniques seem to be rushing toward a local maximum:

* AI researchers do things that get good results faster out of Nvidia cards

* Nvidia makes their cards faster at the things AI researchers are doing

It's getting good results. We sure are going up this gradient quickly. But I don't think it's going to get us to a global maximum.


Human consciousness itself is probably a local maximum. Not necessarily a bad thing to see how far a certain process can get you as long as you keep getting results.

At least until there's some other insight along the way.


If not countered, we have a genetic algorithm that may attempt to move our far descendants out of the local maximum. Plus we can change the environment, both locally and just move into space.


I agree w/ you about moving up a gradient quickly w/ the "GPU manufacturing <-> deep learning research" feedback loop. I think it could last a while though. One really important area of research is figuring out how to take better advantage of greater capacity. Also, how to do more with fewer training samples (0 shot, 1 shot, etc learning). Then there's reducing the precision of the units you're using to increase capacity through software. Applying these algos to video, audio, media generation, and others will eat up all the resources you can throw at it; the algos today could take advantage of larger capacity when applied to time-series. There's so much going on that I don't see it slowing down for at least 3-5 years.

Also, I'd like to point out that we've seen some big breakthroughs in the past 10 years. But for the past 10 years, the whole field of deep learning has been looked at with skepticism and has been very niche. Over the past couple of years, a lot of money and resources have been put in place to pursue this area of research. More money doesn't necessarily mean more results, but there are many many more people working on these problems than ever before. A lot of them are legitimately brilliant researchers in the prime of their careers. I think there's still more to see.

I am concerned about an Nvidia monopoly around deep learning hardware. They give away tons of free cards to deep learning research groups, but at some point they'll want people to start buying. I assume they expect that will be the enterprise set, but if suddenly they manage to move all their capable deep learning cards to Teslas only (which have a huge markup), it will put the hobbyist deep learning developer at a big disadvantage. The only check and balance on that is the fact that Nvidia makes cards for gamers too, and gaming card competition is still somewhat robust, so any technology that gives enterprise a big boost will probably make its way to their flagship gaming cards quickly. Nvidia's only real competition, AMD, is so far behind that they might as well not be in the business. As someone who usually roots for the underdog, it pains me to see AMD fumble so badly in this whole area.

Quantum computing could be a new hill, but I think that's a ways out and I don't know enough about the topic to speak with any real confidence.


As someone who does (non-NN) ML on AMD, I might ask why you think that AMD is so far behind. In my experience, their GPU hardware is excellent, maybe even better than Nvidia, especially if you factor the price in. Where they ARE lagging heavily is:

1. GPU as a service: While all major providers (AWS, Azure) offer Teslas on their servers, there is no AMD on the cloud (that I know of)

2. Key libraries: Nvidia comes with matrix libraries and cuDNN out of the box, while for AMD, there are only open source offerings that are a bit difficult to manage.

But, If you write your own software or rely on open source, AMD is quite performant and affordable. The problem is that it is really obscure. So, yes, they are really bad at marketing and if you are looking at them as a user instead of as a developer, they are invisible.


I think I agree with everything you just said.

AMD hardware is more or less the same as Nvidia; Nvidia doesn't really have any special sauces or patents that make their cards better than AMD. And you're right: in a lot of benchmarks, AMD is either faster or (at least) better-bang-for-buck than Nvidia.

Yet AMD doesn't have community support in software or cloud adoption or key libraries. The deep learning community has gone for what's easiest to work with and most readily available. Even though you might conceivably get some performance gain from AMD, it's outweighed by the amount of software that's been written in CUDA already, and how quickly that allows you to move.

Nvidia has taken an early lead and jumped on it, while AMD has only fallen further behind. I think one company just fundamentally understands the trend more than the other. But, for what is essentially hardware that answers the question, "how fast can you do matrix multiplication?", library support becomes a key differentiator. And that's where AMD is behind.


The key libraries part has kept me wondering. How much would it cost for AMD to assign a handful heavy duty engineers to this task (writing AMD optimised kernels for convolution etc)?

Their management has been fast asleep for at least 2 years


That is a part of the problem: they assigned people for the task, and produced open-source libraries for matrices, FFT, maybe even something for DNNs. But, those are not polished much, and you have to hunt them down and install them yourself. And, they do a really bad job at marketing.

On the other hand, finding and installing those libraries is nothing compared to actually developing GPU computing software, so, as I said, if you want to program GPUs, even such scattered state of AMD platform is not that worse than Nvidia. Because, in Nvidia, you install CUDA and you get everything set up. And then - what do you do? You still have to learn a not-so-easy black art of optimizing kernels for the actual hardware.


I haven't seen CuDNN equivalents (in terms of perf) for common machine learning frameworks from AMD. I don't think they exist, if they did, people would shift to using AMD.

For NVidia I have seen some faster kernels than the ones supplied by NVidia - https://github.com/NervanaSystems/neon - though CuDNN introduced Winograd kernels too in their last update


I do not know about the quality of this since I do not use NNs, but there is https://github.com/hughperkins/DeepCL and I think I have seen others.


AMD doesn't care about Deep Learning, or ML at all. To quote:

And in the case of the dollars spent on R&D, they seem to be very happy doing stuff in the car industry, and long may that continue—good luck to them. We're spending our dollars the areas we're focused on[1]

To be clear, stuff in the car industry is "just" stuff like self driving cars[2][3].

[1] http://arstechnica.co.uk/gadgets/2016/04/amd-focusing-on-vr-...

[2] https://blogs.nvidia.com/blog/2016/05/06/self-driving-cars-3...

[3] http://www.nvidia.com/object/drive-px.html


> We sure are going up this gradient quickly. But I don't think it's going to get us to a global maximum.

Yes, however, this process has the nice side effect of spurring a huge interest in fundamental AI research, which will lead to alternative algorithms that ultimately perform better.


Wait, how? It seems to me that it gives people an easy way to focus on one area of AI, saying "Guess we didn't need fundamental AI research after all".


Convolutional layers have been around longer than that and don't have much to do with the vanishing gradient problem, which was never that big a problem to begin with. ReLUs are helpful, but they do not lead to THAT big of a speedup. Convolutional layers have definitely had a big impact for accuracy in image problems and sound problems. I think #2 is very important; they speed up both training and evaluation by a factor of 100 compared to a modern CPU.

None of these developments really explain why AI has become so popular though. Neural nets don't actually perform very well for most problems (image and speech problems being the major exceptions) compared to tree-based methods. I think it has more to do with the development of easy-to-use machine learning libraries as well as distributed processing libraries and hardware.


Consciousness of Things is an inevitable conclusion for humanity. The singularity is upon us with the advent of man merging into machine. We are driving that growth to go to the next level.

Also, containerization.


Most of what we call 'AI' today is just fancy regression methods. No need to be so grandiose.


I'm pretty sure that comment was completely facetious. Note that The Singularity got lumped together with Docker containers.


Don't forget CoT vis-à-vis IoT.


I lol'ed. Thank you for this.


>Convolutional layers have been around longer than that and don't have much to do with the vanishing gradient problem, which was never that big a problem to begin with. ReLUs are helpful, but they do not lead to THAT big of a speedup. Convolutional layers have definitely had a big impact for accuracy in image problems and sound problems.

They're essentially an informed prior on image and sound problems, which vastly reduces the size of the parameter space and thus makes both vanishing gradient and overfitting less of a problem.


> they speed up both training and evaluation by a factor of 100

The factor is more like 5-10x for truly optimized CPU versus GPU in fp32 (e.g., nnpack on CPU versus cuDNN).


I'm basing that number on https://github.com/jcjohnson/cnn-benchmarks. I'm sure the exact number depends on the cpu/gpu/algorithm.


im2col + sgemm on CPU as in Caffe for instance is really slow; you are heavily penalized for extra memory traffic and the sgemm tile sizes are probably not well tuned for the problem size at hand.

At the roofline of performance, the difference in both mem b/w and arithmetic throughput between CPU and GPU is only 5-10x (for fp32, Pascal fp16 is a different story of course), and proper implementations on the CPU will get you there.

https://github.com/Maratyszcza/NNPACK


nnpack is excellent - but also very new. Noticed it just got NEON optimisations for Arm- do you know if it's possible to use it together with OpenCL on mobile for combined CPU/GPU inference?


I'm definitely going to use that stochastic grad student descent line.


It's a standard-issue joke at this point ;-).


> There are lots of "smarter" algorithms that almost definitely come closer to human cognition

Source?



4) Frameworks and libraries have improved a lot and are available as open source. Having a lower barrier of entry helps.


I've been thinking a lot about this lately, especially as I've been learning more about the difference between "nice" optimization problems with differentiable cost functions, and "messy" optimizations with less ideal cost functions.

Is there any reading material you can recommend on the recent history of ML/AI for someone like me, with a dilettante's understanding of these problems and an interest in optimization?


https://arxiv.org/abs/1404.7828

(Disclaimer: I don't know that much about ML/Deep Learning, the above was recommended by deeplearning.net and is what I'm slogging through ATM. I do have some background in "classical" statistics & optimization theory, and the above article is not that math-heavy)


Thank you!


Quantum computing can technically create a complete paradigm shift as it increases to computing power drastically. well, in theory at least


I'm still not comfortable calling the thing that is booming "artificial intelligence". It is mostly pattern recognition and classification. Intelligence is something else.


ISTM you may have found the answer to TFA's question. In the early days of AI, they were looking for actual intelligence. Eventually everyone realized that won't be found soon. Today's success is mostly a successful refocusing on attainable heuristics, rather than solving many problems related to actual intelligence.


Love it. So we today have "A.I." and hopefully someday we may finally get A.A.I.

So what happened to Machine Learning and Computational Intelligence? Not sexy enough?


The original goal of AI is now called AGI, for "artificial general intelligence".


Yes, "signs of intelligence" include creativity, pedagogy, ..., and of course, getting bitten by the bug of existential angst.

Accept no subsititute.


It seems pretty likely that we will start to see "General AI" discovering problematic things like the unsolvable nature of ethics questions, the ungroundedness of truth claims, the immense silliness of religions and ideologies, etc.

Which might be a good thing. Terrifying visions of AI are all about certainty and the authoritarianism it creates.

If existential depression is a mental friction from the lack of certainty about what to do, then some measure of it is probably necessary...


>Which might be a good thing. Terrifying visions of AI are all about certainty and the authoritarianism it creates.

Indeed, may the gods protect us all from some things actually being true and other things actually being false. That would be terrible!


(I sense/assume a missing /s in your post, Eli.)

The objection here would be that entertaining that we can assert T/F of all propositions, given results of halting problem [computation], incompleteness [formalism], and uncertainty [physics], is unreasonable.


>The objection here would be that entertaining that we can assert T/F of all propositions, given results of halting problem [computation], incompleteness [formalism], and uncertainty [physics], is unreasonable.

This displays a radical misunderstanding of the phenomena mentioned. The Halting Problem and logical incompleteness are the same thing underneath, and while they do hold in all formal systems, this never actually matters for non-meta-level mathematics. Basically any theorem about a structure we actually care about will be sub-Turing-complete, and with modern type theories, we can tear down inductive types and rebuild them with stronger axioms when we need to. As a result, some self-referencing theorems are true-but-unprovable, but we never actually need those theorems.

Uncertainty in physics is either just probabilistic (in the case of typical experiments), or, in the special case of Heisenberg uncertainty... no actually, that's just probabilistic imprecision too. That's what Heisenberg's inequalities actually say: "the product of the standard deviations of these measurements must always be at least this much."

Nowhere are we encountering the kind of radical, existential "we can't know anything and had better give up" uncertainty without which /u/mbrock seems to think we will all fall into political authoritarianism. He's been reading too many liberal philosophers of World War II, or maybe just watched that BBC thing "Dangerous Knowledge" and took it seriously.


It seems irrelevant that, as of now, "any problem we care about" is sub-Turing-complete. We're still taking baby steps as a species and the relevant timelines in context of this thread are futuristic. Agreed?

The point is that to maximally insist that "we can't know anything", and/or, "we can know everything" are both equally unreasonable positions.

> Nowhere are we encountering the kind of radical, existential "we can't know anything and had better give up" uncertainty without which /u/mbrock seems to think we will all fall into political authoritarianism. He's been reading too many liberal philosophers of World War II, or maybe just watched that BBC thing "Dangerous Knowledge" and took it seriously.

S/he can address your concerns -- that was not my intent, but hopefully you agree that unreasonable insistence on maximalist positions inherently carries the danger of "political" authoritarianism.

Your practical position reminds of a purported exchange between Wittgenstein and Turing, per Hewitt [ref]. I am sympathetic to it. But reminder again that my initial comments here were in context of intelligent machines. In fact as I am writing this little note, I am entertaining the thought that a veritable intelligent machine may in fact review the 3 (methodological) constraints noted, and shrug it off as practically unimportant. Or it may become alarmed. :)

[edit/p.s. ref:http://lambda-the-ultimate.org/node/4302]


>It seems irrelevant that, as of now, "any problem we care about" is sub-Turing-complete.

It's not irrelevant, it's a question of how you view the Church-Turing-Deutsch thesis: can hypercomputation occur in the physical world? If it can, then why and how are we somehow blocked from utilizing its physical manifestation to reason about the relevant questions of real-world events? If it can't, then why aren't systems with finite, if large or even growing, Kolmogorov complexity thus sufficient for all reasoning about the real world?

>S/he can address your concerns -- that was not my intent, but hopefully you agree that unreasonable insistence on maximalist positions inherently carries the danger of "political" authoritarianism.

My objection has been precisely that to insist on radical ignorance has led to an obnoxiously enforced liberalism.

>Your practical position reminds of a purported exchange between Wittgenstein and Turing, per Hewitt [ref].

I think Turing is just plain wrong here. Real mathematics did not originate by receiving ZFC with first-order logic as a revelation at Sinai and extrapolating theorems and structures from there! It began by formalizing ways to reason about the real world around us. When those original informal methods became insufficient in the foundational crisis of the 19th century, then mathematicians started inventing foundations to unify everything without paradox.

Notably, someone in the comments thread then mentions Curry's Paradox, which I looked up and found surprisingly underwhelming. Curry's Paradox is a perfect example of what Wittgenstein called a meaningless language game! Classical implicature X -> Y isn't always equivalent to the existence of a causal path "X causes Y", but natural-language implicature mostly talks about causal paths, so conflating the two in symbolic logic derives a "paradox".

>In fact as I am writing this little note, I am entertaining the thought that a veritable intelligent machine may in fact review the 3 (methodological) constraints noted, and shrug it off as practically unimportant. Or it may become alarmed. :)

I don't think that a machine limited to reasoning within a single mathematical structure or foundation can really qualify as "intelligent" in the human sense. Logical foundationalism is the wrong kind of thing to constitute a capacity to think.


I never meant to imply "we know nothing, we had better give up."

What I meant was that the dystopian imaginations of AI (Terminator, paper clip maximizers, etc) involve the machines acquiring a strong moral certainty that doesn't ever pause for doubt.

If I've been reading too much of somebody it's probably Richard Rorty.


>What I meant was that the dystopian imaginations of AI (Terminator, paper clip maximizers, etc) involve the machines acquiring a strong moral certainty that doesn't ever pause for doubt.

Ah. Well, considering that acquiring such a thing wouldn't really work out mathematically, fair enough.


The encounter with doubt need not be a terminal state. If it is indeed "intelligent", it may proceed to enlightenment.


I will not accept any substitute. If there is no existential angst, I'll not accept referring to it as intelligence.



There's a similar saying I like (don't know where it's from, if anywhere)

"Artificial intelligence is a group of problems that we don't know how to solve. As soon as we know how to solve one, it gets a name and is no longer AI."


People say the same thing about philosophy. That once something is well-understood, it becomes a science rather than philosophy.

I suppose this could be the same thing.


I always thought it was more once it got big enough.


> it gets a name and is no longer AI

And that name is typically "weak AI". I wonder if we'll ever have AI that we marvel at even after we figure out how it works.


To be fair most of human "intelligence" seems to be pattern recognition and extrapolating data from those patterns. My guess is that if you made a robot that is really, really good at recognizing and extrapolating from arbitrary patterns and then gave it some intrinsic goals then you are pretty much at "true" AI.


Intelligence is not mostly pattern recognition and classification? I can't imagine what you'd think it could be that wouldn't ultimately be called both of these things.


Output - the optimization of behavior. Though we're doing pretty well in that regard too...


Intelligence is the ability for and speed of processing new information, at least as it's measured by metrics like IQ.

Sometimes AI/ML is directly modeled after human/animal thinking (decision trees, neural networks, reinforcement learning) and sometimes it's not (Bayes' theorem, regression, general stats). What constitutes intelligence still boils down to being able to correctly and quickly process new incoming information based on prior information. I believe it definitely qualifies.


The term AI has been used too much and has lost its meaning. When you mean "real AI", you have to refer to it as AGI.


We don't marvel over the ATM reading a handwritten paper check correctly. That's a considerable achievement.


From the first time it started doing that, I've been amazed that that works so well in a production system. Every. Time.


The first time I saw it, I thought they had people in some call center doing the reading. But they don't, at least not often.

The US Postal Service used to have 55 centers where humans tried to read envelopes that the machines couldn't. They're now down to one. "We get the worst of the worst. It used to be that we’d get letters that were somewhat legible but the machines weren’t good enough to read them. Now we get letters and packages with the most awful handwriting you can imagine."

[1] http://www.nytimes.com/2013/05/04/us/where-mail-with-illegib...


very interesting

>> equipment that can read nearly 98 percent of all hand-addressed

that number must be pretty outdated


1. Availability and accessibility of large amounts of training data. Without this training and validation is expensive if not impossible. Now if you don't have the data you can acquire it yourself. Leading to ...

2. Computational speed & storage upgrades. This applies largely to physical, time-critical things like automated driving. The self-driving car could have had all the data it needed in 1980 to do its thing, but required fast computers and lots of data storage to do it safely in real time in a feasible commercial product.

3. Advancement of algorithms. Fervor and excitement around AI/ML has been on a slow but perhaps exponential burn. This has led to the refinement of algorithms that largely sat dormant from the late 80s (and earlier) until fairly recently. This also means lots of open source libraries for people who wish to implement without caring about the underlying mechanisms behind the algorithms.

These things are leading more people to dabble recreationally and commercially.


Ready access to high-quality usecases and training data, along with shared knowledge of the methods that work well on these helps: https://www.kaggle.com/competitions?sortBy=numberOfTeams&gro...

(disclaimer: I work at Kaggle)


I thought we are already over that cliff of becoming forgotten. Yes, we didn't call it AI 2 years ago. But has there been such a big technological change since then with the googles, facebooks and twitters of this world? I think the problem is that it is actually so transparent that you really can't see it when it is applied to you.


"Democratizing AI" is the word being thrown around a lot by CEOs, PR, et al. But only time will tell whether it's true democracy.


As far as I know, the concept of "democratizing AI" comes specifically from the OpenAI initiative, which intends to make new developments in AI technology broadly available and not tied to any single company or entity. This in order to ensure that no single entity has control over how this technology can be used.

Given that Google, Apple, Facebook and others have vastly more data than any independent project, and therefore have stronger AI (currently limited to e.g. speech recognition, image recognition and other low-impact applications), the state of democratization of AI by this measure is poor today.


If a CEO says the word 'democratisation' in relation to almost anything you can be guaranteed that the end result will have nothing to do with enhancing democracy.


Some how machine learning got rebranded and now it's important... and major products are putting prediction into the ui more.


Such vacuity


So, the influx is due to a plethora of things, which are not all mutually exclusive:

1) The Internet has indeed produced large data sets which allowed statistical AI approaches to flourish opposed to logical AI approaches.

2) Somewhat better algorithms. However, I'd say that the algorithmic development hasn't had as much progress as the comments elude. We've only had a few notable innovations like NMT in the past ~20 years.

3) Computational Power to run the algorithms, so we can perform more experiments on large data sets, more computationally intensive algos, and induce better hypotheses.

4) Libraries like scikit-learn and keras have "democratized AI". Grad students used to implement algorithms themselves in the 2000s; now middle-schoolers are doing ML with the tooling already available.

Those are basically it. I think (2) could even be taken off, because again: learning algorithms have barely changed IMO.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: