Ersatz - Deep neural networks in the cloud

bravura · on Jan 17, 2013

Recommendation: Reach out to my colleague James Bergstra, and build out automatic hyperparameter selection. This will make your offering work off-the-shelf, which is what is necessary for it to see wider adoption.

Why? The real pain in the ass in training a deep network is the hyperparameter selection.

What is your learning rate? What is your noise level? What is your regularization parameter?

Choosing these values is a far bigger pain than almost everything else combined.

Doing a grid search is intractable. Random hyperparameter search is better. You can use a sophisticated strategy, like Bergstra et al have proposed.

ninjin · on Jan 17, 2013

I agree that the hyper-parameter selection is a huge pain, personally though, I am more familiar with the work of Snoek et al. [1] from NIPS in December last year. He even distributes a neat Python package that will perform Bayesian optimisation combined with MCMC [2] so that even people, like me, that are not yet familiar with Gaussian Processes can deploy it easily.

[1]: http://arxiv.org/pdf/1206.2944v2

[2]: http://www.cs.toronto.edu/~jasper/software.html

pilooch · on Jan 17, 2013

Link to the paper: http://jmlr.csail.mit.edu/papers/v13/bergstra12a.html

dave_sullivan · on Jan 19, 2013

Yeah, it's a really good point.

I haven't played with automatic parameter selection much (but have been seeing more papers on it recently) so I hadn't really considered it all that closely.

While I'd like to give people a fair amount of control over model parameters if they want, it probably is very important that I make things as turnkey as I can. Shouldn't be too tough to hack something together and make it an option during training.

While I'm trying to start things off relatively simply, the overall goal really is towards allowing people to create models that act as parts of much larger systems, maybe larger neural nets themselves. A sort of genetic algorithm that spawns new neural networks with random parameters and random connections to previous networks could be kind of neat, and making the base elements of those types of architectures (a single fully connected deep net, for example) easily accessible is a first step towards that goal.

tlarkworthy · on Jan 18, 2013

Presumably if you have a GPU backed cloud DBN. Hyper parameter selection is faster than one param per day. Also how to you choose the parameters to the hyper parameter tuner? I am never convinced these things work given no free lunch theorem.

tlarkworthy · on Jan 17, 2013

OMG I have been waiting for something like this. Deep Belief Networks have been smashing machine learning records in jsut about every domain. The only problem was that they were annoyingly slow to converge, and hard to program/debug

see Hinton's google code slides for more info on how powerful these things are:- http://www.youtube.com/watch?v=AyzOUbkUf3M (that's 2007, things are even spicier now)

deadairspace · on Jan 17, 2013

That was a great talk, and an impressive demo of feature generation.

RyanZAG · on Jan 17, 2013

Little bit confusing on what this actually is. Is this

1) Cloud GPU computation where you upload some special model code that is run on the neural network? ie. your own code

2) Upload data and run some pre-specified models on it, such as in the example you have a '-d model=spanish_speech_recognizer' - in which case the offering is all about how many and how good your pre-defined models are.

The two different use cases are for completely different target audiences.

dave_sullivan · on Jan 17, 2013

Sure, I see the confusion.

So basically, you bring the data, pick the neural network architecture you want to use, and set its parameters. The model trains on the data you've given it using a GPU cluster (which still takes a while)

'spanish_speech_recognizer' is the name of the model you just trained, where 'MRNN' is the actual architecture (a multiplicative recurrent neural network as described in http://www.cs.toronto.edu/~ilya/pubs/2011/LANG-RNN.pdf) used in the example.

So the models themselves aren't pre-defined, but the architectures you can use are. You can play with a lot of different parameters (if you want), but you don't have to worry about optimizing the code for GPU or making sure your implementation of the algo itself is correct. At least that's the idea.

snippyhollow · on Jan 17, 2013

Just curious, are you using an existing library like Theano, Torch or some other GPU-enabled lib?

dave_sullivan · on Jan 17, 2013

Theano mostly--it really is good at what it does

hosh · on Jan 17, 2013

Thanks for explaining it this way.

fiatmoney · on Jan 17, 2013

Looks like you spec some parts of the node layout & maybe types, and it fits that spec against your data. So no support for fully custom layouts / fitting methods, but you can parameterize with for instance more or less layers or number of hidden units.

Parameterizing & fitting these networks (both in having a good underlying representation, and deciding which actual parameterization to use) gets tricky & requires some domain knowledge when you start doing things like conditional or continuous RBMs.

tmadar · on Jan 17, 2013

Very well explained. Hope you don't mind that I pimped your comment on my personal blog http://www.blogoftravis.com/

gojomo · on Jan 17, 2013

Does this, and the Google Prediction API, and similar emerging offerings, herald the beginning of an "AIaaS" (Artificial Intelligence as a Service") market?

ghc · on Jan 17, 2013

I'd like to see a post-mortem for this submission one the traffic dies down. Most startups these days feature flashy designs, and I've always wondered how much it really mattered. This design is straightforward and not flashy at all, so I'd be interested in seeing the conversion rate of an "academic-looking" design.

benmccann · on Jan 17, 2013

This seems like a very small market you're going after. It requires people to have good knowledge of deep neural networks (they have to choose the model, architecture, hidden units, multiplicative units, etc.) I think it would be more interesting and open things up to a wider audience if some of these parameters could be chosen for you.

cynusx · on Jan 17, 2013

machine learning in general is a domain for specialists, it won't most of the time if you don't know what you are doing. As for a small market, I disagree. there may be few people who understand it but these are the ones that are put in charge of trillions of rows of data to analyze too.

hosh · on Jan 17, 2013

This is true. What is also true is that the neural nets themselves have much broader market.

thechut · on Jan 17, 2013

The company I work for could be very interested in testing this service. I requested a beta invite and filled out the survey. Any idea when you might start letting people test it out / accept beta invites?

dave_sullivan · on Jan 17, 2013

Yes, most likely Monday. Are you with IOTworks? If so, I've got your survey and will be sure you're on the list.

Although I should add that the response so far has been way beyond what we thought it would be (which is fantastic!), so it may take some time to get to everyone. The beta literally just opened this morning, and I'm not ready to open up the product before working with beta users to polish it up.

mimog · on Jan 18, 2013

Isn't the CPU/GPU intensive part of working with neural networks the training of them? Once the network is trained to within an acceptable error-rate why would you need the cloud?

tluyben2 · on Jan 17, 2013

How about presenting some demos to see how it works and what it can do? I worked a bit with Theano and would like to see (video/tour) how this relates.

ajankovic · on Jan 17, 2013

Layout is broken for me http://imgur.com/Les2I

Latest Firefox on Ubuntu 12.04.

chetan51 · on Jan 17, 2013

Demo demo demo!

cloudshoring · on Jan 17, 2013

Can we process images and/or video as inputs to this service?

dave_sullivan · on Jan 17, 2013

sabalaba · on Jan 17, 2013

Are you hosting your own cluster of using EC2 GPU instances?

dave_sullivan · on Jan 17, 2013

Hosting our own for now--although weighing pros/cons of using AWS.

On one hand, AWS GPUs do seem a bit slower than bare metal, and it's theoretically (actually?) more expensive. Plus we get more control if we host our own.

But then again, running a data center is a problem that has been solved and I'm not sure we can create much value there. In practice, it will probably end up a bit of both, depending on demand.

sabalaba · on Jan 17, 2013

Hah do you guys just have servers sitting around or are you at least co-locating?

indrax · on Jan 18, 2013

Can we export the trained model back out and use it, or only query it with the API?