Hacker News new | past | comments | ask | show | jobs | submit login
Learning to Learn in TensorFlow (github.com/deepmind)
266 points by espeed on Jan 2, 2017 | hide | past | favorite | 18 comments



"Learning to learn by gradient descent by gradient descent"

https://arxiv.org/abs/1606.04474


Other equally exciting papers that relates to learning to learn in DL.

"Neural Architecture Search with Reinforcement Learning"

https://arxiv.org/abs/1611.01578

"RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning"

https://arxiv.org/abs/1611.02779

"Designing Neural Network Architectures using Reinforcement Learning"

https://arxiv.org/abs/1611.02167


another one: Hyper-Networks https://arxiv.org/abs/1609.09106

and a blog post to go with it: http://blog.otoro.net/2016/09/28/hyper-networks/


This is why I love hackernews. Reading this tonight, thanks Eric :)


Another one:

'Learning to reinforcement learn'

https://arxiv.org/abs/1611.05763


What's a good explanation of Tensor Flow for someone living under a rock? I dismissed it as some machine learning library, but I read it is in fact a general computing framework. If I can use it for things like numerical integration or some numpy-type tasks, that would be interesting.


Tensorflow does general computation using data flow graphs; you assemble your graph from operations and variables (tensors, as they're called these days), and Tensorflow handles distribution of this computation over hardware which you make available.

A quick google gives these [1] impressive results for Tensorflow, at least for linear algebra operations.

Despite the advantages, I think you'll find many more readily available functions in Numpy for what you want, while Tensorflow remains quite 'low level', exposing building block operations rather than higher-level methods (the exception is machine learning/neural network stuff). That said, I don't imagine it would be too difficult to implement a fast quadrature method for integration, or whatever else your heart might desire. This [2] is a simple example solving a PDE.

-----

[1] https://simplyml.com/linear-algebra-shootout-numpy-vs-theano...

[2] https://www.tensorflow.org/tutorials/pdes/


Video related to library...

Nando de Freitas - Learning to Learn, to Program, to Explore and to Seek Knowledge (NIPS 2016)

https://www.youtube.com/watch?v=tPWGGwmgwG0


Same topic presented at KDD, with timestamp that cuts to the results.

https://www.youtube.com/watch?v=x1kf4Zojtb0&t=1h8m46s


Can someone explain to me the benefits of using TensorFlow over Theano?


I've heard that Tensorflow is built to take advantage of multiple GPUs automatically, whereas Theano (by default at least) can only make use of a single GPU.


"Automatically" isn't the best word, as TensorFlow won't make use of multiple GPUs unless you explicitly tell it to (at this time). That said, there are a number of benefits to using TensorFlow (including the ability to use multiple GPUs, if not automatically :) )

- Several common gradient optimization algorithms (Momentum, AdaGrad, AdaDelta, Adam, etc) are implemented already, which makes it a bit faster to get your training logic in place

- Going along with the above, there is more in the TensorFlow API focused specifically on training models, as opposed to being purely a math engine. Some might consider the extra funtionality "bloat", but I think it serves a good purpose

- The afforementioned multi-GPU functionality is nice, once you get used to it. It's good for either training multiple versions of a model in parallel or doing data parallel updates of parameters

- There are tools for compiling your trained models as static C++ binaries on mobile devices

- The TensorFlow ecosystem is quite nice: TensorBoard for visualizing training, the topology of your model, and various statistics (most recently visualizing projections of embeddings). TensorFlow Serving for deploying trained models. TF Slim for a more Keras-like layer by layer approach to model building. Several pre-trained models to jump start your own work.

- No compile times. There is a "no optimizations" option in Theano to remove the compilation, but many people's experience with Theano is having to wait to iterate on their code.

- I think the community is pretty swell too :) The Google team does a good job of responding to and working with folks who open issues or PRs

Generally, I'd say TensorFlow is really good when you want to minimize the amount of time between researching, training, and deploying your model.

Edit: line formatting


> - There are tools for compiling your trained models as static C++ binaries on mobile devices

I'm looking for such tool but I haven't found anything apart from C++ libraries that also focus on training. Can you give me some pointers? Thanks.


In the contrib folder in the TensorFlow project, you'll find the makefile subdirectory:

https://github.com/tensorflow/tensorflow/tree/master/tensorf...

The readme has a general overview of how you'll approach using it. Note that you'll want to optimize for inference (remove unnecessary operations from the graph) [0] and freeze your graph (convert Variables into constant tensors) [1] to drop in your own model for the pretrained Inception model that's used as an example.

[0]: https://github.com/tensorflow/tensorflow/blob/master/tensorf...

[1]: https://github.com/tensorflow/tensorflow/blob/master/tensorf...


Actually, Theano supports multiple GPUs as well: http://deeplearning.net/software/theano/tutorial/using_multi...


From the related article:

> The move from hand-designed features to learned features in machine learning has been wildly successful.

Are the features here the "feature vectors" or the network architecture? Or something else? In other terms, does this project help normalizing data, or does it help tweaking hyper parameters?


Here the features are the feature vectors themselves, yes. It's been found that taking somewhat of a hands-off approach and allowing networks to engineer their own mid-level representations from raw data can be very beneficial.

This is the idea behind the learning to learn paper. Instead of taking our gradient and plugging it in to a hand-engineered (i.e. on paper) update rule, we feed it to a neural network, which is trained to find the optimal update rule, in some sense (neural networks are just function approximators after all).


The point of the original paper was to learn the hyperparameters of a DNN using a DNN, as opposed to using, say, a bayesian optimization framework.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: