PyTorch – Tensors and Dynamic neural networks in Python

Smerity · on Jan 18, 2017

Only a few months ago people saying that the deep learning library ecosystem was starting to stabilize. I never saw that as the case. The latest frontier for deep learning libraries is ensuring efficient support for dynamic computation graphs.

Dynamic computation graphs arise whenever the amount of work that needs to be done is variable. This may be when we're processing text, one example being a few words while another being paragraphs of text, or when we are performing operations against a tree structure of variable size. This problem is particularly prominent in particular subfields, such as natural language processing, where I spend most of my time.

PyTorch tackles this very well, as do Chainer[1] and DyNet[2]. Indeed, PyTorch construction was directly informed from Chainer[3], though re-architected and designed to be even faster still. I have seen all of these receive renewed interest in recent months, particularly amongst many researchers performing cutting edge research in the domain. When you're working with new architectures, you want the most flexibility possible, and these frameworks allow for that.

As a counterpoint, TensorFlow does not handle these dynamic graph cases well at all. There are some primitive dynamic constructs but they're not flexible and usually quite limiting. In the near future there are plans to allow TensorFlow to become more dynamic, but adding it in after the fact is going to be a challenge, especially to do efficiently.

Disclosure: My team at Salesforce Research use Chainer extensively and my colleague James Bradbury was a contributor to PyTorch whilst it was in stealth mode. We're planning to transition from Chainer to PyTorch for future work.

[1]: http://chainer.org/

[2]: https://github.com/clab/dynet

[3]: https://twitter.com/jekbradbury/status/821786330459836416

PieSquared · on Jan 18, 2017

Could you elaborate on what you find lacking in TensorFlow? I regularly use TensorFlow for exactly these sorts of dynamic graphs, and it seems to work fairly well; I haven't used Chainer or DyNet extensively, so I'm curious to see what I'm missing!

Smerity · on Jan 18, 2017

When you say "exactly these sorts of dynamic graphs", what do you mean? TensorFlow has support for dynamic length RNN unrolling but that really doesn't extend well to any dynamic graph structure such as recursive tree structure creation. Since the computation graph has a different shape and size for every input they are difficult to batch and any pre-defined static graph is likely excessive, wasting computation, or inexpressive.

The primary issue is that the computation graph is not imperative - you define it explicitly. Chainer describes this as the difference between "Define-and-Run" frameworks and "Define-by-Run" frameworks[1].

TensorFlow is "Define-and-Run". For loops and conditionals end up needing to be defined and injected into the graph structure before it's run. This means there are "tf.while_loop" operations for example - you can't use a "while" loop as it exists in Python or C++. This makes debugging difficult as the process of defining the computation graph is separate to the usage of it and also restricts the flexibility of the model.

In comparison, both Chainer, PyTorch, and DyNet are "Define-by-Run", meaning the graph structure is defined on-the-fly via the actual forward computation. This is a far more natural style of programming. If you perform a for loop in Python, you're actually performing a for loop in the graph structure as well.

This has been a large enough issue that, very recently, a team at Google created "TensorFlow Fold"[2], still unreleased and unpublished, that handles dynamic computation graphs. In it they tackle specifically dynamic batching within the tree structured LSTM architecture.

If you compare the best example of recursive neural networks in TensorFlow[3] (quite complex and finicky in the details) to the example that comes with Chainer[4], which is perfectly Pythonic and standard code, it's pretty clear why one might prefer "Define-by-Run" ;)

[1]: http://docs.chainer.org/en/stable/tutorial/basic.html

[2]: https://openreview.net/pdf?id=ryrGawqex

[3]: https://github.com/bogatyy/cs224d/tree/master/assignment3

[4]: https://github.com/pfnet/chainer/blob/master/examples/sentim...

PieSquared · on Jan 19, 2017

Ah, fair enough, I see your point. An imperative approach (versus TensorFlow's semi-declarative approach) can be easier to specialize to dynamic compute graphs.

I personally think the approach used in TensorFlow is preferable – having a static graph enables a lot of convenient operations, such as storing a fixed graph data structure, shipping models that are independent of code, performing graph transformations. But you're right that it entails a bit more complexity, and that implementing something like recursive neural networks, while totally possible in a neat way, ends up taking a bit more effort. I think that the trade-off is worth it in the long run, and that the design of TensorFlow is very much influenced by the long-run view (at the expense of immediate simplicity...).

The ops underlying TensorFlow's `tf.while_loop` are actually quite flexible, so I imagine you can create a lot of different looping constructs with them, including ones that easily handle recursive neural networks.

Thanks for pointing out a problem that I haven't really thought about before!

kalamaya · on Jan 30, 2017

I'm intrigued by pyTorch but I am really having a hard time groking what you mean by the whole "but that really doesn't extend well to any dynamic graph structure such as recursive tree structure creation. Since the computation graph has a different shape and size for every input they are difficult to batch and any pre-defined static graph is likely excessive, wasting computation, or inexpressive."

Would you mind providing a concrete example to relate to if you dont mind? Again, intrigued by PT so want to learn more about it vs TF...

liuliu · on Jan 19, 2017

You can build both the symbolic computation graph and do the computation at the time when defining the network architecture, thus, gaining the ability to be "dynamic" and also supporting advanced features with the symbolic representation that you built on the side.

In fact, with DyNet or PyTorth, you still need to bookkeeping the graph you traversed (tape) because no one is doing forward AD. If that's the case, why not have a good library to do symbolic computation graph and build dynamic feature on top of it. (I am not saying Tensorflow is a good symbolic computation graph library to build upon just arguing that start with a define-compile-run library doesn't necessarily hinder your ability to support dynamic graphs).

smhx · on Jan 19, 2017

the biggest hindrance to do this are language constructs that cannot be or are inconveniently expressed in the symbolic graph, such as python's if vs tf.if and for vs theano.scan, or conditioning on some python-code (not tensor operations). So to build an eagerly evaluating symbolic graph framework that is allowed to do arbitrary things would mean that you would (to an extent) reimplement the language you are working with.

liuliu · on Jan 19, 2017

Let's assume Tensorflow has basic symbolic computation graph expressiveness. What you would do is to build a symbolic representation while executing your graph inline, your symbolic representation doesn't need to have any control structure, it is simpler than that. You execute while loop in Python as usual, and your symbolic representation won't have TF.While at all, it will simply be the execution you performed so far (matrix mul 5 times).

Once you have a reasonable symbolic computation graph library, you don't need to explicitly build a "tape" because the symbolic representation will record the order of execution and reverse AD even graph optimization (applying CSE etc) come naturally as well.

superfx · on Jan 18, 2017

How is adding dynamic graphs to TensorFlow "after the fact" while adding it to Torch isn't? (Torch is much older than TF).

Smerity · on Jan 18, 2017

Torch was never written as a static graph computation framework. Torch was/is more a tensor manipulation library where you are executing the individual operations step by step and the graph can be tracked and constructed incrementally from those operations. For this reason, much of PyTorch is about building a layer on top of the underlying components (which are focused on efficiently manipulating tensors and/or implementing low level NN components) rather than re-architecting Torch itself.

This won't be the same for TensorFlow as it was written with the concept of a static computation graph at its core. I'm certainly not saying it's impossible to re-architect - and many smart people in the community and at Google are devoting thinking and code to it - but simply that the process will be far more painful as it was not written with this as an intended purpose.

To note - there are many advantages to static computation graphs. Of particular interest to Google is that they distribute their computations very effectively over large amounts of hardware. Being able to do this with a dynamic computation graph would be far more problematic.

superfx · on Jan 18, 2017

Thanks for the clarification.

Does the upcoming XLA interact with this as well? I.e. compilation would be too costly for dynamic graphs, and so it would only make sense for static graphs?

Smerity · on Jan 18, 2017

I am not highly clued in to XLA as it's new, quite experimental, and most honestly I've just not looked at it in detail. Given XLA provides compilation, JIT or ahead of time, it doesn't really (yet) factor in to the dynamic graph discussion.

What would theoretically be interesting is a JIT for dynamic computation graphs. Frequent subgraphs could be optimized and cached and re-used when appropriate, similar to a JIT for Javascript. No doubt they're already pondering such things.

https://www.tensorflow.org/versions/master/experimental/xla/

datascientist · on Jan 24, 2017

Chainer's Define-by-run apporach is also described here https://www.oreilly.com/learning/complex-neural-networks-mad...

jkk · on Jan 21, 2017

Any particular reason you prefer PyTorch over DyNet?

chewxy · on Jan 18, 2017

If you guys wanna use Go, Gorgonia also features dynamic graphs the way Chainer does (also Theano-style compile-execute machines)

attractivechaos · on Jan 20, 2017

One question: how do you save a dynamic network if it changes from time to time (e.g. from sample to sample)?

apaszke · on Jan 22, 2017

You save the parameters and the code of the model definition

smhx · on Jan 18, 2017

It's a community-driven project, a Python take of Torch http://torch.ch/. Several folks involved in development and use so far (a non-exhaustive list):

* Facebook * Twitter * NVIDIA * SalesForce * ParisTech * CMU * Digital Reasoning * INRIA * ENS

The maintainers work at Facebook AI Research

tsomctl · on Jan 18, 2017

Not only that, but it appears to use the same core c libray (TH) as Lua torch.

smhx · on Jan 18, 2017

we actually share the same git-subtree between Lua and Python variants. TH, THNN, THC, THCUNN are shared.

divbit · on Jan 18, 2017

I have been running in the back of my mind the idea of attempting a julialang interface to torch for a few weeks now, using the ccall interface: http://docs.julialang.org/en/release-0.5/manual/calling-c-an.... Do you have any thoughts / recommendations w'r't' that? (This would be more of a fun / weekend(s) project for me than anything else) My goal would be to have the tensors override the .* and * operators as used here: https://gist.github.com/divbit/ec57ad2f1989bf13aecdf9e1e1056...

spyspy · on Jan 18, 2017

This project aside, I'm in love with that setup UI on the homepage telling you exactly how to get started given your current setup.

artursapek · on Jan 18, 2017

Agreed. Reminds me of this scary page I found the other day when googling "certbot setup":

https://certbot.eff.org/all-instructions/

jmportilla · on Jan 19, 2017

yeah, that's a great way to quickly show the set up and OS requirements

demonshalo · on Jan 18, 2017

yes indeed!

programnature · on Jan 18, 2017

Actually not clear if there is an official affiliation with Facebook, other than some of the primary devs.

throwawayish · on Jan 18, 2017

Notably absent is the otherwise Facebook-typical PATENTS license thing. Which I see as a good sign.

Also, it doesn't look like this has happened just now? PRs in the repo go back a couple months and the repo has 100+ contributors.

smhx · on Jan 18, 2017

it's the same license file as https://github.com/torch/torch7 and http://torch.ch

The C libraries are shared among the Lua and Python variants

tdees40 · on Jan 18, 2017

At this point I've used PyTorch, Tensorflow and Theano. Which one do people prefer? I haven't done a ton of benchmarking, but I'm not seeing huge differences in speed (mostly executing on the GPU).

sandGorgon · on Jan 18, 2017

Keras is going to be the interface to Tensorflow - https://news.ycombinator.com/item?id=13413487

tdees40 · on Jan 18, 2017

Yes, but Keras works just fine using Theano as a backend as well...

taterbase · on Jan 18, 2017

Is there any reason this might not work in windows? I see no installation docs for it.

smhx · on Jan 18, 2017

the C libraries are compatible with Windows, they are used in Torch windows ports. We just dont have any Windows devs on the project to help and maintain it :( .

randomx89 · on Jan 19, 2017

Are you guys looking for Windows devs to contribute or help maintaining it? I'd be interested in helping out if I can. I currently use Chainer, but I'd like to try pytorch

apaszke · on Jan 22, 2017

Yes! There's an issue on that, where we'll be coordinating the work: https://github.com/pytorch/pytorch/issues/494

EternalData · on Jan 18, 2017

Been using PyTorch for a few things. Love how it integrates with Numpy.

theoracle101 · on Jan 18, 2017

Most important question. Is this still 1 indexed (Lua was 1 indexed, which means porting code you need to be aware of this)?

apaszke · on Jan 18, 2017

No! Python 0 based indexing everywhere.

smhx · on Jan 18, 2017

it's 0-indexed just like everything else in python

rtcoms · on Jan 18, 2017

I've never fiddled with machine learning thing so don't know anything about it.

I am wondering if CUDA is mandatory for torch installation ? I use a Macbook air which doesn't have graphics card, so not sure if torch can be installed and used on my machine.

itg · on Jan 18, 2017

It's not mandatary, but for some problems such as using image data, it provides as substantial speedup when training a classifier.

zitterbewegung · on Jan 18, 2017

You could probably train MNIST on your Macbook Air but anything much more complicated for that you would want to use A GPU.

RangerScience · on Jan 18, 2017

I believe that's a "no". I was able to set up a docker'ized Deep Style on my Macbook Pro, although it takes for bloody ever to do a single image. CUDA is, AFAIK, a substantial speed boost, but not a requirement.

baq · on Jan 18, 2017

Very nice to see Python 3.5 there.

jbsimpson · on Jan 19, 2017

This is really interesting, I've been wanting to learn more about Torch for a while but have been reluctant to commit to learning Lua.

veli_joza · on Jan 19, 2017

Lua is a pleasure to learn and use. The language core is so simple and elegant, you can learn it in a day. Standard library is also very light, which is both strength and weakness.

I use it more and more for hobby projects. Combine it with LuaJIT (which torch uses) and you have the fastest interpreted language around. Give it a try.

etiene · on Jan 20, 2017

I want to reiterate this. I started learning it for guilt because it was created in the university I studied. Then I realised it was really a pleasure to use it. I still use it in many hobby projects nowadays whenever I can.

ankitml · on Jan 18, 2017

I am confused with the license file. What does it mean? Some rights reserved and copyright... Doesnt look like a real open source project.

yincrash · on Jan 18, 2017

It is a standard 3-clause BSD license. The "All rights reserved" portion definitely adds ambiguity (and only exists in the BSD license out of all major OSS licenses). There is StackExchange answer that goes into the history of it[1].

[1] http://opensource.stackexchange.com/questions/2121/mit-licen...

ankitml · on Jan 18, 2017

Got it. It makes sense now.

wyldfire · on Jan 18, 2017

Copyrights are appropriate, they make it explicit who produced the work and make it easier to enforce the license. This license follows the general form of the BSD-style license [1].

[1] https://opensource.org/licenses/BSD-3-Clause

s_ngularity · on Jan 18, 2017

Looks like standard BSD-3 to me

gallerdude · on Jan 18, 2017

What's the highest level neural network lib I can use? I'm a total programming idiot but I find neural nets fascinating.

visarga · on Jan 18, 2017

Keras requires just a few lines of code, it's designed for easy use and practicality.

apaszke · on Jan 19, 2017

torch.nn offers a very similar interface to Keras (e.g. see Alexnet definition at https://github.com/pytorch/vision/blob/master/torchvision/mo...).

nickdavidhaynes · on Jan 18, 2017

http://playground.tensorflow.org/

Pretty much no way to use neural networks (except for playing, like above) without writing code.

eudoxus · on Jan 18, 2017

This[1] was posted earlier today to HN. Seems pretty simple to play with NNs without coding.

[1] - http://kur.deepgram.com/

aaron-lebo · on Jan 18, 2017

Is this related to lua's Torch at all?

http://torch.ch/

zo7 · on Jan 18, 2017

They don't seem to explicitly say it, but it might be using the same core code given the structure of the framework and their mentioning that it's a mature codebase several years old. The license file also goes back to NYU before being taken over by Facebook, similar to Torch.

apaszke · on Jan 18, 2017

The core libraries are the same as in Lua torch, but the interface is redesigned and new.

pavanky · on Jan 18, 2017

They are sharing the code base using git-subtrees. So the C and CUDA parts of the codebase will be kept in sync. The modules written in lua or python will diverge.

superdaniel · on Jan 18, 2017

I was wondering the same thing. There's even another repo that seems mildly popular called pytorch on Github: https://github.com/hughperkins/pytorch

pavanky · on Jan 18, 2017

They share the same underlying C and CUDA libraries. The Python and Lua modules are different. You can see both projects have pretty much the same contributors because they are sharing the code base using git-subtree.

0mp · on Jan 18, 2017

It is worth adding that there is a wip branch focused on making PyTorch tensors distributable across machines in a master-workers model: https://github.com/apaszke/pytorch-dist/

shmatt · on Jan 18, 2017

i've been running their dcgan.torch code in the past few days and results have been pretty amazing for plug and play

vegabook · on Jan 18, 2017

Guess there's no escaping Python. I had hoped Lua(jit) might emerge as a scientific programming alternative but with Torch now throwing its hat into the Python ring I sense a monoculture in the making. Bit of a shame really because Lua is a nice language and was an interesting alternative.

jjawssd · on Jan 18, 2017

Lua is extremely flexible to the point where there is basically no standard library. This causes problems with code reuse and moving between codebases because everyone does things drastically differently. Compare this to Numpy in the Python world, a single fundamental package for scientific computing in Python.

Lua is less used than Python in the scientific community, and a lot of the most innovative machine learning researchers already work with C++ and Python. Using yet another language with only marginal benefit increases cognitive load and drains from the researcher's mental innovation budget, forcing the researcher to learn the ins and outs of Lua rather than working on innovative machine learning solutions.

Lua is a nice language. Python 3 is a nice language and there are many new exciting features and development styles (hello async programming?) in the making which will prevent a monoculture from forming in the near term.

vegabook · on Jan 18, 2017

Thanks for the interesting and informative comment. Do I sense just a tiny bit of regret though? Yet another Python interface. YAPI. You heard it here first. And no, Py3 is not that nice. Too much cruft by far. And lua is miles faster than Python when you're outside the tensor domain, ie while you're sourcing and wrangling your data. Arguably luajit obviates the need for C , something you can't say about Python. Disclosure: I am a massive, but increasingly disenchanted, user of Python. I had actually started looking at Torch7, foregoing tensorflow, precisely because of Lua. But the walls are closing in....

jjawssd · on Jan 18, 2017

A very large portion of performance problems can be mitigated with the use of cython and the new asyncio stuff.

asyncio success story: https://magic.io/blog/asyncpg-1m-rows-from-postgres-to-pytho...

cython: http://scikit-learn.org/stable/developers/performance.html

vegabook · on Jan 18, 2017

Luajit is at least 10x faster than python and easily obviates the need to mess around with cython. That's an easy win for Lua. Let's be honest: Torch has decided that if you cannot beat them, join them. It is about network effects. Not about Python better than Lua intrinsically.

baq · on Jan 18, 2017

but why do you care if luajit is faster than python if everything that matters is computed on the GPU anyways?

jjawssd · on Jan 18, 2017

Can't argue with that

inlineint · on Jan 19, 2017

An alternative to Cython is Numba [1], which speeds up some cycles in pure Python by just adding a single decorator.

[1] http://numba.pydata.org/

cr0sh · on Jan 18, 2017

> And lua is miles faster than Python when you're outside the tensor domain, ie while you're sourcing and wrangling your data.

Then use Lua for that, if you are more comfortable there and want/need the speed bump. There's nothing that says an entire project or whatnot has to be developed in a singular language.

Use each tool to its strengths, as your needs, requirements, and abilities dictate.

vegabook · on Jan 18, 2017

There always has to be someone rolling out the horses-for-courses pitch. No. I wanted Lua to gain traction with other people. That's the point. I would have liked the Lua sci-ecosystem to be healthy as an alternative.

infinite8s · on Jan 18, 2017

Is there an equivalent to numpy in the Lua space?

apaszke · on Jan 19, 2017

torch?

webmaven · on Jan 20, 2017

> And lua is miles faster than Python when you're outside the tensor domain, ie while you're sourcing and wrangling your data.

Is that true even if the Python used is PyPy rather than CPython?

etiene · on Jan 20, 2017

I like Lua more than I like Python and all of this makes me sad. I wished more people were putting their hearts into getting the Lua's ecosystem going instead of into things like this.

argonaut · on Jan 18, 2017

You're making the language out to be more important than it really is.

The Python that you write when using these frameworks just the glue code / scripts. All you're doing is calling the framework's functions. Most of it gets thrown away (as researchers). The stuff that doesn't is self-contained and usually short. You're not writing 100k+ line codebases.

Lua may be faster for certain tasks (data processing), but the time it takes for does tasks is usually a rounding error in deep learning. Not to mention you can still code in C/C++ with pytorch.

If there is a monoculture in machine learning, it would be the deep learning monoculture.

statsmatscats · on Jan 18, 2017

Here is an emerging julia alternative http://www.breloff.com/transformations/

eva1984 · on Jan 19, 2017

No, there is already no escaping from CUDA, so it is already a monoculture nevertheless.

crudbug · on Jan 18, 2017

I was also on this bandwagon. Its about pythonic syntax not semantics that drives this.

If only Mike Pall created a transpiler infrastructure layer on top of LuaJIT.

wodenokoto · on Jan 18, 2017

There's also R and Julia and there are still plenty of people building neural networks in C.

attractivechaos · on Jan 18, 2017

It is easy to build a multi-layer perceptron purely in C. You can roll your own or use a library like FANN. However, so far as I know, very few (darknet is the only example I know of) are using C to build a bit more complex networks like CNN/RNN, let alone those topologically complex networks in research domain.

plg · on Jan 18, 2017

Every time I decide I'm going to get into Python frameworks again, and I start looking at code, and I see people making everything object-oriented, I bail

Just a personal (anti-)preference I guess

apaszke · on Jan 19, 2017

But it is possible to write your model in purely functional style. Check out the PR to examples repo with functional ResNets https://github.com/pytorch/examples/pull/22.

closed · on Jan 18, 2017

Same. I know there can be nice, composable OO approaches, but every time I bump into a super crazy stacktrace, or need one of those police-detective-style boards with yarn to connect everything, I start to wonder.