I'm an undergrad student, and I'm nervous about picking between Tensorflow+Keras over PyTorch.
It looks like many more companies are hiring for TensorFlow, and there's a wealth of information out there on learning ML with it. In addition, it just got the 2.0 update.
But, PyTorch is preferred nearly every single time when I see the discussion come up on HN and Google searches. I'm having a hard time deciding what to dedicate my time to.
Abstract from the tools. They come and go. You will need to adopt a new one every other year.
Instead, make sure to understand the math and the concepts, and then it‘s easy to translate that to an implementation.
One way of doing this (though not sufficient) is to learn both tools.
Right now the pull is away from TF (increasingly convoluted API and lots of deprecations) and towards pytorch (more support from the research community and increasing performance in production).
I posted this link but now the title has somehow changed. I do not know what is the policy on HN. But the title saying "[video]" might give a wrong impression that this points to a one hour long video. The link points to a tutorial which embeds an entirely optional two minute video that introduces the main content contained in five web pages.
One thing I've noticed is that it's quite hard to have vibrant discussions about DL because it is all either so simple or it is dauntingly complicated/unpredictable. Mostly my DL conversations end up being about frameworks. Anyone else experience this?
Also the number of DL submissions on HN seems surprisingly low given the applicability of the technology.
It's pretty easy when you're talking to people who understand the fundamentals of deep learning, but that understanding isn't very common even on HN. I think that's because the real-world, valuable usecases of DL are not very accessible:
(a) DL is pretty complicated in a way that's unfamiliar to most software engineers. You are consistently working with Tensors that have a couple more dimensions than people are used to holding in their heads (i.e. images mean you are typically working with 4D Tensors).
(b) You learn from academic papers, not blogs. It's a new workflow for many software people and intimidating to some (although the papers are usually closer to blog posts than rigorous academic papers).
(c) It's very difficult to learn deep learning on your own without it getting pretty expensive. Advanced uses pretty much require GPUs/TPUs and that's either a big upfront purchase or a serious per-experiment cost.
(d) Deep Learning is not a single field. It is CV, NLP, RL, speech recognition and probably others I'm forgetting about. They overlap, but it further reduces the number of people you can have informed discussions with because being knowledgeable about computer vision does not mean you are able to have a vibrant discussion about NLP.
Could you list some of the places you look to learn? Where do you find relevant academic papers? It's hard to find information about what this learning workflow is.
These are “hands on” in the sense that you can replicate the results just by pasting in the same code. It’s kind of like a tutorial notebook in essay form.
In my experience the best place to have informal ai discussions is Twitter. The community is shockingly helpful. Follow @jonathanfly, @roadrunning01, @pbaylies and whoever pops up in the stuff they post. Roadrunning in particular posts tweets of the form “here’s some research; here’s the code” often with an interactive notebook.
Keep in mind that tutorials will always make it look easy compared to debugging actual production code. If you look through tensorflow tutorials, they also look very easy, especially with TF2.
That said, I've experimented with pytorch and I agree that it is really nice to work with.
Disclaimer: I work at Google and do use tensorflow, though I don't work on the tensorflow team.
PyTorch is 10x easier to debug than even TF2, and it's been that way all along. TF2 is no easier to debug than the previous releases if you're not using eager mode (which most people don't), and even in eager mode it sometimes errors out in ways that do not offer any suggestion as to _which op_ caused the error. This is nuts. Modern architectures have hundreds, sometimes thousands of ops. It basically boils down to flying blind and guessing and can easy take days of trial and error to figure out each issue. Plus, every time you start a TF program it just sort of sits there for a minute or so before it starts doing anything. This severely hampers productivity when debugging.
To all the folks who are just starting out: just go with PyTorch. It's downright intuitive compared to anything Google has been able to put out so far.
Disclosure: ex-Googler. Used TF while there (and DistBelief before it). Gave it up as soon as PyTorch came out. Couldn't be happier.
Very good that Pytorch emerged as a serious contender to TF. While TF still provided more production grade tools (TFX, TensorRT, TF serving), Pytorch continue to evolve and hope soon we have a more complete ecosystem
I really like JAX as well: https://github.com/google/jax.
It's younger than PyTorch and TF, but feels cleaner and more expressive. It has a very nice autodiff implementation (based on https://github.com/HIPS/autograd) and performance is comparable to TF in my experience.
It feels like JAX doesn't have any of the high-level APIs that PT/TF/MXNet that are vital for fast prototyping of model architectures. Is that correct?
It seems that the JAX developers are focusing their time on making the core framework better and are leaving the task of building high-level APIs to the community for now.
I suspect we'll see a few high-level APIs emerge over the next few months that explore different approaches before the community settles on a particular one.
I hope not. That's part of what makes TF so miserable - the core library didn't provide the tooling people actually needed so the community built a ton of different tools and it just made TF confusing to use.
Is there a drop in replacement for TensorBoard? It’s probably the biggest thing keeping me using tensorflow. Ideally the api of the pytorch equivalent would be about the same too.
Anyone have thoughts on TF2.0 vs pytorch? Over on Twitter people seem to be pretty hyped about TF2.0, but when I tried learning it it just felt... not very fun. I need to give it a fair shot though.
Anyone know somewhere that has a good overview of the various ML and DL model types and what they are good for? I've been looking for a survey paper or book or just a glossary of ML.
When you hear autoregressive model, think “predicting a sequence”. These are good for text to speech since you can say “given some text, generate a spectrogram.” GPT-2 is probably the most impressive example of autoregressive techniques (I think).
GANs, and especially stylegan, are good for generating high quality images up to 1024x1024. These take about 5 weeks to train and $1k of GCE credits. The dataset size is around 70k photos for FFHQ. Mode collapse is a concern, which is when the discriminator wins the game and the generator fails to generate anything that can fool it. Stylegan has some built in techniques to combat this. IMLEs recently showed that mode collapse can be solved without gans at all.
Hmm.. what else... I’ll update this as I think of stuff. Any questions?
EDIT: Regarding IMLE vs GAN, here are some resources:
For comparing images, I believe they use the standard VGG perceptual loss metric that StyleGAN uses. (See section 3.5 of https://arxiv.org/pdf/1811.12373.pdf)
It seems to me that the main disadvantage of IMLE is that you might not get any latent directions that you get with StyleGAN. E.g. I'm not sure you could "make a photograph smile" the way you can with StyleGAN. But in the paper, they show that you can at least interpolate between two latents in much the same way, and the interpolations look pretty solid.
IMLE (implicit maximum likelihood estimation) as far as I can tell is a trivial method of parameterizing a random variable distribution and tuning it to make true data (e.g., image) examples more likely. The technique relies on finding nearest neighbor example images, which in turn needs a metric of image distance. Original IMLE uses least-squares pixel distance for example, which is not a very flexible or effective metric in practice (eg., it is completely confused by rotation).
The whole advantage of GaN is it does NOT need an explicit distance metric for comparing images--instead the discriminator effectively learns the metric in order to improve its ability to distinguish real images from generated/fake ones. Arguably this is the whole advantage of GaNs.
So to argue that IMLE can solve mode collapse is a false equivalency.
I found a strange bifurcation recently while collecting papers on a sub-topic of this question.. China-based authors quoting other China-based authors extensively, in English with math, of course. Meanwhile, the US and Western EU seem like "it" , in other words, all the papers referenced seem like the ones you would reference..etc self-consistant.
One of the incredibly unfortunate things about science out of China. It may or may not be trustworthy, as in the data may be just straight false. I'm not surprised that you saw that split, I'd be leary of quoting/referencing a potentially false paper myself.
Autoregressive models use their own output at past time steps as part of the input to predict the next value. If your sequence generator does not do that then it’s not “autoregressive”.
Not as far I know. It does have max-margin loss [1], which is pretty much all you need to implement a neural ranking model, apart from data iterators, and training loops.
Does no one build their own ml algos anymore? I don't understand the need for pytorch and tensor flow. I honestly thought tensor flow was nothing but a teaching thing for undergrads
This type of reasoning can be extended to any high-level tool. " Does no one writes there own OS. I don't understand the need for Linux or windows. I honestly thought windows or linux was nothing but a tool for undergrads to use Excel or host a WordPress site". And this is not a caricature of your argument. There is a lot of stuff under the hood that Tensorflow or Pytorch implement for a programmer. So much so that people have written wrapper for using TF or Pytorch to even further abstract the working of the library. Implementing deeplearning architecture is less of a science and more of a "let me try this or that" and iterating ideas quickly if of the utmost importance. Also, I can implement a neural network in C (CUDA) although not the auto diff part, but I could if given time to research) but if I started implementing my own library, it would take an order (or even more) of magnitude more time to do the stuff I do daily. We don't need to reinvent the wheel here guys.
That's what I'm getting at, the stuff under the hood is what's important, devil in the details and all that. I'm also a quant so every ml algo needs to be tailored so idk
Do you also write your own automatic differentiation tools? Using libraries like TF and PyTorch makes sense if you use neural networks because they provide automatic differentiation (who wants to write out their gradients by hand?) and standard neural network components.
Edit: If your algorithm is not using neural networks, then libraries like TF may or may not be a good fit, it depends on the algorithm.
Writing custom low-level code can still make sense in those cases.
Although the endpoint is likely to be a better understanding of the choices made by a mature implementation, and of the work involved in fixing up edge cases.
Not all of us need to build their own ML algos. Just in the same way that not all of us need to build their sorting libraries or data structures. Some people are specialized in this to develop and do research. While other software engineers just want something they can use without much hassle and just a superficial understanding.
They're frameworks which implement high performance tools commonly used in ml problems like tensor operations, automatic differentiation, various gradient descent optimisers, and also neural network building blocks
To be fair, the main thing you learn doing the cryptopals challenge is to not write your own crypto.
I had a lesson in writing crypto once, when I made what I thought was a good enough secret mixing procedure to encode some data I wanted to email outside of a company that didn’t allow web access. (Long time ago, circa 2000). It all looked undecipherable and I sent most of the data before I discovered that strings of binary zero were leaking my secret key. Oops, pretty stupid.
I am sure you could write stuff like Diffentiable Processors or the like from scratch with numpy but if you respect yourself and your time, you won’t. Complicated architectures are orders of magnitude harder than writing feed forward networks from scratch. For example, see the Merlin paper.
https://www.youtube.com/playlist?list=PLZbbT5o_s2xrfNyHZsM6u...
They explain things incredibly well, videos are easy to understand, engaging, and to the point. Highly recommend it to everyone!
I've also heard that Udacity has some good courses, but I can't vouch for those yet.