While PyTorch is obviously the future in the short term, it will be interesting to see how this space evolves.
Before Tensorflow, people (myself included) were largely coding all of this stuff pretty manually, or with the zoo of incredibly clucky homemade libs.
Tensorflow and PyTorch made the whole situation far more accessible and sane. You can get a basic neural network working in a few lines of code. Magical.
But it's still early days. George Hotz, author of tinygrad[0], a PyTorch "competitor", made a really insightful comment -- we will look back on PyTorch & friends like we look back on FORTRAN and COBOL. Yes, they were far better than what came before. But they are really clunky compared to what we have today.
George Hotz is a great hacker and problem solver, but I greatly doubt he has the patience to actually build a product.
This is a fantastic skill set to solo-hack a complicated piece of software and produce a proof-of-concept, but not for building a tool that people actually use. See his track record of never building any successful product after his PS3 & iOS jailbreaks, his latest resignation memo from Comma.ai, and his resignation from Twitter after just 4 weeks.
So I think it's kind of hilarious that you're citing him and his newest tinygrad project in one sentence with the assertion that PyTorch & friends will soon be considered ancient and clunky tools. While I don't necessarily dispute the latter, I'm certain that tinygrad won't be the one replacing them.
George Hotz is exactly the kind of cowboy programmer that drives me up a wall. I encounter them in my company all the time. Due to some early success ( or even not ) a Self-absorbed, know-it-all who can produce gobs of code in the mold of a so-called ‘10x programmer’ and then move onto something else, leaving an unsupportable mess in their wake. It’s usually someone who likes to be seen programming rather than actually taking the time to build something for the long term. This is an attitude we should be actively trying to discourage.
It's not mean, it's just a statement of a fairly obvious pattern in his career, which should influence how much weight we give to one of his predictions.
Why do we have so many of these overhyped personalities in the programming community? Guessing it's because half of the community have less than 5 years experience and younger people are more online but these personality cults are very offputting.
I think because so many of us are invisible that when you have someone like George Hotz who received a ton of media exposure, you sort of are impressed that they did it or something?
Streaming. A handful of software engineers are now entertainers. This does a few things. It vastly increases their mindshare - they become one of the ten or so programmers known by the most people. It also encourages exaggeration. Streamers live off eyeballs. You want big loud over the top personalities with hot takes and extreme reactions.
comma.ai is quite successful. it does what it advertises and profitable. it is a viable product and used and loved by many. it is amazing how much they achieved given the little funding they raised compared to the money raised in this space. it is a great achievement to get it all this way
Hopefully something that doesn't require the user to be a math wizard.
I'm not into data science by any means, and I first learned about tensors with Tensorflow. While I would like to experiment with neural nets, the current frameworks require me to learn the foundations of how they work. I'm sure I'll get arguments that this is a good thing, but I just want to do some cool stuff with computers, not take math courses to learn how the cake is made.
High-level programming languages are abstractions to machine code that, for the most part, don't require the user to know how they work internally. I think we're missing similar abstractions for ML, or it's possible they already exist and I haven't looked hard enough.
I'd rather hope that basic linear algebra and the notion of vectorized function application becomes universal math knowledge like addition and multiplication, as basic programming slowly becomes. We need languages and compilers with first-class optimized differential programming support, sure, but increasing baseline education would benefit wider population.
I agree with your last point, but most people have no practical need to learn high-level math. We struggle globally with even basic education rates, and programming definitely isn't for everyone.
What is needed are higher-level abstractions over ML principles that make ML more accessible for people who are interested in programming, but lack the math background to be productive. This doesn't necessarily mean that the lower-level details should be entirely hidden, but that the tools are friendly enough for beginners to easily pick up, and flexible enough for advanced users to tweak to their needs.
Note that “high-level math” by your definition covers things that were basically understood by 1900 (a good part even by 1850) and polished by 1940. It’s an incredible failure of the education system that any of this still counts as “advanced” in any way in 2023[1].
So abstract away, by all means, ensuring the user doesn’t need to know the Biot-Savart law to use a doorbell is the engineer’s job after all, and as semiconductor electronics show, users can include other engineers. (Hell, even the humble resistor is a fairly advanced piece of technology dependent a good chunk of 20th-century physics.) But sooner or later we’re going to have to pay back this “ed debt”. (“Soc debt”?)
As you mentioned below, most people don't understand basic calculus from the 17th century, and I would argue that they don't need to.
Going by dates of introduction to determine if a math topic should be part of general knowledge doesn't make sense. Math was highly advanced by the 20th century, and a lot of it was used to advance the field of physics, which led us to the technological advancements we all enjoy today. None of it needs to be fundamentally understood by the layperson.
You sound well versed in math, but most people just don't need this knowledge in everyday life. Precisely because engineers build tools to make everyday life easier for everyone, regardless of education opportunities. Our society couldn't function any other way, and asking for everyone to have advanced math or engineering knowledge is unrealistic.
> Going by dates of introduction to determine if a math topic should be part of general knowledge doesn't make sense.
On one hand, yes, and that’s part of the point I usually make when I recite how far into the standard university course Newton’s or Gauss’s knowledge can still be found. On the other hand, no, and that’s the other part of the point I want to make when lamenting high-school maths, because the state of the art there is obsolete to a level that’s patently insane. Thank you for making me think about this.
I’ll put it like that. How often do you change your mind? (Or, if you want: how often do you make breaking changes to your APIs?) Obviously it makes to sense to keep a running tally and decide whether to do it or not depending on which side of the target value you’re currently on. Making a performance indicator out of this is a sure-fire way to end up in a completely twisted universe.
On the other hand, if you do it all the time something’s equally obviously gone awry—maybe you just started out being very wrong, and that’s not a problem with the current you, but if after some time there’s still no signs of settling you need to do some soul-searching. Similarly but even more dangerously, if you never do it, it’s of course possible you were simply right to begin with, and that’s happened once or twice in history, but you must seriously consider that you might either be living among fools or avoiding the intrinsic discomfort of the process and thus a fool yourself.
I’d say the recency of a general-education syllabus is an indicator of a similar nature: meaningless in each particular instance, and emphatically not to be used as a target metric, but at the same time as the average goes into the margins it’s increasingly likely it’s failing at its job. And by that measure maths is universally in deep shit. (The natural sciences are as well, just with more variance; I’m not qualified to judge the rest.)
I'd say general education is failing at its job for many different reasons than not teaching advanced mathematics.
On the top of my list:
- Not teaching students how to learn, and instilling them with the joys of perpetual learning, but forcing them to memorize concepts and regurgitate conclusions made by others, so they can pass an easily gradeable multiple choice test.
- Teaching a _bunch_ of useless concepts for everyday life, including some basic math concepts, let alone the math topics you've mentioned, but not teaching about the basics of finance, law and politics which would be much more relevant for leading a generally successful life.
- High corruption, low teacher wages, bullying, etc.
General education is not as much concerned with actually preparing young adults for adulthood, as it is with getting them through a well-established system designed to squeeze any actual desire to learn from students, while squeezing their parents financially. It's a broken system that won't be fixed by adding advanced math to the syllabus.
Exactly. There is no way a single person can understand even the 1700’s state of the art for every single scientific discipline, let alone the 1800’s levels.
> Note that “high-level math” by your definition covers things that were basically understood by 1900 (a good part even by 1850) and polished by 1940. It’s an incredible failure of the education system that any of this still counts as “advanced” in any way in 2023[1].
Depends on the country, and furthermore where I’ve seen them included they’re isolated either to a specialized track or to the last months of school, which still counts as “advanced” as far as I’m concerned and, I expect, as far as the students think about it.
(I also think learning linear algebra from the matrices side rather than from the linear transformations side, as was indeed common for mathematicians in the 19th century and for physicists and engineers well into the 20th, is both unnecessarily painful and almost impossible to pull off well. So if a syllabus lists its basics-of-linear-algebra part as “matrices” it’s probably missing the point. Knowing that you do rows times columns is of very little value, the part that makes it worth it is the why.)
Hell no, that goes beyond sarcasm and well into egregious goalpost shifting.
For reference, things known to humanity in 1940 (though not necessarily understood, that was an important difference in my comment) include the Schwarzschild (stationary black hole) and FLRW (homogeneous expanding or contracting universe) solutions to the equations of general relativity (1917 and 1924 respectively), the Darwin term for the effect of special relativity on the spectrum of hydrogen (1928), the severely non-classical exchange interaction responsible for the existence of ferromagnetism (same year), the London equations describing the (non-)penetration of the magnetic field into type-I superconductors (1935), and a way to produce superfluid helium (1937). And that only includes things I was supposed to ( :( ) know at some point. Also Gödel’s incompleteness theorems (1931) and Turing’s proof of undecidability of the halting problem (1936). I’m certainly not proposing to demand high schoolers know all of those.
The only reason I had to go as far as 1940 or so is because linear algebra came relatively late in the history of mathematics, even though it should logically serve as the foundation of multivariable calculus—and was enthusiastically adopted as such once it finally entered the collective mathematical unconscious in the late 19th century. Working out the pedagogy took a couple more decades.
If we were only talking about the “advanced” mathematics of calculus, I’d say Newton’s knowledge as of 1690 would be way overkill. (That would include the convergence speed of power series, a decent theory of ordinary differential equations, the beginnings of the calculus of variations, and even the “Newton polygons”, a theory of formal-series solutions to polynomial equations that properly belongs to algebraic geometry.) At that point J S Bach was five, the city of Philadelphia was a town founded eight years ago, and Peter Romanov (later called the Great) was still twelve years away from founding St Petersburg and thirty years away from proclaiming Russia an empire.
To be honest, I think even if school taught no mathematics beyond basic numeracy, I wouldn’t complain as much. I would still complain, mind you, but only to the extent I do about being able to grow up without knowing what a fugue or the twelve-bar blues is or who Giotto or Niccolò de' Niccoli were. Instead we torture people with years of alleged mathematics bearing no resemblance to the real thing, and thus give them license to think they are not “maths people” (whatever that means) and that whatever maths they did not hear about in school is “advanced” and obscure. At least the average high-school graduate knows he doesn’t have a clue about music history or theory.
Sure, I just couldn’t discern the point behind the absurd (and if someone could I’d appreciate being told).
My tirade had a point as well, though: post 20th century our collective intuitions have become wildly miscalibrated regarding which things in humanity’s understanding of the world are recent or obscure. It seems that the barrier of “advanced” mostly hasn’t shifted at all since high school became compulsory in most parts of the world, and as decades go by I can see this going from “a goddamn shame” to “existential threat to human culture”. I’m not even sure that hasn’t happened yet—already a high school teacher can rarely, if ever, thoroughly explain a recent development of their choice in their subject.
One could just as easily argue that modern humans evolved about about 200,000 years ago, and something that existed for approx 0.0005% of human existence is obviously advenced.
We struggled along for 200,000 years without it, so I'd hardly call it an incredible failure that more people don't learn it[1].
[1] The usual disclaimers apply - im not anti-math, and I think more people should learn math, science, etc...
Why though? Aren't everyday users more interested in finished models themselves? Why would they want to make one, especially if they, in this example, don't really know how they work anyway? What is the value here?
That funny, I looked into the whole neural net stuff the other day and thought:
"Wow you basically don't need any math".
I guess it depends a lot on perspective, to me the whole field looks like alchemy more than science. The results cannot be denied, but we understand so little.
This is true. In the current scheme of things, you don't need to know any math to get things going. At some level, if you really need to write a custom model from scratch, probably you need to - but not really.
Now you have the likes of HuggingFace which is another abstraction on top of Pytorch. Heck you don't even need to code a transformer from scratch. They have done it for you.
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForMaskedLM.from_pretrained("bert-base-uncased")
I was like wow, this level of abstraction is totally breaking barriers to entry and we will see many many flavours of huggingface in the years to come.
I think this is more to do with the way software ecosystems develop in general. You don't need to know much about the actual implementation of databases or low-level programming to build a data-heavy application, and you don't need to know the complexities of graphics programming to render complex visuals to a screen.
If you want to do research in any branch of ML, you are almost certainly going to need some strong mathematical foundations. At the same time, if you're more interested in applying existing architectures/models, the ecosystem has a lot of really amazing tools that will take you a very long way without requiring you to study statistical learning theory.
They have a course that goes exactly like that (Practical Machine Learning or similar). You start high level with libraries and just a few lines of code, and gives you the high level tools to use ML in your projects, and sometimes goes a bit lower level when needed.
Thanks! I've had this bookmarked for years now, but haven't gotten around to checking it out. It's on my list to look into next time I get the ML itch. :)
The right analogy is cryptography. You don't need to understand the math behind cryptography to calculate a hash, or encrypt something. You just need to understand at a very high level what is asymmetric encryption, IV, etc.
That’s actually a bad analogy in my opinion. People who don’t learn cryptography make many mistakes calling cryptography libraries and those mistakes can’t be uncovered by testing.
People can make ML mistakes if they don't learn ML. Some of the mistakes can be uncovered by testing, but only if you know enough about ML to know how to test it. Some mistakes just you with a bad model that could have been better.
I think what we all want is some kind of blackbox where data goes in and insight comes out. Ideally we would have something that takes a bunch of data and finds the shortest program that generates it. Unfortunately such a black box cannot exist, it is uncomputable. So then we need to ask for something less than that and we don't know exactly what that is.
If the data was generated from a generator of constant complexity, its kolmogorov complexity cannot exceed the complexity of the generator. As the amount of data increases, the kolmogorov complexity will increase until it converges after a finite amount of data. At this point we effectively have the generator so we can generate infinite data.
> I'm sure I'll get arguments that this is a good thing, but I just want to do some cool stuff with computers, not take math courses to learn how the cake is made.
It’s not so much how the cake is made but rather what you are actually doing.
You can already make something cool today by using end products and stringing their api together which might be what you are looking for.
If you want to make something new however and wish to use ML tools, well, at some point, you will have to learn what these tools actually are.
The ML is made of, and expressed in terms of, math. There’s no way round that if you want to train models; you need first-year undergrad statistics and linear algebra. In other words, just as you do with any language, you need to understand the grammar and vocabulary.
You don’t have to know how the operations involved doing what they’re doing, in the sense that you don’t need to know how matrix multiplication is implemented any more than you need to understand Timsort to call sort() in Python, but you do need to know matrix arithmetic. That isn’t being a math wizard, it’s the price of entry.
I don't think that you need to know any of the underlying maths to make a neural network in TF/Keras. You need to know (not understand!) the building blocks, and what are they good for, but learning these feels like learning the difference between lists and dicts. You can do programming without it, but it will be ugly.
I understand what you mean, and I think we are unfortunately not there yet. Right now you still mostly need to understand the details to make something work. In many areas, it's not even just a leaky abstraction [1] where it works 90% of the time, but more like a series of magic incantations and tricks you need in a specific order to make it work. But as far as I can tell its getting better and more robust in general.
Came here to say this. Tensors are the wrong level of abstraction, because they're just matrices, which are just arrays.
The best tool I've used for vector/matrix processing is MATLAB (or GNU Octave, which is free but without as many libraries). To say I'm disappointed that these never gained the same traction as Python would be an understatement.
Unfortunately, these tools all suffer from a lack of insight from programmers. We abstract whenever possible, but for whatever reason, academics have a certain expectation that the user learn the entire system before starting. I believe that's why functional programming never quite caught on. That egocentric view infects everything from Mathematica/WolframAlpha to UNIX.
I prefer other models. For example, I don't feel that anyone has explored genetic algorithms under the Actor model yet. Which severely limits how fast the user can explore a problem space. Part of that is due to hardware. A GPU may present itself as having X number of teraflops or whatever, but only have something like 64 cores and no API to access them directly like a CPU. Which limits it to a paltry number of environments and/or automatons to evolve.
A better architecture would be something like 1024+ symmetric cores with a unified copy-on-write, content-addressable memory accessible through ordinary C desktop programming and orchestrated with something like Docker. Then we could explore auto-parallelization with languages like Julia and Clojure, possibly even Rust. I've written about that extensively throughout the years, but there's still been no change to the status quo, so I've all but given up on that. I do see a small hope with RISC-V on FPGA though.
The somewhat ironic thing is that AI is evolving so quickly that it's passed the Turing test now for the masses. We only have maybe 5-20 years before it solves all problems better than us. So it's an odd feeling for me to have waited 20 years with no progress on the hardware that I needed to do what I wanted to do, only to make the thing I wanted to do irrelevant anyway. That's the challenge we all face now as play is automated by AI while we still have to go to work and pay rent within a 20th century Matrix reality. I hope we find a way to reverse that trend before we're too old to care anyway.
Ideally, this should all be integrated at the compiler level. Programming languages and their associated standard libraries should expose differentiation as a first-class abstraction, along with everything else.
The problem is that the whole space is really immature, and huge strides are being made... but by many disparate groups, all working in different spaces and languages.
Just in the last few months I read about several Python ML frameworks, a Julia differential compute capability, and that Haskell has a library where even the type system can express differentiation of general algebraic data types!
The whole ecosystem needs to... congeal a little bit.
This whole space reminds me of Object Oriented programming before C++ and Java.
> Haskell has a library where even the type system can express differentiation of general algebraic data types!
It's not really the same sort of differentiation. Derivatives of algebraic data types (as implementable in Haskell) only generalize formal derivatives of polynomials. You need a lot more machinery before you can talk about smoothness.
You don't have to know how an internal combustion engine works to use one. If you need to troubleshoot or repair it, you have to know the basics. The same is true of neural networks.
Using Tensorflow doesn't require you to know all the details, you build a model of the network, and it actually executes that model. The low level stuff is handled.
You still have to know how to tell when it is working, or not. Properly rationing your training/testing/validation ports of data is surprisingly important for not ending up with a model that seems to work great until you use it in the real world.
My wild guess? We will get AI driven UI Frameworks as a tool. You drag and drop stuff in there, it suggests already the basic stuff. Preselects sane filters and normalizes your data etc.
Will handcrafted, math used me better.
Very likely.but a VW is good enough for 99 percent of use cases :)
François Chollet's Deep Learning with Python is arguably the best book on the topic and teaches people using Keras / TF2. The Keras documentation is excellent as well, and in the long run it will matter. Personally I prefer the way the Keras / TF2 APIs work to PyTorch as well.
Also, I am pretty sure a JAX backend is coming to Keras, and there is TF Lite...
I cannot make a prediction as to which stack will dominate in five years.
The relevant metric is the literature - adoption is driven by new models, and most new models in academia are being implemented in PyTorch or JAX. (In particular, TensorFlow 2 is nowhere to be seen.) The new grads joining the workforce are going to continue to use the tools they are comfortable with and new projects are built around new models.
Yes, I have been reading a lot of papers in the last couple of years and the corresponding code, if available, is using PyTorch much more often than TensorFlow.
If you're familiar with the plumbing/porcelain API paradigm, JAX depends on TensorFlow plumbing (XLA) with a more ergonomic porcelain API.
You might not see TensorFlow's plumbing much anymore if you're a new grad running experiments in a notebook, but the "porcelain API" is just the tip of the ice berg of modern machine learning.
>George Hotz, author of tinygrad[0], a PyTorch "competitor", made a really insightful comment -- we will look back on PyTorch & friends like we look back on FORTRAN and COBOL.
Excuse me but, what? You could say this about pretty much every "hot" framework. Technology moves fast.
Our department also didn't even have GPUs to use for the first part of my time there, whereas now it would seem ridiculous to have an ML research wing without access to GPUs.
I think the TensorFlow release roughly coincided with the widespread adoption of proper deep learning library, but it didn't cause it. In fact the first TensorFlow seemed to be just after (by 6 months to a year) the inflection point of deep learning library usage.
At the time, there seemed to be a lively ecosystem but no clear winner. Theano, Caffe and Torch were the main hitters but there were smaller players too. Torch was particularly good, maybe functionally ahead (but I don't really remember for sure) but hampered because, at the time, it was Lua only. In fact Keras's first release said "Keras is a minimalist, highly modular neural network library in the spirit of Torch, written in Python / Theano so as not to have to deal with the dearth of ecosystem in Lua."
When TensorFlow was released, it looked like that would be the death of all the other libraries. The fact that any others survived at all is really down to how hard to use TensorFlow was (/is?). It didn't live up to the expectation of being the killer of other libraries.
My work might make me biased but I think there’s a lot of room to make more general purpose code differentiable.
I work on a ton of stuff that would very much benefit from being differentiable, but also very much can’t fit at all into the “stack a ton of linalg-like layers”.
It’s a huge engineering effort to even try to think about how I might start taking out derivatives. It’s possible but there’s so much overhead to do anything with it.
Julia has metaprogramming as a fundamental principle of the language. This makes for a very concise and powerful system that, together with a well-written AD framework like Zygote, makes every expression differentiable, meaning that effectively the entire language is differentiable.
And that isn't even the coolest thing about working in Julia: just wait til you see what people can squeeze out of macros.
so why it hasn't taken over the ML world already? or it did? or there are too many ML "researchers" who haven't bothered to improve their own tooling and are trapped in Anaconda?
The Julia community is small and has no large commercial backers. Projects such as TF/PyTorch require community support and a lot of investment which Julia just doesn't have. In fact, Julia isn't even trying at the moment to "compete" with TF/PyTorch [1, 2].
I've worked at 2 companies that would have liked to use Julia but it wasn't (and still isn't) product ready for anything involving high reliability or robustness.
Pytorch can be used in a very general purpose way. It's essentially numpy + automatic differentiation + GPU support. All the 'linalg-like layers' are entirely optional. If you write y = A*x+b in pytorch, that works, and is differentiable.
Non-smooth functions (e.g. abs(x)) can be handled with bundle methods, but how would one make inherently discontinuous (non-convex) functions differentiable? (e.g. if x then 1 else 5)
Discrete problems are inherently non-differentiable. There are approaches like complementarity methods and switching functions (tanh) usually end up with numerical issues.
This already happens to an extent in existing ML pipelines. The ReLU activation function, is discontinuous in its derivative, and is one of the most widely used functions in neural networks. Its derivative looks like this.
if(i<0) return 0;
else return 1;
Now ReLU is continuous itself (as well as being monotonic) so it still cooperates relatively well with gradient descent algorithms. I think this is where the problem lies - not with differentiability itself, but with gradient descent not working due to the highly non-convex search space that such general programming constructs will produce.
ReLU is not discontinuous; it is nonsmooth but continuous, hence derivatives exist except at the hinge points.
Inherently discontinuous functions OTOH are disconnected and nonconvex. Gradient descent works, but you have to add a step to first partition the discrete space like branch and bound. This involves solving the continuous relaxation to find a bound. This does not require differentiability (it is not differentiable), but the price to pay is that it is combinatorial (NP hard)
The OP was talking about general differentiability but inherently discontinuous functions form a large and important class of functions (from software programming) that are not differentiable.
> Now ReLU is continuous itself (as well as being monotonic) so it still cooperates relatively well with gradient descent algorithms.
A function with a discontinuous derivative cannot cooperate with gradient descent algorithms. That's why you have the famous problem of "dead neurons".
Imagine an alternative ReLU which had a narrow curved section, to smooth out the discontinuity. Now it has a continuous derivative, but that gradient is still zero for values < 0. This flat region is the cause of dead neurons, because backprop multiplies the propagated error by the gradient, to update the weight - and if the gradient is 0, the result of the multiplication is 0 , and the neuron's weights do not get adjusted.
And then people go to empirical data and apply the Great Smoothing: By dropping ML/DL methods on the data (also including results from discontinuous behavior), continuity is often implicitly assumed.
one step in the right directions are labeled dimensions, where you can write input = width * hight * color * batch and never have to worry about tensor permutations again.
PyTorch and Tensorflow were good enough to implement all the major advanves in recent years: the AlexNet successors, GPT, Stable Diffusion. It's not early days, and the market will not evolve unless there is a new application that cannot be served by PyT and TF.
Whatever it is, I sincerely hope that it is not driven by Python, and instead we have a more accessible language.
Say what you will about Python, but it is a total pain in the arse to work with. The 2.x legacy still casts a long shadow (in terms of online guides, unmaintained libs etc), anaconda, pip, easy install, strings/unicode, type hints (or not), unreadable & unmaintainable list comprehensions etc etc etc. What a dumpster fire, and a truly hateful language.
Here is hoping that some other language takes over. I have my personal take, but I won't let that ruin the rant :)
In part people like Python because it allows you to be busy with boilerplate, the deficiencies of the language and packaging as well as bugs and unexpected tracebacks. It makes you feel productive.
This is the same angle that pg used in one of his essays: Back then people liked Java because the huge amount of boilerplate made them look productive. In Lisp you stare into the REPL and notice that you don't have any ideas about the actual problem you are trying to solve.
I would also prefer another language. A well written C++ library (the one in pytorch seems a bit rough and a second class citizen) would already be a step up.
My "favourite" part is Anaconda taking multiple days to say downgrade Spider v3.3.0 to v3.2.0 (or whatever) in an environment. This operation should ideally complete in under a second, Anaconda is probably one of the most inefficient software on the planet that has widespread adoption.
> For the sake of completeness, I’ve also included Keras in the figure below. It was released at around the same time as TensorFlow. But, as one can see, it’s tanked in recent years. The short explanation for this is that Keras is a bit simplistic and too slow for the demands that most deep learning practitioners have
Keras is nowadays the front-end to TF, so this paragraph doesn't make sense. In light of this misunderstanding the author's views don't carry a lot of weight.
"These days, one can use TensorFlow with JavaScript, Java, and C++. The community is also starting to develop support for other languages like Julia, Rust, Scala, and Haskell, among others.
PyTorch, on the other hand, is very centered around Python — that’s why it feels so pythonic after all. There is a C++ API, but there isn’t half the support for other languages that TensorFlow offers."
This is also a big reason why CUDA won over OpenCL.
NVidia recognised early on (around CUDA 3) that have a polyglot infrastructure was a much better option, so eventually plenty of toolchains started to target PTX, while OpenCL was stuck into "compile to C" alternative.
It didn't help that AMD and Intel hardly did anything to improve the situation when SPIR finally came out.
NVIDIA is so far ahead of the competition that people really do not care about the poor API that ROCm offers. Intel is not really a thing to even remotely care about for GPUs, they are still obviously important for CPU end. All the other players matter even less.
I badly wanted something decent as an alternative to CUDA for quite a while, but even after leaving AI myself, it is clear they are not near the standards that NVIDIA has been offering for nearing a decade now already.
We don't even have to get into all the issues consumers have gotten with AMD's GPUs.
I'm not deeply familiar with this space, but isn't training and inference on CPUs gaining some traction[1][2][3]? It's still surely orders of magnitude slower, but as CPUs get more powerful and ML frameworks get more efficient, making this cheaper and more accessible would be a major breakthrough.
CPU training was and continues to be one-tenth of the speed, dollar for dollar. It's not even close.
The inescapable trend is that bigger models are better models. So if you're doing this professionally, you're going to need GPUs not only for training, but increasingly for inference in order to get decent latency.
On the other hand, it's likely competition will catch up to Nvidia.
They were early and good. But the space is too lucrative, and the technology is not special enough that they will be able to keep their dominant position for long.
Nor does anything Khronos related for compute APIs.
While Apple initially promoted OpenCL, Android never supported it as official API rather Renderscript.
So on iOS eventually Apple moved into Metal, usable from Objective-C and Swift, while on Android, Google has deprecated Renderscript usable from Java and Kotlin, pushing everyone to now learn Vulkan compute and do the integration on the NDK by themselves while learning C, C++ and GLSL on the process (as no big deal from their POV).
Basically hardly anyone cares with such great usability story. /s
> Nor does anything Khronos related for compute APIs
As you said below, vulkan compute does work on mobile devices, and while not from Khronos (other than SYCL) there are a bunch of higher level libraries using it.
I don't think Apple is the only mobile vendor. I think that between AMD, Intel, Qualcomm, Mediatek, Samsung and Google, they will realize that they are all being screwed over by Nvidia. Right now ML is still mostly research but when real products start surfacing (or, perhaps, for real products to surface to begin with) they will have to settle on something, wether it's ROCM, OneAPI or something else.
You mean aside from their long history of developing open things, from Webkit (based on the KDE's open basic wkhtml and made into a full-blown browser engine) to Swift?
I would say OpenCL was "betrayed" by most Khronos consortium participants on some level or another, and those who didn't betray it still tried to find ways to vendor-lock rather than really sharing and advancing the ecosystem to where it needed to be.
Still, remember that if you're not in the PC space, CUDA might not even be an option for you.
Are you saying polyglot is better, and leads to dominance?
Are you saying, TensorFlow is more polyglot than PyTorch? Or if not, isn't that what your quote says, though? TF supports much more languages. Or what do you mean by polyglot?
But then, this would imply that TensorFlow would dominate over PyTorch? But the article says the opposite?
I'm not sure polyglot really matters for Deep Learning frameworks, for inference you can already export your model to a standard format like .onnx and have it run on about about any runtime, mobile and web included.
Sure you can't train your models in javascript with pytorch, but not sure you want to use any other language than python for that, since python already has all the ecosystem
I'd argue polyglot is only better when alternaives offer benifits. Like polyglot cloud providers you can mix prices, polyglot program runtimes you get multi-OS support, polyglot ORM you can migrate to new DB with shiny features.
What do you gain by polyglot ML models out of Python?
JAX should be mentioned [1]. It's also from Google and is getting popular these days. Not PyTorch-popular, but the progress and investment seem promissing.
When I first read about JAX I thought it would kill Pytorch, but I'm not sure I can get on with an immutable language for tensor operations in deep learning.
If I have an array `x` and want to set index 0 to 10, I cannot do:
x[0] = 10
I instead have to do:
y = x.at[0].set(10)
I'm sure I could get used to it, but it really puts me off.
Agreed that that's a bit ugly but at least in the ML context you rarely if ever need to do this (personally I only do this on the input to models, where we use pure numpy).
I feel the same. There are probably more ergonomic and generalizable ways to do whatever it is you need to do. Treat it as functional programming and let the XLA compiler handle things.
JAX would be amazing if it had a better debugging story. As it is, if something goes wrong in a transformed function, you lose pretty much all ability to use normal methods of fixing it. It practically just says ‘something went wrong.’
I think JAX is behind pytorch in production usability tho. Its a tool for cutting-edge architecture research but it lacks infrastructure to actually deploy the models in a production environment.
Is StackOverflow the right metric? I started with TensorFlow, switched to Keras and then PyTorch.
I had 100x the amount of questions with TensorFlow compared to PyTorch. With PyTorch I can jump into every part or the training process and easily debug. Often no need for StackOverflow.
I am glad not have to deal with TF’s black box VM approach that seems optimized for selling Google products, more than solving machine learning.
> For the sake of completeness, I’ve also included Keras in the figure below. It was released at around the same time as TensorFlow. But, as one can see, it’s tanked in recent years.
Seriously!? Author has no clear idea of what she's talking about. Keras IS THE front end of TF. They are not different, they are the same thing!
> PyTorch is better for students and research
> TensorFlow has the better deployment infrastructure
Ammm... OKAY!?
I'm not a TF fan or anything but reading some half baked absurd claims and saying one is better over another is just nuts. Both frameworks are built to satisfies the need of different demographics of projects, people and organizations. It's NICE to learn both, use both where required and I wish that both frameworks continuously evolves and make this space more accessible to built & deploy more complex solutions. Also f$%$ JAX, no one cares about you so just STFU.
On StackOverflow, so many questions will get closed, as being too vague, unclear, duplicate, off topic, or whatever. I assume many people moved to the official forum instead.
As ML becomes "commoditized" easy integration with other tools will likely enable completely new applications. In this sense pytorch has an edge as far the python ecosystem goes, where it is more "native".
But ultimately it will depend on what kind of such ML enabled "killer apps" become popular. Maybe they live in the browser or in some IoT and thus give the edge to js or c++ implementations rather than python.
But it's easy to develop any such killer apps in any of the ML frameworks. And there will not be a single such killer app, but there are many.
It's more a question what most of the community uses, esp the research community uses, as that is what gets into production first at some point. If you have your nice model already in PyTorch, why recode it in TF later?
So, the ML framework adoption always has a bias towards what's more friendly for the researchers, and developers.
I don't get the more native part - you can call a TensorFlow or a PyTorch model and get results from either in Python - what kind of integration are you imagining where the internal implementation of the model makes a difference?
I think people mostly refer to the debugging part while developing, not any other capabilities or API issues. So for example building a larger (monolithic) application entirely in python, where ML is only a part might be easier using pytorch.
> That being said, the data speaks a fairly simple truth. TensorFlow is, as of now, the most widespread deep learning framework. It gets almost twice as many questions on StackOverflow every month as PyTorch does.
I don’t like how easy to measure things are conflated to mean more important things.
SO traffic shouldn’t mean something is more widespread without some more information.
Maybe PyTorch has great documentation and doesn’t need as many questions. Or maybe PyTorch users are on slack or irc for their questions.
SO is a good sign that a tool is being used generally, but I don’t think should be used to compare. I think counts if SO are nominal and not even ordinal for purposes of a technology’s use.
I’m not sure how to actually evaluate products. Maybe downloads? Or publications? Or referencing projects, although this is hard as there are many non-OSS projects that can’t be counted.
Anyway, this seems to irk me since people will equate “n people tweeted” with meaning something other than “n people tweeted.” Like somehow something being measureable makes it mean something.
One of the pieces of evidence is the huggingface model zoo, and the proportion of torch-only vs tf-only models, but if you start poking around you'll notice just an overwhelming number of variations of the same few models that somebody has fine-tuned or maybe just used different random starts.
If you search for 'distilbert-base-uncased-finetuned-emotion' (picked only because I noticed it a lot down in the tail) you'll find five hundred and twenty seven results. Are they all the same? Different? Who knows!
I'm going to guess that there's some kind of 'learn ml' class that as part of the curriculum teaches students to upload to huggingface and that's skewing the numbers beyond use.
TF is being abandoned for JAX in many cases, most of the new things I see coming up in ML are done in JAX an Pytorch stays strong. It doesn't help that the TF docs are a catastrophe.
All you need to know about JAX from JAX main github page : "This is a research project, not an official Google product. Expect bugs and sharp edges. Please help by trying it out, reporting bugs, and letting us know what you think!"
JAX replaced TF by many Google research people internally, like Google Brain, and also DeepMind. However, TF is still very actively used by Google as well, esp for production. Google is big.
In the long run i expect this to move away from python entirely. Yes large piece of key loops are compiled but still doesn’t fit like a good fit for something this performance critical
Doesn't really matter, as Python is only the glue, and the hot paths happen in native code or straight on the metal/gpu.
But I'd wish there were as good libraries for typed languages. Exploratory coding in python is so hard, because you don't know what you can do with a library or your tensors etc.
It's more than a large piece of key loops that are compiled. Almost everything is compiled. Python hardly does anything beyond orchestration in modern scientific pipelines. Even loops written in Python can be compiled away.
It's almost like you are coding in a separate language altogether sometimes. E.g. you write something in Cython that gets compiled to C/C++. Same holds for things like numba/LLVM or Jax/XLA. PyTorch is going bigger on compilation in PyTorch 2.0.
Python can be seen as a common interface to write code that can be JIT compiled to multiple backends. It's quite convenient to be able to interact with all these backends, and bring the results back to a universally common interface (NumPy).
From this perspective, scientific Python itself is somewhat evolving by itself into something like Julia, which is JIT compiled to LLVM. Except Python's momentum has generated a huge number of backends for users to choose from.
I'm all for a new language dethroning Python in scientific computing. But I don't think it will be for reasons of speed. I think it'll likely be things like typing (e.g. from Dex lang) or library modularity (e.g. Julia).
But what then?
Python is the only language everyone can agree on.
I'd love to code in Julia, and I'd happily use something like Rust. But realistically? Everyone does Python, and getting Python code out of critical path's is less effort than trying to get everyone to write Rust.
I mean, Apple is trying to force people to code ML in Swift and I don't think I have ever even seen anyone in this space who even knows what Swift is. I mean, they provided all this nice ML framework to run stuff on M1+, and all everyone was asking: "Yeah but how do we do this from PyTorch?"
I hope we end up with rich, cross-platform dataframe capabilities via Arrow, but even then you need a whole ecosystem of tools to work with and visualise data. Perhaps Julia takes over but we’re some way off truly polyglot data science.
It's a big leap from "this isn't the most loved tool" to "dying a slow death". Plenty of companies use TensorFlow, some of them at huge scale, and they're getting stuff done with it just fine.
"TensorFlow has the better deployment infrastructure"
Tensorflow Serving is nice in that it's so tightly integrated with Tensorflow. As usual that goes both ways. It's so tightly coupled to Tensorflow if the mlops side of the solution is using Tensorflow Serving you're going to get "trapped" in the Tensorflow ecosystem (essentially).
For pytorch models (and just about anything else) I've been really enjoying Nvidia Triton Server[0]. Of course it further entrenches Nvidia and CUDA in the space (although you can execute models CPU only) but for a deployment today and the foreseeable future you're almost certainly going to be using a CUDA stack anyway.
Triton Server is very impressive and I'm always surprised to see how relatively niche it is.
There are lies, damned lies and statistics. Looks like the entire article is premised on a single StackOverflow trends chart (really, everything else is opinion). For an alternate set of data points here is Francois Chollet)
Sure, he maybe unbiased but why do you say his data is unbiased. He is in fact showing other stackoverflow data points not just one cherry picked one like the article posted (and lot of opinion about what is going on).
Folks mention the nativeness of Pytorch, but perhaps percieved simplicity is even better.
Python is a simple language, and for many AI researchers pytorch was an easy way to not get bogged down in implementation details for rapid experimentation.
Graph seems to confirm what they are saying, in terms of questions asked on Stack Overflow. Number of PyTorch questions are growing, number of TensorFlow questions have plateaued and maybe even shrinking, but bit early to tell.
I wouldn't necessarily say that growing/shrinking number of questions on Stackoverflow means a project is less/more popular than another, can be because of number of factors (more experienced developers, better documentation, better support channels via some other platform, and so on), so I don't think the conclusion is 100% on spot. But at least the graph accurately describes what it's supposed to describe.
It's about time to move out of 'Python prison' for ML frameworks. It was a good start for academics but for scalable enterprise deployment, python is a limiting language. I would love to see more ML Eng focussed libraries for Java, Swift, C# etc
Before Tensorflow, people (myself included) were largely coding all of this stuff pretty manually, or with the zoo of incredibly clucky homemade libs.
Tensorflow and PyTorch made the whole situation far more accessible and sane. You can get a basic neural network working in a few lines of code. Magical.
But it's still early days. George Hotz, author of tinygrad[0], a PyTorch "competitor", made a really insightful comment -- we will look back on PyTorch & friends like we look back on FORTRAN and COBOL. Yes, they were far better than what came before. But they are really clunky compared to what we have today.
What will we have in 20 years?
[0] https://github.com/geohot/tinygrad, https://tinygrad.org