TensorFlow for Python is dying?

gamegoblin · on Jan 16, 2023

While PyTorch is obviously the future in the short term, it will be interesting to see how this space evolves.

Before Tensorflow, people (myself included) were largely coding all of this stuff pretty manually, or with the zoo of incredibly clucky homemade libs.

Tensorflow and PyTorch made the whole situation far more accessible and sane. You can get a basic neural network working in a few lines of code. Magical.

But it's still early days. George Hotz, author of tinygrad[0], a PyTorch "competitor", made a really insightful comment -- we will look back on PyTorch & friends like we look back on FORTRAN and COBOL. Yes, they were far better than what came before. But they are really clunky compared to what we have today.

What will we have in 20 years?

[0] https://github.com/geohot/tinygrad, https://tinygrad.org

mmmmmbop · on Jan 16, 2023

George Hotz is a great hacker and problem solver, but I greatly doubt he has the patience to actually build a product.

This is a fantastic skill set to solo-hack a complicated piece of software and produce a proof-of-concept, but not for building a tool that people actually use. See his track record of never building any successful product after his PS3 & iOS jailbreaks, his latest resignation memo from Comma.ai, and his resignation from Twitter after just 4 weeks.

So I think it's kind of hilarious that you're citing him and his newest tinygrad project in one sentence with the assertion that PyTorch & friends will soon be considered ancient and clunky tools. While I don't necessarily dispute the latter, I'm certain that tinygrad won't be the one replacing them.

Aqueous · on Jan 16, 2023

George Hotz is exactly the kind of cowboy programmer that drives me up a wall. I encounter them in my company all the time. Due to some early success ( or even not ) a Self-absorbed, know-it-all who can produce gobs of code in the mold of a so-called ‘10x programmer’ and then move onto something else, leaving an unsupportable mess in their wake. It’s usually someone who likes to be seen programming rather than actually taking the time to build something for the long term. This is an attitude we should be actively trying to discourage.

goodside · on Jan 16, 2023

That’s a lot of meanness just to refute a detail that was never asserted.

lkrubner · on Jan 16, 2023

It's not mean, it's just a statement of a fairly obvious pattern in his career, which should influence how much weight we give to one of his predictions.

kelipso · on Jan 16, 2023

Why do we have so many of these overhyped personalities in the programming community? Guessing it's because half of the community have less than 5 years experience and younger people are more online but these personality cults are very offputting.

nathan11 · on Jan 16, 2023

Largely, because we're human. Nothing I've seen from other fields makes me think they're immune to it, or that devs are more susceptible to it.

Some fields are literally defined/created by personality cults.

birdyrooster · on Jan 16, 2023

I think because so many of us are invisible that when you have someone like George Hotz who received a ton of media exposure, you sort of are impressed that they did it or something?

UncleMeat · on Jan 17, 2023

Streaming. A handful of software engineers are now entertainers. This does a few things. It vastly increases their mindshare - they become one of the ten or so programmers known by the most people. It also encourages exaggeration. Streamers live off eyeballs. You want big loud over the top personalities with hot takes and extreme reactions.

stuntkite · on Jan 19, 2023

If there was a gossip rag at the grocery check stand that dished on drama between George and Elon, I’d buy it.

His rap about getting sued by sony is a classic.

maha2022 · on Jan 17, 2023

comma.ai is quite successful. it does what it advertises and profitable. it is a viable product and used and loved by many. it is amazing how much they achieved given the little funding they raised compared to the money raised in this space. it is a great achievement to get it all this way

imiric · on Jan 16, 2023

> What will we have in 20 years?

Hopefully something that doesn't require the user to be a math wizard.

I'm not into data science by any means, and I first learned about tensors with Tensorflow. While I would like to experiment with neural nets, the current frameworks require me to learn the foundations of how they work. I'm sure I'll get arguments that this is a good thing, but I just want to do some cool stuff with computers, not take math courses to learn how the cake is made.

High-level programming languages are abstractions to machine code that, for the most part, don't require the user to know how they work internally. I think we're missing similar abstractions for ML, or it's possible they already exist and I haven't looked hard enough.

ansgri · on Jan 16, 2023

I'd rather hope that basic linear algebra and the notion of vectorized function application becomes universal math knowledge like addition and multiplication, as basic programming slowly becomes. We need languages and compilers with first-class optimized differential programming support, sure, but increasing baseline education would benefit wider population.

imiric · on Jan 16, 2023

I agree with your last point, but most people have no practical need to learn high-level math. We struggle globally with even basic education rates, and programming definitely isn't for everyone.

What is needed are higher-level abstractions over ML principles that make ML more accessible for people who are interested in programming, but lack the math background to be productive. This doesn't necessarily mean that the lower-level details should be entirely hidden, but that the tools are friendly enough for beginners to easily pick up, and flexible enough for advanced users to tweak to their needs.

mananaysiempre · on Jan 16, 2023

Note that “high-level math” by your definition covers things that were basically understood by 1900 (a good part even by 1850) and polished by 1940. It’s an incredible failure of the education system that any of this still counts as “advanced” in any way in 2023[1].

So abstract away, by all means, ensuring the user doesn’t need to know the Biot-Savart law to use a doorbell is the engineer’s job after all, and as semiconductor electronics show, users can include other engineers. (Hell, even the humble resistor is a fairly advanced piece of technology dependent a good chunk of 20th-century physics.) But sooner or later we’re going to have to pay back this “ed debt”. (“Soc debt”?)

[1] See my recent comment at https://news.ycombinator.com/item?id=34389684 for a starting point for a literature dive.

imiric · on Jan 16, 2023

As you mentioned below, most people don't understand basic calculus from the 17th century, and I would argue that they don't need to.

Going by dates of introduction to determine if a math topic should be part of general knowledge doesn't make sense. Math was highly advanced by the 20th century, and a lot of it was used to advance the field of physics, which led us to the technological advancements we all enjoy today. None of it needs to be fundamentally understood by the layperson.

You sound well versed in math, but most people just don't need this knowledge in everyday life. Precisely because engineers build tools to make everyday life easier for everyone, regardless of education opportunities. Our society couldn't function any other way, and asking for everyone to have advanced math or engineering knowledge is unrealistic.

mananaysiempre · on Jan 16, 2023

> Going by dates of introduction to determine if a math topic should be part of general knowledge doesn't make sense.

On one hand, yes, and that’s part of the point I usually make when I recite how far into the standard university course Newton’s or Gauss’s knowledge can still be found. On the other hand, no, and that’s the other part of the point I want to make when lamenting high-school maths, because the state of the art there is obsolete to a level that’s patently insane. Thank you for making me think about this.

I’ll put it like that. How often do you change your mind? (Or, if you want: how often do you make breaking changes to your APIs?) Obviously it makes to sense to keep a running tally and decide whether to do it or not depending on which side of the target value you’re currently on. Making a performance indicator out of this is a sure-fire way to end up in a completely twisted universe.

On the other hand, if you do it all the time something’s equally obviously gone awry—maybe you just started out being very wrong, and that’s not a problem with the current you, but if after some time there’s still no signs of settling you need to do some soul-searching. Similarly but even more dangerously, if you never do it, it’s of course possible you were simply right to begin with, and that’s happened once or twice in history, but you must seriously consider that you might either be living among fools or avoiding the intrinsic discomfort of the process and thus a fool yourself.

I’d say the recency of a general-education syllabus is an indicator of a similar nature: meaningless in each particular instance, and emphatically not to be used as a target metric, but at the same time as the average goes into the margins it’s increasingly likely it’s failing at its job. And by that measure maths is universally in deep shit. (The natural sciences are as well, just with more variance; I’m not qualified to judge the rest.)

imiric · on Jan 16, 2023

I'd say general education is failing at its job for many different reasons than not teaching advanced mathematics.

On the top of my list:

- Not teaching students how to learn, and instilling them with the joys of perpetual learning, but forcing them to memorize concepts and regurgitate conclusions made by others, so they can pass an easily gradeable multiple choice test.

- Teaching a _bunch_ of useless concepts for everyday life, including some basic math concepts, let alone the math topics you've mentioned, but not teaching about the basics of finance, law and politics which would be much more relevant for leading a generally successful life.

- High corruption, low teacher wages, bullying, etc.

General education is not as much concerned with actually preparing young adults for adulthood, as it is with getting them through a well-established system designed to squeeze any actual desire to learn from students, while squeezing their parents financially. It's a broken system that won't be fixed by adding advanced math to the syllabus.

brookst · on Jan 16, 2023

Exactly. There is no way a single person can understand even the 1700’s state of the art for every single scientific discipline, let alone the 1800’s levels.

KKKKkkkk1 · on Jan 16, 2023

> Note that “high-level math” by your definition covers things that were basically understood by 1900 (a good part even by 1850) and polished by 1940. It’s an incredible failure of the education system that any of this still counts as “advanced” in any way in 2023[1].

Derivatives and matrices are high-school math.

mananaysiempre · on Jan 16, 2023

Depends on the country, and furthermore where I’ve seen them included they’re isolated either to a specialized track or to the last months of school, which still counts as “advanced” as far as I’m concerned and, I expect, as far as the students think about it.

(I also think learning linear algebra from the matrices side rather than from the linear transformations side, as was indeed common for mathematicians in the 19th century and for physicists and engineers well into the 20th, is both unnecessarily painful and almost impossible to pull off well. So if a syllabus lists its basics-of-linear-algebra part as “matrices” it’s probably missing the point. Knowing that you do rows times columns is of very little value, the part that makes it worth it is the why.)

kgwgk · on Jan 16, 2023

It’s 2023! There is no reason why high-school students couldn’t learn everything that was known to humanity in 1940, right?

mananaysiempre · on Jan 16, 2023

Hell no, that goes beyond sarcasm and well into egregious goalpost shifting.

For reference, things known to humanity in 1940 (though not necessarily understood, that was an important difference in my comment) include the Schwarzschild (stationary black hole) and FLRW (homogeneous expanding or contracting universe) solutions to the equations of general relativity (1917 and 1924 respectively), the Darwin term for the effect of special relativity on the spectrum of hydrogen (1928), the severely non-classical exchange interaction responsible for the existence of ferromagnetism (same year), the London equations describing the (non-)penetration of the magnetic field into type-I superconductors (1935), and a way to produce superfluid helium (1937). And that only includes things I was supposed to ( :( ) know at some point. Also Gödel’s incompleteness theorems (1931) and Turing’s proof of undecidability of the halting problem (1936). I’m certainly not proposing to demand high schoolers know all of those.

The only reason I had to go as far as 1940 or so is because linear algebra came relatively late in the history of mathematics, even though it should logically serve as the foundation of multivariable calculus—and was enthusiastically adopted as such once it finally entered the collective mathematical unconscious in the late 19th century. Working out the pedagogy took a couple more decades.

If we were only talking about the “advanced” mathematics of calculus, I’d say Newton’s knowledge as of 1690 would be way overkill. (That would include the convergence speed of power series, a decent theory of ordinary differential equations, the beginnings of the calculus of variations, and even the “Newton polygons”, a theory of formal-series solutions to polynomial equations that properly belongs to algebraic geometry.) At that point J S Bach was five, the city of Philadelphia was a town founded eight years ago, and Peter Romanov (later called the Great) was still twelve years away from founding St Petersburg and thirty years away from proclaiming Russia an empire.

To be honest, I think even if school taught no mathematics beyond basic numeracy, I wouldn’t complain as much. I would still complain, mind you, but only to the extent I do about being able to grow up without knowing what a fugue or the twelve-bar blues is or who Giotto or Niccolò de' Niccoli were. Instead we torture people with years of alleged mathematics bearing no resemblance to the real thing, and thus give them license to think they are not “maths people” (whatever that means) and that whatever maths they did not hear about in school is “advanced” and obscure. At least the average high-school graduate knows he doesn’t have a clue about music history or theory.

bluesnowmonkey · on Jan 16, 2023

He was making an absurd statement to prove a point. No need to go all Good Will Hunting on us.

mananaysiempre · on Jan 16, 2023

Sure, I just couldn’t discern the point behind the absurd (and if someone could I’d appreciate being told).

My tirade had a point as well, though: post 20th century our collective intuitions have become wildly miscalibrated regarding which things in humanity’s understanding of the world are recent or obscure. It seems that the barrier of “advanced” mostly hasn’t shifted at all since high school became compulsory in most parts of the world, and as decades go by I can see this going from “a goddamn shame” to “existential threat to human culture”. I’m not even sure that hasn’t happened yet—already a high school teacher can rarely, if ever, thoroughly explain a recent development of their choice in their subject.

bfors · on Jan 16, 2023

I read your tirade and I'm now interested - what do you think the ideal high school math curriculum looks like?

webnrrd2k · on Jan 17, 2023

One could just as easily argue that modern humans evolved about about 200,000 years ago, and something that existed for approx 0.0005% of human existence is obviously advenced.

We struggled along for 200,000 years without it, so I'd hardly call it an incredible failure that more people don't learn it[1].

[1] The usual disclaimers apply - im not anti-math, and I think more people should learn math, science, etc...

swagmoney1606 · on Jan 17, 2023

For those of us who were failed by the education system, is there a good way to teach oneself mathematics?

My knowledge embarrassingly stops at Algebra II (USA system).

beepbooptheory · on Jan 16, 2023

Why though? Aren't everyday users more interested in finished models themselves? Why would they want to make one, especially if they, in this example, don't really know how they work anyway? What is the value here?

imiric · on Jan 16, 2023

"Everyday users" aren't programmers, but people who would use the apps powered by ML. Think ChatGPT.

There are programmers, though, who are not fluent in the math that makes ML possible, but would be empowered by friendlier tools.

niemandhier · on Jan 16, 2023

That funny, I looked into the whole neural net stuff the other day and thought: "Wow you basically don't need any math".

I guess it depends a lot on perspective, to me the whole field looks like alchemy more than science. The results cannot be denied, but we understand so little.

deepGem · on Jan 16, 2023

This is true. In the current scheme of things, you don't need to know any math to get things going. At some level, if you really need to write a custom model from scratch, probably you need to - but not really.

Now you have the likes of HuggingFace which is another abstraction on top of Pytorch. Heck you don't even need to code a transformer from scratch. They have done it for you.

from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") model = AutoModelForMaskedLM.from_pretrained("bert-base-uncased")

I was like wow, this level of abstraction is totally breaking barriers to entry and we will see many many flavours of huggingface in the years to come.

calebkaiser · on Jan 16, 2023

I think this is more to do with the way software ecosystems develop in general. You don't need to know much about the actual implementation of databases or low-level programming to build a data-heavy application, and you don't need to know the complexities of graphics programming to render complex visuals to a screen.

If you want to do research in any branch of ML, you are almost certainly going to need some strong mathematical foundations. At the same time, if you're more interested in applying existing architectures/models, the ecosystem has a lot of really amazing tools that will take you a very long way without requiring you to study statistical learning theory.

odshoifsdhfs · on Jan 16, 2023

Try fast.ai (https://http://fast.ai)

They have a course that goes exactly like that (Practical Machine Learning or similar). You start high level with libraries and just a few lines of code, and gives you the high level tools to use ML in your projects, and sometimes goes a bit lower level when needed.

tluyben2 · on Jan 16, 2023

Also the book [0] is nice as a companion or when you don't want to watch videos (I don't).

[0] https://www.amazon.com/Deep-Learning-Coders-fastai-PyTorch/d...

palotasb · on Jan 16, 2023

https://www.fast.ai/

imiric · on Jan 16, 2023

Thanks! I've had this bookmarked for years now, but haven't gotten around to checking it out. It's on my list to look into next time I get the ML itch. :)

EthicalSimilar · on Jan 16, 2023

Link is broken.

cm2187 · on Jan 16, 2023

The right analogy is cryptography. You don't need to understand the math behind cryptography to calculate a hash, or encrypt something. You just need to understand at a very high level what is asymmetric encryption, IV, etc.

netheril96 · on Jan 16, 2023

That’s actually a bad analogy in my opinion. People who don’t learn cryptography make many mistakes calling cryptography libraries and those mistakes can’t be uncovered by testing.

vintermann · on Jan 16, 2023

Learning how to use cryptography responsibly is different from learning to design and attack cryptographic algorithms, and fortunately a lot easier.

bborud · on Jan 16, 2023

Oh, it'll all be great when ChatGPT writes all that code for us and we can be off somewhere having a beer and a cry instead.

liotier · on Jan 16, 2023

You'll cry but, to listen to your sorrow and offer friendly support, ChatGPT will be there for you !

bborud · on Jan 17, 2023

That makes me think of a chapter in a Daniel Suarez novel where an AI tortures a human for a prolonged period of time.

liotier · on Jan 17, 2023

Or the Fifth Element's robot bartender, who will do great with a ChatGPT upgrade - https://youtu.be/sFAPW3W2yMk?t=114

charcircuit · on Jan 16, 2023

People can make ML mistakes if they don't learn ML. Some of the mistakes can be uncovered by testing, but only if you know enough about ML to know how to test it. Some mistakes just you with a bad model that could have been better.

im3w1l · on Jan 16, 2023

I think what we all want is some kind of blackbox where data goes in and insight comes out. Ideally we would have something that takes a bunch of data and finds the shortest program that generates it. Unfortunately such a black box cannot exist, it is uncomputable. So then we need to ask for something less than that and we don't know exactly what that is.

optimalsolver · on Jan 16, 2023

Maybe not the shortest program, but perhaps a list that contains the shortest program:

https://arxiv.org/abs/1212.6104

https://arxiv.org/abs/1301.1547

im3w1l · on Jan 16, 2023

Interesting but how would one use such a list to make predictions?

renonce · on Jan 16, 2023

If the data was generated from a generator of constant complexity, its kolmogorov complexity cannot exceed the complexity of the generator. As the amount of data increases, the kolmogorov complexity will increase until it converges after a finite amount of data. At this point we effectively have the generator so we can generate infinite data.

KRAKRISMOTT · on Jan 16, 2023

Would you mind elaborating on the proof?

WastingMyTime89 · on Jan 16, 2023

> I'm sure I'll get arguments that this is a good thing, but I just want to do some cool stuff with computers, not take math courses to learn how the cake is made.

It’s not so much how the cake is made but rather what you are actually doing.

You can already make something cool today by using end products and stringing their api together which might be what you are looking for.

If you want to make something new however and wish to use ML tools, well, at some point, you will have to learn what these tools actually are.

adw · on Jan 16, 2023

The ML is made of, and expressed in terms of, math. There’s no way round that if you want to train models; you need first-year undergrad statistics and linear algebra. In other words, just as you do with any language, you need to understand the grammar and vocabulary.

You don’t have to know how the operations involved doing what they’re doing, in the sense that you don’t need to know how matrix multiplication is implemented any more than you need to understand Timsort to call sort() in Python, but you do need to know matrix arithmetic. That isn’t being a math wizard, it’s the price of entry.

laci37 · on Jan 16, 2023

I don't think that you need to know any of the underlying maths to make a neural network in TF/Keras. You need to know (not understand!) the building blocks, and what are they good for, but learning these feels like learning the difference between lists and dicts. You can do programming without it, but it will be ugly.

eru · on Jan 16, 2023

> I'm not into data science by any means, and I first learned about tensors with Tensorflow.

Tensorflow doesn't actually have any tensors. At least not in the sense that tensor used to be used before Tensorflow. Tensorflow just has arrays.

See https://en.wikipedia.org/wiki/Tensor for the old, original meaning.

proto-n · on Jan 16, 2023

I understand what you mean, and I think we are unfortunately not there yet. Right now you still mostly need to understand the details to make something work. In many areas, it's not even just a leaky abstraction [1] where it works 90% of the time, but more like a series of magic incantations and tricks you need in a specific order to make it work. But as far as I can tell its getting better and more robust in general.

[1] https://news.ycombinator.com/item?id=13215590

zackmorris · on Jan 16, 2023

Came here to say this. Tensors are the wrong level of abstraction, because they're just matrices, which are just arrays.

The best tool I've used for vector/matrix processing is MATLAB (or GNU Octave, which is free but without as many libraries). To say I'm disappointed that these never gained the same traction as Python would be an understatement.

Unfortunately, these tools all suffer from a lack of insight from programmers. We abstract whenever possible, but for whatever reason, academics have a certain expectation that the user learn the entire system before starting. I believe that's why functional programming never quite caught on. That egocentric view infects everything from Mathematica/WolframAlpha to UNIX.

I prefer other models. For example, I don't feel that anyone has explored genetic algorithms under the Actor model yet. Which severely limits how fast the user can explore a problem space. Part of that is due to hardware. A GPU may present itself as having X number of teraflops or whatever, but only have something like 64 cores and no API to access them directly like a CPU. Which limits it to a paltry number of environments and/or automatons to evolve.

A better architecture would be something like 1024+ symmetric cores with a unified copy-on-write, content-addressable memory accessible through ordinary C desktop programming and orchestrated with something like Docker. Then we could explore auto-parallelization with languages like Julia and Clojure, possibly even Rust. I've written about that extensively throughout the years, but there's still been no change to the status quo, so I've all but given up on that. I do see a small hope with RISC-V on FPGA though.

The somewhat ironic thing is that AI is evolving so quickly that it's passed the Turing test now for the masses. We only have maybe 5-20 years before it solves all problems better than us. So it's an odd feeling for me to have waited 20 years with no progress on the hardware that I needed to do what I wanted to do, only to make the thing I wanted to do irrelevant anyway. That's the challenge we all face now as play is automated by AI while we still have to go to work and pay rent within a 20th century Matrix reality. I hope we find a way to reverse that trend before we're too old to care anyway.

visarga · on Jan 16, 2023

What you need is HuggingFace, a higher level library that comes with a model zoo, ready to plug and play.

jiggawatts · on Jan 16, 2023

They don't really exist.

Ideally, this should all be integrated at the compiler level. Programming languages and their associated standard libraries should expose differentiation as a first-class abstraction, along with everything else.

The problem is that the whole space is really immature, and huge strides are being made... but by many disparate groups, all working in different spaces and languages.

Just in the last few months I read about several Python ML frameworks, a Julia differential compute capability, and that Haskell has a library where even the type system can express differentiation of general algebraic data types!

The whole ecosystem needs to... congeal a little bit.

This whole space reminds me of Object Oriented programming before C++ and Java.

consilient · on Jan 16, 2023

> Haskell has a library where even the type system can express differentiation of general algebraic data types!

It's not really the same sort of differentiation. Derivatives of algebraic data types (as implementable in Haskell) only generalize formal derivatives of polynomials. You need a lot more machinery before you can talk about smoothness.

mikewarot · on Jan 17, 2023

You don't have to know how an internal combustion engine works to use one. If you need to troubleshoot or repair it, you have to know the basics. The same is true of neural networks.

Using Tensorflow doesn't require you to know all the details, you build a model of the network, and it actually executes that model. The low level stuff is handled.

You still have to know how to tell when it is working, or not. Properly rationing your training/testing/validation ports of data is surprisingly important for not ending up with a model that seems to work great until you use it in the real world.

Bombthecat · on Jan 16, 2023

My wild guess? We will get AI driven UI Frameworks as a tool. You drag and drop stuff in there, it suggests already the basic stuff. Preselects sane filters and normalizes your data etc.

Will handcrafted, math used me better.

Very likely.but a VW is good enough for 99 percent of use cases :)

awesomegoat_com · on Jan 16, 2023

I am highly suspicious of people who are not math wizards.

And guess what? It is good life strategy.

catwell · on Jan 16, 2023

> While PyTorch is obviously the future in the short term

Well, that's not obvious for me. The metrics do not support the death of Keras / TF (https://twitter.com/fchollet/status/1614322127460782087)

François Chollet's Deep Learning with Python is arguably the best book on the topic and teaches people using Keras / TF2. The Keras documentation is excellent as well, and in the long run it will matter. Personally I prefer the way the Keras / TF2 APIs work to PyTorch as well.

Also, I am pretty sure a JAX backend is coming to Keras, and there is TF Lite...

I cannot make a prediction as to which stack will dominate in five years.

adw · on Jan 16, 2023

The relevant metric is the literature - adoption is driven by new models, and most new models in academia are being implemented in PyTorch or JAX. (In particular, TensorFlow 2 is nowhere to be seen.) The new grads joining the workforce are going to continue to use the tools they are comfortable with and new projects are built around new models.

zone411 · on Jan 16, 2023

Yes, I have been reading a lot of papers in the last couple of years and the corresponding code, if available, is using PyTorch much more often than TensorFlow.

grepLeigh · on Jan 16, 2023

If you're familiar with the plumbing/porcelain API paradigm, JAX depends on TensorFlow plumbing (XLA) with a more ergonomic porcelain API.

You might not see TensorFlow's plumbing much anymore if you're a new grad running experiments in a notebook, but the "porcelain API" is just the tip of the ice berg of modern machine learning.

If you do any work on the JAX framework, you're frequently working with both the JAX and TensorFlow code repositories: https://github.com/google/jax/blob/main/WORKSPACE#L17

jackblemming · on Jan 16, 2023

>George Hotz, author of tinygrad[0], a PyTorch "competitor", made a really insightful comment -- we will look back on PyTorch & friends like we look back on FORTRAN and COBOL.

Excuse me but, what? You could say this about pretty much every "hot" framework. Technology moves fast.

letitgo12345 · on Jan 16, 2023

Theano preceded Tensorflow by 8 years. So I don't think it was all manual coding before TF. Torch (in Lua) was also around much before TF

gamegoblin · on Jan 16, 2023

That's definitely true, though worth noting that those libs didn't have anything close to the dominance the modern ones do.

I was doing ML academia stuff in the pre-TF era, and most of my department was using custom C++/CUDA, or a pile of random stuff like https://en.wikipedia.org/wiki/Weka_(machine_learning) and https://en.wikipedia.org/wiki/Waffles_(machine_learning).

Our department also didn't even have GPUs to use for the first part of my time there, whereas now it would seem ridiculous to have an ML research wing without access to GPUs.

quietbritishjim · on Jan 16, 2023

I think the TensorFlow release roughly coincided with the widespread adoption of proper deep learning library, but it didn't cause it. In fact the first TensorFlow seemed to be just after (by 6 months to a year) the inflection point of deep learning library usage.

At the time, there seemed to be a lively ecosystem but no clear winner. Theano, Caffe and Torch were the main hitters but there were smaller players too. Torch was particularly good, maybe functionally ahead (but I don't really remember for sure) but hampered because, at the time, it was Lua only. In fact Keras's first release said "Keras is a minimalist, highly modular neural network library in the spirit of Torch, written in Python / Theano so as not to have to deal with the dearth of ecosystem in Lua."

When TensorFlow was released, it looked like that would be the death of all the other libraries. The fact that any others survived at all is really down to how hard to use TensorFlow was (/is?). It didn't live up to the expectation of being the killer of other libraries.

fxj · on Jan 16, 2023

Hey! I am still using Fortran and love it. For me it is as fast as C, but much better to write and debug. But YMMV.

vgatherps · on Jan 16, 2023

My work might make me biased but I think there’s a lot of room to make more general purpose code differentiable.

I work on a ton of stuff that would very much benefit from being differentiable, but also very much can’t fit at all into the “stack a ton of linalg-like layers”.

It’s a huge engineering effort to even try to think about how I might start taking out derivatives. It’s possible but there’s so much overhead to do anything with it.

uoaei · on Jan 16, 2023

Julia has metaprogramming as a fundamental principle of the language. This makes for a very concise and powerful system that, together with a well-written AD framework like Zygote, makes every expression differentiable, meaning that effectively the entire language is differentiable.

And that isn't even the coolest thing about working in Julia: just wait til you see what people can squeeze out of macros.

pas · on Jan 16, 2023

so why it hasn't taken over the ML world already? or it did? or there are too many ML "researchers" who haven't bothered to improve their own tooling and are trapped in Anaconda?

staunton · on Jan 16, 2023

The Julia community is small and has no large commercial backers. Projects such as TF/PyTorch require community support and a lot of investment which Julia just doesn't have. In fact, Julia isn't even trying at the moment to "compete" with TF/PyTorch [1, 2].

[1] https://discourse.julialang.org/t/state-of-machine-learning-... [2] https://news.ycombinator.com/item?id=29682507

cbkeller · on Jan 18, 2023

I think this is the best answer, as someone who's been using the language for most of my work for a few years (but not in ML).

Flux.jl is probably the highest-profile relevant effort, but it's been (AFAIU) pretty much entirely volunteer-developed for the past 2+ years.

snovv_crash · on Jan 16, 2023

I've worked at 2 companies that would have liked to use Julia but it wasn't (and still isn't) product ready for anything involving high reliability or robustness.

geysersam · on Jan 16, 2023

Typical ML doesn't require an entirely differentiable language.

For applications that do need that (mainly "scientific machine learning") Julia has taken a considerable chunk of the market.

To be fair Pytorch is more convenient if you're not doing anything unusual. And Julias more flexible AD comes with some sharp edges.

martopix · on Jan 16, 2023

Pytorch can be used in a very general purpose way. It's essentially numpy + automatic differentiation + GPU support. All the 'linalg-like layers' are entirely optional. If you write y = A*x+b in pytorch, that works, and is differentiable.

wenc · on Jan 16, 2023

> make more general purpose code differentiable

Non-smooth functions (e.g. abs(x)) can be handled with bundle methods, but how would one make inherently discontinuous (non-convex) functions differentiable? (e.g. if x then 1 else 5)

Discrete problems are inherently non-differentiable. There are approaches like complementarity methods and switching functions (tanh) usually end up with numerical issues.

taylorius · on Jan 16, 2023

This already happens to an extent in existing ML pipelines. The ReLU activation function, is discontinuous in its derivative, and is one of the most widely used functions in neural networks. Its derivative looks like this.

if(i<0) return 0; else return 1;

Now ReLU is continuous itself (as well as being monotonic) so it still cooperates relatively well with gradient descent algorithms. I think this is where the problem lies - not with differentiability itself, but with gradient descent not working due to the highly non-convex search space that such general programming constructs will produce.

wenc · on Jan 16, 2023

ReLU is not discontinuous; it is nonsmooth but continuous, hence derivatives exist except at the hinge points.

Inherently discontinuous functions OTOH are disconnected and nonconvex. Gradient descent works, but you have to add a step to first partition the discrete space like branch and bound. This involves solving the continuous relaxation to find a bound. This does not require differentiability (it is not differentiable), but the price to pay is that it is combinatorial (NP hard)

The OP was talking about general differentiability but inherently discontinuous functions form a large and important class of functions (from software programming) that are not differentiable.

taylorius · on Jan 16, 2023

I could be wrong, but I think we're saying exactly the same thing. :-) I agree, anyway.

KKKKkkkk1 · on Jan 16, 2023

> Now ReLU is continuous itself (as well as being monotonic) so it still cooperates relatively well with gradient descent algorithms.

A function with a discontinuous derivative cannot cooperate with gradient descent algorithms. That's why you have the famous problem of "dead neurons".

taylorius · on Jan 16, 2023

Dead neurons do not come about because of ReLU's discontinuous gradient. It is because for large regions of parameter space (i<0) the gradient is 0.

KKKKkkkk1 · on Jan 16, 2023

> Dead neurons do not come about because of ReLU's discontinuous gradient. It is because for large regions of parameter space (i<0) the gradient is 0.

If they had a continuous derivative that would not be the case.

taylorius · on Jan 16, 2023

Imagine an alternative ReLU which had a narrow curved section, to smooth out the discontinuity. Now it has a continuous derivative, but that gradient is still zero for values < 0. This flat region is the cause of dead neurons, because backprop multiplies the propagated error by the gradient, to update the weight - and if the gradient is 0, the result of the multiplication is 0 , and the neuron's weights do not get adjusted.

A derivative can be continuous, and zero.

wnkrshm · on Jan 16, 2023

And then people go to empirical data and apply the Great Smoothing: By dropping ML/DL methods on the data (also including results from discontinuous behavior), continuity is often implicitly assumed.

CoastalCoder · on Jan 16, 2023

I think the concept you might be looking for is "piecewise differentiable".

laichzeit0 · on Jan 16, 2023

Or just use non-gradient based optimisation? Maybe I'm missing something.

geysersam · on Jan 16, 2023

That's usually really inefficient. But I agree that's the right direction (pun intended).

Question is: How can we extract similar amount of information about the problem without relying on derivatives?

kavalg · on Jan 16, 2023

Have you tried JAX?

thom · on Jan 16, 2023

So you mean we’ll still be compiling PyTorch as a dependency for various libraries for the next 40 years?

singularity2001 · on Jan 16, 2023

one step in the right directions are labeled dimensions, where you can write input = width * hight * color * batch and never have to worry about tensor permutations again.

experimental support in pytorch

albertzeyer · on Jan 16, 2023

It's called named dimensions, named tensors, dimension tags: https://pytorch.org/docs/stable/named_tensor.html

It's already there in PyTorch since a while (2019). However, as far as I know, it has not found too much adoption yet.

In our framework RETURNN, we also make heavy use of dimension tags: https://returnn.readthedocs.io/en/latest/getting_started/dat...

CoastalCoder · on Jan 16, 2023

I forget the details, but I think Nervana also had a library(?) that used this concept before they were bought by Intel.

amelius · on Jan 16, 2023

Is that part of the type system?

singularity2001 · on Jan 16, 2023

maybe I misremembered; https://pytorch.org/docs/stable/named_tensor.html

KKKKkkkk1 · on Jan 16, 2023

PyTorch and Tensorflow were good enough to implement all the major advanves in recent years: the AlexNet successors, GPT, Stable Diffusion. It's not early days, and the market will not evolve unless there is a new application that cannot be served by PyT and TF.

mattlondon · on Jan 16, 2023

> What will we have in 20 years?

Whatever it is, I sincerely hope that it is not driven by Python, and instead we have a more accessible language.

Say what you will about Python, but it is a total pain in the arse to work with. The 2.x legacy still casts a long shadow (in terms of online guides, unmaintained libs etc), anaconda, pip, easy install, strings/unicode, type hints (or not), unreadable & unmaintainable list comprehensions etc etc etc. What a dumpster fire, and a truly hateful language.

Here is hoping that some other language takes over. I have my personal take, but I won't let that ruin the rant :)

ukghal · on Jan 16, 2023

In part people like Python because it allows you to be busy with boilerplate, the deficiencies of the language and packaging as well as bugs and unexpected tracebacks. It makes you feel productive.

This is the same angle that pg used in one of his essays: Back then people liked Java because the huge amount of boilerplate made them look productive. In Lisp you stare into the REPL and notice that you don't have any ideas about the actual problem you are trying to solve.

I would also prefer another language. A well written C++ library (the one in pytorch seems a bit rough and a second class citizen) would already be a step up.

awesomegoat_com · on Jan 16, 2023

I think you are on to something re: python and java.

badpun · on Jan 16, 2023

My "favourite" part is Anaconda taking multiple days to say downgrade Spider v3.3.0 to v3.2.0 (or whatever) in an environment. This operation should ideally complete in under a second, Anaconda is probably one of the most inefficient software on the planet that has widespread adoption.

visarga · on Jan 16, 2023

Hey, I like list comprehensions, thank you very much.

jjgreen · on Jan 16, 2023

Bummer, list comprehension is deprecated, you should instead use a "virtual list" maintained by some kid who will shortly be losing interest ...

geysersam · on Jan 16, 2023

That's a strange comment. Python encourages using its basic container types. Very unusual to use a third party library for something like that.

I've never encountered a "virtual" anything in Python. That's more common in c++ or Java.

CoastalCoder · on Jan 16, 2023

I hate python for big software systems as much as the next guy, but for whatever reason some people are quite productive with it.

I think programmers fall into different camps on this issue.

CamperBob2 · on Jan 16, 2023

"Python is the new VB6." Simultaneously the greatest compliment and the worst insult I can come up with for it.

jollyllama · on Jan 16, 2023

>Say what you will about Python, but it is a total pain in the arse to work with.

Ok. What would be your suggestion for a language that "just works?"

pelorat · on Jan 16, 2023

ergo14 · on Jan 16, 2023

You are holding it wrong ;)

Nimitz14 · on Jan 16, 2023

Where do you work so I can ensure we never meet?

jmmcd · on Jan 16, 2023

> For the sake of completeness, I’ve also included Keras in the figure below. It was released at around the same time as TensorFlow. But, as one can see, it’s tanked in recent years. The short explanation for this is that Keras is a bit simplistic and too slow for the demands that most deep learning practitioners have

Keras is nowadays the front-end to TF, so this paragraph doesn't make sense. In light of this misunderstanding the author's views don't carry a lot of weight.

pjmlp · on Jan 16, 2023

"These days, one can use TensorFlow with JavaScript, Java, and C++. The community is also starting to develop support for other languages like Julia, Rust, Scala, and Haskell, among others.

PyTorch, on the other hand, is very centered around Python — that’s why it feels so pythonic after all. There is a C++ API, but there isn’t half the support for other languages that TensorFlow offers."

This is also a big reason why CUDA won over OpenCL.

NVidia recognised early on (around CUDA 3) that have a polyglot infrastructure was a much better option, so eventually plenty of toolchains started to target PTX, while OpenCL was stuck into "compile to C" alternative.

It didn't help that AMD and Intel hardly did anything to improve the situation when SPIR finally came out.

nicce · on Jan 16, 2023

People don’t like to be vendor locked anymore. For that reason, they really do seek options to get out of NVIDIA monopoly.

latency-guy2 · on Jan 16, 2023

NVIDIA is so far ahead of the competition that people really do not care about the poor API that ROCm offers. Intel is not really a thing to even remotely care about for GPUs, they are still obviously important for CPU end. All the other players matter even less.

I badly wanted something decent as an alternative to CUDA for quite a while, but even after leaving AI myself, it is clear they are not near the standards that NVIDIA has been offering for nearing a decade now already.

We don't even have to get into all the issues consumers have gotten with AMD's GPUs.

imiric · on Jan 16, 2023

I'm not deeply familiar with this space, but isn't training and inference on CPUs gaining some traction[1][2][3]? It's still surely orders of magnitude slower, but as CPUs get more powerful and ML frameworks get more efficient, making this cheaper and more accessible would be a major breakthrough.

[1]: https://arxiv.org/abs/2206.10034

[2]: https://www.intel.com/content/www/us/en/developer/articles/t...

[3]: https://www.usenix.org/conference/atc19/presentation/liu-yiz...

freeone3000 · on Jan 16, 2023

CPU training was and continues to be one-tenth of the speed, dollar for dollar. It's not even close.

The inescapable trend is that bigger models are better models. So if you're doing this professionally, you're going to need GPUs not only for training, but increasingly for inference in order to get decent latency.

geysersam · on Jan 16, 2023

On the other hand, it's likely competition will catch up to Nvidia.

They were early and good. But the space is too lucrative, and the technology is not special enough that they will be able to keep their dominant position for long.

pjmlp · on Jan 16, 2023

Except Intel and AMD so far failed to provide a good development story for libraries, polyglot programming, and IDE tooling.

Unless one is happy using vim and emacs to target GPUs, coding in C or C++, with either pixel debugging or lame printfs.

Making fire with stones to avoid a proprietary lighter.

jiggawatts · on Jan 16, 2023

> Making fire with stones to avoid a proprietary lighter.

I'm stealing that quote!

Vt71fcAqt7 · on Jan 16, 2023

To add to this, CUDA doesn't run on mobile. That alone will force an (open, hopefuly) competitor to surface.

pjmlp · on Jan 16, 2023

Nor does anything Khronos related for compute APIs.

While Apple initially promoted OpenCL, Android never supported it as official API rather Renderscript.

So on iOS eventually Apple moved into Metal, usable from Objective-C and Swift, while on Android, Google has deprecated Renderscript usable from Java and Kotlin, pushing everyone to now learn Vulkan compute and do the integration on the NDK by themselves while learning C, C++ and GLSL on the process (as no big deal from their POV).

Basically hardly anyone cares with such great usability story. /s

camel-cdr · on Jan 16, 2023

> Nor does anything Khronos related for compute APIs

As you said below, vulkan compute does work on mobile devices, and while not from Khronos (other than SYCL) there are a bunch of higher level libraries using it.

pjmlp · on Jan 16, 2023

I haven't said that.

fxtentacle · on Jan 16, 2023

Yeah, TensorFlow. It's excellent for deployment on arbitrary platforms because it has an OpenGL back-end.

saurik · on Jan 16, 2023

You really think Apple is going to implement an "open" anything?

Vt71fcAqt7 · on Jan 16, 2023

I don't think Apple is the only mobile vendor. I think that between AMD, Intel, Qualcomm, Mediatek, Samsung and Google, they will realize that they are all being screwed over by Nvidia. Right now ML is still mostly research but when real products start surfacing (or, perhaps, for real products to surface to begin with) they will have to settle on something, wether it's ROCM, OneAPI or something else.

pjmlp · on Jan 16, 2023

So far they have been screwed by Google, as they also don't have OpenCL anything on Android, and Vulkan compute is something for hardcore NDK users.

coldtea · on Jan 16, 2023

You mean aside from their long history of developing open things, from Webkit (based on the KDE's open basic wkhtml and made into a full-blown browser engine) to Swift?

camel-cdr · on Jan 16, 2023

Isn't SYCL and for lower level stuff vulkan compute a good alternative?

pjmlp · on Jan 16, 2023

Vulkan compute is a joke in terms of current tooling, SYCL is for C++, not polyglot.

einpoklum · on Jan 16, 2023

I would say OpenCL was "betrayed" by most Khronos consortium participants on some level or another, and those who didn't betray it still tried to find ways to vendor-lock rather than really sharing and advancing the ecosystem to where it needed to be.

Still, remember that if you're not in the PC space, CUDA might not even be an option for you.

albertzeyer · on Jan 16, 2023

I don't exactly understand your argument.

Are you saying polyglot is better, and leads to dominance?

Are you saying, TensorFlow is more polyglot than PyTorch? Or if not, isn't that what your quote says, though? TF supports much more languages. Or what do you mean by polyglot?

But then, this would imply that TensorFlow would dominate over PyTorch? But the article says the opposite?

pjmlp · on Jan 16, 2023

Yes, polyglot is better, and leads to dominance.

Yes, TensorFlow is more polyglot than PyTorch.

lairv · on Jan 16, 2023

I'm not sure polyglot really matters for Deep Learning frameworks, for inference you can already export your model to a standard format like .onnx and have it run on about about any runtime, mobile and web included.

Sure you can't train your models in javascript with pytorch, but not sure you want to use any other language than python for that, since python already has all the ecosystem

est · on Jan 16, 2023

> polyglot is better, and leads to dominance

I'd argue polyglot is only better when alternaives offer benifits. Like polyglot cloud providers you can mix prices, polyglot program runtimes you get multi-OS support, polyglot ORM you can migrate to new DB with shiny features.

What do you gain by polyglot ML models out of Python?

pjmlp · on Jan 16, 2023

Not dealing with Python to start with.

est · on Jan 16, 2023

So you should be glad the Tensorflow for Python is dying anyway. No argument here.

flakiness · on Jan 16, 2023

JAX should be mentioned [1]. It's also from Google and is getting popular these days. Not PyTorch-popular, but the progress and investment seem promissing.

[1] https://github.com/google/jax

jphoward · on Jan 16, 2023

When I first read about JAX I thought it would kill Pytorch, but I'm not sure I can get on with an immutable language for tensor operations in deep learning.

If I have an array `x` and want to set index 0 to 10, I cannot do:

  x[0] = 10

I instead have to do:

  y = x.at[0].set(10)

I'm sure I could get used to it, but it really puts me off.

atorodius · on Jan 16, 2023

Agreed that that's a bit ugly but at least in the ML context you rarely if ever need to do this (personally I only do this on the input to models, where we use pure numpy).

uoaei · on Jan 16, 2023

I feel the same. There are probably more ergonomic and generalizable ways to do whatever it is you need to do. Treat it as functional programming and let the XLA compiler handle things.

jpeter · on Jan 16, 2023

Looks like Java code. It's doa

voz_ · on Jan 16, 2023

I disagree.

alexmolas · on Jan 16, 2023

JAX is the Arch Linux of machine learning scientists

albertzeyer · on Jan 16, 2023

The underlying concept of JAX, function transformations, is very powerful and elegant.

PyTorch 2.0 has gotten a similar underlying feature for torch.compile now. https://pytorch.org/get-started/pytorch-2.0/

6gvONxR4sf7o · on Jan 16, 2023

JAX would be amazing if it had a better debugging story. As it is, if something goes wrong in a transformed function, you lose pretty much all ability to use normal methods of fixing it. It practically just says ‘something went wrong.’

shmageggy · on Jan 16, 2023

There is now jax.debug.print() and jax.debug.breakpoint(). It’s a start.

https://jax.readthedocs.io/en/latest/debugging/print_breakpo...

bootsmann · on Jan 16, 2023

I think JAX is behind pytorch in production usability tho. Its a tool for cutting-edge architecture research but it lacks infrastructure to actually deploy the models in a production environment.

punnerud · on Jan 16, 2023

Is StackOverflow the right metric? I started with TensorFlow, switched to Keras and then PyTorch.

I had 100x the amount of questions with TensorFlow compared to PyTorch. With PyTorch I can jump into every part or the training process and easily debug. Often no need for StackOverflow.

I am glad not have to deal with TF’s black box VM approach that seems optimized for selling Google products, more than solving machine learning.

snendroid-ai · on Jan 16, 2023

> For the sake of completeness, I’ve also included Keras in the figure below. It was released at around the same time as TensorFlow. But, as one can see, it’s tanked in recent years.

Seriously!? Author has no clear idea of what she's talking about. Keras IS THE front end of TF. They are not different, they are the same thing!

> PyTorch is better for students and research

> TensorFlow has the better deployment infrastructure

Ammm... OKAY!?

I'm not a TF fan or anything but reading some half baked absurd claims and saying one is better over another is just nuts. Both frameworks are built to satisfies the need of different demographics of projects, people and organizations. It's NICE to learn both, use both where required and I wish that both frameworks continuously evolves and make this space more accessible to built & deploy more complex solutions. Also f$%$ JAX, no one cares about you so just STFU.

albertzeyer · on Jan 16, 2023

Regarding StackOverflow, PyTorch has a very active official forum: https://discuss.pytorch.org/

On StackOverflow, so many questions will get closed, as being too vague, unclear, duplicate, off topic, or whatever. I assume many people moved to the official forum instead.

TensorFlow also recently (much later) has such a forum (https://discuss.tensorflow.org/), but I think it's way less active.

college_physics · on Jan 16, 2023

As ML becomes "commoditized" easy integration with other tools will likely enable completely new applications. In this sense pytorch has an edge as far the python ecosystem goes, where it is more "native".

But ultimately it will depend on what kind of such ML enabled "killer apps" become popular. Maybe they live in the browser or in some IoT and thus give the edge to js or c++ implementations rather than python.

albertzeyer · on Jan 16, 2023

But it's easy to develop any such killer apps in any of the ML frameworks. And there will not be a single such killer app, but there are many.

It's more a question what most of the community uses, esp the research community uses, as that is what gets into production first at some point. If you have your nice model already in PyTorch, why recode it in TF later?

So, the ML framework adoption always has a bias towards what's more friendly for the researchers, and developers.

jstx1 · on Jan 16, 2023

I don't get the more native part - you can call a TensorFlow or a PyTorch model and get results from either in Python - what kind of integration are you imagining where the internal implementation of the model makes a difference?

college_physics · on Jan 16, 2023

I think people mostly refer to the debugging part while developing, not any other capabilities or API issues. So for example building a larger (monolithic) application entirely in python, where ML is only a part might be easier using pytorch.

jstx1 · on Jan 16, 2023

Why though? What specific thing does PyTorch do that TensorFlow doesn't do in a large monolithic Python application?

college_physics · on Jan 16, 2023

Like said, debugging that application is one difference [0]

[0] https://sites.google.com/site/wildml2016nips/SculleySlides2....

albertzeyer · on Jan 16, 2023

This is from 2016, i.e. really ages ago. This is for TensorFlow 1, graph mode.

Now with eager mode TensorFlow 2, there is not much difference between debugging TensorFlow vs debugging PyTorch.

prepend · on Jan 16, 2023

> That being said, the data speaks a fairly simple truth. TensorFlow is, as of now, the most widespread deep learning framework. It gets almost twice as many questions on StackOverflow every month as PyTorch does.

I don’t like how easy to measure things are conflated to mean more important things.

SO traffic shouldn’t mean something is more widespread without some more information.

Maybe PyTorch has great documentation and doesn’t need as many questions. Or maybe PyTorch users are on slack or irc for their questions.

SO is a good sign that a tool is being used generally, but I don’t think should be used to compare. I think counts if SO are nominal and not even ordinal for purposes of a technology’s use.

I’m not sure how to actually evaluate products. Maybe downloads? Or publications? Or referencing projects, although this is hard as there are many non-OSS projects that can’t be counted.

Anyway, this seems to irk me since people will equate “n people tweeted” with meaning something other than “n people tweeted.” Like somehow something being measureable makes it mean something.

bigbillheck · on Jan 16, 2023

One of the pieces of evidence is the huggingface model zoo, and the proportion of torch-only vs tf-only models, but if you start poking around you'll notice just an overwhelming number of variations of the same few models that somebody has fine-tuned or maybe just used different random starts.

If you search for 'distilbert-base-uncased-finetuned-emotion' (picked only because I noticed it a lot down in the tail) you'll find five hundred and twenty seven results. Are they all the same? Different? Who knows!

I'm going to guess that there's some kind of 'learn ml' class that as part of the curriculum teaches students to upload to huggingface and that's skewing the numbers beyond use.

NeutralForest · on Jan 16, 2023

TF is being abandoned for JAX in many cases, most of the new things I see coming up in ML are done in JAX an Pytorch stays strong. It doesn't help that the TF docs are a catastrophe.

GistNoesis · on Jan 16, 2023

All you need to know about JAX from JAX main github page : "This is a research project, not an official Google product. Expect bugs and sharp edges. Please help by trying it out, reporting bugs, and letting us know what you think!"

NeutralForest · on Jan 16, 2023

Yeah it is for now but I'm not holding my breath for Google based projects and I can't see myself using TF in the future either.

albertzeyer · on Jan 16, 2023

JAX replaced TF by many Google research people internally, like Google Brain, and also DeepMind. However, TF is still very actively used by Google as well, esp for production. Google is big.

NeutralForest · on Jan 16, 2023

I just seen new interesting research using JAX like Kevin Murphy's "Probabilistic machine learning" https://probml.github.io/pml-book/.

Also using TF is a PITA so all in all, I don't want to spend time using it.

Havoc · on Jan 16, 2023

In the long run i expect this to move away from python entirely. Yes large piece of key loops are compiled but still doesn’t fit like a good fit for something this performance critical

matsemann · on Jan 16, 2023

Doesn't really matter, as Python is only the glue, and the hot paths happen in native code or straight on the metal/gpu.

But I'd wish there were as good libraries for typed languages. Exploratory coding in python is so hard, because you don't know what you can do with a library or your tensors etc.

brap · on Jan 16, 2023

I like languages with strong static typeung too, but aren’t Python’s type hints enough for this purpose? Or are these libraries just untyped?

matsemann · on Jan 17, 2023

The libraries can often be a bit too liberal in what they accept, so it's hard to write typings for them.

whalesalad · on Jan 16, 2023

type annotations and an IDE with support for that goes a long long way here.

phonebucket · on Jan 16, 2023

It's more than a large piece of key loops that are compiled. Almost everything is compiled. Python hardly does anything beyond orchestration in modern scientific pipelines. Even loops written in Python can be compiled away.

It's almost like you are coding in a separate language altogether sometimes. E.g. you write something in Cython that gets compiled to C/C++. Same holds for things like numba/LLVM or Jax/XLA. PyTorch is going bigger on compilation in PyTorch 2.0.

Python can be seen as a common interface to write code that can be JIT compiled to multiple backends. It's quite convenient to be able to interact with all these backends, and bring the results back to a universally common interface (NumPy).

From this perspective, scientific Python itself is somewhat evolving by itself into something like Julia, which is JIT compiled to LLVM. Except Python's momentum has generated a huge number of backends for users to choose from.

I'm all for a new language dethroning Python in scientific computing. But I don't think it will be for reasons of speed. I think it'll likely be things like typing (e.g. from Dex lang) or library modularity (e.g. Julia).

amelius · on Jan 16, 2023

Yes, Python is much like a .BAT file that calls .EXE and .COM files.

zwaps · on Jan 16, 2023

But what then? Python is the only language everyone can agree on. I'd love to code in Julia, and I'd happily use something like Rust. But realistically? Everyone does Python, and getting Python code out of critical path's is less effort than trying to get everyone to write Rust.

I mean, Apple is trying to force people to code ML in Swift and I don't think I have ever even seen anyone in this space who even knows what Swift is. I mean, they provided all this nice ML framework to run stuff on M1+, and all everyone was asking: "Yeah but how do we do this from PyTorch?"

xvilka · on Jan 16, 2023

Nobody uses Swift outside Apple.

thom · on Jan 16, 2023

I hope we end up with rich, cross-platform dataframe capabilities via Arrow, but even then you need a whole ecosystem of tools to work with and visualise data. Perhaps Julia takes over but we’re some way off truly polyglot data science.

jstx1 · on Jan 16, 2023

It's a big leap from "this isn't the most loved tool" to "dying a slow death". Plenty of companies use TensorFlow, some of them at huge scale, and they're getting stuff done with it just fine.

kkielhofner · on Jan 16, 2023

"TensorFlow has the better deployment infrastructure"

Tensorflow Serving is nice in that it's so tightly integrated with Tensorflow. As usual that goes both ways. It's so tightly coupled to Tensorflow if the mlops side of the solution is using Tensorflow Serving you're going to get "trapped" in the Tensorflow ecosystem (essentially).

For pytorch models (and just about anything else) I've been really enjoying Nvidia Triton Server[0]. Of course it further entrenches Nvidia and CUDA in the space (although you can execute models CPU only) but for a deployment today and the foreseeable future you're almost certainly going to be using a CUDA stack anyway.

Triton Server is very impressive and I'm always surprised to see how relatively niche it is.

[0] - https://github.com/triton-inference-server/server

fnands · on Jan 16, 2023

Ugh, worked with Keras/tf last week for the first time in a few years. It's so hard going back to tf after working in PyTorch.

It just feels so clunky, and progress was so slow.

Dayshine · on Jan 16, 2023

The big problem we have with tensorflow that means we're trying to move away from it the undocumented breaking changes.

A model saved in 2.0 should work in any later 2.x version.

In practice, they've changed the implementation of the batch normalisation layer at least twice, causing 3 sig fig differences in a unet.

They just don't seem to care about production use of tensorflow.

npalli · on Jan 16, 2023

There are lies, damned lies and statistics. Looks like the entire article is premised on a single StackOverflow trends chart (really, everything else is opinion). For an alternate set of data points here is Francois Chollet)

https://twitter.com/fchollet/status/1615020986298138626

polygamous_bat · on Jan 16, 2023

Just FYI, fchollet is also one of the creators of Keras/TF, so it's hardly an unbiased data point.

rg111 · on Jan 17, 2023

I can't say good things about Chollet, so I will refrain from saying anything.

Another thing he likes to shout about is pip installs. PyTorch suggests and sets to default conda. Everyone gets PyTorch via conda.

Chollet occasionally shouts about pip install numbers of PT vs. TF on Twitter and thus TF being superior.

10xDev · on Jan 16, 2023

Is the data incorrect or somehow misleading? If the answer is no to both of these then your point is irrelevant.

npalli · on Jan 16, 2023

Sure, he maybe unbiased but why do you say his data is unbiased. He is in fact showing other stackoverflow data points not just one cherry picked one like the article posted (and lot of opinion about what is going on).

zone411 · on Jan 16, 2023

Another data point that PyTorch overtook TF: https://trends.google.com/trends/explore?date=today%205-y&q=...

paradite · on Jan 16, 2023

I've been using Tensorflow.js for cross platform mobile app development together with React Native, and the experience is quite nice.

I wonder if there are better alternatives than tfjs for cross-platform mobile apps.

mechanical_bear · on Jan 16, 2023

Their methodology of judging popularity is absurd.

bilsbie · on Jan 16, 2023

Is it possible to make a toy model in PyTorch to get a feel for how the networks work? And try out some ideas.

MuffinFlavored · on Jan 16, 2023

i think the problem you’ll run into is

what use case do you have that will benefit from a trained model?

amelius · on Jan 16, 2023

This removes fragmentation, which is great.

Competition is only good if it brings something to the table.

yumraiha12 · on Jan 16, 2023

Folks mention the nativeness of Pytorch, but perhaps percieved simplicity is even better.

Python is a simple language, and for many AI researchers pytorch was an easy way to not get bogged down in implementation details for rapid experimentation.

marcinzm · on Jan 16, 2023

capableweb · on Jan 16, 2023

> The Stack Overflow graph doesn't seem to show what they say it does?

I'm guessing you're referring to this?

> PyTorch is still growing, while TensorFlow’s growth has stalled. Graph from StackOverflow trends (https://insights.stackoverflow.com/trends?tags=pytorch%2Cten...)

Graph seems to confirm what they are saying, in terms of questions asked on Stack Overflow. Number of PyTorch questions are growing, number of TensorFlow questions have plateaued and maybe even shrinking, but bit early to tell.

I wouldn't necessarily say that growing/shrinking number of questions on Stackoverflow means a project is less/more popular than another, can be because of number of factors (more experienced developers, better documentation, better support channels via some other platform, and so on), so I don't think the conclusion is 100% on spot. But at least the graph accurately describes what it's supposed to describe.

pynchia · on Jan 16, 2023

English grammar is dying?

Kalanos · on Jan 16, 2023

don't believe the hype https://docs.google.com/document/d/1TzRDJG97ipOOYL4njPLxz-Om...

suyash · on Jan 16, 2023

It's about time to move out of 'Python prison' for ML frameworks. It was a good start for academics but for scalable enterprise deployment, python is a limiting language. I would love to see more ML Eng focussed libraries for Java, Swift, C# etc