I just want to push back on Academic Machine Learning is a (low quality with no novelty) paper mill and devaluing researchers efforts.
To be clear ML research has “paper mill” problems but we should be careful that we don’t imply that there are only “rare successes”
There are many many amazing results published at ICLR, NeurIPs, ICML every year that are important developments that are not only research successes but also commercial and open source success stories. For example LoRA and DPO are two recent incredible development and these are not “rare” - this years ICLR had many promising results that will in turn be built on to produce the next “transformer” level development. Without this work there are no transformers.
Even transformers themselves were a contribution whose impact only became valuable through the work of many researches improving the architecture and finding applications (for example LLMs were not a given use case of transformers until additional researches put the work in to develop them)
I would not say it’s impossible… my lab is working on this (https://arxiv.org/abs/2405.14577) and though it’s far from mature - in theory some kind of resistance to downstream training isn’t impossible. I think under classical statistical learning theory you would predict it’s impossible with unlimited training data and budget for searching for models but we don’t have those same gaurentees with deep neural networks.
I don’t understand why very large neural networks can’t model causality in principal.
I also don’t understand the argument that even if NNs can model causality in principal they are unlikely to do so in practice (things I’ve heard: spurious correlations are easier to learn, the learning space is too large to expect causality to be learned from data, etc).
I also don’t understand why people aren’t convinced that LLM can demonstrate causal understanding in setting where they have been used for things like control like decision transformers… like what else is expected here?
He argued that, because machine learning is just based on correlational statistics, it would never be able to produce reasoning about causation.
Which is, at least in retrospect (GPT turned out to be able to do causal reasoning), a fallacy: It's like assuming humans can't think about gold because they do not themselves consist of gold. Or: That humans can't manually evaluate a computer program, because they are not themselves computers.
>That humans can't manually evaluate a computer program, because they are not themselves computers.
Well, yes, that’s why they were designed in the first place: to carry at scale those repetitive dull tasks that aggregates to an amount which exceeds human abilities and patience.
No, our submarines doesn’t have what it takes to swim, but really there is no drama here: there are still useful amazing peace of engineering.
I think one of the major difficulties is dealing with unobserved confounders. The world is complex and it is unlikely that all relevant variables are observed and available
My research involve applying Popper's epistemology to natural language processing.
So I am quite involved in this.
As far as I can tell, almost all of what Popper tried to do with quantification measures of information are exactly what you are talking about.
In particular Conjectures and Refutations covers this really extensively so I'd recommend reading or re-reading that. Though Logic of Scientific Probability covers an early form. David Miller's Critical Rationalism covers it well too and some of it's problems.
I.e:
His notion (shared with positivists like Carnap and others) that science is a set of logical statements. A collection of statements is a theory, a theory entails a set of predictions which is called the information content of the theory (sometimes I(c) or C(I) in his notation).
If the I(c) > I(c') where c' is a competing theory then it is said to have more explanitory power. I.e. it makes more predictions.
This is part of his defnition of what makes a good explanation and what david desutch calls "hard to vary".
The other main part of the definitition is about whether these statements reflect Truth in anyway.. that is covered by his notion of verisimiltude or truthlikeness which is quantified as the degree to which the information content of a theory I(c) can be corroborated.
Both of these are essentailly "The predictive strength of someone's Truth"
The problem you and many other have probably encountered is the information content of an explanation is *intractable* it's an open set of statements which cannot full by fleshed out. So instead we can never have a perfect quantification of whether my theories or your theories are better... there may indeed be statements entailed by flat earth theory that have yet to be discovered and could indeed be more corroborated and provide better information content than a non-flat earth theory! Popper revels in this fact and fully embraces it.
Beyond Popper though, we need to understand more of the dynamics of "predictive strength" - I am finding causality a great source of literature that for which I would recommend Judea Perl and the Book of Why among other things.
For Philosophy of Science in particular there are ton's of great articles on stanford encyclopedia of philosophy about Explanation that go into this in depth - in fact the positivists like Carnap wrote amazing things about this which I would recommend.
We don't really have a freindly way for you to produce a heatmap readily from that so feel free to DM me or email us if you want something like that.
For "who is peer reviewing what projects and project categories" that is super interesting.... with open peer review these days that is probably doable.
The reason I ask is that when people reference a study, I look at who is funding it and who those people are affiliated with. Looking at what other studies they have peer reviewed can be useful to see if the study results have external pressures that may influence outcome. HN folks often provide references but references alone are not sufficient in my opinion. Being able to see the reverse heat map of what projects someone has reviewed can paint a better picture of their area of expertise and who may be influencing them.
We, at scite, actually designed scite to help folks see beyond the "one big claim per paper" model. Specifically, we designed scite to present show how specific results were either supported or contrasted by other papers, rather than showing that the publication itself was disputed or reproduced as a whole. If there was any feedback you could give on how we could present things so that folks don't get that impression we'd love it - feel free to DM me or let us know at hi@scite.ai as you test it out.
In an ideal world we would present the claims, results, and argumentation as structured information to the user but we are not quite at that stage yet!
I am one of the folks working on Valence, i'd love feedback on what we are doing.
Valence attempts to solve for performance management by giving folks a declarative way of setting performance objectives and having a controller go and meet those objectives. Though the Kubernetes autoscalers allow something like this, we wanted to build something that is higher level so that folks wouldn't have to worry about the details of analyzing their applications, instrumenting them, and coming up with the optimial autoscaler thresholds and configs.
Under the hood, Valence is a simple feedforward control system that learns the dynamics of an application - forecasts its load and applies the optimial controls in order to continually meet those objectives. We have found it pretty effective at reducing cost and performance violations so now we are launching it to a wider audience to get feedback and start working on extending Valences effectiveness in other operational environments and other types of performance objectives.
To be clear ML research has “paper mill” problems but we should be careful that we don’t imply that there are only “rare successes”
There are many many amazing results published at ICLR, NeurIPs, ICML every year that are important developments that are not only research successes but also commercial and open source success stories. For example LoRA and DPO are two recent incredible development and these are not “rare” - this years ICLR had many promising results that will in turn be built on to produce the next “transformer” level development. Without this work there are no transformers.
Even transformers themselves were a contribution whose impact only became valuable through the work of many researches improving the architecture and finding applications (for example LLMs were not a given use case of transformers until additional researches put the work in to develop them)