And yet, if your article gets rejected when you write it yourself, and gets publ...

krapp · 2023-10-11T09:58:20.000000Z

And yet, when AI rewrites your paper for you, and you blindly send it to be reviewed, likely also by an AI, it may well have fabricated or misrepresented the actual research and congratulations you've published some bullshit.

And no, humans are not just as bad as AI at this. I don't know why that meme persists - if humans were just as bad, much less worse, at hallucination and confabulation as AI at scale we would never have gotten past the stone age as a species. It would be useful to keep humans in the loop but the inevitable exponential explosion of AI generated research will simply be too much for peer review to handle (see the same thing happening with fiction writing, and several publications having to shut down because they were drowning in AI garbage.)

But it's not as if it matters. As with everything else that incorporates AI, quality isn't relevant, only quantity and profitability.

ben_w · 2023-10-11T10:58:12.000000Z

> I don't know why that meme persists

That would be the large number of salient examples.

I expect humans to form a (possibly normal) distribution in this regard rather than being all of equal quality, and people on the high end would permit development even if the mean was worse than even the best current AI.

If we were however all clumped together, then the very same meme you're complaining about here would itself be an example of humans doing exactly what you're saying AI do.

krapp · 2023-10-11T11:11:37.000000Z

Except, since we're talking about scientific research, and not all humans perform scientific research, the only relevant group to consider would be humans who do scientific research.

I don't think it's common for researchers to randomly fabricate sources, citations and results. It happens, but it isn't mere coincidence when it doesn't happen. The "replicability crisis" would appear to be a counterexample, but it isn't. Actual attempts at research aren't the same category of error as randomly generated research.

When AI gets something right on the other hand it's entirely coincidental, because AI doesn't have any concept of "truth" or "falsehood," or the real world context in which it operates. AI just relates text tokens to one another, with some unavoidable randomness, trying to match something that looks like the desired result. To equate that process with the result of actual human beings doing science and publishing research seems like you're just reaching to justify the validity of replacing researchers with AI.

ben_w · 2023-10-11T12:01:02.000000Z

> Except, since we're talking about scientific research, and not all humans perform scientific research, the only relevant group to consider would be humans who do scientific research.

Sure, but I expect that to change the distribution, not much else.

> I don't think it's common for researchers to randomly fabricate sources, citations and results. It happens, but it isn't mere coincidence when it doesn't happen.

Hopefully not, but even in science, misunderstanding the cited work is a thing, as is disbelieving it. For the former, "perceptrons can't do XOR" became "AI is impossible", while the Black-Scholes equation was misapplied to ultimately lead to the global financial crisis. For the latter, the salient examples are older (e.g. the importance of washing hands between autopsies and midwifery), or things where I can't find the historical scientific consensus because of the modern cultural noise (e.g. thinking men and women have different numbers of ribs because of the book of Genesis and not checking).

In this regard, I think the replication crisis is valid, even though it (presumably) involved real research and real results, because widespread citations didn't fully account for the limits of that research.

To put it another way, while I agree with this:

> To equate that process with the result of actual human beings doing science and publishing research seems like you're just reaching to justify the validity of replacing researchers with AI.

With scientists I'm referring more to the game of telephone that necessarily occurs between any given researcher and anyone using their work.

For the general population it's only that game, hence all the people who think scientists in the 70s were worried about an ice age, or the newspaper stories that collectively divide the world into things that cause or cure cancer.

If you are referring specially to people who want an LLM to actually perform novel research, rather than merely write up lab notes in LaTeχ, then you would be correct that this is a terrible idea. Even for pure mathematics, you need a different type of AI, better suited to that task — they do exist though, and have been used successfully, but IIRC are not anything like as general with regard to maths and experiments as LLMs are with language.

Charon77 · 2023-10-11T10:56:35.000000Z

That's true. If you don't use AI, others will. If that makes them profit, why would anyone not use it.

The whole system favors publication quantity over quality, results over process.

The solution is to ask our AI overlord.

soco · 2023-10-11T11:37:00.000000Z

Or at some point change the rewarding system to favor quality. Voila! But why is everybody assuming that the AI-enhanced work will not be checked before submission? I'm no scientist but my workflow starts with my draft and ends with me correcting/adapting whatever AI changes did, whenever AI gets involved.

godelski · 2023-10-11T16:47:45.000000Z

> Or at some point change the rewarding system to favor quality.

I am a scientist, and I don't think anyone knows how to do this. There also is very little skin in the game for this. Bureaucrats want dumb metrics that they can point to to determine success regardless if that metric is meaningful or not. There is a strong pressure to publish quickly, which directly conflicts with a pressure for quality. Many prominent Nobel Laureates, Fields Metal Winners, Turing Prize Winners, and so on have discussed how they themselves could not have thrived in the current environment and its insanity. But who is going to change this? Honestly the ones that are hurt the most are the grad students at the non-top 10-20 universities. Everyone else has found "success" in the system (as in learning how to play that game) and they are highly incentivized to maintain that system and their status. Research is a completely different game today than it was even 20 years ago and we have not adjusted our system accordingly, and worse, many want to pretend that it hasn't changed. An interesting simple example is that research teams have exploded in size (especially at large universities) which makes for a lot of Nobel drama (max 3 per team). But there are also many other simple and more nuanced points that exist, but few want nuance. Either way, I'm absolutely certain that our current metrics do not strongly align with producing good science.

soco · 2023-10-12T10:10:48.000000Z

I understand then that there's a known and increasing tension in this domain. Instead of seeing it crash and burn, do you see any glimmer of hope? I mean we still need science and we still need research, so we need a solution to be able to go on with it, even if it's outside the current framework...

godelski · 2023-10-12T20:42:04.000000Z

> do you see any glimmer of hope?

Oh, of course. Most academics don't do it for the money. It's not like they're paid much. Though that can incentivize cheating in another way because they want money. But there will always be people that don't give a fuck and just want to do good research regardless of the metrics being used. I am insistent that to be a good scientist you must be somewhere on the side of "anti-authority." Because your job relies on challenging concepts, and especially well known and widely agreed upon concepts. Generally look for people who are passionate, will rant, but importantly rant with nuance. Those are the people passionate about their research and not the metrics.

The problem is we're just throwing a lot of money down the drain, wasting a lot of time, and generating distrust of the system. Any "crash and burn" is never going to lead to an extinction in any sense. It's just about general society level if we want to do good science or just do noisy science (all science is noisy). But you can never do good science if you remove nuance. This is why I hate that ML (and CS in general) uses conference systems as the main platform for publishing. It is ridiculous to think you can have a good system when it is highly competitive, zero sum, highly time consuming, there's no discussion between authors and reviewers (you may get a one page rebuttal, but your reviewers comments are often disjoint and vague), and you're being judged by those you're competing with. It's just a silly notion to believe this is useful.

The solution is actually not hard. I refer to the larger phenomena as Goodhart's Hell. The solution is to stop using metrics as targets. Metrics are guides. If you don't have a deep understanding of what your metric actually measures, how well your metric aligns to the thing you're intending to measure (never 100%), and if you don't understand the biases to your data, you're fucking doomed to this bureaucratic hell. Noise is inherent to complex systems and aggregation is the bane of evaluation of complex feature spaces. Just remember that its models all the way down and that all models are wrong (despite some models being better than others).

RecycledEle · 2023-10-11T12:25:00.000000Z

>change the rewarding system to favor quality

I think that will happen when we swap peer review for peer replication.

godelski · 2023-10-11T16:49:30.000000Z

Exactly. We have to have mechanisms to support replication, which is the only means of validating work. Rather, adding evidence to the work's claim. We're too caught up in this naive notion of novelty, which today more strongly correlates with how well read one is in this massive ocean of papers.

mistermann · 2023-10-11T12:07:36.000000Z

> And no, humans are not just as bad as AI at this. I don't know why that meme persists - if humans were just as bad, much less worse, at hallucination and confabulation as AI at scale we would never have gotten past the stone age as a species.

Knowing this would require a counterfactual machine of some sort, thus the fact must be hallucinated into existence, no?

hashtag-til · 2023-10-11T11:02:26.000000Z

It becomes some robotic "AI generated text" pushed into some "AI text summariser" which in practice just introduces noise in the communication.

As a scientist, you should be aiming at communicating ideas in a pragmatic and verifiable way, not embellishing stuff via AI.

A real facepalm moment for scientists with wrong incentives.

jrimbault · 2023-10-11T09:52:59.000000Z

Fascinating how we (society) have let KPIs get into research/education and healthcare.

godelski · 2023-10-11T16:38:01.000000Z

This is why I've started to argue against "peer review." Because no one can decide a paper's correctness/legitimacy by sitting down and reading the work. The method just doesn't scale and certainly doesn't in settings that are zero sum like conferences (see the absolute shitshow that is ML peer review). It's become a entire waste of time and money.

The alternative I suggest is just submitting works to Open Review so there can be open discussions on the works. It removes the hostility from the setting and self biases towards reviewers/commenters being parties that are experts in that sub-niche as they're going to tend to be the only ones reading those papers, and especially reading closely.

Metrics are fucking hard and we don't improve systems by removing nuance from metrics. Yet as complexity has increased in our world, this has seemed to be the direction we've gone. It's bad for science and the current methods have far too many false negatives (improper rejects) and false positives (improper accepts) to validate it as a meaningful signal. We've tried so many ways to fix it while maintaining the institution that it's time to just try letting go of it. I honestly cannot find a great argument for keeping these journals around (yes, there are arguments, but not many ,,strong,, arguments, and specifically ones that lead to better science).

oefrha · 2023-10-11T10:46:43.000000Z

Research inherently requires attention and attention is inherently scarce, so unfortunately people do and will compete for attention however they can, metrics or not.

JumpCrisscross · 2023-10-11T15:28:43.000000Z

> let KPIs get into research/education and healthcare

Research, sure, that’s a creative profession. But education and healthcare have obviously measurable outcomes, some of which are predictably desired ex ante.

jrimbault · 2023-10-12T09:24:11.000000Z

The implicit here is "business/for profit" KPIs.