Hacker News new | past | comments | ask | show | jobs | submit login
Researchers cannot always differentiate AI-generated and original abstracts (nature.com)
145 points by headalgorithm on Jan 12, 2023 | hide | past | favorite | 113 comments



> But the human reviewers didn't do much better: they correctly identified only 68% of the generated abstracts and 86% of the genuine abstracts. They incorrectly identified 32% of the generated abstracts as being real and 14% of the genuine abstracts as being generated.

Isn't the headline actually "Abstracts written by ChatGPT fool scientists 1/3 of the time"? Having never written one myself, wouldn't the abstract be the place where ChatGPT shines, being able to write unsubstantiated information confidently? I imagine getting into the meat of the paper would quickly reveal issues.


>they correctly identified only 68% of the generated abstracts and 86% of the genuine abstracts.

I think you and I are basically in alignment... what this tells me is that 14% of real abstracts are so bad that other human beings call their BS. Meanwhile, this AI stuff is kinda working 32% of the time in generating legitimately interesting ideas.

So at that point - yeah, that sounds about right. The 32% is still so low that it shows AI is not anywhere near maturity, whereas 14% of human-generated is crap.

And, yeah - a short blurb like an abstract seems to be exactly the kind of text that ChatGPT is conditioned to do well generating. As others below note - once a human starts reading the rest, the alarm bells trigger.


> ...this AI stuff is kinda working 32% of the time in generating legitimately interesting ideas.

The bar is not "this is legitimately interesting," it's "I've seen real abstracts worse than this."


It really would depend on the quality of the work/paper. I bet mis-identification by humans of hand written or generated abstracts goes down in higher tier conferences vs. lower tier ones, since the writing standards in lower tier conferences can be pretty bad (well, in CS at least).


I suspect you are right Sean - and I know you are a guy who has read a lot of papers and been to plenty of conferences.


(Buuuut let us also enjoy the fact that ML can generate a decent-looking abstract to a scientific paper 32% of the time. This represents massive progress; I would venture that most humans cannot write such an abstract convincingly.)


I would argue the opposite. Give anyone who can write reasonnably the abstract of a few dozen papers in a given field, and I am confident they would be able to produce a convincing bullshit abstract. Most abstracts in a given field sound extremely similar, and are mostly keyword dropping with an ounce of self-promotion. I actually think that the only reason ChatGPT did not produce a higher rate of convincing abstracts is that the reaserchers assessing them knew they had a high probability to look at an AI result and were thus extra careful. Most of the abstracts correctly labelled as AI would probably be considered legit (which does not mean good) if sent to a conference without warning.


>Meanwhile, this AI stuff is kinda working 32% of the time in generating legitimately interesting ideas.

I don't know that interesting has anything to do with whether people found them plausible.


All true I think. Interesting thing is how many researchers read just the abstract of papers. I’d say it’s quite common to read just the abstract and some future captions, and only dig into details on a few papers.


That's the main purpose of the abstract. There are several levels of reading a paper: checking the title, reading the abstract, reading intro/conclusions and skimming over the table, or reading the whole thing. You start at the first level and continue or stop depending on the interest of the paper for your work (and available time).

This works very well to more or less stay on top of things with today's frantic publication pace in many disciplines.


What should result is a more fortified peer review system.


> I imagine getting into the meat of the paper would quickly reveal issues

This is a tautology: the thing that can be validated can be validated.


I think it has an important difference for ChatGPT - it likes to generate numbers that make absolutely no sense. A human that lies will try to generate sufficiently correct data to convince. ChatGPT often won't even make an attempt to produce values that fit.


ChatGPT right now also likes to make up fake citations when you ask it how it knows something, which could be checked quickly.


...request an example [I've tried, avail is none].


I guess that's good for us meat beings. Better for an AI to incompetently lie than competently lie.

I wonder if having AI models available would make it significantly easier to identify material created by that model. Seems it would be easier, but it would still be a big problem. Nets can be trained to identify ai vs not ai on a given model. And that's without needing access to the model weights, just training examples. But when there are N potential models...

Edit: Second paragraph added later.


I'm reminded of the somewhat recent news of a line of Alzheimer's research being based on a fabricated paper that was only caught many years later [0].

Previously, we've relied on a number of heuristics to determine if something is real or not, such as if an image has any signs of a poor photoshop job, or if a written work has proper grammar. These heuristics work somewhat, but a motivated adversary can still get through.

As the quality of fakes gets better, we'll need to develop better tools to dealing with them. For science, this could, hopefully, result in better work replicating previous works.

I'm quite likely being overly optimistic, but there's a chance for positive outcomes here.

[0]: https://www.science.org/content/article/potential-fabricatio...


The requirement to detect something fake is quite easy, and we knew it for a long time: publish all data code and everything to make the expriments reproducible.

Even if everything is fake, the code has value for further research.

It would be nice to have that as a minimun standard at this point, as I would prefer to see much less publications that can be trusted more than the current situation.


That's not easy: a reproduction is a scientific project in its own right. Some research is straightforward to reproduce, but a lot of it isn't.

That's not to say scientists shouldn't publish their data; they should.


I believe OP is suggesting that this effort be borne by the original authors making the new claims, not by society to reproduce them.

In the limit, any claims one makes are rejected unless you also provide a way to reproduce the claim.

That's a lot of work, but perhaps the world doesn't need more monkeys and typewriters creating faux Shakespeare.


What work? The original authors should reproduce their own work?


Not really (I mean it helps, but it doesn't make it easy). It can still be very difficult to reproduce legitimate results, even with help from the original researcher. And faked results can last a long time even without reproduction because it's very easy to assume you're doing something wrong.


> Even if everything is fake, the code has value for further research.

I'd call that human-computer science partnership. If it checks out, it's not fake. Nonhuman scientists are still scientists.


Are pipettes or vials scientists? Computers are tools like hammers, and have equal agency. There are no nonhuman scientists.


If a developer codes up an AI to scour the web, write an article, and submit it to a scientific journal without letting the developer see the article, is the developer doing science?

If someone trains a translation model between languages they don't know, is that someone a translator?

I guess the users of said model would be "translators" as they would be doing the translation (without necessarily knowing the languages either).


unsure if anyone is "doing science". Doing science is applying the scientific method.

Making conjectures, deriving predictions from the hypotheses as logical consequences, and then carrying out experiments or empirical observations based on those predictions.

Not sure AI is up to that, and it's debatable if it'll ever be able to make and test conjectures. There is a difference between symbol manipulation (like outputting text) and actual conjecture.


> and it's debatable if it'll ever be able to make and test conjectures

https://en.wikipedia.org/wiki/Robot_Scientist

Adam is capable of:

- hypothesizing to explain observations


Thank you. All this airy fairy talk of Ai is fun, but at the end of the day it's an inert tool (or toy) without human interaction.


For now.


And for foreseeable future.


Paper authors: my algorithm is this fast! Here's some cool benchmarks charts! No you can't have any code to run the benchmarks yourself!


It's probably a little easier to fool people with AI generated scientific literature than a regular piece of literature. Most scientists are not good writers to begin with. English might not even be their first, or even second or third language. Even then, there are a lot of crutch words and phrases that scientists rely upon. "Novel finding" "elucidate" "putative" "could one day pave way for" "laying important groundwork" and all sorts of similar words and phrases are highly overused, especially in the abstract, intro, and discussion sections where you wax lyrical about hypothetical translational utility from your basic research finding. A lot of scientific writers could really use a thesaurus, and learn more ways to structure a sentence.


Your critique assumes that the goal of scientific writing is to be intelligible to lay people.

In truth, the entire weird and crufty vocabulary is simply a common set of placeholders that makes it easier to grasp research, because the in-group learns to understand them as such.


As a computer scientist (you can check my publication record through my profile) and an (aspiring) novelist, I disagree. A lot of papers are just poorly written, full stop.

It is also true that science literature contains a lot of jargon that encodes important information. But that doesn't excuse the fact that a lot of scientific writing could be improved substantially, even if the only audience were experts in the same field.


Yeah, a lot of scientific writing is just downright useless, and I don't just mean that in the "haha, it's hard to read, but it's ok"-sense. For example, in many fields (parts of theoretical physics, many parts of econ) publications are so hard to read that "reading" a paper looks less like "learning from the author by following what they did on paper" and more like "rederiving the same thing that the author claims to do, except by yourself with only some minor guidance from the paper." This is, frankly, absolutely insane, but it's the current state of things.


It's a fine line to walk when publishing. For example, is it ok to use the term "Hilbert space" in an article? Perhaps in physics, but not if publishing in biology - or at least in biology, a few sentences to describe the term may be more appropriate. But the use of the term is actually quite useful, as in this manufactured example the article may apply only to Hilbert spaces but not all vector spaces. So since the distinction may be important to the finding, the terminology is necessary.


Oh, no, I'm not dismissing terminology! (That's a whole separate topic, which is distinct from writing, though it can sometimes be a sign of bad writing.) I'm talking specifically about the actual writing. Iirc, there's a phrase that sums it up quite nicely: "most science is written like it hates you and wants you to stop reading immediately." Poor writing is so much the norm that people are shocked when you write a readable article. (You'd be surprised at the number of peer review comments I've received to the effect of, "the writing is surprisingly clear and direct," and I'm not like, a novelist, or essayist, or anything. English isn't even my first language! It's just not the norm to think about, and much less act on, these things.)


>"most science is written like it hates you and wants you to stop reading immediately."

As an educated lay person who dips into scientific papers occasionally I completely agree with this. And now I have a nice phrase I can remember next time I read a scientific paper and think "why does science hate me?".


I did a lot of paper doctoring when I worked in a foreign research lab, and I can agree a bit. But conference standards would push for good writing if the conference was good enough, and bad writing would be grounds for rejection (or at least heavy shepherding). So the easy conferences would get a lot of bad writing and the really competitive conferences would get much better writing (or the papers wouldn't make it past the crap filter in the reviewing round).

But first you need something good to write about, and a lot of papers fall short at that point. It doesn't matter how well it is written at that point (getting the "this is well written, but not interesting" rejection is painful but usually obvious).


I'm not saying this contributes to being more unintelligible. These are just filler words anyhow, not jargon. I agree that if anything, it makes it faster to read a paper since your brain just glosses over the same structures you've read 1000 times already and directs you to the meat. However, as someone who reads a lot of papers for my job, I just wish writers were more interesting——you will never see an em dash like I've used here, for example. Maybe scientists could benefit from reading more Hemingway in their downtime.


i recommend active voice over passive. i also tell students to strive for clarity.

"don't write like you have a pole up your ass."


My university had a style guide for scientific writing. I had to follow it when writing my engineering thesis.

It contained formatting rules. The text must be double spaced size 11 font, The table of contents must be laid out in this specific way, tables and figures have to be included and referenced in this particular fashion and so on.

It also imposed rigid restrictions on language usage and writing style it basically emphasized writing short sentences. No adverbs. Avoid passive voice, Write everything in first person tense and so on.

I asked my housemate who studied creative writing to proof read my thesis. He commented on how sterile and cold the whole things felt compared to the more florid prose he was used to.

So the way scientific papers read is very much by design.


Writing text that feels plausibly real is ChatGPT’s specialty.

Fake scientific papers that are written with the language, vocabulary, and styling of an academic paper have been a problem for a long time. The supplement and alternative medicine industries have been producing fake studies at high volumes for years now. ChatGPT will only make it accessible to a wider audience.


Isn't that the reason we have trusted scientific peer review journals? I mean, why trust a paper that hasn't been vetted by a trusted source? The same is true in news media - I don't give any stock to news content that isn't published by a well-trusted source (and I do pay for subscriptions, e.g. AAAS, Financial Times, etc., for that very reason). I guess I don't understand the concern - the world has always been filled with junk information and we have tried and true systems in place already to deal with it.


Science in general, as many other things, is predicated on the idea that most people are not lying. AI-generated or not, if a large portion of any body of work ends up being composed of lies, there is a problem. So in the hypothetical future that people are using AI to lie en masse, yes, it is a problem. However, even without AI, if a large number of people are submitting lies as research, there would be a problem.

My point is that this is similar to the problem of art fraud or fake news or any other thing that involves faking media: the immoral act is the problem; the use of AI just makes it easier to do and harder to catch. Yes, it is a problem, but the heart of the problem lies in the immorality of the act, not in the technology per se. Perhaps the issue is not that "many people are immoral", but that the technology enables a larger proportion of immoral people to fool us.


Elite universities insider. People lie in elite science just about as much as in politics. A lot of data from schools like MIT and Stanford is intentionally baked to lead to politically-popular but incorrect results.

By "politically-popular," I don't mean primarily in a red-blue sense. If you're publishing a study for therapists to read, it should ideally use a clever methodology to re-affirm their viewpoints.


As a researcher, I would expect any researcher to be able to generate fake abstracts. However, I suspect that generating a whole paper that had any interest would be nigh on impossible for AI to do. An interesting paper would have to have novel claims that were plausible and supported by a web of interacting data.


That's not how we should use it - give a few ideas, a bunch of notes, results and ask the model to write the text. It will write better text than most, but we are responsible for prompting it with valid data.


Valid data alone won't guarantee a valid analysis or a useful interpretation.


Of course the author checks it and assumes full responsibility, it's just a writing aid, not a researcher yet.


> An interesting paper would have to have novel claims that were plausible and supported by a web of interacting data.

And if AI can manage that, well: https://xkcd.com/810/


Nice trick for ChatGPT, but this will not destroy science.

Nobody takes a serious decision reading only the abstract. Look at the tables, look at the graphs, look at the strange details. Look at the list of authors, institutions, ...

Has it been reproduced? Has the last few works of the same team been reproduced? And if it's possible, reproduce it locally. People claim that nobody reproduce other teams works, but that's misleading. People reproduce other teams works unofficially, or with some tweaks. An exact reproductions is difficult to publish, but if it has a few random tweaks ^W^W improvements, it's more easy to get it published.

The only time I think people read only the abstract is to accept talks for conference. I've seen a few bad conference talks, and the problem is that sometimes the abstracts get posted on like in bulk without further check. So the conclusion is don't trust online abstracts, always read the full paper.

EDIT: Look at the journal where it's published. [How could I have forgotten that!]


> Nobody takes a serious decision reading only the abstract.

I am not sure if this is sarcasm or not.

Literally the whole world besides the researches read mainly the abstract and make even life changing decisions based on it. (just look at any twitter discussion linking to a paper)


It's not sarcasm, but it may be researcher bias https://xkcd.com/2501/

I don't consider twitter discussion "serious decisions". When a thread here reach 100 comments, the quality of the discussion usually drops. I don't want to imagine how bad it's in twitter.

The problem is people overdosing from snake oil they read in twitter. I think there were a few cases with ivermectine, hydroxychloroquine and chlorine dioxide [1]. People should not self medicate, or at least understand proportions and that the dose makes the poison. They should see a medical doctor [2], that at least understand proportions and that the dose makes the poison and that they should follow the advice from the FDA made by people that read a few of research papers and extended reports, instead of only one press release about a preprint written by a moron that somehow got a position in a university or hospital.

I guess politician don't read the full paper (except Merkel?), but they hire experts to read the articles and give advice. If you are a politician and the "expert" is reading only the abstract without checking the full paper and the journal and other related stuff, you should fire the moron.

[1] The chemistry of chlorine dioxide is too simple and it's obvious that it can't possible work, so it makes me angry. The other stuff don't work, there were no good reasons to expect them to work, but at least it's not obvious with elementary chemistry that it can't work.

[2] I have horror stories about medical doctos too. Always ask for a second opinion for important stuff.


I'm quite confident that there are cliques within "science" which are admitted without as much as a glance at the body of the papers. Some people simply cannot be bothered to get past the paywalls, others accept on grounds outside the content of the paper, like local reputation or tenure. Others are asked to review without the needed expertise, qualification, or time to properly understand the content. Even the most honorable reviewers make mistakes and overlook critical details. Then there are the set of papers which are (rightfully so) largely about style, consistency, and honestly, fashion.

How can we yield results from an industry being lead by automated derivatives of the past?

Is an AI-generated result any less valid than one created by a human with equally poor methods?

Will this issue bring new focus on the larger problems of the bloated academic research community?

Finally, how does this impact the primary functions of our academic institutions... teaching.


> How can we yield results from an industry being lead by automated derivatives of the past?

Even a human researcher need experiments to validate ideas. AI can generate plausible ideas, so why not run the experiment and let it learn from the outcomes? The source of learning comes from experimentation, that's how models escape the derivative trap. AlphaGo invented move 37, proving AI can be creative and smart.


Interesting questions. I don't think most of them have a definitive answer, so this is my opinion.

> I'm quite confident that there are cliques within "science" which are admitted without as much as a glance at the body of the papers.

It's possible. I don't know every single area, but I guess it's not common in most serious branches od science.

> Some people simply cannot be bothered to get past the paywalls, others accept on grounds outside the content of the paper, like local reputation or tenure.

My friends says they can ask Alexandra for a copy. A few years ago it was also common to ask a friend in another site that has a copy.

> Others are asked to review without the needed expertise, qualification, or time to properly understand the content. Even the most honorable reviewers make mistakes and overlook critical details. Then there are the set of papers which are (rightfully so) largely about style, consistency, and honestly, fashion.

That's why you RTFP instead of thrusting the journal or the reviewer or just the abstract. I've seen "dubious" papers published in good journal.

> How can we yield results from an industry being lead by automated derivatives of the past?

It's running now in natural intelligence. Every time there is a interesting paper, other groups run to publish variants or combinations with other result to reach the anual quota, or get enough point for the graduate student. Once the GAI can read all papers and combine them, there will be very few low hanging fruits to pick.

> Is an AI-generated result any less valid than one created by a human with equally poor methods?

For now, AI is to stupid to get the details right. That's why it's important to read the full paper instead of only the abstract. One AI is intelligent enough, the result may be as bad as a cheating human. Let's hope the AI has a good PRNG to create credible noise, because that's a part humans do badly.

> Will this issue bring new focus on the larger problems of the bloated academic research community?

Nah.

> Finally, how does this impact the primary functions of our academic institutions... teaching.

I don't understand the question. You fear that professor in some areas will just send fake papers written by ChatGPT4.0 and the journal and the community will not notice? There are a lot of predatory journals, open-peer-review journals and other bad journals that are publishing a lot of crap. Usually by professor in bad universities or just as a fancy achievement for the c.v.. A good AI will increase the amount of crap, but it will be just ignored.


> I don't understand the question.

No, actually I'm curious if this could open up the schedule for professors who want to spend more time teaching to design and develop better curriculums for their students. But I'm probably being overly optimistic about the number of profs who actually want to teach.


There are some universities that don't require research, so if you want to only teach you can go to one of them. It's a solved problem. Anyway, the best universities require research, opting out of research so you get less money, less prestige, and not the top students.

Also, accumulating published papers is important to get a new position in the future, so only teaching is a risk for your future career.

For teaching some of the topics to advanced students, it's important to have people that do research and is updated with the cutting edge results. Also to babysit the graduate students so they can publish their results. For teaching the students in the first years, research is probably overrated.


Good points.

It's just been a while since I was inspired by anyones research I guess.


There are many interesting result that fly under the radar. If you don't want to count big things like LIGO, my favorite to talk about is magnetoresitance, IIRC the giant one https://en.wikipedia.org/wiki/Giant_magnetoresistance [It's not my specialization, so I may have a few details wrong.]

You can explain it to a technical friend.

Start explaining about the the two possible spin of electrons, and how it cause the existence of two currents inside a conductor. This is not important in a normal conductor like cooper, but it's important inside a magnetic conductor like iron. So yo get a different resistance for each one of the two currents, the one that has a spin that is in the same direction than the magnetic field and the other with the oposite spin.

You can make a sandwich of iron-cooper-iron. If there is no external magnetic field, the two iron parts have oposite magnetization and the total resistance is higher. If there is an external magnetic field, the two iron parts have the same magnetization and the total resistance is lower. Anyway, the difference is not too much.

[Ideally, your fiend should be boring now, uninterested about the abstract currents no one cares about.]

It get's more interesting if you have many layers of iron and cooper, because the difference is higher and they call is "giant". It is used in the heads to read hard disks, like in the laptop of your friend. [Your friend will never see it coming!]

It's interesting because it mix weird abstract quantum properties and engineering to make it more efficient and you get a device that everyone has. For some weird reason, no one talks about it. And the sad part is that SSD are killing the punch line of the story :( .


The real issue for me is that the bot might generate incorrect text, imposing a yet-higher burden on readers who already find it difficult to keep up with the literature. It is hard enough, working sentence by sentence through a paper (or even an abstract) wondering whether the authors made a mistake in the work, had difficulty explaining that work clearly, or wasted my time by "pumping up" their work to get it published.

The day is already too short, with an expansion of journals. But, there's a sort of silver lining: many folks restrict their reading to authors that they know, and whose work (and writing) they trust. Institutions come into play also, for I assume any professor caught using a bot to write text will be denied tenure or, if they have tenure, denied further research funding. Rules regarding plagiarism require only the addition of a phrase or two to cover bot-generated text, and plagiarism is the big sin in academia.

Speaking of sins, another natural consequence of bot-generate text is that students will be assessed more on examinations, and less on assignments. And those exams will be either hand-written or done in controlled environments, with invigilators watching like hawks, as they do at conventional examinations. We may return to the "old days", when grades reflected an assessment of how well students can perform, working alone, without resources and under pressure. Many will view this as a step backward, but those departments that have started to see bot-generated assignments have very little choice, because the university that gives an A+ to every student will lose its reputation and funding very quickly.


I find it almost deliciously ironic that we research and development engineers in the field of computer science have expertly uncovered and deployed exactly the tools needed to flood our own systems and overwhelm our ability to continue doing the processes we depended on to create this situation in the first place.

It's like we've reached a fixed point, global minima for academic ability as a system. You could almost argue it's inevitable. Any system that looks to find abstractions in everything and generalize at all costs will ultimately learn to automate itself into obscurity.

Perhaps all that's left now is to critique everything and cry ourselves to sleep at night? I jest!

But it does seem immensely tiresome and deters "real science".


> if scientists can’t determine whether research is true, there could be “dire consequences”

Yeah well we can't tell that now either. Maybe we can finally start publishing raw data alongside these "trust us we found something" papers that people evaluate based on the reputation of the journal and the authors.

As someone else pointed out, that system has already derailed decades of Alzheimer's research. It's stupid and broken and it should have changed a long time ago.

https://www.science.org/content/article/potential-fabricatio...


Well, given that all paper abstracts have to follow the same structure with the same keywords and be conservative to get a chance to get published, it makes sense that ChatGPT shines there.

IMHO, it says more about the manic habits of journal editors than anything else.


That's a feature, not a bug. It means that when you have 100 papers to check for applicability to something that you are researching you can do so in a minimum of time.


Not really IME; you have to go through layers of bullshit aiming at making the paper seems more important that it is, not hurting the feelings of Pr. Curmudgedon that could be a reviewer, fitting the grant that funded the paper, hiding the weak points of the study, adhering to the current Scientific Serious Professional Way Of Writing™, not disturbing the flawed, but socially accepted consensus in this particular field, ... and so on that are actually burying what should actually have been in the abstract.


From the original paper (linked in the article):

> ... When given a mixture of original and general abstracts, blinded human reviewers correctly identified 68% of generated abstracts as being generated by ChatGPT, but incorrectly identified 14% of original abstracts as being generated. Reviewers indicated that it was surprisingly difficult to differentiate between the two, but that the generated abstracts were vaguer and had a formulaic feel to the writing.

That last part is interesting because "vague" and "formulaic" would be words I'd use to describe ChatGPT's writing style now. This is a big leap forward from the outright gibberish of just a couple of years ago. But using the "smart BSer" heuristic will probably get a lot harder in no time.

Also, it's worth noting that just four human reviewers were used in the study (and are listed as authors). That's a very small sample size to draw conclusions from. The article doesn't mention level of expertise of these reviewers, but I suspect that could also play a role.

Some are focusing on the paper-mill angle. But I think the more interesting angle is ideation.

If researchers can't reliably tell the difference between machine and human generated abstracts, what kinds of novel experiments or even research programs could ChatGPT suggest that might never have been considered?


Some probably but I feel you could also ask your 5yo kid and have the same efficacy. The work is not in generating ideas, it's in the verification.


We're a trust-based species.

Even if you're skeptical or cynical, you'll still fall for well-written nonsense if it remotely feels authoritative, sane, reasonable. Especially when not a deep expert on the topic.

The above effect is even stronger if you have prior trust in the author.

I guess the reason is that extreme distrust is an uphill battle. It costs a huge amount of time and cognitive load to discover objective truth, often for little reward.


I know it was just titles, but I was having a good day on "arxiv vs snarxiv" if I did better than random chance. And that was just a Markov text generator, no fancier AI needed.


For a scicomm publication I wrote the abstract of my explainer article leveragjng ChatGPT

https://www.theseedsofscience.org/2022-general-antiviral-pro...

(after I had written the rest of the article and long after writing the academic paper underlying it.

Although, the published abstract reads nothing like the abstracts that ChatGPT generated for me because of the subtle but important factual inaccuracies it generated. But I found it helpful to get around my curse-of-knowledge in producing a flowing structure.

My edited, manually fact-checked result flowed less fluidly but was accurate to the article body’s content. Still overall glad I did it that way. I would have otherwise fretted over format/structure for a lot longer.


If an software system can generate abstracts, good. Nobody got into research for love of abstract-writing.

It is a tool. Ultimately researchers are responsible for their use of a tool, so they should check the abstract and make sure it is good, but there’s no reason it should be seen as a bad thing.


Surely the abstract is the easiest thing to generate? It is a short summary, and often devoid of anything verifiable.

Remember that it wasn't long ago that Facebook had to pull its Galactica AI in a few days after people got it to output all kinds of nonsense: https://mobile.twitter.com/mrgreene1977/status/1593274906707...


There is only so much peer review can actually accomplish. Mostly a reviewer can tell if the work was performed with a certain amount of rigor and the results are supported by the techniques used to test the claimed results. It doesn't guaranty there were no mistakes made. Having others reproduce the results is the only true way to verify an experiment. Unfortunately you don't get tenure for reproducing other people work.


Abstracts can just be keyword soups. Then the AI just has to make sure that the keywords make some vague sense when put next to each other. Or if not they can mix in existing keywords with brand new ones.

Abstracts don’t have to justify or prove what they state.


Why are automatically generated abstracts bad? That seems a useful tool. It would be a problem if the abstracts are factually wrong or misleading.

They'd probably be better than what comes out of university PR departments.


Considering how much intentionally fake garbage got published, this doesn't surprise me at all... and this is not just random scientists, but scientists who should (atleast theoretically) know enough to be able to notice it's gibberish.

https://en.wikipedia.org/wiki/Sokal_affair

https://en.wikipedia.org/wiki/List_of_scholarly_publishing_s...


I don’t understand. Doesn’t the author list give that away ;-) ?

(https://pubmed.ncbi.nlm.nih.gov/36549229/)


Isn't it how abstracts shall be ? - excluding phenomenal characteristics like: different formulas to get it, human or author involvement, creativity; in pure form being scientific form of some work, like an equation catching the essence without flaws or distractions - and that's what computers are for. to proceed, then humans may don't have to ??

But I'm lost at what those scientists are trying to find.. (?)


believable - if they use provable, then it could be useful (like a science) - abstract selected, timelines, find missing, extrapolate..


TBH, a paper's abstract is supposed to summarize the purpose and findings in the paper, so auto-generation of what is otherwise "repeating what the rest of the paper says" should be considered a win; it's automating boring work.

If ChatGPT can't do that (i.e. if it's attaching abstracts disjoint from the paper body), it's not the right tool for the job. A tool for that job would be valuable.


At least one nice side-effect of this could be that only reproducible research with code provided will matter in the future (this should already be the case but for some reason isn't yet). What's the point of trusting a paper without code if ChatGPT can produce 10 such papers with fake results in less than a second


ChatGPT can produce code too. Therefore I think this may call for something more extreme — at risk of demonstrating my own naïveté about modern science, perhaps only allowing publication after replication, rather than after peer-review?


Ideally yes, for a paper to be accepted it should be reproduced, if ChatGPT is ever able to produce code that runs and produce SOTA results then I guess we won't need researchers anymore

There is however a problem when the contents of the papers costs thousands/millions of $ to be reproduced (think GPT3, DALLE, and most of the papers coming Google, OpenAI, Meta, Microsoft). More than replication, it would require fully open science where all the experiments and results of a paper are publicly available, but I doubt tech companies will agree with that.

Ultimately it could also end up with researchers only trusting papers coming from known labs/people/companies


Reproduction of experiments generally comes after publication, not before acceptance. Reviewers of a paper would review the analysis of the data, and whether the conclusions are reasonable given the data, but no one would expect a reviewer to replicate a chemical experiment, or the biopsy of some mice, or re-do a sociological survey or repeat observation of some astronomy phenomenon, or any other experimental setup.

Reviewers work from an assumption that the data is valid, and reproduction (or failed reproduction) of a paper happens as part of the scientific discourse after the paper is accepted and published.


I'm thinking of the LHC or the JWST: billions of dollars for an essentially unique instrument, though each produces far more than one paper.

Code from ChatGPT could very well end up processing data from each of them — I wouldn't be surprised if it already has, albeit in the form of a researcher playing around with the AI to see if it was any use.


Not all science results in 'code'.


Indeed and other sciences seems even harder to reproduce/verify (e.g. how can mathematicians efficiently verify results if chatgpt can produce thousands of wrong proofs)


Mathematicians have it easier than most, there are already ways to automate testing in their domain.

Kinda needed to be, given the rise of computer-generated proofs starting with the 4-colour theorem in 1976.


> there are already ways to automate testing in their domain.

Do you mean proof assistant like Lean ? From my limited knowledge of fundamental math research, I thought most math publications these days only provide a paper with statements and proofs, but not with a standardized format


I can't give many specifics, my knowledge is YouTube mathematicians like 3blue1brown and Matt Parker taking about things like this.


Only a tiny fraction of existing maths can be done with proof assistants currently, and as a result very very few papers use them. In most current research automated testing would be impossible or orders of magnitude more work; in many areas mathematicians are working with things centuries ahead of where proof assistants are up to, and working at a much higher level of abstraction. Also, many maths papers have important content that is not proofs (and many applied maths papers contain no proofs at all).


100% there is a group right now making an AI generated paper and trying to publish it for the next iteration of the Sokal affair.

https://en.wikipedia.org/wiki/Sokal_affair


It's weird to me that scientists make so much hay of the Sokal affair given how unscientific it is.

It's a single data point. Did anyone ever claim the editorial process of Social Text caught 100% of bunk? If not, how do we determine what percent it catches based on one slipped-through paper?

I'd expect scientists to demand both more reproducibility and more data to draw conclusions from one anecdote.


How easy would it be for researchers to differentiate deliberately fabricated abstracts written by humans from abstracts of peer-reviewed scientific papers from respected publications? I think the answer to that question might give more context to this result.


Probably impossible. As a reviewer, the abstract won't tell me if the paper is bullshit or faked. An abstract can tell me that there are substantial language issues, or that the authors are totally unskilled about the field, or that the topic is not interesting to me, or their claims lack ambition, but beyond that crude filter, all the data for separating poor papers from awesome ones, and true claims from unfounded one can only be in the paper itself, an abstract won't contain them.


What I see as wrong here is an AI witch-hunt. AI is a tool. And it would be the same as calling the baning the use of a car cause horses exist. Obviously the disruption is happening, which is always a good thing as it should lead to progress.


On the other hand, all kinds of technology have been regulated to minimize adverse effects. The trouble with software is that it is evolving faster than regulators can keep track of, and it is very hard to police even if regulated.


Wow, if this situation finally creates an incentive for replicating results of papers rather than publishing some marginal new result, ChatGPT is going to be a huge win for the practice of science (vs the politics of science).


Would it be possible to use GPT model infrastructure to compute the likelihood that a given text was created by that GPT model?


I don't see the problem. A lot of tech writing will probably be done by AI soon. It's about the content of the paper.


Yes. And if I can use ChatGPT to write an abstract for me from my paper, let's go!


> And if I can use ChatGPT to write an abstract for me from my paper, let's go!

Is ChatGPT located in a central repository or cloud? Is centralized? If that, probably a bad idea.

A private company having access to your abstract before you publish it could easily lead to problems like plagiarism (even worse, automatized plagiarism) or give an unfair advantage to one in two teams running to publish the same result. Science has a lot of this cases.


I bet we can make an AI that can differentiate them better ...


That would just lead to an AI that makes better abstracts á la GAN.


of course, what I mean is that it's now an AI vs AI battle


Getting through peer review is the ultimate Turing test.


Not really. No one writes a paper with the aim of convincing reviewers that they are human, and the reviewers aren't trying to determine whether the authors are human.


One of the citations is AI generated content itself lol


I hope the abstract for this paper is AI-generated.


We could use this to test the peer-review system.


I think part of the problem comes to the sheer amount of jargon in even the simplest research paper. During my time in graduate school (CS) I would often do work that used papers in mathematics (differential geometry) for some of the stuff I was researching. Even having been fairly well versed in the jargon of both fields I was often left dumbfounded reading a paper.

This would seem to me a situation that is easily exploited by an AI that generate plausible text. If you pack enough jargon into your paper you will probably make it past several layers of review until someone actually sits down and checks the math/consistency which will be, of course, off in a way that is easily detected.

It's a problem academia has in general. Especially in STEM fields they have gotten so specialized that you practically need a second PhD in paper reading to even begin to understand the cutting edge. Maybe forcing text to be written so that early undergrads can understand it (without simplifying it to the point of losing meaning) would prevent this as an AI would likely be unable to do such feat without real context and understanding of the problem. Almost like adversarial Feynman method.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: