Ask HN: Is “AI” just a giant plagorizing machine?

jimwhite · on May 20, 2023

LLMs are more likely to create novel content (i.e. "hallucinate") than copy training data verbatim, so, in a word, "No". In fact the unpredictability and hallucination behavior is what is called creativity in artistic domains. The whole Stochastic Parrots paper and concept originated with Luddites who don't want this technology to advance to the point where is generally useful and applicable. And while the Luddites (people who smashed the first robots, Jacquard Looms which were programmable machines that did a job only humans were previously able to perform) had a justifiable concern about negative impacts (the loss of employment took two generations to recover) it is safe to say that the Industrial Revolution has had more positive effects (e.g. life expectancy, per capita wealth) in total.

smoldesu · on May 20, 2023

Isn't Open Source just a giant plagiarizing machine with more genial licensing conventions?

AI is as good a time as any to re-assess how we feel on intellectual property, but I don't think we can get any more restrictive without suffocating innovation. Plagiarism is how patent trolls shut down otherwise innovative products, and it's the basis for jacking up non-generic drug prices and forcing developers to pay licensing fees just to use an OS. If anything, our current preconception of plagiarism is too unclear and fragile, bound to be destroyed by whatever the hell AI is, if it even matters in the first place.

oblib · on May 20, 2023

There's a difference between opensource and proprietary works. For example, I use "PouchDB" in some of my apps and their code is opensource, but what I make with it is not opensource.

It's one thing to make something similar, and there are often more ways than one to do that, but how would one know if AI created something unique or just copied some human's copyrighted work and presented it in a response?

TBH, I have not dug into how it works, but I did ask it to show me how to make something with PouchDB.js and it looked like it was pretty much a copy and paste from their website. That is not really an issue, but it did not attribute that code to Pouchdb.com. And, to be fair, I did not ask where it got it. But it seems to me that if they got it off the PouchDB.com web site it should tell us and provide a link to the original source.

smoldesu · on May 20, 2023

> how would one know if AI created something unique or just copied some human's copyrighted work and presented it in a response?

You can't. Similarly, when a human writes original code, we can't be certain that they're not repeating stuff they've seen before too. We don't think of things in terms of license, no human remembers the fast inverse square root function or cocktail sort for the license it had. At no point can I be certain that I'm not unconsciously plagarizing proprietary code from a previous job. Sometimes it's essential (I have to plagiarize "set euo pipefail").

There is indeed some nuance to AI creating supposedly-novel works, but I don't think it's as great as people think. AI makes the process of building a derivative work easier, which is probably frustrating as a license-holder. Many forms of derivative works have legal precedent though, like Wine, DXVK, OpenJDK or the Dolphin emulator - it's an open-source implementation of a proprietary API. You can fly pretty close to the sun without getting burned in many of these cases, as long as you don't violate fair use.

spondylosaurus · on May 20, 2023

Not plagiarism, exactly, but Ted Chiang describes ChatGPT as a "blurry jpeg of the web": https://www.newyorker.com/tech/annals-of-technology/chatgpt-...

chatmasta · on May 20, 2023

Are you a giant plagiarizing machine? After all, you learned to reason and write words and concepts from somewhere. Should you pay a royalty to every teacher you've ever had, or every author of every book you've ever read?

Is it even possible to have a novel thought that isn't somehow dependent on an earlier thought of someone else?

june_twenty · on May 20, 2023

> Are you a giant plagiarizing machine?

No. Who am I plagiarizing when I feel love, hurt, enjoy taste, procreate? So no is the clear answer to that question.

al2o3cr · on May 21, 2023

This whole "LLMs IZ PLAIGIARISISISMES!" moral panic is going to lead to written copyright expanding to be just as stupid and exploitable as music copyright, where a company with enough lawyers can claim ownership over THREE NOTES in sequence.

roschdal · on May 20, 2023

speedgoose · on May 20, 2023

No. Have a nice day.

cjbprime · on May 20, 2023

No, it's capable of reasoning, but it gained its knowledge of how the world works through analyzing things that were written or drawn by (mostly) humans.

smt88 · on May 20, 2023

It's not capable of reasoning and it has no knowledge of how the world works. That's why it often spits out nonsense and falsehoods.

To oversimplify a bit, it's a next-word-in-the-sentence prediction engine.

cjbprime · on May 21, 2023

> To oversimplify a bit, it's a next-word-in-the-sentence prediction engine.

Turns out that when you ask LLMs to predict the next word in the sentence, and then train billions of parameters on billions of sentences, they realize the best way to improve at predicting the next word is to understand why the world works the way it does.

jstx1 · on May 20, 2023

While you're at it, throw out your calculator because it has no conception of numbers and doesn't know anything about mathematics.

smt88 · on May 20, 2023

You're making my point for me. My calculator doesn't has a conception of numbers or math. It can only do arithmetic algorithms that it was programmed to do. It is a regurgitation machine.

GPTs are similar. They can tell you that the sky is blue and they can tell you why the sky is blue, but they can't translate that to a hypothesis about how the wavelengths of light also influence the properties of lasers. If they've ingested information about these different topics, they can make the connection, but otherwise they can't.

jstx1 · on May 21, 2023

My point was that your calculator (and LLMs) is useful despite not having the same internal representation of things as yours.

smt88 · on May 21, 2023

I never said that either GPTs or calculators are useless.

phoe-krk · on May 20, 2023

And your staircases, because they do not have the concept of scaling heights little by little instead of all at once.

/s

shrimp_emoji · on May 20, 2023

Damn, what does that say about me then?

spondylosaurus · on May 20, 2023

Reasoning how? LLMs are statistical models, not brains.

jstx1 · on May 20, 2023

Reasoning in behavior - you ask it to explain things and it does; it gives reasons, motivations, nuance, explains connections, clarifies points. That's reasoning.

spondylosaurus · on May 20, 2023

There's a pretty significant gap between being able to give reasons and being able to actually reason (as a verb). The latter requires complex cognitive faculties not present in a mathematical model of the English language.

jstx1 · on May 20, 2023

LLMs are tool, the important question to ask of a tool is "is this useful", not "does it work like my brain does". I don't see why you care about the internals, and not about the actual outputs.

spondylosaurus · on May 20, 2023

This comment chain wasn't about outputs, it was specifically about whether or not LLMs "reason"—precisely the internals. I feel like you're just moving the goalposts now to deflect from the original question.

jstx1 · on May 20, 2023

> This comment chain wasn't about outputs

My point was that it should be.

cjbprime · on May 21, 2023

> The latter requires complex cognitive faculties not present in a mathematical model of the English language.

The training is on images and text, and the output is text. But nothing constrains the model in the middle of those two ends to only model based on language features, as opposed to world features discoverable from the language and images.

rowanG077 · on May 20, 2023

Considering we can't even define what "complex cognitive faculties" are you are just talking up air castles at this point.

Energiesustaine · on May 20, 2023

How are humans reasoning? Through a logical or statistical internal model.

So if you make your model through slow thinking and discovery chatgpt just builds it to a billion words.

The result is the same

spondylosaurus · on May 20, 2023

The result is certainly not the same.

A LLM (that hasn't been trained on any data regarding this topic) is unlikely to demonstrate the same bouba/kiki effect seen among humans. https://en.wikipedia.org/wiki/Bouba/kiki_effect

maxdoop · on May 20, 2023

How is this relevant, though?

People continue to claim LLMs are just parrots. But then there is never a solid rebuttal to how humans are different.

Energiesustaine · on May 21, 2023

That's a very far fetched argument.