LLMs are more likely to create novel content (i.e. "hallucinate") than copy training data verbatim, so, in a word, "No". In fact the unpredictability and hallucination behavior is what is called creativity in artistic domains. The whole Stochastic Parrots paper and concept originated with Luddites who don't want this technology to advance to the point where is generally useful and applicable. And while the Luddites (people who smashed the first robots, Jacquard Looms which were programmable machines that did a job only humans were previously able to perform) had a justifiable concern about negative impacts (the loss of employment took two generations to recover) it is safe to say that the Industrial Revolution has had more positive effects (e.g. life expectancy, per capita wealth) in total.
Isn't Open Source just a giant plagiarizing machine with more genial licensing conventions?
AI is as good a time as any to re-assess how we feel on intellectual property, but I don't think we can get any more restrictive without suffocating innovation. Plagiarism is how patent trolls shut down otherwise innovative products, and it's the basis for jacking up non-generic drug prices and forcing developers to pay licensing fees just to use an OS. If anything, our current preconception of plagiarism is too unclear and fragile, bound to be destroyed by whatever the hell AI is, if it even matters in the first place.
There's a difference between opensource and proprietary works. For example, I use "PouchDB" in some of my apps and their code is opensource, but what I make with it is not opensource.
It's one thing to make something similar, and there are often more ways than one to do that, but how would one know if AI created something unique or just copied some human's copyrighted work and presented it in a response?
TBH, I have not dug into how it works, but I did ask it to show me how to make something with PouchDB.js and it looked like it was pretty much a copy and paste from their website. That is not really an issue, but it did not attribute that code to Pouchdb.com. And, to be fair, I did not ask where it got it. But it seems to me that if they got it off the PouchDB.com web site it should tell us and provide a link to the original source.
> how would one know if AI created something unique or just copied some human's copyrighted work and presented it in a response?
You can't. Similarly, when a human writes original code, we can't be certain that they're not repeating stuff they've seen before too. We don't think of things in terms of license, no human remembers the fast inverse square root function or cocktail sort for the license it had. At no point can I be certain that I'm not unconsciously plagarizing proprietary code from a previous job. Sometimes it's essential (I have to plagiarize "set euo pipefail").
There is indeed some nuance to AI creating supposedly-novel works, but I don't think it's as great as people think. AI makes the process of building a derivative work easier, which is probably frustrating as a license-holder. Many forms of derivative works have legal precedent though, like Wine, DXVK, OpenJDK or the Dolphin emulator - it's an open-source implementation of a proprietary API. You can fly pretty close to the sun without getting burned in many of these cases, as long as you don't violate fair use.
Are you a giant plagiarizing machine? After all, you learned to reason and write words and concepts from somewhere. Should you pay a royalty to every teacher you've ever had, or every author of every book you've ever read?
Is it even possible to have a novel thought that isn't somehow dependent on an earlier thought of someone else?
This whole "LLMs IZ PLAIGIARISISISMES!" moral panic is going to lead to written copyright expanding to be just as stupid and exploitable as music copyright, where a company with enough lawyers can claim ownership over THREE NOTES in sequence.
No, it's capable of reasoning, but it gained its knowledge of how the world works through analyzing things that were written or drawn by (mostly) humans.
> To oversimplify a bit, it's a next-word-in-the-sentence prediction engine.
Turns out that when you ask LLMs to predict the next word in the sentence, and then train billions of parameters on billions of sentences, they realize the best way to improve at predicting the next word is to understand why the world works the way it does.
You're making my point for me. My calculator doesn't has a conception of numbers or math. It can only do arithmetic algorithms that it was programmed to do. It is a regurgitation machine.
GPTs are similar. They can tell you that the sky is blue and they can tell you why the sky is blue, but they can't translate that to a hypothesis about how the wavelengths of light also influence the properties of lasers. If they've ingested information about these different topics, they can make the connection, but otherwise they can't.
Reasoning in behavior - you ask it to explain things and it does; it gives reasons, motivations, nuance, explains connections, clarifies points. That's reasoning.
There's a pretty significant gap between being able to give reasons and being able to actually reason (as a verb). The latter requires complex cognitive faculties not present in a mathematical model of the English language.
LLMs are tool, the important question to ask of a tool is "is this useful", not "does it work like my brain does". I don't see why you care about the internals, and not about the actual outputs.
This comment chain wasn't about outputs, it was specifically about whether or not LLMs "reason"—precisely the internals. I feel like you're just moving the goalposts now to deflect from the original question.
> The latter requires complex cognitive faculties not present in a mathematical model of the English language.
The training is on images and text, and the output is text. But nothing constrains the model in the middle of those two ends to only model based on language features, as opposed to world features discoverable from the language and images.