Copyright Office suggests AI copyright debate was settled in 1965

lolinder · 2025-01-31T04:06:32 1738296392

The headline is either sloppy or intentionally misleading: the Copyright Office is saying that the law surrounding whether AI generated works can be copyrighted was settled in 1965 (the answer being "yes if AI assisted a human creative process, no if not, and we have to decide on a case by case basis if there was enough human input to qualify"). This has been their stance all along, but now they've provided a bit more guidance on what counts as human input, which is helpful.

What this article doesn't talk about at all is the far more controversial AI copyright debate, the one most people will think of given the headline: whether training a model is fair use. That's the one everyone is actually concerned about, and they're definitely not claiming it was settled already.

Salgat · 2025-01-31T05:13:14 1738300394

The human input makes sense, otherwise, couldn't you bruteforce generate billions of low resolution images that cover a vast range of situations and then use that to attack anything similar enough to fit the substantial similarity condition? You could even plug a news feed into the generator.

dotancohen · 2025-01-31T07:17:58 1738307878

Somebody did this with music - they brute forced all chord progressions or something like that. In theory all new music is infringing.

somenameforme · 2025-01-31T07:42:17 1738309337

Things like this often makes me wish we had more 'common sense' laws and left the discretion of interpreting that notion to judges, juries, and the various systems of appeals and other courts we have, entirely with the expectation that laws would 'evolve' over time. This might sound radical, but instead it's actually just going back to how things used to be. Here is the First Amendment in its entirety:

---

"Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the Government for a redress of grievances."

---

The rest of the bill of rights looks similarly. Now a days it'd be thousands of pages long trying to elaborate endlessly upon every single scenario. But the more important thing is that this idea of trying to codify every scenario still doesn't work because you end up with a zillion loopholes in just about every single law with some clever clown going 'ah hah! you didn't cover this!' So all you really do is end up with laws that are not only excessively fragile and subject to exploitation, but also completely indecipherable by just about anybody, certainly including the people voting on their passage.

xp84 · 2025-02-01T18:04:42 1738433082

See, while I agree that loopholes will abound — basically your whole last paragraph I completely acknowledge — I also want to point out the deeply insincere and frequently malicious people who appoint and confirm judges now. From both of our cravenly corrupt parties. If anything, since common sense is not even uncommon, it’s nearly extinct, we need even more fine-grained explicit laws, if of course we had a useful Congress to pass laws with any degree of seriousness, which we also don’t.

Basically (speaking from a US perspective) we’re doomed either way.

somenameforme · 2025-02-02T08:17:50 1738484270

Many states assign judges through nonpartisan elections. I mean I do, more or less, agree with you but at the same time there's quite a lot of checks and balances - judges, appelates, constitutional review, and the final judge - juries. The entire government was intentionally designed to be mostly nonfunctional to prevent a tyranny of the majority (or minority). It's hard to really do things, and have them stick, unless there's pretty wide consensus.

I'd also add that the more simple laws add transparency. If a e.g. judge does something not in the apparent spirit of a older law, it's apparent to anybody with a mind. But when instead an offender (or judge) relies on a nuanced, if esoteric, interpretation to a loophole in section 274, subsection 74 paragraph 13a, nobody has any clue what's happening.

And if we're screwed either way, it'd be nice if people could at least visibly and explicitly see that they're being screwed.

AstralStorm · 2025-01-31T07:28:17 1738308497

Unfortunate problem of copyright being an ever uphill battle and why it should be short timed.

Similarly with patents.

Even when used there should be a timeout, possibly per clause to avoid overly broad stuff.

But then a few IP trolls and lawyers would have to find another job.

Terr_ · 2025-01-31T07:04:23 1738307063

> whether training a model is fair use

I want to highlight that training the model is only one part of the copyright questions going on, the other is how they are making and keeping direct long-term copies in giant training datasets.

Imagine what would happen if a regular person bought and then immediately resold thousands of books, CDs, and movies, taking just enough time to make a copy of each one and building out their own library/movie-theater for friends and coworkers. You think the powers-that-be would let you or I get away with that?

There is no (non-evil) reason to hold multijillion dollar corporations with professional legal advice on-tap to a lower standard than regular people.

xp84 · 2025-02-01T18:18:06 1738433886

You’re making the exact same kind of maximalist argument as people who argued that ripping a CD and letting your brother listen to the MP3s is exactly morally and criminally like shoplifting the disc. Or that recording a movie off TV was equal to stealing it. (That particular one was of course famously judged by the courts as “tough shit” to the copyright owners who sued over it.)

Yes, training does impart some fraction of an article into the weights. NYT famously “demonstrated” this by like typing whole paragraphs of their articles into GPT and having the model produce some of the following sentences. However, this substitutes for the article in zero ways, since if you need the article to summon the article…who cares?

We should admit that nothing about our copyright law intentionally weighed in about LLMs. It’s simply nuts to apply a law to a situation when its drafters had no idea of the positive or negative implications of such an application. It would be like applying a law that calculated prison time based on number of horses stolen to someone who stole a Honda Civic with 100hp and saying that clearly they should get 100 years because it is equal to 100 horses.

Now, I get that we have a useless legislative branch which even if they actually passed intentionally applicable laws, they’d be stupid ones, but I think making simplistic analogies like that do not help anyone, other than the Luddites. Look, the cat is out of the bag and even if say, the US government effectively killed all Gen AI by forcing any training to be done with material you own the copyright of, countries like China (and criminals, who can simply use the tech in secret) will happily just pull ahead of us and economically demolish us - like any country would have done if a competing country had banned electricity 100 years ago.

We need something better than the ridiculously unfit 200-year old paradigm for this.

bonzini · 2025-01-31T06:54:08 1738306448

> yes if AI assisted a human creative process, no if not

Fair enough but does that help settle the other question, which is whether weights are considered derivative works of any material used in the training?

sublinear · 2025-01-31T08:58:08 1738313888

ALL OF WHAT AI HAS BEEN TRAINED ON IS HUMAN INPUT

cxr · 2025-01-31T04:23:48 1738297428

There's not really much of a debate, just a bunch of clamoring and wishful thinking by rightsholders who don't understand copyright law insisting that precedent should be subordinate to mimetic outrage over LLMs.

throwaway17_17 · 2025-01-31T04:47:14 1738298834

In what way are ‘rightsholders’ expressing wishful thinking? I assume you are saying that there is no violation of those rights controlling various properties that have been used to train ‘AI’. You then mention precedent in a way that implies there are legal decisions that make it clear ‘AI’ training using copyrighted material does not violate the rights of those who own that material. Could you list or link to such a precedent?

To the best of my knowledge, there is no direct precedent from any federal circuit addressing this issue and certainly no USSC opinions dealing with the issue. Additionally, any analogies drawn from precedent focused on other areas of intellectual property law is easily distinguishable. This is truly fresh legal ground and the next 10 years of jurisprudence will go a long way towards building the precedent that your comment would imply to already exist.

Just to be explicit, the above, while a legal opinion, IS NOT legal advice.

cxr · 2025-01-31T05:55:12 1738302912

No amount of solidarity from support groups comprising clusters of likeminded folks on internet message boards who're opposed to settled law is a substitute for an act of Congress, which is what it will take to give the position of folks opposed to contemporary GenAI any legs.

Neither your comments to HN nor anyone else's strenuous assertions that there's anything to debate are going to change anything.

If you want to treat LLMs as a special case—which is what you want, since there is an entire history of jurisprudence that you have to contend with here—then you need to get Congress to write legislation that says so.

Animats · 2025-01-31T07:08:44 1738307324

> act of Congress

More than that, a constitutional amendment. See Feist vs. Rural Telephone.

The US does not have database copyright or "sweat of the brow" copyright. There has to be human originality.

AstralStorm · 2025-01-31T07:30:31 1738308631

So if you collect things in an undisclosed database without archival rights you probably are violating bajillion copyright claims, right?

The AI itself can be construed as a special kind of a database, given that it can be queried to reproduce at least part of its training dataset with precision...

xp84 · 2025-02-01T18:20:58 1738434058

Doesn’t it usually have to be queried with a bunch of the training material anyway? It’s a pretty foolish sounding argument that one is being harmed if it boils down to “when I type in the first half of my article into GPT it can sometimes complete a few more paragraphs of it”

(That’s what the NYT tried to argue/show. I’m not sure if that case is ongoing or settled.)

cxr · 2025-01-31T14:37:25 1738334245

> So if you collect things in an undisclosed database without archival rights you probably are violating bajillion copyright claims, right?

No.

jpalawaga · 2025-01-31T05:05:37 1738299937

Copyright law stipulates the conditions in which content can be reproduced, not conditions in which it can be consumed.

Arguably the material has been learned and not copied. Maybe in some cases learned with an uncanny ability or photographic memory, but learned. (People with photographic memories also cannot reproduce content in an unlimited fashion).

bbarnett · 2025-01-31T05:41:03 1738302063

Learned!

There's nothing special about an LLM, there's no learning, and they regurgitate verbatim text too.

May as well say curl + images in a db are learned as well, so thus I can use Mickey Miuse as I please in my php web page.

drdeca · 2025-01-31T06:22:12 1738304532

While learned is probably not the best word to use as far as describing the legality goes, I also don’t think “copied” is the right word.

Let’s say that the model “is influenced by” the copyrighted material. That seems hard to argue against.

So, now that we aren’t using the word “learned”, why would we say that the way the models are influenced by the copyrighted works that appear in the “training set” (not to imply that “training” in the usual sense is happening) counts as a copyright violation?

Or, perhaps the claim is that the outputs of the model are violating copyright?

If the output is substantially similar to some particular copyrighted work that is in the “training set”, and could work as a substitute for that work, and if the output resembling the work is in part due to the influence that the work had on the model, I think in this case it would be a clear case of violation of copyright.

However, if it doesn’t have substantial similarity to any particular copyrighted work that influenced it, only similarity to the style common to many of the works that influenced it (even if all by the same author), my impression is that this would not constitute copyright infringement because styles are not protected by copyright, only individual works.

(Now, is this unjust, in the case of it copying the style of some particular author/artist? Idk, maybe? But my impression is that copyright doesn’t protect styles, and that it probably shouldn’t protect styles in general… so I guess maybe if we had a law making a special case forbidding the (deliberate?) copying of a person’s particular style via some kind of machine learning model? Idk.)

tsimionescu · 2025-01-31T06:43:32 1738305812

The argument is that the LLM itself is essentially like a complex lossy archive of its entire training set. It's like an mp3 of all of the songs on Spotify, in some sense (of course, using all text on the internet instead of all songs on Spotify). This is the sense in which it is considered to be a copy of all of this.

galaxyLogic · 2025-01-31T06:43:07 1738305787

Very insightful. Consider there are many "recreational imitators" who mimic how specific (famous) people speak. They are not violating copyright, they are just imitating a way of speaking.

echoangle · 2025-01-31T07:43:35 1738309415

I don’t think this is a good argument because the way of speaking isn’t a copyright issue. I don’t think you have copyright on your specific way of speaking, only on specific recordings of you yourself speaking.

galaxyLogic · 2025-02-01T02:45:13 1738377913

I think I'm saying the same thing. Style is not something that can be copyrighted but I hadn't much thought about it before. I'm not a lawyer. I guess trademarks are something, and design-patents perhaps that let you have IP over style. :-)

visarga · 2025-01-31T06:58:19 1738306699

> There's nothing special about an LLM, there's no learning

The model is 100-500x smaller than its training set. That is something hinting at learning, as direct storage is impossible.

toast0 · 2025-01-31T07:14:59 1738307699

Video compression ratios vs raw video is amazing too, but there's no learning and there's no doubt that the compressed form is subject to copyright.

Aloisius · 2025-01-31T05:14:41 1738300481

> "Where a human inputs their own copyrightable work and that work is perceptible in the output, they will be the author of at least that portion of the output," the guidelines said.

This policy is sensible. Most AI generated works should be uncopyrightable, except where a substantial human contribution is in the output.

Simply describing a picture and letting AI generate it shouldn't be enough for the same reason that dictating what you want to a painter isn't enough to earn you copyright over the resulting painting.

I would be wary about integrating too much AI output into works one wants to enforce copyright over without some level of documentation. The nightmare scenario is having your copyright stripped away because of evidence one used AI extensively.

NitpickLawyer · 2025-01-31T07:46:36 1738309596

> Simply describing a picture and letting AI generate it shouldn't be enough

Interesting take, and I've heard this many times. I'm curious to explore this further and see why you think that is, and where do you draw the line?

Is it the "low effort"? Is it the "automated" stuff? Is the process of setting it up, prompting it and choosing a result not enough "creative input"? If so, why?

Let's take a "real world" example as analogue. Say I setup a camera on a tripod. I set it to take pictures every 1 second, and leave it there. Come back 1hr later, and go through the pictures. I select one of the sunset I like, and post it. Would I not have copyright on that picture? I wasn't there when it was taken. But I did setup the camera and selected the end result. How is that different?

Taking it back to genAI, say I build/train/finetune my own model. Would it now have enough "effort" from me that I can use those generations? Is this an effort thing or is it more? Or is it just that someone else did the work?

What about random "art"? As in art based on random numbers. Say I write a script in python to use random math formulas to "draw" on a canvas. I let it run for a couple of hours, come back, look at the results and select one. Do I not get copyright on the resulting "art" because it was randomly generated by a script? Does it matter if the script was written by me? Would it be different if you download my script and generate the art yourself? Would you not have copyright?

I guess what I'm trying to say is "where do we draw the line?". It's not clear to me why people say "simply prompting and selecting isn't creative enough". This distinction wasn't there before. Plenty of "art" out there based on random processes + curation. Why the sudden change?

njarboe · 2025-01-31T05:50:21 1738302621

If the painter is doing a "work for hire" you should get the copyright.

Aloisius · 2025-01-31T05:55:03 1738302903

They can if they buy the copyright from the painter.

They just can't get it from the government because they are not, in any sense, the author of the creative work.

galaxyLogic · 2025-01-31T06:46:05 1738305965

Right, you cannot copyright such output, is now clear(er).

But what about the other direction, can distributing such AI generated content VIOLATE somebody else's copyright?

If output of AI cannot be copyrighted, can it violate copyright?

ilaksh · 2025-01-31T06:14:09 1738304049

It says they were not able to reproduce an image with the same prompt. So they just didn't know about seeds?

BeefySwain · 2025-01-31T06:21:00 1738304460

Why is a binary (compiled machine code) protected by copyright, but the raw output of an AI model is not?

andsoitis · 2025-01-31T06:55:21 1738306521

Courts have ruled that compilation does not remove originality—the binary is still a transformation of an original, copyrighted work (the source code).

realusername · 2025-01-31T07:02:56 1738306976

Because binaries are a transformation of the source code, which is written by a human.

Other kind of binaries which are fully generated by a machine like private keys aren't copyrightable.

Animats · 2025-01-31T07:04:47 1738307087

US copyright applications are not examined, in the sense that patents are. Issued patents are presumptively valid. Registered copyrights are not. Whether a copyright application is valid has to be determined by a court.

sublinear · 2025-01-31T08:56:30 1738313790

I'm pretty confident the copyright office was massively overthinking it in 1965 and knocked it out of the park far beyond the watered down and ignorant arguments we hear today. It's sad really.

philippta · 2025-01-31T07:26:09 1738308369

I think the main two questions everyone need clarified are:

1. Can I get sued by a 3rd party when using AI generated work in my project?

2. Can I sue a 3rd party when they use my AI generated work in their project?

jarsin · 2025-01-31T03:50:30 1738295430

When uploading books to kindle direct publishing you have to state that you own the copyright and publishing right.

So any book or story on Amazon that was generated substantially via prompting should now have to be removed based on this guidance from the copyright office.

furyofantares · 2025-01-31T05:11:45 1738300305

You can publish public domain content on kindle.

https://kdp.amazon.com/en_US/help/topic/G200743940

Aloisius · 2025-01-31T05:31:43 1738301503

Yeah, though Amazon could just make their own copy available without compensating the uploader.

cyberax · 2025-01-31T04:17:21 1738297041

That's incorrect. Purely factual books (like phone dictionaries or map atlases) are perfectly fine for publishing.

feoren · 2025-01-31T05:01:47 1738299707

Purely factual books are copyrightable. It is the collection and curation of those facts that is protected. You cannot just copy someone else's 100 Amazing Facts about The Rainforest verbatim; if you publish 100 Cool Truths about The Jungle and it has those same 100 facts, you'll get sued and they'll easily win.

jcranmer · 2025-01-31T05:25:15 1738301115

The EU and the UK generally has something akin to "sweat of the brow", where collections of facts that took time to collate are copyrightable.

But in the US, Feist v Rural explicitly disavowed the sweat of the brow doctrine, and said that facts have no copyright value--a work requires a quantum of original creative spark to be copyrightable (it was discussed in the context of phone books--a phone book does still have some residual "thin copyright", but the listing of phone numbers is not copyrightable, and it is actually difficult to infringe on the thin copyright of a phone book). In the US, your example would easily be found to be not infringing, if the only similarity were reproducing the same 100 facts.

schoen · 2025-01-31T05:40:03 1738302003

However, if it states the facts in exactly the same way, that could be considered infringing because of the creativity presumably involved in deciding how to state each fact.

"Elephants are enormous mammals, usually grey in color, with significant intelligence and social habits. They are native to portions of Africa and Asia, with different elephant species found in each region. They are famous for their strong and dexterous trunks, which can also be used to communicate something like a trumpet. Humans, especially in South Asia, have long admired elephants and used them for transportation and various kinds of work; African elephants are famous for having been ridden to war by the Carthaginians, to the dismay of their Roman opponents. Today elephants are significantly threatened by various human activities, both those intentionally directed at the elephants (like killing them for meat or for the ivory derived from their tusks) and those not deliberately meant to affect them (like deforestation). While our word for elephants comes from the Greek, it was probably borrowed by the ancient Greeks from another language family."

I just wrote this paragraph about elephants based on my own knowledge. There is no copyrightability of the substantive information here (e.g. if you learned something new about elephants, you can tell other people) but there is probably some copyrightability of the paragraph based on things like the creativity of my word choices. That distinction can sometimes create confusion in discussions about "facts", and I'm not positive that legal standards that are meant to clarify it have always given a clear and workable rule.

bryanrasmussen · 2025-01-31T06:14:36 1738304076

that is not a list of facts however. This is a list of facts

Largest Land Animals By Size:

    African Bush Elephant

    Asian Elephant

    African Forest Elephant

...

futybt · 2025-01-31T04:46:23 1738298783

[flagged]

dboreham · 2025-01-31T05:13:58 1738300438

Haiku?

drewcoo · 2025-01-31T05:29:46 1738301386

Burma Shave