Hacker News new | past | comments | ask | show | jobs | submit | more rimeice's comments login

The network crumbling effect


Logged in to FB for the first time in years to post something on the marketplace last night, briefly checked the feed to see what it’s like and was greeted by a picture of a slightly different AI generated, scantily clad volleyball player every 5/6 posts…all with ~150k likes. The quality of content on that platform is so low it beggars belief.


Been running my startup for 7 years, not a unicorn but fairly successful. Can honestly say I’ve not made it through a single one of these books. Certainly have tried and have got a few nuggets, but nothing has compared to learning this stuff first hand, on the job. Sure having passable knowledge of this list is useful, but you’re going to hire in people/consultants/external providers to do almost every module on this list for you at some point, you’ll learn from them firsthand. You don’t need to read this list to get a business off the ground and by the time you do you’ll need to surround yourself with these specialists in some form pretty quickly.


Yeh I just can’t see it working well. Surely it would be so uncomfortable to work up even a minor sweat with this thing strapped to your face.


I think I'd s/"unlimited abundance"/"nothing to lose" - particularly in startups. In the small businesses I've worked in, it's when the times a tough, the biggest and most critical bets have been made often with the most conviction. That sort of situation I think also drives some of those motivational and incredibly innovative behaviours in people.


Value has been taken. Motivation to create has been taken if the second you upload something it gets sucked in to a model that any old person can create infinite iterations.


The sooner anyone making profit from models trained on creators proprietary content start paying for the content they’re using the better for creators, society and even the AI companies. It’s pretty tiring hearing people argue about whether copyright law applies to AI companies or not. It applies. Just get on and sort out a proper licensing model.


> "The bottom line is this," the firm, known as a16z, wrote. "Imposing the cost of actual or potential copyright liability on the creators of AI models will either kill or significantly hamper their development."

> The firm said payment for all of the copyrighted material already used in LLMs would cost the companies that built them "tens or hundreds of billions of dollars a year in royalty payments."

https://www.businessinsider.com/marc-andreessen-horowitz-ai-...


Doesn't that kind of demonstrate the value being actively stolen from the creators, more than anything? Copyright law killed Napster, too. That doesn't mean applying copyright law was wrong.


And now the guy who started Napster is on the board of Spotify who just decided they weren’t going to pay small time artists for their music anymore. Go figure.


Spotify and the rights holders come together and agree on the right price. Unfortunately, since long before Spotify ever existed, it's usually the record labels who owns the rights to the music and the actual creators still get shafted.


Except in this case, 10s of thousands of rights holders are just getting nothing as of the start of 2024 and I can tell you, Spotify certainly did not “come together” with any of us.


Didn’t they basically say they’re no longer going to bother paying for songs earning under $3 USD per year.

It seems like the only people that will be impacted are the abusive auto-generated spammer accounts with thousands of garbage tracks uploaded garnering 1200 streams a year by people accidentally playing them via Google Home misinterpretations etc.


1000 streams per year for each song. That’s not just that auto-generated junk. That’s a majority of ALL music on Spotify.

So yes an individual song might be $3 per year but that just shows how poor their royalties are to begin with. And tries to obscure the fact that artists don’t just release one song ever.

There’s thousands of artists who maybe even were somewhat successful at some point in their career but would have a lot of songs in their back catalog that don’t get that many streams annually. Suddenly they’ve gone from not making enough per stream from Spotify, to just getting paid nothing at all.


Of course it was wrong. Abolish all copyright.


Starting a business is huge amounts of risk to begin with. Just because you may lose more doesn't mean you're magically exempt from being able to ignore that.

Watching the superstars of venture capital whine that copyright is unfair is quite something, though.


“Payment for all workers who develop fields or man factories would cost the companies that operate them hundreds of thousands of dollars a year in salary payments”

- slavers, probably.

Of course slavery != AI, but the argument that we should protect companies from their expenses to enable their bad business model is very entitled and presumptuous.

Thousands of companies have failed because their businesses models didn’t work, and thousands more will.

AI will be fine. It probably won’t be as stupidly lucrative as the current model, but we’ll find a way.


There are many areas of research, technological advancement, and construction which would proceed much more quickly than their current pace if we didn't force them do things in the way that society has decided is correct and just.


The constitution is clear that copyright is not "just;" it is a concession for the sole purpose of promoting "the Progress of Science and useful Arts," whose "unnatural restriction of natural rights must be carefully balanced."

The fact that "research, technological advancement, and construction would proceed much more quickly" without copyright is exactly why abolishing copyright is the just and correct thing to do.


“The constitution” is certainly not in favor of abolishing copyright, as it explicitly provides for its legality and usefulness.

I am not sure where the “unnatural restriction” quote comes from, could you illuminate me?

As far as what the constitution says about copyright, it seems only to say that it is “To promote the Progress of Science and useful Arts”


What do you expect them to say, "we're cool with making less money and making it harder to make money?"


Sounds like they need to find someone good at developing business models who can help them figure one out.


Oh no, the horror of the cost of doing business when you can't get away with a get-rich-quick-scheme fast enough to cash out and disappear.


> Imposing the cost of actual or potential copyright liability on the creators of AI models will either kill or significantly hamper their development.

which, as an involuntary donor, is exactly what I want


Good.


Well. Fuck those guys.


> It applies.

If only saying it would make it so.

Unfortunately, it's not easy to make this legal argument given how copyright law only protects fixed, tangible expressions, not ideas, concepts, principles, etc. and has a gaping hole called 'fair use.'


The new York Times has examples where GPT will reproduce world for word exactly paragraphs of their (copyrighted) text if you ask it to. That's a pretty fixed tangible expression I think.


MidJourney, too.

https://twitter.com/Rahll/status/1739003201221718466

It's frankly impressive how well this image is embedded in the weights of their model, down to tufts of hair. And it's far from the only one.


For sure, that could be an instance of infringement depending on how it is used. But that's a minuscule percentage of the output and still might be fair use (read the decision in Authors Guild, Inc. v. Google, Inc.). But even if that instance is determined to be infringement, it doesn't mean the process of training models on copyrighted work is also infringement.


I can see 3 ways that you can guarantee that the output of a model never violates copyright

1. Models are trained with 100% uncopyrighted or properly licensed input data

2. Every output of the ML model is evaluated to make sure it's not too close to training data

3. Copyright law is changed to have a specific cutout for AI

#1 is the approach taken by Adobe, although it generally is harder or more expensive to do.

#2 destroys most AI business models

#3 has been done in some countries, but seems likely that if done in the US it would still have some limits.

For example, I could train a model on a single image, song, or piece of written text/code. Then I run inference, and get out an exact copy of that image, song, or text. If there are no limits around AI and copyright, then we've got a loophole around all of copyright law. I don't think that the US would be up for devaluing intellectual property like that.


The much more likely outcome:

4. A ruling comes down that enshrines what all the big companies have been doing (with the blessings of their armies of expensive, talented, and conservative legal teams) as legitimate fair use


The much more likely scenario is that there is a precedent-setting court case. This is how it happened with practically every other instance of copyright bumping into technology.


> The new York Times has examples where GPT will reproduce world for word exactly paragraphs of their (copyrighted) text if you ask it to

You are forgetting the massive asterisk that you need to provide multiple paragraphs of the original article in order to get verbatim output from chatgpt. In what world are people doing that to avoid paying the NYT?


This is absolutely not true. Here is an article where ars technica tried it

https://arstechnica.com/tech-policy/2023/12/ny-times-sues-op...

And this is a screenshot of their session whith copilot

https://cdn.arstechnica.net/wp-content/uploads/2023/12/Scree...


This does not match what GP claimed, that when prompted with the start of an article, GPT 3 (mostly) faithfully completes it. The original article also claimed that it seemed to have been patched shortly after publication.


It does match exactly what I claimed and it also states that even though the behavior was patched in GPT the ars people were able to easily reproduce it in copilot.

Its funny that the behavior was patched if OpenAI believes it isn't copyright infringement.


That is evidence that GPT Can violate copyright, not that all of the outputs do.

It supports an argument that GPT shouldnt produce outputs that are extremely similar, not that the content can not be used as an input.


I suspect a single verbatim output of sufficient length is enough to poison the entire weight set as a derivative work

as well as all the output it ever generated


I dont necessarily disagree, but by what logic or argument do you make that case?

Does that mean that models that can not produce copies of X length ARE fair use?


> Does that mean that models that can not produce copies of X length ARE fair use?

not necessarily

"sufficient but not necessary" I believe is the term


Sure, but then you need an additional line of rationale and logic to cover those other cases.


I suspect that single case will catch all of them


Thats pretty silly. you can just put a gatekeeper to prevent it from spitting out anything too similar, or prevent a user from forcing it to. it is not an intractable or pervasive problem.

it is a fringe case that rarely occurs, and only with a lot of user prompting.


the weights themselves are still a derivative work even if they post-filter

legal discovery could almost certainly compel the LLM host to provide access to the output of the weights themselves without the "gatekeeper" present


I don't buy the argument that the models are not sufficiently transformative. Nobody would look at a weight table and confuse it for the original work, and it is different in basically every way.

If there is a case to be made, I think it has to be around the original use of the works, the transcription process. Not the weights, or the output


No one would look at a veracrypt archive containing a copyrighted work and confuse it either. They look very different to the original files but both the encrypted file and the learning model's weights allow one to reproduce the copyrighted work.


Not falling back to the argument that the output is infringing, not the archive.

If the archive can't produce the original work, it's not infringing.

If you printed the binary of Harry Potter and sold it as a painting, that would be fair use. It doesn't matter if the data is encoded in it, if it is not used for extraction.

Think of Andy warhol's Campbell Soup. Nobody is going to confuse the art for a can of soup and try to eat it. That's not being sold as a label for other soups. However, the original Campbell Soup data is absolutely encoded there.


[flagged]


Maybe the "rat fucks" just wanted to make cool images?


You can use this cool technology to do that without stolen labor. It's called "drawing".


More likely they just wanted to make money


By giving the model away for free?


And profit off of it?


>the better for [...] AI companies Yes exactly. Kill off the free alternative since nobody else can afford licenses with big rights holders. Google will love this. Creators will get pennies. Everyone has to go through proprietary apis, OS will be outlawed. Not something I would want to see!


> anyone making profit from models

It's actually just "anyone making models". If you train a model with other people's art (without their permission) and then distribute the model or output for free, your still stealing their work, even if you make zero profit.


I think it's worth noting that Adobe Firefly and iStock already have generators that use only licensed content.


Highly doubtful.

Yes, I know Adobe said so. No, I don't trust them.

Facts:

1. Adobe Firefly is trained with Adobe Stock assets. [1]

2. Anyone can submit to Adobe Stock.

3. Adobe Stock already has AI-generated assets that are not correctly tagged so. [2]

4. It's hard to remove an image from a trained model.

Unless Adobe carefully scrutinize every image in the training set, the logical conclusion is Adobe Firefly already contains at least second-handed unauthorized images (e.g. those generated by Stable Diffusion). It's just "not Adobe's fault".

[1] https://www.adobe.com/products/firefly.html : "The current Firefly generative AI model is trained on a dataset of licensed content, such as Adobe Stock, and public domain content where copyright has expired."

[2] Famous example: https://twitter.com/destiny_thememe/status/17448423657672255...


It applies - but how it applies is simply not settled. Clearly, there is no statistical model that can launder IP. If someone uses Midjourney to produce someone else's IP, that user (not Midjourney) would be in violation of copyright if they use it in a way that does not conform to the license, such as by creating their own website of cat and girl comics and running Adsense on it.

However, if other artists are inspired by this style of comic, and it influences their work - that is simply fair use. If that artist is some rando using a tool like Midjourney - that is inspired by the art but doesn't reproduce it - it is not at all clear to me that this is not also fair use.


The point is that Midjourney itself, the model weights, is possibly a derivative work of all of the works that went into its training. The fact that it can produce almost identical copies of some of them, especially if not carefully prevented from doing so through secondary mechanisms, is obvious proof that it directly encodes copies of some of these works.

That already clearly means that they couldn't publish the model directly even if they wanted to, since they don't have the right to distribute copies of those works, even if they are represented in a weird lossy encoding. Whether it's legal for them to give access to the model through an API that prevents returning copyrighted content is a much more complex legal topic.


> possibly a derivative work of all of the works that went into its training.

or… it is possibly a transformative work of the all the works that went into it's training, which would lead to a strong fair use argument. Given how permissive the courts have been with transformative work, this seems like an easier argument to make.


But again, we’ve seen these models spit out verbatim text and images that are copyrighted (many examples throughout this thread). That doesn’t strike me as “transformative work.”


it is possible that those works are not transformative, but 99.9% of the output is.

Human artists can also create copies when instructed to.

That doesn't mean that the rest of their work isn't transformative, or that the process of leading isn't fair use.

Similarly, the law doesn't bar artists from learning, but provides recourse, when and if artists create and sell copes.


>Clearly, there is no statistical model that can launder IP.

Of course. The model isn't making a decision as to what may be used as training data. The humans training it do.

>If someone uses Midjourney to produce someone else's IP, that user (not Midjourney) would be in violation of copyright

That's like saying that if a user unpacks the dune _full_movie.zip I'm sharing online, it's them who have produced the copyrighted work. And me, the human who put the movie Dune into that zip file, is doing no wrong. Clearly, there is no compression algorithm that can launder IP, right?

>However, if other artists are inspired by this style of comic, and it influences their work - that is simply fair use

The AI isn't inspired by anything. It's not a sentient being, it's not making decisions, and its behavior isn't regulated by laws because it does not have a behavior of its own. Humans decide what goes into an AI model, and what goes out. And humans who train AI models on art don't get "inspired". They transform it into a derivative work — the AI model.

One that has been shown to be awfully close to dune_full_movie.zip if you use the right unpacking tools. But even that isn't necessary. Using work of others in your own work without permission and credit usually goes by less inspiring words: plagiarism, theft, ripping off.

Regardless of whether you reproduce the work 1:1, and whether you can be punished by law for it.

>tool like Midjourney - that is inspired by the art but doesn't reproduce it

Never in the history of humanity has the word inspired meant something that a tool can do. If it's "inspired" (which is something only sentient beings can do), then we should be crying out about human right abuses the way the AI models are trained and treated.

If it's just a tool, it's not "inspired".

You can't have your cake and eat it too. Either pay your computer minimum wage for working for you, or stop saying that it can get "inspired" by art (whether it's training an AI model or creating a zip file).


Yeah, "influenced" would have been a better word choice.


Why do you as a hacker shamelessly support copyright?


s/profit/any revenue what so ever


It's pretty tiring hearing people think anything like what you're saying is going to happen. The horse is out of the barn already, deal with it.


> The horse is out of the barn already, deal with it.

Like leaded gas, the government can make regulations to deal with anything should they choose to.

Why do you think it's okay for massive companies to freely profit off the work of others?


Leaded gas isn't open source software. Try again.


Most of the big AI models aren't open source and even if they were? You can absolutely regulate them.


[flagged]


> Go ahead and legislate it, in a free economy world your country is fucked, and stupid, if they do.

This is a tired argument that has been repeated every time almost any regulation has been proposed let alone implemented.

We could be more economically competitive with the world if we had no holidays and worked 13 hour days too.


Yeh, this. The frequency of babies named Jeffrey[1] follows this pattern. Maybe all this crime was committed by Jefferys?

[1] https://namerology.com/2022/03/08/the-ultimate-baby-name-gra...


The funniest thing is that when you're stopped in the flow for some weight discrepency in the UK, the attendant will quickly type in their code and null the error without investigating it at all, move on to the next customer who's blocked with the same thing and repeat. Seems to totally void to point of having these anti-trust systems.


Agree, the experience could be _signiifcantly_ improved with this assumption of user behaviour. Would be interesting to do a study to see how many people acutally take advantage of the "non weighing" systems.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: