Hacker News new | past | comments | ask | show | jobs | submit login

> To build LAION, founders scraped visual data from companies such as Pinterest, Shopify and Amazon Web Services — which did not comment on whether LAION’s use of their content

Pinterest, "their content"... not sure I agree with that but if we're going to use the logic that things saved to Pinterest by it's users becomes Pinterest's content then isn't LAION doing the same thing and the content becomes LAION's content when it's saved to their database of images...




There is an T&C agreement between Pinterest and its users. There is no T&Cs agreement between between LAION and Pinterest's users.


There's also no agreement between Pinterest and the actual copyright owners of most of their content, so much reposted art without even credit or a link to the source.

Plus, LAION is just an index, whereas Pinterest actually hosts it.


Worse, Pinterest even ranks highly in Google, often higher than the original.

If you'd now go and do the same, create a website that rehosts known content, Google will simply delist you.

But magically, not for Pinterest, they get a boost instead. I've always wondered what kind of under the table deal makes this possible.


i've never understood the policy either, especially because it actively makes google images worse


If LAION republishes people's copyrighted content, that sounds like a pretty blatant copyright violation (edit: it's not; see below). Sounds like all the artists unhappy that their art is being used to train these AI systems, should be talking to LAION to have their content removed from the dataset.

Edit: Apparently LAION doesn't republish the content, only the metadata, so it's not a copyright violation. Still, it would be nice if got permission or offered a way for artists to be excluded from the data set.


>Apparently LAION doesn't republish the content, only the metadata

"LAION datasets are simply indexes to the internet, i.e. lists of URLs to the original images together with the ALT texts found linked to those images."

I'm surprised link rot doesn't make this a big problem.


There are many twitter datasets that are already 50% gone. It's very bad for reproducibility


> Still, it would be nice if got permission or offered a way for artists to be excluded from the data set.

It obeys robots.txt, and the user-agent is documented.


robots.txt generally only addresses permission to crawl; not permission to reproduce the content.


Why does Pinterest get away with republishing people’s content? Shouldn’t artist be suing Pinterest for its blatant copyright violations?


you have to use a section 230 DCMA take down notice if the user post your stuff with out permission.

https://policy.pinterest.com/en/copyright


I assume they have all users click on a license that has a clause allowing them to use all images uploaded however they want, buried deep within the text somewhere.


You can read here(in german): https://www.alltageinesfotoproduzenten.de/2023/02/20/laion-v...

that the non-profit LAION is going as far as intimidating the creators of the images they used.

They don't publish the image, yes, and that is also their reasoning. However, in my opinion, the intention behind all of this seems obvious. They circumvent copyright claims by being a non-profit and not publishing, but with the clear intent that the image will be used to train some system further down the road. LAION appears to be a key player in how these text-to-image models dodge the copyright bullet.


Intimidating? They mention that if you sue them on copyright ground when they do nothing copyright related, they are able to claim damages. Which seems pretty fair, as the claims by the photographers are clearly in bad faith.

Links are still mostly legal in Germany, if the link is to something which is not, it’s a different situation and different from "hey, I own the copyright of my images, don’t link to them!"


We all know the purpose for which the images linked in the dataset will be used. "We are a non-profit organization and provide only a link" is akin to taking people for fools. Why can't these systems be trained exclusively on images for which people have given consent or for which money has been paid?


> is akin to taking people for fools.

I very much disagree.


Because expensive. It's the usual "buy low, sell high" story, unless forbidden.


Yea they are used in a transformational capacity


> They don't publish the image, yes, and that is also their reasoning.

No, their reasoning is that they only keep a link to the image so there’s nothing to remove.

A more interesting angle would be copyrightable ALT tags in the form of poems or other creative content. But, the cat is already out of the bag as it’s probably easy enough to strip poetry or other copyrightable out of alt tags with the technology we’ve got at this point.


> No, their reasoning is that they only keep a link to the image so there’s nothing to remove.

I'm well aware of it, and that's also what I wanted to say. However, as I mentioned, the dataset with the links (to copyrighted material) is provided for machine learning purposes. Saying "We only provide links" is a lazy excuse from a guy with a smirking face.


> Still, it would be nice if got permission or offered a way for artists to be excluded from the data set.

If artists don't want their work to impact the world, they're free to keep it to themselves.

This whole discussion that we should allow individual artists to opt out of AI art through contracts or some other legal vehicle is a non starter, because it'll be impossible to administrate and enforce at scale, and there's too much incentive and ability for big tech to just ignore them and steamroll artists. They aren't a unified bloc, and even if they were how would they ever compete against big tech?

So what to do? Looking at productivity gains over the decades, it's not clear why we are still working as hard as we are. It's long overdue that productivity gains should come back to the people. Maybe "artist" shouldn't be a job title associated with profit/income seeking. If you want to be an artist, maybe society can support that.

Maybe instead of using all those productivity gains to do more more more, we can just work less for the same. Because it seems to me the more we work, the more they get richer. What if instead, they didn't get so rich, and we gave that money to artists in the form of grants, like we do for scientists. You do some art, apply for some grants, and you get some money to do more art. It'll all be public domain, anyone can use it, and big business gets to make a profit on it just like with scientific advancements (I have issues about that, but at least there's precedent).


I think there's plenty of room for a finer-grained permission framework and clear demand for one. I give it about a year, maybe 1.5 for all the interested parties and advocates to align behind a standard and make it happen via big-company enforcement and the occasional copyrught lawsuit (or, more commonly, enforcement via TOS).

Once that happens, of course, the "black market" data aggregators won't care, but they're small and their product can't be used by legitimate channels so it can't compete in the mainstream market of ideas. What capital will do to screw over artists though is... Pay them. Once a framework for AI seed rights-granting is in place, a hundred or a thousand legit artists can produce enough AI seed-feed to legally supplant the work of hundreds of thousands.

There will still be room for the artist-as-celebrity with a unique style that makes their art worth owning as much for the fact that it came from them as for the content of the canvas or the sculpture, but a huge, guaranteed-work, bread-and-butter space for the visual artist, advertising and entertainment media asset creation, will dry up as companies backfill their art needs with functionally-free-to-them mass-generated close-enough assets (advertising in particular is going to be full of this... Remember when "head first" photos of people were a thing for awhile? Look forward to trends like that, over and over again, forever).


> all the interested parties and advocates to align behind a standard

This is the thing that will never happen, because all artists are interested parties, and as I said, they are not a unified bloc. So whatever solution big companies come up with for themselves, we all know ahead of time that they will 1) overwhelmingly benefit big corporations and 2) but insufficient to address artist concerns. When they coalesce around whatever standards they end up with, artists will still largely be making the same complaints.


Too true. I should probably have said "All parties with enough political and capital clout to make trouble for other parties." The disorganized masses are disorganized and usually don't end up with a seat at the table if they don't organize.


It's copyright infringement all the way down.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: