Hacker News new | past | comments | ask | show | jobs | submit login

But the CC license requires attribution (and, in some variants, prohibits commerical use). It's not clear to me that IBM followed these requirements--presumably a trained neural network counts as a derived work, so far as the CC license is concerned.



I wouldn't be too sure.

On the one hand, US Copyright is very explicitly attached to a piece of work, not the facts or ideas contained in it. Per 17 USC 102: > In no case does copyright protection for an original work > of authorship extend to any idea, procedure, process, > system, method of operation, concept, principle, or > discovery, regardless of the form in which it is described, > explained, illustrated, or embodied in such work.

You are therefore free to create works of your own that analyze facts contained in others' copyrighted works, comment on their ideas, and so on. This is always true if you don't include any of their copyrighted work, and often true, via fair use, if you only include the small pieces needed for your commentary. Accordingly, it seems pretty clear that you could analyze a huge collection of copyrighted portraits and do whatever you want with the results (distribution of hair colors, eye positions relative to nose, how these vary within/across individuals, and so on).

The counterargument would appear to be that derivative works include the "abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted." Neural networks do seem to store their training data, at least in some form, and there's a fuzzy line between extracting some facts from data (which is fine) and omitting data to create an abridged version (which isn't).

I wouldn't want to bet much either way, but I do think it would be a little odd for copyright to limit how much you can "learn" from something, either manually or via machine.


>presumably a trained neural network counts as a derived work

I just spent about 15 minutes trying to confirm that and... I have no idea. I suppose it's not surprising that a software engineer would not be able to suss that out in 15 minutes. Every definition I find tends to focus on on art, visual and auditory creations.

Disregarding a legal interpretation (you know, the thing that actually matters), I can see it either way. Certainly the model is based off of data derived from the characteristics of these images. On the other hand, if I saw e.g. a shade of blue in one of these images that I liked, would I need to provide attribution if I measured it and used it in my own work? I have no idea, I suppose I'm just thinking out loud here. I do understand the taking something to a logical extreme (the color example) is not the end all be all of legal arguments.


Isn't the purpose of a ML model to describe actual facts, derived mechanically and not creatively, and thus might not be subject to US copyright in the first place?


It may not constitute a new copyright (like a database copyright on a new collection of stuff copyrighted by others), but that doesn't invalidate existing copyrights and the license requirements under which the copyrighted material was used.

But I'm not a lawyer, so no idea how the interactions are in that situation. The closest analogy I could think of are sampling in music: how much must the original works be atomized before they don't count anymore?


But you could argue that the weights of the model wouldn't be what they are without the copyrighted work. Since their model does use the work, apart of it was "derived" from that piece.


This isn't about the various senses the English word "derived" can have, it's about the specific meaning of the legal term "derivative work." That would start with including major elements of another work and those elements being subject to copyright (e.g. in the US, book titles are not). There's no point in talking about how one could argue that something wouldn't be what it is without something else, that's not the legal basis at all. In no legal sense are the Indiana Jones movies derivative works of the old 1930's serials just because they were sources of inspiration.


Being subject/able to copyright, complying with the license, and not infringing on the original work are separate issues


This is nuanced enough it would need to be tested in court. I suspect we won't really be sure of the answer for at least another 20 years. Computers have existed for 80 years and we're still debating if APIs are copyrightable.


Almost certainly a trained neural network would be classified as a "collection" under copyright. Interestingly, I suspect that this collection is not a derived work of the individual photos because the photos are not actually included in the trained neural net. Instead the network collects facts about the photos that allows it to recognise the faces.

Not a lawyer and I bet you could get a case to go to court arguing otherwise, but this is my guess on what the result would be.


I guess you can use the trained neural network to generate (hallucinate) images, and those images will depend on the original training images, and may contain features of those images.

I suppose that even if you consider the neural network as a black box, you can generate images that bear some resemblance to the training data in some indirect way. For example by walking along a gradient of the output with respect to changes in the input.


Even with the color example, there are lots of restrictions on the use of certain colors for certain uses. For example, a paint store may not be able to mix a certain shade of paint that is (somehow) legally “owned” by a specific brand.


I thought that had to do with either trademark (e.g. logo for a competing company) or the method by which a color is manufactured (assuming it is not simple/obvious.) I don't believe you can literally lock down a color with no exceptions.


This is exactly right.

First, you can't copyright a color, period. You can trademark a color, but that trademark only applies to a very specific use of that color. For instance, you can paint your house or non-delivery-service business in "UPS Brown" without fear, but you couldn't use it in conjunction with a delivery service.

The purpose of trademark is to eliminate customer confusion, where they may think they're doing business with one entity when in fact they are doing business with another. Non-confusing uses of trademarks are legal.


> a paint store may not be able to mix a certain shade of paint that is (somehow) legally “owned” by a specific brand.

A paint store can legally mix paints with those colors, unless you've told them that you're going to be using the color for a trademark-infringing purpose.

Stores may voluntarily decline to mix trademark colors to avoid even the possibility of a lawsuit, but it isn't legally required.


TFA mentions contacting the Flickr users who uploaded the images, so for all we know, they were attributed appropriately. (Not all CC licenses are CC-BY anyway.)

The problem of treating a neural network as derived work is exactly why IBM said they wouldn't use the dataset in their products. Instead, they'll likely train various different networks, note which ones performed best, write a paper about it and throw the trained networks away. So long as they do that, they're not infringing on anyone's copyright.


The CC licenses where created before using this kind of content for machine learning was a widely known possibility, probably should be rewritten to create a new type of license where you can disallow this kind of usage.


Are you sure on that timing?

The CC non-profit was set up in 2001 according to Wikipedia, yet people were going things like face detection in the 80s which would have required using images of faces


2001 was before mobile phones with cameras. It was years before big social networks. Years before the selfie became common. There simply weren’t that many images of faces on the web, comparatively speaking. As Creative Commons arose, the idea of harvesting an enormous amount of ordinary people’s faces was not something people were thinking about.


yeah I'm pretty sure of that timing hence my usage of the phrase 'widely known'.

Sure, there were face detection projects before 2001, with much lower quality results, and maybe not even familiar to the creators of CC.

Controlling a fetus' DNA is just getting ramped up, but of course there has been a good deal of academic knowledge of the possibilities before hand. Do you think there may be some laws or contractual formats worked out in the last few years that might apply in the area, yet have not adequately taken the implications into consideration?

Time's arrow being what it is I expect there are.


huh, I just noticed I wrote where instead of were.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: