Hacker News new | past | comments | ask | show | jobs | submit login

Even if using LLaMA turns out to be legal, I very much doubt it is ethical. The model got leaked while it was only intended for research purposes. Meta engineered and paid for the training of this model. It's theirs.



Did Meta ask permission from every user they trained their model on? Did all those users consent, and when I say consent I'm saying was there a meeting of minds not something buried in page 89 of a EULA, to Meta building an AI with their data?

Turnabout is fair play. I don't feel the least bit sorry for Meta.


They don't ask permission when they're stealing users' data, so why should users ask permission for stealing their data?

https://www.usatoday.com/story/tech/2022/09/22/facebook-meta...


But it doesn't copy any text one to one. The largest one was trained on 1.4 trillion tokens, if I recall correctly, but the model size is just 65 billion parameters. (I believe they use 16 bit per token and parameter.) It seems to be more like a human who has read large parts of the internet, but doesn't remember anything word by word. Learning from reading stuff was never considered a copyright violation.


> It seems to be more like a human who has read large parts of the internet, but doesn't remember anything word by word. Learning from reading stuff was never considered a copyright violation.

This is one of the most common talking points I see brought up, especially when defending things like ai "learning" from the style of artists and then being able to replicate that style. On the surface we can say, oh it's similar to a human learning from an art style and replicating it. But that implies that the program is functioning like a human mind (as far as I know the jury is still out on that and I doubt we know exactly how a human mind actually "learns" (I'm not a neuroscientist)).

Let's say for the sake of experiment I ask you to cut out every word of pride and prejudice, and keep them all sorted. Then when asked to write a story in the style of jane austen you pull from that pile of snipped out words and arranged them in a pattern that most resembles her writing, did you transform it? Sure maybe, if a human did that I bet they could even copyright it, but I think that as a machine, it took those words, phrases, and applied an algorithm to generating output, even with stochastic elements the direct backwards traceability albeit a 65B convolution of it means that the essence of the copyrighted materials has been directly translated.

From what I can see we can't prove the human mind is strictly deterministic. But an ai very well might be in many senses. So the transference of non-deterministic material (the original) through a deterministic transform has to root back to the non-deterministic model (the human mind and therefore the original copyright holder).


LLaMa was trained on data of Meta users, though.


I was sleepy, I meant to say that it WASN'T trained on data of Meta users.


I feel like most-everything about these models gets really ethically-grey — at worst — very quickly.


It's an index of the web and our own comments, barely something they can claim ownership on , and especially to resell.

But OTOH, by preventing commercial use, they have sparked the creation of an open source ecosystem where people are building on top of it because it's fun, not because they want to build a moat to fill it with sweet VC $$$money.

It's great to see that ecosystem being built around it, and soon someone will train a fully open source model to replace Llama


What did they train it on?


On partly copyrighted text. Same as you and me.


Meta as a company has shown pretty blatantly that they don't really care about ethitcs, nor the law for that sake.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: