Look at the comments, even in this thread, defending Sam. I thought this could be dealt with by the courts. But it clearly needs stronger medicine; there is simply too much hubris. Congress may need to act; let’s start with states’ AGs. (Narrow clarifications on copyright and likeness rights, to start. Prosecution from there. The low-hanging fruit is anyone in Illinois who was enrolled in WorldCoin.)
We, as a community, have been critical of crypto. AI is different—Altman is not. He is a proven liar, and this episode demonstrates the culture he has created at a multibillion-dollar firm that seems destined, in the long run, for—at best—novel conservatorship.
We have a wider problem. It isn’t “alignment.” It’s 19th-century fast-talking fraudsters clothing themselves in the turtlenecks of dead founders.
We aren't a community. Please stop this constant attempt to create tribes and then (presumably) become the tribe's representative. We are individuals with different opinions. The individuals on HN are more amenable to reason than the average discussion; persuade them into a way of thinking.
Of course we are. That doesn’t coerce anyone into a groupthink, and it doesn’t mean everyone is a part of it. I’m simply asking anyone who held one opinion to find harmonics with another.
HN def is a community, in some sense of the word, but I also dont like the phrasing and emphasis on "we" having done something or we having a problem and whatnot.
HN promotes debate and individual opinions, so statements like in parent just have bad connotations to them.
> dont like the phrasing and emphasis on "we" having done something
Fair enough. I cited two we’s.
In the first—where “we, as a community, have been critical of crypto”—I juxtapose this community at large’s scepticism of crypto with its giving the benefit of doubt to people similar to crypto’s worst in AI.
In the second—where “we have a wider problem”—I intended to reference Silicon Valley at large. I did not intend to imply that Sam Altman is HN’s problem to solve. (Though I welcome the help, if you have sway in D.C. or your state capital. I am not sure I have the bandwidth to take on a political project of this magnitude.)
> I did not intend to imply that Sam Altman is HN’s problem to solve.
In consideration that Sam Altman was the former president of YCombinator, and Hacker News is financed by this company, it would be true hacker spirit (as in Hacker News) to subvertingly use Hacker News to "solve" the Sam Altman problem. :-D
We are definitely a community. Sadly, "community" has little meaning in he modern day compared to 30+ years ago. I'm in a neighborhood but I don't feel a sense of "community" with most of them. I have a job but don't have a sense of "community" with most of my coworkers.
I feel as if this is a community. I certainly participate in it, as if it was, and that I have certain personal Responsibilities, as a member of that community.
That does not mean that everyone is on the same page, though. Probably the only thing everyone agrees on, is that this is a valuable personal resource.
If this was groupthink, I'd go somewhere else. BTDT. Got the T-shirt.
Sam Altman is a shameless weasel. Guy has no sense of decency. He got caught redhanded a few times already, what does it takes to unseat him? And whoever defends Altman for their love of ChatGPT, it wasn’t Altman who was behind it, it was a large team that is now dissipating away. Altman just wants to enrich himself at your expense
> it wasn’t Altman who was behind it, it was a large team that is now dissipating away
Credit where it is due: he corralled and inspired a difficult team. He oversaw the release of ChatGPT. His team wanted him back, because they believed he would make them rich. On that promise, he delivered.
Nobody has committed high crimes in this saga. But the lying has become endemic, from Altman to the culture. The situation screams for oversight, perhaps—most proximately—via consent decree.
> Nobody has committed high crimes in this saga. But the lying has become endemic
This is why honesty is valued. If you look and sound squeaky clean, it's likely that you are honest and trustworthy. If you blatantly steal a famous actress' voice then lie about it, what else will you steal?
The part you’re missing (IMO) is that he’s in that position because folks want him to ‘steal’ for their benefit, and are overlooking that he is likely going to steal to their detriment too.
It’s behind the ‘you can’t con an honest man’ thing.
good question! I don't really know, but you can tell it isn't money; he admitted that himself anyway and reportedly he has no equity in openai[0].
my optimistic hypothesis is he really wants to control AGI because he believes he can make a more efficient use of it than other people. he might even be right and that's what scares me, because I don't trust his measures of efficiency.
I'd rather not let my pessimistic fantasies run wild here.
1) more money is not necessarily a goal for these people - it's what for they want more money and why they believe they can spend it better than everyone else (regardless if they truly can)
2) in a post-AGI world money may be an obsolete concept
A proxy yes. But not everyone leverages it that way. So it really depends. Some do just want to hoard as much as possible, others want to lobby, others want fame, others want legacy.
Money helps with all those, but will not passively do that stuff.
It is Sam and it's not Sam; it's that attitude that "no" is just "no right now", and that attitude that whittles down people against their will.
I've been working long enough, serious ageism kicks in if I say how long, but in this situation my decades of experience point at Sam as toxic, I can say all it takes is one idiot like Sam to destroy the work of a their entire organization. That "no for now" attitude destroys far far more than the issue they are pushing, it destroys basic trust.
I don't want to defend Altman. He may or may not be a good actor. But as an engineer, I love the idea of building something magical, yet lately that's not straightforward tinkering - unless you force your way - because people raise all sorts of concerns that they wouldn't have 30 years ago. Google (search) was built on similar data harvesting and we all loved it in the early days, because it was immensely useful. So is ChatGPT, but people are far more vocal nowadays about how what it's doing is wrong from various angles. And all their concerns are valid. But if openai had started out by seeking permission to train on any and every piece of content out there (like this comment, for example) they wouldn't have been able to create something as good (and bad) as ChatGPT. In the early search days, this was settled (for a while) via robots.txt, which for all intents and purposes openai should be adhering to anyway.
But it's more nuanced for LLMs, because LLMs create derivative content, and we're going to have to decide how we think about and regulate what is essentially a new domain and method and angle on existing legislation. Until that happens, there will be friction, and given we live in these particular times, people will be outraged.
That said, using SJ's voice given she explicitly refused is unacceptable. It gets interesting if there really is a voice actor that sounds just like her, but now that openai ceased using that voice, the chances of seeing that play out in court are slimmer.
Google search linked to your content on your site. It didn't steal your content, it helped people find it.
ChatGPT does not help people find your content on your site. It takes your content and plays it back to people who might have been interested in your site, keeping them on its site. This is the opposite of search, the opposite of helping.
And robots.txt is a way of allowing/disallowing search indexing, not stealing all the content from the site. I agree that something like robots.txt would be useful, but consenting to search indexing is a long, long way from consenting to AI plagiarism.
>Point is we couldn't have a way of consenting to ai training until after we had llms.
Sure we could have. even if we're talking web 1.0 age, Congress passed a law as early as the early 00's for email, which is why every newsletter has to have a working unsubsribe link. so it's not impossible to do so.
regardless, consent is a concept older than the internet. Have an option of "can we use your data to X?" and a person says yes/no. It's that simple. we can talk about how much we cared 30 years ago, but to be frank that mistrust is all on these tech companies. They abused the "ask for forgiveness" mentality and proceeded to make the internet nearly impossible to browse without an ad blocker. Of course people won't trust the scorpoion one more time.
As an conrast, look at Steam. It pretty much does the same stuff on the inside, but it reinvests all that data back to the platorm to benefit users. So people mind much less (and will even go to war for) having a walled garden of their game library. Short sighted, but I understand where it comes from.
> But if openai had started out by seeking permission to train on any and every piece of content out there...
But why would anyone seek permission to use public data? Unless you've got Terms and Conditions on reading your website or you gatekeep it to registered users, it's public information, isn't it? Isn't public information what makes the web great? I just don't understand why people are upset about public data being used by AI (or literally anything else. Like open source, you can't choose who can use the information you're providing).
In the case being discussed here, it's obviously different, they used the voice of a particular person without their consent for profit. That's a totally separate discussion.
>why would anyone seek permission to use public data?
first of all it's not all public data. software licenses should already establish that just because something is on the internet doesn't mean it's free game.
Even if you want to bring up an archive of the pre-lawsuit TOS, I'd be surprised if that mostly wasn't the same TOS for decades. OpenAI didn't care.
>Isn't public information what makes the web great?
no. Twitter is "public information" (not really, but I'll go with your informal definition here). If that's what "public information" becomes then maybe we should curate for quality instead of quantity.
Spam is also public information and I don't need to explain how that only makes the internet worse. and honestly, that's what AI will become if left unchecked.
> Like open source, you can't choose who can use the information you're providing
That's literally what software licenses are for. You can't stop people from ignoring your license, but breaking that license opens you wide open for lawsuits.
The right to copy public information to read it does not grant the right to copy public information to feed it into a for-profit system to make a LLM that cannot function without the collective material that you took.
That's the debatable bit, isn't it. I will keep repeating that I really don't see a difference between this and someone reading a bunch of books/articles/blog posts/tech notes/etc etc and becoming a profficient writer themselves, even though they paid exactly 0 money to any of these or even asked for permission. So what's the difference? The fact that AI can do it faster?
If people used the correct term for it, "lossy compression", then it would be clearer that yeah, definitely there's a line where systems like these are violating copyright and the only questions are:
1. where is the line that lossy compressions is violating copyright?
2. where are systems like chatgpt relative to that line?
I don't know that it's unreasonable to answer (1) with that even an extremely lossy compression can violate copyright. I mean, if I take your high-res 100MB photo, downsample it to something much smaller, losing even 99% of it, distributing that could still violate your copyright.
Again, how is that different than me reading a book then giving you the abridged version of it, perhaps by explaining it orally? Isn't that the same? I also performed a "lossy compression" in my brain to do this.
> is that different than me reading a book then giving you the abridged version of it, perhaps by explaining it orally?
That seems like a bad example, I think you are probably free to even read the book out loud in its entirety to me.
Are you able to record yourself doing that and sell it as an audiobook?
What if you do that, but change one word on each page to a synonym of that word?
10% of words to synonyms?
10% of paragraphs rephrased?
Each chapter just summarized?
The first point that seems easier to agree on isn't really about the specific line, just a recognition that there is a point that such a system crosses where we can all agree that it is copying and that then the interesting thing is just about where the boundaries of the grey area are (i.e. where are the points on that line that we agree that it is and isn't copying, with some grey area between them where we disagree or can't decide).
In one case, you are doing it and society is fine with that because a human being has inherent limitations. In other case, a machine is doing it which has different sets of limitations, which gives it vastly different abilities. That is the fundamental difference.
This also played out in the streetview debate - someone standing in public areas taking pictures of surroundings? No problem! An automated machine being driven around by a megacorp on every single street? Big problem.
There's an unstated assumption that some authors of blog posts have: if I make my post sufficiently complex, other humans will be compelled to link to my post and not rip it off by just paraphrasing it or duplicating it when somebody has a question my post can answer.
Now with AIs this assumption no longer holds and people are miffed that their work won't lead to engagement with their material, and the followers, stars, acknowledgement, validation, etc. that comes with that?
Either that or a fundamental misunderstanding of natural vs. legal rights.
- a human will be an organic visitor that can be advertised to. A bot is useless
- A human can one day be hired for their skills. An AI will always be in control of some other corporate entity.
- volume and speed is a factor. It's the buffet metaphor, "all you can eat" only works as long as it's a reasonable amount for a human to eat in a meal. Meanwhile, a bot will in fact "eat it all" and everyone loses.
- Lastly, commercial value applies to humans and bots. Even as a human I cannot simply rehost an article on my own site, especially if I pretend I read it. I might get away with it if it's just some simple blog, but if I'm pointing to patreons and running ads, I'll be in just as much trouble as a bot.
> I really don't see a difference between this and someone reading a bunch of books/articles/blog posts/tech notes/etc etc and becoming a profficient writer themselves
tangential, but I should note that you in fact cannot just apply/implement everything you read. That's the entire reason or the copyright system. Always read the license or try to find a patent before doing anything commercially.
To me it's more like photocopying the contents of a thousand public libraries and then charging people to access to your private library. AI is different because you're creating a permanent, hard copy of the copyrighted works in your model vs. someone reading a bunch of material and struggling to recall the material.
> Google (search) was built on similar data harvesting and we all loved it in the early days, because it was immensely useful. So is ChatGPT, but people are far more vocal nowadays about how what it's doing is wrong from various angles.
Part of that is that we've seen what Google has become as a result of that data harvesting. If even basic search engines are able to evolve into something as cancerous to the modern web as Google, then what sorts of monstrosities will these LLM-hosting corporations like OpenAI become? People of such a mindset are more vocal now because they believe it was a mistake to have not been as vocal then.
The other part is that Google is (typically) upfront about where its results originate. Most LLMs don't provide links to their source material, and most LLMs are prone to hallucinations and other wild yet confident inaccuracies.
So if you can't trust ChatGPT to respect users, and you can't trust ChatGPT to provide accurate results, then what can you trust ChatGPT to do?
> It gets interesting if there really is a voice actor that sounds just like her, but now that openai ceased using that voice, the chances of seeing that play out in court are slimmer.
It's common to pull things temporarily while lawyers pick through them with fine-toothed combs. While it doesn't sound like SJ's lawyers have shown an intent to sue yet, that seems like a highly probable outcome; if I was in either legal teams' shoes, I'd be pulling lines from SJ's movies and interviews and such and having the Sky model recite them to verify whether or not they're too similar - and OpenAI would be smart to restrict that ability to their own lawyers, even if they're innocent after all.
As an engineer, the current state of LLMs is just uninteresting. They basically made a magical box that may or may not do what you want if you manage to convince it to, and fair chance it'll spout out bullshit. This is like the opposite of engineering.
In my opinion, they're extremely interesting... for about a week. After that, you realise the limitations and good-old-fashioned algorithms and software that has some semblance of reliability start to look quite attractive.
> Look at the comments, even in this thread, defending Sam. I thought this could be dealt with by the courts. But it clearly needs stronger medicine; there is simply too much hubris.
The courts are a lagging indicator.
This is the entire "move fast break things" play, it's legislative arbitrage. If you can successfully ignore the laws, build good will in the public, and brand yourself in an appealing way, then the legal system will warp around whatever you've done and help you pull the ladder up behind you.
On the other hand, if you break a bunch of stuff that people actually like and care about, politicians might feel like they need to do something about that. If you burn all your good will such that both political parties agree that you're a problem, that is when the play can go sideways.
> Look at the comments, even in this thread, defending Sam.
The comments are overwhelmingly less defending Sam specifically and more pointing out that the voices don't really sound similar enough to definitively conclude theft from a specific person. Having heard a couple samples, I'm inclined to agree with them.
> We, as a community, have been critical of crypto.
"We" as a community have come to no such consensus. Speak for yourself.
Look at the comments, even in this thread, defending Sam. I thought this could be dealt with by the courts. But it clearly needs stronger medicine; there is simply too much hubris. Congress may need to act; let’s start with states’ AGs. (Narrow clarifications on copyright and likeness rights, to start. Prosecution from there. The low-hanging fruit is anyone in Illinois who was enrolled in WorldCoin.)
We, as a community, have been critical of crypto. AI is different—Altman is not. He is a proven liar, and this episode demonstrates the culture he has created at a multibillion-dollar firm that seems destined, in the long run, for—at best—novel conservatorship.
We have a wider problem. It isn’t “alignment.” It’s 19th-century fast-talking fraudsters clothing themselves in the turtlenecks of dead founders.