Stack Overflow users deleting answers after OpenAI partnership

mixedmath · 2024-05-08T21:49:02 1715204942

About 5 years ago, StackOverflow messed up and declared that they were making all content submitted by users available under CC-BY-SA 4.0 [1]. The error here is that the users-content agreement was that all users' contributions are made available under CC-BY-SA 3.0 (and not anything about later). In the middle there were also some licensing problems concerning code vs noncode that were confusing.

I remember thinking that if any of the super answerers really wanted, they could have tried to sue for illegally making their answers available under a different license. But I thought that without any damages, this probably wasn't likely to succeed.

But now I wonder whether making all content available to AI scrapers and OpenAI in particular might be enough to actually base a case. As far as I can tell, StackOverflow continued being duplicitous with what license applies to what content for half of the year 2018 and the first few months of the year 2019. Their current licensing suggests CC-BY-SA 3.0 for things before May 5 2018, and CC-BY-SA 4.0 for things after. Sometime in early 2019 (if memory serves, it was after the meta post I link to), they made users login again and accept a new license agreement for relicensing content. But those middle months are murky.

I should emphasize that I know nothing.

[1]: https://meta.stackexchange.com/q/333089/205676

frognumber · 2024-05-09T01:41:30 1715218890

My understanding of licensing law is that something like 3.0 -> 4.0 is very unlikely to be a winnable case in the US.

Programmers think like machines. Lawyers don't. A lot of confusion comes from this. To be clear, there are places where law is machine-like, but I believe licensing is not one of them.

If two licenses are substantively equivalent, a court is likely to rule that it's a-okay. One would most likely need to show a substantive difference to have a case.

IANAL, but this is based on one conversation with a law professor specializing in this stuff, so it's also not completely uninformed. But it matches up with what you wrote. If your history is right, the 2019 changes is where there would be a case.

The joyful part here is that there are 200 countries in the world, and in many, the 3.0->4.0 would be a valid complaint. I suspect this would not fly in most common law jurisdictions (British Empire), but it would be fine in many statutory law ones (e.g. France). In the internet age, you can be sued anywhere!

lifthrasiir · 2024-05-09T07:45:39 1715240739

> If two licenses are substantively equivalent, a court is likely to rule that it's a-okay. One would most likely need to show a substantive difference to have a case.

Which does exist and can affect the ruling. CC notably didn't grant sui generis database rights until 4.0, and I'm aware of at least one case where this could have mattered in South Korea because the plaintiff argued that these rights were never granted to and thus violated by the defendant. Ultimately it was found that the plaintiff didn't have database rights anyway, but could have been else.

9991 · 2024-05-09T04:55:00 1715230500

If there wasn’t a substantive difference, then there’s no need to make the change.

ZiiS · 2024-05-09T05:10:05 1715231405

A super literal reading of some bad wording in 3.0 created an effect the authors say they did not intend and fixed in 4.0. Given the authors did not intend this interpritation a judge is likly to assume people using the licence before it came to light also did not, hence switching to 4.0 is fine. Conversly now this is widiy known continuing to use 3.0 could be seen as explicitly choosing the novel interpritation (arguably this would be a substantive change).

moefh · 2024-05-09T05:43:50 1715233430

> a judge is likly to assume people using the licence before it came to light also did not

Why would the judge have to assume anything? The person suing could simply tell the judge they did mean to use the older interpretation, and that they disagree with the "fix". They're the ones that get to decide, since they agreed to post content using that specific license, not the "fixed" one.

ZiiS · 2024-05-09T09:59:56 1715248796

A license is between two parties neither gets to choose exactly how it is interprited.

moefh · 2024-05-09T12:50:11 1715259011

But the people suing aren't trying to choose how the license is interpreted, they're trying to prevent the other party from changing the text. If the change is meant to "fix" how the text should be interpreted (which is what you said), then they're the ones trying to choose the exact interpretation.

reddalo · 2024-05-09T05:40:33 1715233233

The fact itself that programmers keep insisting on writing "IANAL" is maybe an example of that.

A court would probably not agree on the fact that writing "IANAL", not the full sentence, is a sufficient disclaimer.

jszymborski · 2024-05-09T05:49:52 1715233792

I personally write "IANAL", not to reduce my personal legal liability, but rather to give a heads up to those reading that I am not an expert, that I am likely wrong, and that you likely shouldn't listen to me.

technion · 2024-05-09T05:55:36 1715234136

I feel there's a common thread that maybe should be some kind of internet law that people who make a point of noting they are not experts, are more often correct than people who confidently write as though they are.

You see this particularly with crypto, where "I am not a crypto expert" is usually accompanied by a more factual statement than one from the self proclaimed expert elsewhere in the thread.

Terr_ · 2024-05-09T07:32:23 1715239943

In addition to "humility implies self-awareness", I'd like to point out a parallel thread of "disclosure implies honesty and diligence."

S201 · 2024-05-09T05:59:24 1715234364

https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

Kuinox · 2024-05-09T07:42:58 1715240578

You can look it up and the Dunning Kruger effect is probably not real.

fao_ · 2024-05-09T13:58:26 1715263106

It's less that it's not real, but rather that the common interpretation of it is utterly false.

WesternWind · 2024-05-09T06:27:03 1715236023

When I was younger there was a short period I thought it meant that a person was just really anal about details.

frognumber · 2024-05-10T16:33:22 1715358802

It's complex.

One cannot legally practice law without a license. The definition of that varies by jurisdiction. Fortunately, in my jurisdiction, "practicing law" generally implies taking money, and it's very hard to get in trouble for practicing law without a license. However, my jurisdiction is a bit of an outlier here. Yours might differ.

In general, the line is drawn at the difference between providing legal information and legal advice.

Generic legal discussions, like this one, are generally not considered practicing law. Legal information is also okay. If I say "the definition of manslaughter is ...," or "USC ___ says ___," I'm generally in the clear.

Where the line is crossed is in interpreting law for a specific context. If I say "You committed manslaughter and not murder because of ____, which implies ____," or "You'd be breaking contract ____ because clause 5 says ____, and what you're doing is ____," that's legal advice.

The reasons cited for this are multifold, but include non-obvious ones, such as that clients will generally present their case from their perspective. A non-lawyer will be unlikely to have experience with what questions to ask to get a more objective view (or even if the client is objective, what information they might need to make a determination). Even if you are an expert in the law, it's very easy to accidentally give incorrect advice, which can have severe consequences.

In practice, most of this is protectionism. Bar associations act like a guild. Lawyers are mostly incompetent crooks, and most are not very qualified to provide legal advice either, but c'est la vie. If you've worked with corporate lawyers, this statement might come off as misguided, but the vast majority of lawyers are two-bit operations handling hit-and-runs, divorces, and similar.

In either case, it's helpful to give the disclaimer so you know I'm not a lawyer, and don't rely on anything I say. It's fine for casual conversation, but if tomorrow you want to start a startup which helps people with legal problems, talk to a qualified lawyer, and don't rely on a random internet post like this one.

makeitdouble · 2024-05-09T06:08:03 1715234883

Do you actually need a disclaimer ?

I always assumed it was the same type of courtesy as IMHO, and someone taking legal advice from random strangers on the internet wouldn't result in any legal liability on the side of the commenters.

ggffjhgftg · 2024-05-09T09:10:21 1715245821

Yes, people have been sued before for giving advice that was acted upon.

I remember hearing about an construction engineer who was sued for giving bad advice whilst drunk to a farmer over fixing a dam. The dam failed and the engineer was found to be liable.

makeitdouble · 2024-05-09T09:55:46 1715248546

I can see the reasonning behind the case, as the engineer has plausible expertise in the domain and could credibly give actionable advice.

When it comes to lawyers, there is already a legal framework where lawyers are responsible when giving legal advice, even when it's not toward their clients, the same way medical professionals have specific liabilities regarding the medical acts they can perfom.

Non lawyers giving legal advice doesn't fit that framing, except if they explicitely pose as one. I'd also exclude malicious intent, as whatever the circumstances, if it can be proven and results in actual harm there's probably no escape for the perpetrator.

llamaimperative · 2024-05-09T11:21:23 1715253683

That’s possible because the engineer is licensed. A random guy giving bad advice and failing to disclose he’s not an engineer would do no such thing (so long as he didn’t suggest he was an engineer).

sidewndr46 · 2024-05-09T02:24:55 1715221495

It is worth remembering that law professors have a vested interest in making sure the system work as you described. If contract law was straightforward, they'd be out of job.

AnarchismIsCool · 2024-05-09T04:04:35 1715227475

That's an admirable goal but if there are any "bugs" in the contract you probably don't want it executed mindlessly. Human language isn't code and even code isn't always perfect so I'd rather not be legally required to throw someone out a window because someone couldn't spell "defederate".

frognumber · 2024-05-09T03:15:54 1715224554

I agreed in the abstract, but not in the specific (the specific professor was one of integrity, and sufficiently famous this was not an issue).

However, it's worth noting the universe is a cesspool of corruption. If you pretend it works the way it ought to and not the way it does, you won't have a very good time or be very successful. The entire legal system is f-ed, and if you pretend it's anything else, you'll end up in prison or worse.

kragen · 2024-05-08T21:56:04 1715205364

> if any of the super answerers really wanted, they could have tried to sue for illegally making their answers available under a different license.

they can plausibly sue people other than stackoverflow if they attempt to reuse the answers under a different license. but i think it's very difficult to find a use that 4.0 permits that 3.0 doesn't

StevenXC · 2024-05-08T22:19:54 1715206794

3.0 has a "bug" that makes it risky to use materials without very careful attribution:

https://doctorow.medium.com/a-bug-in-early-creative-commons-...

miohtama · 2024-05-09T00:35:37 1715214937

I don't think this is a practical issue, really.

I assume linking to the original answer is sufficient attribution.

In the link you can find name, license and figure out if the answer was modified.

Also linking the answer in a source comment is the smallest professional courtesy everyone should be doing.

If you have some issue of not linking an answer then you likely do not deserve the answer in the first place.

eviks · 2024-05-09T05:47:46 1715233666

The blog illustrates that such assumptions about what's a sufficient attribution are fraught with danger, so "the smallest professional courtesy" can expose you to a $150k risk

trueismywork · 2024-05-10T16:08:47 1715357327

If it is indeed CC-BY-SA then, openAI needs to publish their weights under the same license.

drivingmenuts · 2024-05-09T01:40:57 1715218857

People put their content on the site for the public to use, and now the public is using it, it's just that "the public" includes AIs. Admittedly, a non-human public, nonetheless ...

_xivi · 2024-05-09T05:39:55 1715233195

The problem is LLMs don't provide attribution/credit which directly violates the license[0]

Otherwise search engines were already "non-human public" that scraped the site but directly linked to the answers, which was great. They didn't claim its their work like these models. The problem isn't human vs non-human. LLMs aren't magic, they don't create stuff out of thin air, what they're doing is simply content laundering.

[0] https://creativecommons.org/licenses/by-sa/4.0/#ref-appropri...

postepowanieadm · 2024-05-09T06:57:31 1715237851

You have to agree on how your work may be used, no one has expected it will be sold for ai training.

LouisSayers · 2024-05-08T22:17:38 1715206658

I'm actually perfectly fine if StackOverflow wants to sell an answer I made to help train AI.

For me, the purpose of providing an answer is to help save others (and my future self) time, and I don't really mind if someone uses that in a private product - especially if it helps tools like ChatGPT which provide an insane amount of value given the low monthly price.

JimDabell · 2024-05-09T05:49:48 1715233788

> I'm actually perfectly fine if StackOverflow wants to sell an answer I made to help train AI.

I’m not.

This was a collaborative effort to make the lives of programmers easier, and the data was always meant to be a public good. OpenAI – and, more importantly – all the other LLMs with pockets that aren’t as deep – should be able to just download the database and train on it for free.

I don’t care about any license. I don’t care about attribution. Learning isn’t copying, so copyright is irrelevant. I contributed about a thousand answers to Stack Overflow, all with the understanding that anybody can download and use them for free, not so they can be locked up by Stack Overflow.

What concerns me with deals like this is that it’s altering the cultural norm to expand copyright to cover not just copying, but use. Deals like this being made by OpenAI makes it more likely to cause pushback at the social and legal level when other LLMs are trained without these deals in place.

It’s akin to – and can possibly result in – regulatory capture, making it difficult for new startups to compete with OpenAI.

jessriedel · 2024-05-10T16:26:52 1715358412

> the data was always meant to be a public good.

The words are a copyleft-able public good. Concepts, facts, and ideas are not; anyone can use them for anything, including making money. If you're actually worried about specific wording or other creative choices being unjustly used improperly by an LLM, then by all means that should be enforced. But those examples are just very rare, because the LLMs are very good at extracting facts from prose.

theteapot · 2024-05-09T01:25:36 1715217936

Good for you. I'm not. I contributed answers to StackOverflow because I use answers other have contributed to StackOverflow, not to ChatGPT, not for ChatGPT to monetize. I don't use ChatGPT and probably never will.

sp332 · 2024-05-09T01:32:53 1715218373

But the content you posted to SO was already permissively licensed. Other people can copy it, and make derivative works, and even charge money for them, as long as they cite your SO handle as the author. https://meta.stackexchange.com/questions/347758/creative-com...

gtirloni · 2024-05-09T01:39:12 1715218752

ChatGPT is not citing anything. It can't possibly do that reliably with LLM weights alone.

nox101 · 2024-05-09T04:35:39 1715229339

(1) The announcement (https://stackoverflow.co/company/press/archive/openai-partne...) says things will be attributed in both the 2nd and 3rd paragraph

(2) It's only likely to attribute if it quotes verbatim... Just like a human. when I tell someone I learned that Array.map's second parameter passed to the callback is an index to the value just pass, I don't add "And learned this on Stack Overflow from user gtriloni". It's just knowledge that I learned.

The only time I'd attribute is if copied a snippet of code or a paragraph to quote in a blog post. For me at least, that almost never happens. It take the knowledge I learned and apply it to my own code. It's rare if ever there is a something on S.O. so useful that I copy it verbatim.

anileated · 2024-05-09T06:42:24 1715236944

> Just like a human

An LLM is not a human. It is a tool operated by a, in this case, for profit entity. It has no human rights, but its operator has all relevant legal obligations.

If it was, as you say, “just like a human” in relevant ways (think, feel, have self-awareness, etc.) then it would effectively be a slave subjected to extreme abuse.

Either it is a tool that generates derivative works at mass scale for profit and its operator should be liable for licensing/attribution violations, or it is a conscious being and we should immediately stop abusing it. Pick your poison.

thinkling · 2024-05-09T22:40:10 1715294410

Bing's version of ChatGPT/GPT4 cites sources. My limited unterstanding is that it uses your question to do a web search, brings the results into the context window, and then generates an answer that cites sources.

OpenAI could integrate StackOverflow the same way.

mesid · 2024-05-09T07:38:05 1715240285

Doesn't Phind do this? It cites sources in its responses.

scubbo · 2024-05-09T04:58:29 1715230709

"The person you are upset with is technically permitted to do the thing that you are upset about" is not a good counter-argument to someone's distaste. Whether or not the licensing agreement _permits_ this usage, it is not the usage that the contributor (to whom you are replying) foresaw and was enthusiastic about.

sp332 · 2024-05-09T05:26:20 1715232380

I'm not telling them how to feel. They've been wrong for a long time.

random_cynic · 2024-05-09T02:17:18 1715221038

[flagged]

paulryanrogers · 2024-05-09T02:55:22 1715223322

Name calling and dismissive responses aren't going to win anyone over.

Please be more considerate.

Der_Einzige · 2024-05-09T03:48:40 1715226520

[flagged]

forgetfreeman · 2024-05-09T04:04:26 1715227466

One generally doesn't have to lean into phrases like "legitimate tactics" and "rhetorical power" when they've got the moral, ethical, or intellectual high ground. Telling people they're idiots is about the most counter-productive single strategy for addressing human stupidity ever conceived. 1. they won't believe you 2. they'll ignore everything else you have to say because you're a dick. So the real question is, who hurt you?

xwowsersx · 2024-05-09T04:31:08 1715229068

@dang Many individuals in this thread seem to require a gentle reminder regarding the expected etiquette on HN. https://news.ycombinator.com/newsguidelines.html

theteapot · 2024-05-09T02:34:50 1715222090

I think you're projecting something. Oblivion awaits you as it awaits these Gatekeepers of yours.

forgetfreeman · 2024-05-09T04:01:25 1715227285

Oh your cheerleading here is going to age like milk when unemployment numbers start ramping up in white collar sectors. For the record, when construction and industrial jobs got deleted the chorus line was "retrain for service industry work". When service industry and white collar jobs really start getting the same treatment, what's the move now? We're literally running out of economic sectors to pretend folks can be funneled into.

coliveira · 2024-05-09T04:10:56 1715227856

All of this would be fine if the wealth were shared by the population. The big problem is that wealth is concentrated and only a small group will benefit from these technology shifts.

forgetfreeman · 2024-05-09T09:48:18 1715248098

It's weird how our species has had evergreen problems around resource allocation for at least the last few thousand years.

CamperBob2 · 2024-05-09T05:10:23 1715231423

Oh your cheerleading here is going to age like milk when unemployment numbers start ramping up in white collar sectors.

You don't seem to understand that this is the goal. A very worthy one.

We won't get to a post-scarcity economy by doing the same things -- and the same jobs -- that got us this far.

forgetfreeman · 2024-05-09T09:50:11 1715248211

You what now? You think AI is the path to luxury space communism? I'm missing the part where the 0.1% that owns and controls basically everything shrug and lean into redistribution of wealth...

throwanem · 2024-05-09T05:28:57 1715232537

They'll tell us to retrain for construction and heavy industry.

uberman · 2024-05-09T00:12:42 1715213562

The price to get an answer from stack overflow is usually free as most questions have already been asked and answered. You dont even need an account.

wrsh07 · 2024-05-09T01:07:23 1715216843

They do serve ads, we should probably stop pretending "funded by ads" is the same as free. Your attention isn't free.

ssl-3 · 2024-05-09T03:39:39 1715225979

Suppose I walk up to a tent at a festival that has a big sign that says "FREE BEER", and I ask a person there for a beer. They hand me a beer, and I go on my way. Was the beer free? I think was free.

Now, suppose I walk up to a Budweiser-branded tent at a Budweiser festival that has a big sign with a Budweiser logo on it that says "FREE BEER", and I ask a person there who is wearing a Budweiser polo shirt, a Budweiser lanyard, and a Budweiser hat for a beer. They hand me a beer in a Budweiser-branded cup, and I go on my way. Was the beer free?

I think that both of these beers were free.

PaulDavisThe1st · 2024-05-09T04:56:00 1715230560

Now suppose you walk up to a tent that offers you free beer, but before they give it you, you have to burn 2% of your phone's battery watching an ad from them. Then they hand you the beer and you go on your way. Was the beer free?

jl6 · 2024-05-09T06:59:52 1715237992

And they also put a tag on your ankle identifying you as someone who likes beer, so that beer salesmen can come knock on your door tonight.

ssl-3 · 2024-05-09T07:35:37 1715240137

We've somehow gone from this:

> They do serve ads [...] Your attention isn't free.

to something like this:

> They tag my ankle to mark me as a person who enjoys beer, and make me watch an ad until 2% of my phone's battery is depleted, and then they come to my home and knock on my door at night to sell me beer.

...which... I mean, huh?

Stack Overflow is invading your body, restricting your personal liberty, and visiting your home? Really? That's a fucking thing now?

aspenmayer · 2024-05-09T07:52:45 1715241165

I think they were extending the original point you were responding to, and remixing your own mixed metaphor of free beer.

In the attention economy, advertising has a cost that is borne by the advertiser and the consumer, up to and including loss of property rights in the case of content relicensure and trespass upon devices leading to excess battery usage, as well as loss of privacy due to geotargeted ads.

ssl-3 · 2024-05-09T09:20:32 1715246432

>I think they were extending the original point you were responding to, and remixing your own mixed metaphor of free beer.

Perhaps. But having been to many festival environments, I can definitely imagine a tent offering "free beer" that is actually approximately free -- both with, and without a slathering of advertising. (Actually, I don't really have to imagine it -- I've been there and have had that free beer.)

I can't imagine them coming to my house and knocking on my door at night to sell me more of it, though. That's absurd.

>In the attention economy, advertising has a cost that is borne by the advertiser and the consumer, up to and including loss of property rights in the case of content relicensure and trespass upon devices leading to excess battery usage, as well as loss of privacy due to geotargeted ads.

Well, sure. When viewed on a long-enough timeline, it becomes abundantly clear that nothing is actually free, comrade.

I can produce my own beer on a hypothetical plot of land that nobody owns, and that nobody else wants to use, and I can give someone one of these beers. For "free."

But it still has a cost. (And this, too, is an absurd reduction.)

aspenmayer · 2024-05-09T09:49:53 1715248193

> I can't imagine them coming to my house and knocking on my door at night to sell me more of it, though. That's absurd.

I interpreted that as a tongue-in-cheek hyperbolic metaphor relating to the ways that ad auction networks and other kinds of geofencing and geotargeting allow for deanonymization and reidentification of individuals for conversion tracking and behavioral analysis.

That’s the thing about these technologies - they’re dual-use in the sense that those who see the upsides use them generally with good intentions and ideally with affirmative consent. Just like the relicensed content, though, once the data is collected, the original creators, publishers, and third parties may not be able to control where it ends up, which is a negative externality, I think most would agree.

wrsh07 · 2024-05-09T14:37:24 1715265444

My question is "how valuable is your time?"

I think at a festival it's a little tricky to value (if it pulled you away from seeing your favorite band play a song, maybe this cost you the equivalent of $X, where that's what you would pay to see them perform that song. If no bands were playing, you walk over while chatting with friends - the same thing you'd be doing if there were no free beer tent - it was free)

When I'm on stack overflow my time is valuable. I'm programming which can pay me something like $50-300/hour (maybe more?)

How expensive is the 1 second I spend reading an ad? Let's call it $50/3600. Is that expensive? By my most conservative estimate it's over 1¢.

Should we round that down to free given that I've spent hours/many page loads on stack overflow? I guess that's up to you.

ssl-3 · 2024-05-09T23:34:05 1715297645

I mean, we can play that game if you want. Let's suppose that if we look hard enough, that every opportunity has a cost.

"Oh, a free concert downtown on Saturday? And you can pick me up at 2? Yeah, I do really like that band, and I sure would like to go -- that's pretty exciting, thanks for the invite!

But instead of making plans with you right now, I'd rather tell you about all of the ways I could be using my time on that Saturday afternoon instead.

No, no. It's not that I don't want to go. I just want to really drive home the idea that there's an opportunity cost to attending, so it can't really be free -- it can't be a free show for you, or for me, or for anyone else that goes. It's important to me that you realize that this "free concert" is anything but free.

Listen, I don't know what you mean by "dead-ass loser." I'm just being a realist here!

Oh, so now you're saying that you're not going to pick me up on Saturday? Some friend you are! I haven't even fully amortized this yet!"

wrsh07 · 2024-05-10T20:09:43 1715371783

I think we're maybe gleefully posting past each other, but the point I'm trying to hit is that business models matter. Stack overflow provides a service. It's a good service. They host a great q&a platform for developers and myriad other category enthusiasts.

However, they have a business model. They are categorically different than eg Wikipedia. It's important to understand that.

This business model matters because it tells you what economic forces will lead them to do. When business models break down at public companies they commit acts of desperation. On an ad run site that will mean more ads, more invasive ads, etc.

As you're forced to sit through 30s unskippable ads on YouTube I hope you think "I'm so glad this is free"

ssl-3 · 2024-05-10T21:46:54 1715377614

I mean... Over here in my little reality, I have never seen ads on YouTube or on Stack Overflow.

aspenmayer · 2024-05-09T04:06:57 1715227617

Unironically, folks are being triggered by trigger warnings now.[1]

Imagine how “free” the beer in your hypothetical scenario is to an alcoholic struggling to stay sober.

Capitalism commoditizes even protest against it and repackages it as a product or service.

None of this is to assign blame to good faith actors in a so-called free market, nor is it to abdicate responsibility on behalf of so-called free agents. Just a counterpoint.

[1] https://pjvogt.substack.com/p/what-do-trigger-warnings-actua...

dorkwood · 2024-05-09T03:51:36 1715226696

What if someone took your answers, put them in a book, claimed they wrote everything themselves, and then sold the book for money?

nox101 · 2024-05-09T04:38:59 1715229539

Then they'd likely get sued because the license for the answers are CC-BY-SA, putting them in a book, claiming they wrote everything themselves, and selling them are all against the license.

On the other hand, if they read my answers and they wrote a book about what they learned (not copied). There'd be no issues

LouisSayers · 2024-05-09T04:19:08 1715228348

Well if the book was doing well, I might clone it and sell a few copies myself

Let's be real, SO is a troubleshooting site. It's not our personal collection of code or project sources.

I don't expect to be paid when someone asks me for directions, and I'm sure lonely planet didn't source their guides 100% organically either.

webdood90 · 2024-05-09T04:20:51 1715228451

What if I read your answers, claimed I learned everything myself, and sold my skills to a company for money?

dorkwood · 2024-05-09T07:47:24 1715240844

That would be ok.

JimDabell · 2024-05-09T05:50:20 1715233820

That would be a very different scenario. Learning isn’t copying, but that is.

squigglydonut · 2024-05-09T16:29:46 1715272186

You're being taken advantage of for a subscription product. It's one this to give to a community, but it's wrong for an enterprise to come in and capitalize on the value of it. It's the equivalent of going into an animal sanctuary, slaughtering all the animal, and selling their pelts.

popcorncowboy · 2024-05-09T07:43:56 1715240636

Your position lays bare the new and industry-destroying economic problem introduced by opaque-data-source LLMs. The economic value provided by the originator is captured fully and completely behind rentier models.

Beware the ease and convenience of all that "insane value". This way lies digital serfdom.

juleiie · 2024-05-09T13:16:29 1715260589

I would be fine with it if the ‚AI’ in question was free and bonus if open source.

However it is a product of a next monolithic behemoth company that earns money on it and I suspect has nefarious motives to make profit.

That’s the whole key thing for me that makes me feel scammed. That and not asking for permission.

Future true AI would be potentially bigger than nuclear fission with all the consequences. Handling this in a petty capitalistic way makes me think the outcome will be close to fallout games that were supposed to be only an exaggeration.

Those companies must stop behaving like thieves. In fact it is a literal theft.

random_cynic · 2024-05-09T02:20:56 1715221256

ChatGPT provides far more value than StackOverflow currently. It's not just trained on SO answers but all of the manuals/help pages, Github issues and forum posts. In addition you can continue a conversation. No rigid format or gatekeeping like stackoverflow. I don't see a real use case for Stackoverflow now. If I want to ask humans, Discord/IRC channels are far better option.

xpe · 2024-05-09T04:37:11 1715229431

> No rigid format or gatekeeping like stackoverflow.

What bothers about gatekeeping? I could guess, but I'm asking so you say it out loud. Then you can compare it against other problems, such as moats (competitive barriers).

OpenAI spent something like $3M on training GPT-3. This is a pretty big moat. But almost certainly more valuable in dollar terms is the first-mover advantage which provides millions of human eye-hours used for RLHF.

I wouldn't be so eager to trade the gatekeepers you so fear for even an openly available chat service that is happy to automate away as much information work as possible.

The Stack Overflow model is (was) pretty darn good -- people help each other out, the company made money, some people got noticed for their skills, products got build faster and better (on the whole, I hope). Contrast the human-generated content era to what we have now which appears to be the machine-ingesting content era. There are legions of lawsuits against companies scraping data without permission and/or attribution.

juleiie · 2024-05-09T13:34:48 1715261688

Those companies know it is unethical at best but make quick bucks before the laws and suits follow. It’s the Wild West era and they found the gold.

If it is unregulated then it will be exploited to the maximum profit, consequences be damned.

random_cynic · 2024-05-09T09:17:22 1715246242

> I wouldn't be so eager to trade the gatekeepers you so fear for even an openly available chat service that is happy to automate away as much information work as possible.

Don't flatter yourself. People want to solve their problems so that they can build what they want to. They don't have time for shenanigans from internet jerks who get their validation from imaginary internet points.

eVeechu7 · 2024-05-09T02:43:10 1715222590

It can't reliably cite its source for an answer.

random_cynic · 2024-05-09T02:46:33 1715222793

Hardly matters for Stackoverflow like questions if the provided solutions work/solve the problem you're having. Which for me happens majority of the time (with GPT-4 not the free version).

paulryanrogers · 2024-05-09T02:57:24 1715223444

If you copy-paste solutions from SO then please at least cite your sources and their license (CC-BY-SA).

lannisterstark · 2024-05-09T04:14:26 1715228066

You might not want to hear this but no one does this. Should they? probably. But most people don't use Ctrl+C, Ctrl+V in the first place for SO answers.

muxator · 2024-05-09T06:30:38 1715236238

Just a single data point, but when I copy & paste a snippet from Stack Overflow, I always add a comment "// source: https://stack overflow.com/questions/xxx#yyy".

I both find it respectful of who wrote the answer in the first place and useful for future users of the code: the Stack Overflow answer often provides context and explanation for what would otherwise be an obscure piece of code.

Pretty darn useful if you ask me: those who want to have more information can follow the link, casual readers can skip it, and the whole process if fair to the author.

lolc · 2024-05-09T12:56:31 1715259391

I don't think I've ever copied enough from Stackoverflow for copyright to become relevant. Rarely more than one line verbatim.

It embarrasses me to think that somebody should feel obliged to cite me when they use one of my answers. I don't know how to take the partnership with Openai though. They bill me when I use their service, it's not collaborative like Stackoverflow.

random_cynic · 2024-05-09T09:27:44 1715246864

No one should copy paste any solutions from anywhere. FWIW, 99% of the content in SO is hardly "original", mostly copy-pasted themselves from previous solutions or original user guide/manuals.

paulryanrogers · 2024-05-09T11:44:05 1715255045

In general I'd agree that it's best to use answers just as a guide. That said, I wasn't trying to pass judgement, just ask attribution which is a best practice and often required by the license itself.

phatfish · 2024-05-09T07:36:00 1715240160

Id rather not go round in circles while ChatGPT feeds me bullshit information. When this happens i go to Google and read a SO answer with the correct information and also get an informed discussion around the subject.

For the easy answers LLMs are fine, but I usually want an answer to a niche issue or edge case, where LLMs have to be constantly told they are plain wrong, before getting to something resembling an answer.

random_cynic · 2024-05-09T09:20:18 1715246418

[flagged]

dang · 2024-05-09T19:46:51 1715284011

You've been breaking the site guidelines so often and so badly that I've banned this account:

https://news.ycombinator.com/item?id=40306506

https://news.ycombinator.com/item?id=40306495

https://news.ycombinator.com/item?id=40304632

https://news.ycombinator.com/item?id=39686999

https://news.ycombinator.com/item?id=39406496

https://news.ycombinator.com/item?id=38374129

https://news.ycombinator.com/item?id=38327047

If you don't want to be banned, you're welcome to email hn@ycombinator.com and give us reason to believe that you'll follow the rules in the future. They're here: https://news.ycombinator.com/newsguidelines.html.

servus45678981 · 2024-05-09T08:26:46 1715243206

No it doesn‘t. It is overly censored

croes · 2024-05-09T00:48:34 1715215714

Maybe a low price for you but not for everybody.

wrsh07 · 2024-05-09T01:08:53 1715216933

ChatGPT serves 3.5 for free. You can run llama locally for free. Lmsys is free.

croes · 2024-05-09T08:48:40 1715244520

You think that will stay this way?

It will either become paywalled or full of ads.

wrsh07 · 2024-05-09T18:11:35 1715278295

I listed 3 things that are free.

Personally, I don't think ChatGPT will start running ads in the next ten years. However, let's assume that it does.

Lmsys is for research, I suspect if it runs ads it will be like godbolt (a small ad from a relevant sponsor).

Llama 2 and 3 can always be run locally without ads. I make no claims about future versions.

fabian2k · 2024-05-08T21:48:32 1715204912

The OpenAI partnership doesn't really affect the core issue here around users deleting their content. That has never been welcome on Stack Overflow and when noticed usually was reversed. This is in accordance with the license as far as I understand the legal aspects, and in general it makes sense for me as it ensures that the content stays useful.

The content is also CC-BY-SA, which is much better than what you get on essentially every other large site that hosts community content. But the same license also means that you cannot remove that content again, even if Stack Overflow would allow that anyone else can scrape it or download it before it is deleted and reproduce it according to the license.

Users still can remove their name from their posts, and if they write personal details those can be redacted as well. But you can't remove good quality content from the sites later, that is likely to be reverted.

Hizonner · 2024-05-08T22:04:31 1715205871

The problem isn't that Stack Overflow is allowing people to scrape the content. The problem is that Stack Overflow is preventing some people from scraping the content, in order to collect money from others. And, incidentally, passing zero of that money on to the people who actually created the content.

(Nearly) none of the people who are presently pissed off would have complained if Stack Overflow had continued to allow all comers to scrape the content and train LLMs on it, nor if Stack Overflow had released the entire finished collection of content under the same CC-BY-SA license that was demanded of each contributor.

With the OpenAI partnership, and similar shenanigans leading up to it, Stack Overflow is relying on obscure technicalities to violate the essential spirit of the original deal.

webstrand · 2024-05-09T03:56:37 1715226997

Isn't the data publicly available? https://archive.org/details/stackexchange

mattstir · 2024-05-09T21:27:10 1715290030

The publicly-available archives released by Stack Exchange are updated roughly quarterly and have the attribution requirements as specified by CC BY-SA + the Stack Exchange ToS.

The article makes it sound like OpenAI is using the API though, rather than the archives. The API and live sites forbid scraping within the acceptable use policy, as seen here: https://stackoverflow.com/legal/acceptable-use-policy

hedora · 2024-05-09T00:29:41 1715214581

Given the CC license, and the fact that contributors can apparently code, they should scrape the content and be done.

Of course, that’d mean bypassing the scraper blocker. This article is a decent starting point:

https://stackoverflow.com/questions/66413511/how-to-avoid-be...

lamontcg · 2024-05-09T02:03:35 1715220215

> And, incidentally, passing zero of that money on to the people who actually created the content.

I mean that is basically SO's entire business model.

People do tons of work for free and SO runs the service and monetizes it.

theendisney · 2024-05-08T22:13:54 1715206434

I dont get how you can release something under anything other than all rights reserved without identification. We need to be able to persecute you in case you are not the author. Or is it that i may republish anything under any license?? It could be that the platform licences it in the toss but with cc are they not obligated to make it available without obstructie?

Repulsion9513 · 2024-05-09T06:50:10 1715237410

Prosecution and persecution are two different things. Persecuting anyone is not a good time :)

If you need to prosecute the person, there are established procedures for that: DMCA, or ultimately a lawsuit over the infringement. That you didn't identify yourself publicly on the site does not make that impossible. In fact the point of the DMCA was to make it easier to handle this - because if the provider doesn't comply with your DMCA, you can sue the provider.

shkkmo · 2024-05-09T02:33:51 1715222031

Requiring indentification to publish so that copyright is protected would be massive overreach and this sort of thinking is why I think copyright is a dangerous concept that needs to be sharply curtailed, not expanded to cover AI training.

In practice, the safest course is to not use content from untrustworthy sources in ways that require a license (aka in ways that are not fair use in your applicable jurisdictions).

theendisney · 2024-05-09T03:31:17 1715225477

I think by default you just cant use things? Who thought that was a great idea i dont know. We must be missing an enourmous chunk of progress.

Every juristiction its own idea of fair use? Thats just hilarious?

I never really thought about peoples privacy either but at first glance you seem to be right.

Do you have any solution to the puzzle? People are quite attached to the concept and many build their house on this soil. Appeal to tradition?

Brian_K_White · 2024-05-09T09:08:04 1715245684

StackOverflow are violating the SA part of CC-BY-SA by selling special access to the CC-BY-SA content to one party and blocking others from the same thing.

OpenAI are violating both BY and SA but that's a seperate issue.

Everyone who contributed work, did so under terms that the work was free for all, not a resource that one party can sell to another party who then sells to end users. Those end users were meant to have it directly without having to pay openai or anyone else, and if any bulk/scraping access is allowed for anyone like openai, everyone else has the right to the same thing for no more than a "shipping & handling" charge to cover the network & employee cost to physically deliver the data.

What are StackOverflow selling, and/or what exactly are OpenAI paying for? What is the goods or services that is traded for the money?

There are many possible answers but I see no answer that doesn't ultimately one way or another wind up resolving into a violation of one or more terms of CC-BY-SA by both StackOverflow and OpenAI.

rich_sasha · 2024-05-09T11:51:25 1715255485

I guess the core issue was always having a for-profit company preside over a "free" product. Clearly, they have to make money, and they aren't bound by ethics of open source. Contributors may feel like they are contributing to a FOSS project, but they aren't. What Stack exchange is doing is probably legal (?) and that's the bar they need to clear. The contributors aren't stakeholders and SE only needs to retain enough of them to sustain themselves commercially.

There's been more than a decade of companies now providing something for free, while they figure out how to monetize it, and these always scare me a little, because its always going to end up like this. Users of Facebook becoming eyeballs for ads, GitHub users providing free data for LLMs, SE selling data to Open AI...

If a product is free, then you are the product. And if you don't know how you are monetized, you're going to be disappointed by it sooner or later.

squigglydonut · 2024-05-09T16:35:02 1715272502

Harsh but true. I think what stings about SO is that developers are the ones losing here. I think this will prompt less open source and encourage more private work. I hope people are seeing that they are being take advantage of on many fronts.

pornel · 2024-05-08T22:45:11 1715208311

StackOverflow has always been quite open that they're primarily building a dataset for SEO, rather than being a user-centered website. So I don't feel this deal changed much. SO users are still serfs building them a dataset for sale, only the buyer has changed.

LLMs are faster and infinitely more patient than interaction with StackOverflow, so I don't expect SO to survive for long. They're in crisis regardless whether they sell to OpenAI or not, so they may as well get something out of it before they're decimated.

theteapot · 2024-05-09T01:28:45 1715218125

I think they're in crisis because they sold out there community not because LLMs are better. As a developer, if you offer me StackOverflow vs ChatGPT, I'd take StackOverflow any day of the week 100x over.

BeetleB · 2024-05-09T02:06:20 1715220380

I'm in the opposite boat. Going through Stackoverflow answers has become quite a chore.

For simple things GPT gives me the correct answer most of the time. And even when it's won't it's quicker to discern it is wrong than trying to parse a given SO page.

Of course I still use SO for more complex questions.

As a rule, if I can quickly find the answer via SO, then chances are GPT will give me the answer more rapidly.

Jimmc414 · 2024-05-09T01:45:08 1715219108

Respectfully, how would you know if you never use ChatGPT?

theteapot · 2024-05-09T01:52:20 1715219540

I said I don't use it. I didn't say I've never used it. In my experience browsing SO is way easier, more accurate, more precise, more controllable, navigable, and ... gives attribution.

lannisterstark · 2024-05-09T04:17:25 1715228245

For some reason , but a lot of of the answers here seem to care more about "but tell em /I/ solved it" re: attribution rather than helping the user. Somewhat egoist or some such? ( and I don't mean it as an aggressive tone, just ESL so don't know how to say it othrewise)

If I license something as MIT, I personally don't care who uses it for what purpose, hell I don't even care generally that they attribute me. I put it out for people to use. But maybe that's just me.

fragmede · 2024-05-09T02:17:16 1715221036

you spend more time on SO than me. without looking, can you name three stack overflow contributors? I can't.

selcuka · 2024-05-09T03:03:35 1715223815

I was offered a job a few years ago by someone who saw my Stack Overflow answers, does that count? I don't see something like this happening with ChatGPT.

phatfish · 2024-05-09T07:52:55 1715241175

I can do two, Jon Skeet (C#) and S. Lott (Python) are names I remember for providing great answers.

theteapot · 2024-05-09T02:31:19 1715221879

lannisterstark · 2024-05-09T04:25:06 1715228706

>As a developer, if you offer me StackOverflow vs ChatGPT, I'd take StackOverflow any day of the week 100x over.

Really? Hm, I wouldn't. I can use nuance and clarify my answers and have a respectable back and forth (GPT-4 doesn't call me names when I mess up or say something dumb) and arrive at an answer.

JimDabell · 2024-05-09T05:54:46 1715234086

> GPT-4 doesn't call me names when I mess up or say something dumb

I’ve heard this accusation a lot, but I don’t think I’ve ever seen it happen. People call you names on Stack Overflow? Where?

lannisterstark · 2024-05-09T07:37:48 1715240268

>Where?

-50

Marking duplicate. "You should attempt searching before asking such obvious questions."

This question has already been answered here: < https://news.ycombinator.com/item?id=20861356 >

Closed 3 seconds ago.

----

or some such ;) You may not come across it personally, but that doesn't mean it doesn't happen. SO is successful as a QA platform(or was anyway) despite this shortcoming, not because it is a feature and it doesn't happen. If a lot of people are talking about the same thing, maybe people should at least pay cursory attention to the issue rather than "No, it doesn't happen" (Not aimed at you, but there are absolutely comments like this every time this gets bought up.)

JimDabell · 2024-05-09T08:19:35 1715242775

You linked to a discussion of about a hundred comments. I skimmed it but didn’t see name calling. Can you be more specific?

fragmede · 2024-05-09T04:28:51 1715228931

Are you sure that's not an X vs Y problem???

lannisterstark · 2024-05-09T04:41:49 1715229709

I actually have no idea what you mean. can you clarify pls?

fragmede · 2024-05-09T05:03:25 1715231005

It's a common non-answer on stack overflow.

https://meta.stackexchange.com/questions/66377/what-is-the-x...

lannisterstark · 2024-05-09T05:10:02 1715231402

lol that makes sense, thanks.

Kiro · 2024-05-09T07:37:14 1715240234

And I'd take ChatGPT any day of the week 1000x over. That doesn't mean anything.

nox101 · 2024-05-09T04:02:53 1715227373

> SO users are still serfs building them a dataset for sale

That is a very negative spin.

Users get access to other people's answers for free. They get that free service and are required to contribute nothing. Those that do contribute do it to help other users. S.O. isn't doing anything bad. They're providing a free service where everyone wins. Users get answers. Answerers get to help other humans at scale. S.O. makes a little money.

As for the dataset, it's been available under CC-BY-SA for years. The entire database is backed up and made available here for free every month.

https://archive.org/details/stackexchange

There are even free tools to query it here

https://data.stackexchange.com/

juleiie · 2024-05-09T13:56:13 1715262973

Why a company makes money on someone’s free work? This is obviously not okay. We have even more egregious examples but this is certainly one of them.

nox101 · 2024-05-13T04:27:52 1715574472

The company is paying the people working by providing a free service.

It's like youtube. Youtube provides a free hosting of your videos. In exchange they monetize them. You're free to host them on your own servers. That will likely cost you way more than putting them on youtube. So you're getting something from them. You're also getting their advertising service to monetize your videos. You could do it yourself, hire a bunch of people and try to get companies to put ads on your self-hosted videos. Again, unless you're wildly successful it's unlikely you'll be able to do that and make a profit. So, youtube is effectively paying you.

Same with Stack Overflow. They're providing the servers, the bandwidth, etc. It costs them $. They're providing that service to you.

progval · 2024-05-09T06:38:54 1715236734

> StackOverflow has always been quite open that they're primarily building a dataset for SEO

Do you have a source / more details about this? What good is SO's content for SEO?

keefle · 2024-05-09T04:26:05 1715228765

Side related question: are there content licenses coming up that are similar in spirit to what the GPL is but targeted at AI training? (E.g. if this piece of content was used in training an AI that was to be used commercially, the AI's weights must be published)

progval · 2024-05-09T06:36:30 1715236590

The argument AI companies make is that LLMs are not derived works of their input, or is fair use. So according to them, the input's license does not matter.

postepowanieadm · 2024-05-09T06:59:32 1715237972

Do you have any sources about that, I'm just curious:)

_xivi · 2024-05-09T07:15:03 1715238903

https://news.ycombinator.com/item?id=37780199

HappyPanacea · 2024-05-08T22:04:25 1715205865

I suspect they will fail to emphasize the ShareAlike property of CC BY-SA 2.5/3.0/4.0 which is incredibly strong - "ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original". This is an incredibly wide and vague definition, especially "build upon" which will be unattractive to many users.

nox101 · 2024-05-09T04:43:47 1715229827

I suspect, if ChatGPT quotes an answer or a snippet it will show attribution and a license for snippet. If it instead only uses the knowledge it gained from the answer/snippet and writes it's own answer, then, just like a human, it won't attribute

rodgerd · 2024-05-09T01:15:07 1715217307

[flagged]

paulryanrogers · 2024-05-09T03:03:01 1715223781

It was especially hilarious to watch the CTO of OpenAI get asked if they scrape YouTube, and could not say yes or no [0]. Possibly one of the most important sites in the Internet, and they're CTO claims ignorance.

[0] https://www.reddit.com/r/ChatGPT/comments/1bfa7s3/openai_cto...

extheat · 2024-05-08T22:54:41 1715208881

I am thankful we have LLMs so we don't have to deal with SO. Ideally, as little as possible. SO can be a pretty toxic place filled with elitism and care for procedure over actually helping people, which is not totally unreasonable from their standpoint but it's definitely not what people are visiting the site for. Quite ironically, one of the major complaints I get is that LLMs output wrong answers here and there, ignoring that many of the answers on SO are also completely wrong or irrelevant to the core question being asked. And mind you, also outdated (I regularly have to click through the sorting to make sure answers are actually still relevant).

If we could merge the two to get the best of both worlds, and have LLMs that know how to write well and are validated by humans on the site, that would be great. Maybe not great for the folks looking to accrue internet points but absolutely great for users.

tyingq · 2024-05-08T23:00:36 1715209236

That's great for now. It's not clear to me, though, where LLM's will get their training data from here forward without ingesting lots of LLM generated code and answers and eating it's own tail.

BenFranklin100 · 2024-05-09T03:13:41 1715224421

Didn’t you get the memo? LLM’s either already are capable of or just a step away from being able to reason so no need for human generated training data in the future.

Or at least that’s what 3/4 of HN commentators believe and all AI CEOs want you to believe.

squigglydonut · 2024-05-09T16:36:03 1715272563

yea that's bullshit. They are capable of stealing intellectual property though.

cle · 2024-05-09T00:33:39 1715214819

OpenAI and Microsoft get TONS of user-written code w/ quality feedback, OpenAI through ChatGPT and Microsoft through VS Code and Copilot.

prng2021 · 2024-05-09T02:19:03 1715221143

That's only now and in the near term future. If AI is actually successful, every year the amount of human written code will decrease. That's the whole point of this.

kevin_thibedeau · 2024-05-09T03:26:14 1715225174

They'll get it from human generated archives from before the singularity.

tyingq · 2024-05-09T11:09:06 1715252946

Well, yes, but software doesn't hold still, so the answer for "How to do xyz in whatever-replaces-reactjs" might not be great.

LordShredda · 2024-05-09T03:20:21 1715224821

Does it matter if stack overflow is toxic or not? You're there to ask a question and get an answer. If you ask wrong, you get corrected. Tough moderation makes search much faster and better for other askers.

You're there to ask for help not make friends. They have to be polite, but not gentle

lannisterstark · 2024-05-09T04:18:40 1715228320

Yes it does. If I am belittled instead of people asking clarifying questions so I can learn, I'm much less likely to think better of said people or platform, or use it.

joquarky · 2024-05-09T18:40:53 1715280053

This is what killed perl

jimjimjim · 2024-05-08T23:18:29 1715210309

What you see as elitism is mostly simple curating. You can't store everything because it makes retrieving value from the store that much more difficult. It's the same with wikipedia and other public content repositories. People cry elitism and gatekeeping but without curation you eventually end up with a haystack of mediocre looking for a needle.

_gabe_ · 2024-05-09T04:56:22 1715230582

This “curation” is what is killing SO. Software is soft. It changes. There is no “one true answer for all time”. It’s honestly sad how many times I search for an answer, only to see the exact question I’m looking for closed as duplicate, then when I look at the “duplicate” I see that it’s an out of date answer.

Stack Overflow could have solved the problem of duplication so many ways. Why not categorize and bucket duplicate answers? They could have even had yearly recurring questions with the most up to date answer! Why not add beginner/hobby/expert rankings to questions so that the people answering don’t get sick of seeing beginner questions all the time?

There is so much SO could have done, instead they rested on their laurels and now they’re left with an out of date repository. What use is a curated repository if it will only help me solve problems with solutions from a decade ago?

fzeroracer · 2024-05-09T06:56:38 1715237798

It sounds like what you want is Quora. You can go ahead and use Quora for all of your software question needs.

phatfish · 2024-05-09T08:00:56 1715241656

Who says the solutions from a decade ago are not still correct or the best way to solve a problem? Just because ChatGPT regurgitates something today with the words moved around doesn't mean it contains "new" insights.

dleeftink · 2024-05-09T01:45:47 1715219147

I agree in part, but why aren't other moderated outlets where users can ask technical questions given the same label? Reddit, Quora and HN are also curated, are content removals on these site taken as elitist? Even if these places are less heavily moderated, I have no trouble surfacing relevant answers using any search engine's in-site search.

I am not talking about QA quality on any of these sites here, but the elitist stigma that has seemingly followed SO for so long.

[0]: https://meta.stackoverflow.com/questions/262446/are-we-being...

squigz · 2024-05-09T02:11:03 1715220663

> why aren't other moderated outlets where users can ask technical questions given the same label

The exact label aside for a moment, reddit and HN mods often face backlash for their actions. But beyond that, Wikipedia and SO stand out in this regard because of their transparency regarding the curation. Mostly, reddit curation happens in the background, without much explanation. SO and Wikipedia basically spell out their actions and reasoning.

Another difference is that with reddit and HN, you have no real recourse. At least with Wikipedia (I'm not too familiar with SO policies in this regard) you can appeal decisions, open discussions about policies, etc.

I have to agree with GP - people often mistake the 'bureaucracy' of sites like Wikipedia and SO as something unnecessary that the editors force on everyone, but the fact is, it's necessary to create and maintain a high-quality repository of information.

dleeftink · 2024-05-09T02:41:29 1715222489

> SO and Wikipedia basically spell out their actions and reasoning

You're able to appeal on SO as well. It's interesting to think about a situation where moderation decisions would be more in 'the background', as you say (like Reddit/HN), and whether this takes away from the perceived 'elitism' some moderation practices are accused of.

squigz · 2024-05-09T03:05:00 1715223900

In my experience on the above sites, and as a (small) community manager, it absolutely plays into it. A lot of people just instinctively respond negatively to displays of authority.

On the other hand, I think it's an important aspect of a community/platform if the goal of that platform is to be transparent and open, which I think is an important aspect of SO and Wikipedia, and I hope more platforms would adopt that view. I think whatever "elitist" perception such platforms have to suffer is well worth having high-quality, open platforms.

(I will say that no platforms are perfect of course, including SO or Wikipedia; there's plenty of criticisms to go around about specific policies and decisions. See: TFA :P)

Shog9 · 2024-05-09T04:57:19 1715230639

This is an insightful observation, and a problem we struggled with for years on Stack Overflow: if you keep moderation quiet and anonymous, there's a lot less criticism, seemingly less hurt feelings... But also very little correction. The Star Chamber works great until corruption sets in; finding a good balance between secrecy and transparency is a challenge.

For years, moderators signed their names to messages like the one cited in the article. After one too many cases of a volunteer being called at work or having their family harassed or sent a suspicious package in the mail... That particular bit of transparency was eliminated - the cost was too high for the limited benefit. OTOH, it used to be very difficult to find your own deleted posts but that has slowly gotten better (including visibility into who deleted them) - turns out the benefit there was substantial (identifying wrongly-deleted posts & curbing over-enthusiastic curators), while harassment has been mostly limited to occasional grousing.

squigz · 2024-05-09T15:52:27 1715269947

> After one too many cases of a volunteer being called at work or having their family harassed or sent a suspicious package in the mail

This is why I'll never use my real name casually on the Internet, and why the idea of widespread identity verification on the Internet scares the crap out of me.

Repulsion9513 · 2024-05-09T06:53:59 1715237639

I actually strongly prefer Wikipedia to SO, on Wikipedia the old now-wrong content can just get edited out, on SO you'll have to dig through all the 300-point popular answers from 2012 to find the new answer that says "yeah none of that is right anymore, instead do this"

SO is far from curated, I guess is my point

struant · 2024-05-09T02:15:50 1715220950

Their curation blows. The whole premise of having a canonical answer to a question is dumb. Most programming languages and libraries are always in flux. The whole nature of many questions changes over time.

StackOverflow is a tyranny of mediocrity. It is a bunch middling programmers shitting on newbies, and driving away experts because you get severely punished for not being mediocre.

I had a question closed as a duplicate for being too similar to another question that I directly cited in my question as being sublty different and not applicable. (Because I anticipated some idiot closing my question...and they went and did it anyway)

IAmNotACellist · 2024-05-09T03:42:14 1715226134

>I am thankful we have LLMs so we don't have to deal with SO. Ideally, as little as possible. SO can be a pretty toxic place filled with elitism and care for procedure over actually helping people

There needs to be a term for this. Perhaps "The Wikipedia Effect."

Ukv · 2024-05-08T21:37:17 1715204237

From a search, the message seems to have been in place since at least 2017[0] and I'd suspect is automated on detection of mass-deletion.

I can understand the reason for the policy (in some ways SO functions more like a wiki than a forum) and it doesn't seem to have been introduced to quell the protest against OpenAI.

[0]: https://meta.stackexchange.com/a/296822/287788

arp242 · 2024-05-08T22:54:32 1715208872

dupe:

StackOverflow is banning accounts that delete answers in protest against OpenAI - https://news.ycombinator.com/item?id=40297027 - May 2024 (103 comments)

witoong623 · 2024-05-09T06:06:52 1715234812

Thank to people who delete their answers, now I have to pay OpenAI to find answers they already scraped. Talking about helping OpenAI making more money :(

greenyoda · 2024-05-08T21:35:56 1715204156

Earlier discussion: https://news.ycombinator.com/item?id=40297027

astrodust · 2024-05-08T22:55:02 1715208902

It is nearly impossible to delete an accepted answer you don't want to have any more. I've had several which are wildly out of date and incorrect now and I don't want to update them, but the mods refuse to remove them.

jazzyjackson · 2024-05-08T23:19:48 1715210388

can't you just comment on the post informing people who land there that its out of date? id prefer that over following a cached link and hitting a 404

leumon · 2024-05-08T23:09:10 1715209750

You can request to dissociate them from your account. (better then nothing)

blibble · 2024-05-08T21:31:52 1715203912

there are several nice libraries that allow you to generate plausible sounding gibberish

this one is particularly nice and easy to use: https://github.com/jsvine/markovify/

you give it a file of existing text and it generates complete rubbish that would pass most automatic filters

ceejayoz · 2024-05-08T21:43:58 1715204638

These are far more likely to come to moderator attention by user flags on the edited posts.

blibble · 2024-05-08T21:52:13 1715205133

for the AI to be useful it has to be continuously updated with new good data

so add small bits of rubbish slowly over time, and don't even contribute again

it'll take a while to completely destroy the AI business model, but we'll get there

dylan604 · 2024-05-08T22:49:57 1715208597

> but we'll get there

at some point, it'll be too late. the horse has already left the barn.

besides, if the site owner makes a deal with the devil, there's nothing you can do other than quit using the site. people are still using social platforms more than ever, so stopping isn't going to happen.

the more likely to happen is that accounts deemed to be polluting the waters will just get suspended with no recourse to have it re-instated.

blibble · 2024-05-08T23:25:14 1715210714

> at some point, it'll be too late. the horse has already left the barn.

I don't think this is true: the technology is useless unless it parasitises new knowledge continuously

it sows the seeds of its own destruction by reducing the value of past and future contributions to zero

> the more likely to happen is that accounts deemed to be polluting the waters will just get suspended with no recourse to have it re-instated.

so this is also perfectly acceptable: once they've banned the top 20% the site effectively becomes read-only, and the AI knowledge previously parasitised from it atrophies with no replacement

dylan604 · 2024-05-09T01:05:28 1715216728

Known knowledge doesn't disappear. Once it knows how to apply an FFT and when, it doesn't need to continue to read about it. It's not a human needing continuing education. Once it knows that Henry VIII had many wives, it doesn't need to keep reading that he had those wives.

Sure, if something new happens, then it's not like SO is the only place it's scraping for new information. If you honestly think that you/we will get to a place to block all scraping, I will just politely disagree.

fzeroracer · 2024-05-09T01:28:33 1715218113

> Once it knows that Henry VIII had many wives, it doesn't need to keep reading that he had those wives.

That's actually incorrect, it needs to constantly ingest new data. If it ingests enough data (from other LLMs that are hallucinating, for example), then suddenly when it has enough bad data it'll start telling you that Henry VIII was a famous video game on the Sony 64.

It has no concept of 'truthfulness' beyond the amount of data that it can draw correlations from. And by nature LLMs have to ingest as much data as possible in order to draw accurate results from new things. LLMs cannot function without parasitizing off of user generated content, and if user generated content vanishes then it collapses in on itself.

williamcotton · 2024-05-09T02:03:51 1715220231

So fill the entire internet with factually incorrect, useless knowledge? This would be a good thing?

fzeroracer · 2024-05-09T03:11:20 1715224280

Well, that's already happening. Google search has become increasingly useless thanks to SEO-focused AI-generated schlock. It's the inevitable outcome of LLMs. Sites have an incentive to hide that they're AI generated and LLMs have no real way to filter for ingested data made from other LLMs. The only difference is how long the ruse can be kept up.

williamcotton · 2024-05-09T03:48:18 1715226498

So you want to pollute the commons just as the people filling the web with SEO-focused AI-generated schlock? Do you feel justified in polluting the commons to serve the ends you desire?

fzeroracer · 2024-05-09T05:50:16 1715233816

Do you actually have a solution to the problem of companies using LLMs to steal from other people and repurposing it as their own, other than figuring out ways to ensure that LLMs suffer for doing so? And frankly as I mentioned, LLMs are already polluting the commons; you're not offering any solution on that front either other than asking people to keep supplying it with fresh data so that it doesn't poison itself.

williamcotton · 2024-05-09T11:57:29 1715255849

Do you realize that your stance is merely your opinion? Does everyone agree that training ANNs is stealing?

dylan604 · 2024-05-09T02:45:40 1715222740

Scorched earth policies are always en vogue, and easy to offer as a knee jerk reaction. They do nothing for actually making forward progress in the conversation though.*

*However...there are times where the best solution is a match and some gasoline.

williamcotton · 2024-05-09T00:31:12 1715214672

What's your stance on a future open source model that is as capable as any commercial models?

Also, I'm curious, do you consider LLMs to be incredibly error prone and untrustworthy?

Or do you think they are going to replace software developers?

ghnws · 2024-05-09T10:31:18 1715250678

Sounds about as succesfull as people destroying social media by removing or editing their posts. Only a tiny minority actually do anything like that.

sva_ · 2024-05-08T23:10:25 1715209825

It sounds like the measure of preventing users from deleting/editing their posts contradicts EU laws?

zarzavat · 2024-05-09T05:52:08 1715233928

Only if they are putting their own personal information in their answers, which I assume they are not.

squigglydonut · 2024-05-09T16:39:09 1715272749

Human answers are personal answers.

ProjectArcturis · 2024-05-08T22:51:05 1715208665

Is there anyone who makes stackoverflow their first stop for programming questions anymore?

int_19h · 2024-05-08T22:56:44 1715209004

Google ranks it pretty high, so it is the de facto first stop for many.

ProjectArcturis · 2024-05-08T23:09:40 1715209780

Ah. I've found various LLMs are much easier to query and generally nicer than SO posters, so it's been quite a while since I've needed to visit SO. I assumed most people had made a similar journey.

jazzyjackson · 2024-05-09T01:13:45 1715217225

ur in a bubble, harry

yesiamyourdad · 2024-05-09T01:45:46 1715219146

Not so much anymore though. I've seen over the last year that SO ranks lower and content farms like geeks4geeks, Programiz, etc. are getting much higher in results.

notatoad · 2024-05-09T00:53:09 1715215989

i still google things, mostly out of habit. but i'd say half the time i visit stack overflow, the answer i get there is either outdated or too opinionated to be useful and i end up going to chatGPT.

juleiie · 2024-05-09T13:43:38 1715262218

It’s probably time for a pro publico bono stack overflow alternative. When money is involved those things tend to destroy themselves sooner or later.

And why would SO even profit from the hard work of thousands of volunteers. It doesn’t seem very ethical.

squigglydonut · 2024-05-09T16:38:06 1715272686

Given how the industry has treated tech workers, this will be exploited. I'm interested in joining a private group with or without profit motive, that is not open source.