1. It is pointless to compare total company value with an annual payment.
2. It isn't an exclusive license. There are dozens of companies training language models, and if they are all forced to pay the same amount then that's some serious cash.
I thought that's really cheap too. Tacking on "wiki" and "reddit" probably accounts for ~50+% of my google searches now. From that perspective $60M annually feels really low-- reddit search is probably a quarter of the value that google search offers me as a consumer.
Reddit data is already part of Google's search index for free, and there is zero chance they are going to block that. This added payment is specifically for training AI models.
Mozilla has a billion in the bank and half a billion annual revenue with a 19 year track record on that contract. They spend most of that revenue each year, but any that's left over finds its way into the bank growing that sum. I'd wager Mozilla is worth considerably more than a single year of revenue if that's what you were suggesting. At a minimum they're worth their cash on hand, and some multiple of that revenue, perhaps 10x or even 20x so that would put Mozilla's company value at least around $6B, and I'd expect the Mozilla Foundation could easily get that kind of money for MoCo+Firefox if it decided to sell. Mozilla doesn't have growth, but it's got regular and significant income and has had that for longer than most of today's SV companies earning similar amounts.
I'm not suggesting they are worth a single year of revenue.
But 80+% of their revenue, and probably the highest margin bit of revenue, comes from that Google payment. So whatever their value is, hinges on that contract payment. Thus, the multiple is interesting....where you might "compare total company value with an annual payment" to see what that mathematical relationship is. Especially given their rapidly declining market share.
Otherwise, Reddit is productizing its data, which was the obvious next step after closing public API access. This establishes a market price that at least one buyer was willing to accept, and shapes the terms for later deals with more customers.
Presumably, they think they can sell this kind of access many time over as one of their major revenue channels while also benefiting from the incoming traffic it should generate.
Google are already indexing Reddit, so this might reflect the incremental value of direct API access plus keeping everyone happy. I would be surprised if Reddit’s threats to block Google’s crawler were all that credible.
Reddit has a chicken and egg problem - nobody knows what the value of it's data even is. They've never sold it before. You can sell pet rocks for 1 billion dollars, but nobody will value them if there's been 0 sales. (ask vs bid vs last sale price).
I'm guessing reddit is rushing to sell their data so they can show that their data is worth anything at all.
I figure in this case, Google was the one that wanted to be public about it (to appear as relevant when AI is the hot thing). I could totally see other orgs happy to keep their data-source private.
I think in the race for AI dominance that human generated data is extremely valuable. The community is valuable only to the extent that is contains lots of human generated responses on specific topics.
I’m not in AI research so I know I could be completely off base.
Well, I've known several people over the years who have created a multitude of accounts for the purpose of selling them when they reach around 100K karma to various activist and political groups. He didn't care what they were doing with the accounts, but it seems there's been a black market for user data and user profiles with high Karma counts for a long time now.
The only things I can think of Google using that data for are for nefarious purposes.
If the value of Reddit was its data, then it should have encouraged more people to use it, instead of e.g. cutting off almost every Android and iPhone app.
My usage went completely downhill and my posts/comments as well
The flipside to this argument is that paying Reddit $60M to access data which was posted to the public internet for public viewing and consumption, by people who don't work for Reddit, is obscene and antithetical to how the Internet is supposed to work. Why on earth should Reddit get paid for this, especially if they're not giving the users who made the value a cut?
Hate on HN all you want, I’ve been without my ADHD meds (warning: the company “Done” is not technically a scam) and spending way too much time on each for the past few days, and I can say this for sure: at least people on HN pay heed to the concept of premises and careful, non-combative argument. Most responses on Reddit are “no, that’s dumb” or “yes, that reminds me of my metaphysical takes”…
Not only that, reddit hive mind is plain wrong in most of the cases. Plus in number of occasions the "le reddit investigation", "we did it reddit" excrement caused real-world issues for people that they were targeting, and those people were innocent.
Reddit is ok and quite cool for targeted discussion on targeted sub-reddits. But all the general subreddits visited by general population and everything that pops once in a while on the front page is a target for hive mind.
For HN comparison, there is a lot of "wrong" here too, but here you can find a cited academic study from one good American university that reveals most of the botfarms and fake news disseminators come from western sphere. If you try to claim on Reddit or anywhere on the internet that fake news champion is not Russia+China+whoever is evil, your entry will get buried.
Also, ask yourself who's the median redditor. For my country's national subreddit the median redditor is a high school kid from the capital.
Average person doesn't have strong opinion about some bit of data irrelevant to their life.
If there is a thing, and topic about it has reached the internet discussions, the participants will have an opinion regardless of their actual practical knowhow about the thing. They'll form their opinion otherways. Platforms like Reddit favour a master opinion due to score and moderation system, so one out of N wrong theories will surface as the master opinion.
In real life, if you ask bunch of random people about the thing they don't know, you get wildly different answers, and largely no-one will back up anyone there. Certainly not in enough force to push a confidently wrong answer up as the "people's opinion"
I don't understand why they chose Reddit, was 4Chan not willing? Seriously though, why not use the comment and discussion sections of Wikipedia and other sites that are not drowning in social insanity.
Source for Reddit hiring more than average tech company? I feel like every tech company scaled up in the bull market and then trimmed back down when hype subsided. For example, Google grew by 72,000 between 2019 and 2022 and has since shed about 20,000.
Reddit is IPOing next month. IMHO they cut public API access as a way to simplify and pump bookkeeping numbers to get a better listing price.
To be fair, “they were only as insane as everyone else” doesn’t necessarily mean that they didn’t vastly over-hire. I mean Jesus I got hired in Silicon Valley in 2022, it was crazy out there
Who knows if the "deleted" data can still be sold off. Sure you can't view it on the website, but nobody knows what Reddit will hand to Google. Then again certain laws passed recently might make such a thing illegal...
The second option I linked also replaces all comments and posts with random scrambled gibberish, not only "deleting" it. I would agree that deleted != actually deleted, but scrambling all content might actually work to prevent further siphoning
What incentive is there for posters on Reddit to continue to produce new content that is sold off for a profit? Sure they agreed to it in the TOS, but I suspect people might leave the platform.
Also how many of the posts are already generated by AI? Seems like a raw deal to pay for data that could be a large percentage of bot activity.
Why visit Reddit at all for information when you can just ask Gemini.
Does Reddit really block all search engines, or do "all" search engines abide by No Spiders? If it's freely available it's ripe for scraping, no matter what Reddit may say.
Reddit and Facebook Groups (the latter being much better) are the only 2 places I can search for reviews or recommendations to pretty much anything these days. Google & co. are impossible to use for "best" anything
When I first learned that AI companies are vacuuming up all internet content without any regard for permission, attribution or compensation for the content creator, I found that deeply immoral.
I figured they should pay for it. But now that they do (in this instance), I'm realizing this might be even worse. They can just buy the entire market in the same way they buy Google search users by paying Apple billions a year.
And still the actual content creator, a Reddit user in this case, is not compensated.
It's truly wild how lax regulation is. This is probably the most important technology ever created and we just let 2 companies have it all: the data and the compute.
Section 230 lets them pretend they don't own the content, shielding them from liability, yet they can simultaneously claim ownership enough to sell it. That's effed up. Either it's not theirs, it's mine or ours, in which case I get to sell it or at least get a cut, or it's theirs in which case they're liable for any illicit content and should face those suits. Which is it, Silicon Valley? Is my post yours or is it mine? You shouldn't get to have it both ways.
This was one of the reasons I no longer contribute to reddit. I don't post, comment, upvote nor downvote. I'll still consume it from time to time, something I try to stop doing. All I can do is vote with my money and my time and I try to spend it else where.
Hm, now I have to wonder if it'd be possible to create a tool that poisons your Reddit posts so they look fine to a human but completely trash LLM output.
I'm not saying we should do it, but it's a fascinating thought.
What about Markov chain babble? It would be interesting to see Bard learn to mimic people going far even as decided to use even go want to do look more like.
The lack of revocability, marginal temporal value, and downstream governance I think makes the prospect of more such data deals happening slim -- or at least, slim without regret.
Someone should create a bot that responds to highly upvoted comments and remind the redditor that Reddit is making money off the comment and the information is being used by AI and they aren't seeing a dime of it.
It was going to get scraped and used by AI anyway? Given that I'm not sure I care that much if reddit makes a bit of cash off it. Honestly, this is the future, if it's publically available on the internet it's going to be scraped and used by AI. The inevitable consequence of which is probably more conversation moves into private places like Discord.
But that's the future we're in, it's not gonna get rolled back.
That's arguably more true in advertising driven reddit than AI reddit - no one comment is going to be worth the most in a data scraper but eyeballs win in advertising.
I think the entire thing is scummy too, but its interesting to me how people think that advertisers were somehow less bad than this.
It's interesting how you're creating an argument out of thin air.
The article was about mining comments for AI, not about advertisement. Yet you brought in the topic of advertisement, and then created a fake argument about how people think advertisers are better than AI scraping. It's really weird that you did that.
They are using a service that allows them to make their opinion public and publicised and to know other people's opinions. Or does that have no value now?
I hope they get all versions of edited and moderated comments. An AI might be able to determine when the moderation was valid and when it was playing games.
I would love to see a leak of that dataset, sorted my username.
I wonder how well they identify posts written via AI (and what proportion of posts that is). Also, there is a ton of misleading astroturfing for some types of businesses.
> oh cool so the contributors of the data are all getting paid?
If people on reddit wanted to get paid for their opinions, they wouldn't be posting them for free on a publicly accessible forum that does not even require an account to view...
This same sentiment came up when StackExchange started to use/sell StackOverflow posts for AI training too - yet everyone arguing that position failed to recognize StackExchange has been making money off their free contributions since day one. Like, how else would StackExchange even exist? They can sell job postings now because of all the free contributions and community built up over the past decade - same with Reddit...
> If people on reddit wanted to get paid for their opinions, they wouldn't be posting them for free on a publicly accessible forum that does not even require an account to view...
People post on the internet 'for free' with the expectation that they're discussing ideas which have their own value, not providing training data. Yeah yeah ToS blah blah blah, but the goodwill that sites have built – even ones like HN – is on the premise that we're participating in the marketplace of ideas, not enriching shareholders.
This will ultimately be bad for the open internet.
I was enriching them by agreeing to view ads – not granting unlimited and perpetual use of my intellectual labor. One is a fair exchange of value, the other is highly exploitative.
Reddit sells ads first-party. They have their own Advertising Account Team, Advertising Platform and more.
The only reason Reddit is interesting to advertisers is the community. You can select specific subs to advertise to, interest groups, demographics, etc. You can get "laser" focused on who you advertise to, and offer compelling adverts to people highly likely to engage with your products/brands and convert into sales.
The only reason this works is because people like you have built those communities on Reddit by contributing your thoughts, opinions, experiences, expertise, time and more for free.
I'm fine with that. That's a fair exchange of value.
I'm not fine with them using that intellectual labor for any purpose whatsoever, indefinitely, forever. The only way to win at this point will be not to play; the downstream consequences will be the death of the open internet.
Part of what makes hackernews worth participating in is that the owners are not profit-maximizing and trying to extract the maximum possible value from the site.
It's a trade: y-combinator, by cultivating a group of talented technical contributors, gets to advertise hiring posts for free and, in some ways, autistically contributes to the community.
In exchange, we all get a place for thoughtful discussion with reasonably fair moderation.
It's a fair trade for those involved and closer to a non-profit model than Reddit which is clearly trying to extract as much value as possible from the content creators on their platform in exchange for giving them community/platform.
> Part of what makes hackernews worth participating in is that the owners are not profit-maximizing and trying to extract the maximum possible value from the site.
How do you know? Just because there is no ads doesn't mean they don't have any benefits from keeping HN thriving.
Also for me I like HN because of the content, not because I care about YC's motive. You are obviously different.
All platforms are moving to monetize content contributors. It’s worked amazing for YouTube, TikTok, and Twitch. X is trying it slowly.
Reddit could potentially do this but I don’t think they have a solid business model to support it yet. They are scrambling for money streams, not content. Although high end supported content drives ad views so who knows.
Attempting to rewrite history and racism regardless of the race involved makes me uncomfortable yes. Do you disagree that Google had an agenda when they programmed their AI to do the things it does?
This is an interesting comment because we readily anthropomorphise every entity and fail to see Google as an umbrella for a group of people that may make decisions in terms of a democratic or autocratic means depending on context.
In this instance "The Google" has expressed "an agenda".
I disagree that individuals might have been trying to wrong society but unfortunately it's the measurable results and direction of the company as a whole that exhibits the agenda and in this instance it's quite a blow to the ability for society to record a factual account of history so it can learn from its mistakes.
This seems to be the case for the latest tech. in general, suddenly upending the taxi industry, forcing small businesses to pay the gig-culture delivery tax, AI mass generating content and creating an imbalance in creative output.
I hope they're filtering it heavily and specifically. While Reddit does still have some valuable discussion in the more niche subs, I've noticed the main subs moving further left, hatefully and almost violently so.
I can't wait to search for who the Republican nominee is, and Google tell me to kill myself.
Not even allowing discussions of crime and city-wide threats is completely asinine and the mods reasoning is that everybody who visits r/toronto is racist and can't be trusted to discuss these issues in good faith. Yet .. it has NEVER been tried and this "experiment" has been going on 2 years now. What I think is really going on is that the Toronto mods are lazy and using that to implicitly pushing an agenda based on what they allow to be posted.
So, I'm only replying to give some color into that thought, not to start a flame war or debate, because it is a legitimate concern I have with using the site as data.
I'm not necessarily complaining that it's left leaning, it always has been in US parlance at least. My worry is that the tone as changed from disagreement to some form of disgust and hate, and yes even at times calls for murders. Maybe that's just the landscape now, broadly?
Just look at some of the 'mainstream' comments linked below. This is probably a 'both sides' thing, but I don't see the other side represented here and I would have equal concerns if it were in similar manner.
That said, I don't have a Reddit account, so just opened the site to see what it showed me. I'm not sure of its algo, as a lot of it is video games, but here are a few that showed up in the top 15 or so -
No way it’s exclusive. Reddit is making a push to find new $$ streams for their IPO and the API is a big part of that with AI being hot. They probably have lots of deals for their new API already and this one is just making headlines because it’s Google and it’s yearly/unlimited/relatively large.
> Google Search is currently expanding the test of a "forums" filter that lets you browse through results from sites with human discussion, like Reddit
Yeah, about that whole "human discussion" thing...
Fair point that a lot of Reddit is already astroturfed, but this is definitely the final nail in the coffin. Now SEO is coming for the forums, too. Wonderful.
I think the idea is that if Google makes it easier to find human-authored content on Reddit, astroturfers and other spammers will have even more reason to post there.
I think Reddit is accurately described as a site "with human discussion". It's another matter whether such discussion is organic and actually shows up in search results.
It was called discussions search, and showed up in the suggestion bar with images and whatnot. I was beyond mad when they removed it. Everything's a circle.
Hah. In fairness, PR firms, digital marketers, "reputation management" experts, and SEO gurus are "human" too.
These days at least 90% of product-related discussions on Reddit are astroturfed, though, and that's for sure.
The main problem, among many others, is that it's far too easy to game the upvote/downvote system with bots, and most pros have bot networks of their own, or know how to hire them. The second problem is that Reddit "karma" is site-wide, which means that most "high karma" accounts that are sold on secondary markets are very useful for digital marketing purposes generally. The third problem... well... I could go on all day. Reddit's system is horrendous if you're looking for human discussion and unbiased advice.
Copying this right from the Reddit TOS for all those who consider posting anything of any value there, present or future:
> You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:
> When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.
This is dystopian. A company like Google will be able to surveil more and the only outcome is more invasion of privacy and more targeted ads. Why doesn't FTC stand up for this?
Arguably Reddit's value is it's data, and GOOG is renting it for 1.2%/year?