Forum upvotes and downvotes have no correlation with quality, though.
As implemented on Twitter and Reddit and HN, they are simply engagement features. In other words, the point of voting is not to surface quality content, it is to make site visitors feel like they’re doing something, and therefore be more likely to return. Karma scores, too, are not about rewarding quality but simply about creating artificial incentives for return visits.
Quality in forums comes mostly from careful human moderation. HN is carefully moderated by dang and others. Likewise, the highest-quality subreddits are those that have strict rules and active moderators that enforce them.
Agreed. I think the whole market analogy breaks down right out of the gate:
> Each of these communities has an upvote mechanism, which can be interpreted as an indication that the user believes the website is undervalued. They are willing to bid up the price by spending their upvote on it.
It's wholly misleading to talk about "spending" an upvote. Upvotes, unlike money, are not a limited resource. Most of the other problems (bots, spamming, optimizing for low-effort clickbait etc) flow from that.
I think you're throwing out an entire idea out based on a small semantic quibble.
Ok, so upvotes on HN and Reddit are unlimited, and therefore not valuable.
Why not limit accounts to X upvotes per day or week? Or only allow upvoting as many times as you've been upvoted yourself? Suddenly upvotes are now scarce.
These aren't impossible problems to solve. HN has implemented tons of "dark" features you don't even realize are here to prevent spamming, and is largely successful at it.
For example, if you post to "New" and send a link to your friends telling them to upvote it, those upvotes will not be counted. Similarly, there is also a wait limit on how soon after visiting HN your upvotes will be counted, without you knowing it. There's a ton of these unwritten tricks like this behind the scenes, and as you can see from the quality level here, they work!
I think ideally the search engine of the future has a combination of user feedback (with HN-style dark patterns behind the scenes to protect integrity), plus some level of human curation.
I'm amazed at how people can simultaneously think Google results are garbage, and yet, also think any attempt to try to solve the problem is futile and worthless. Cynicism is one hell of a drug.
OK, but you're moving the goalposts here. You're not just talking about a new search rank algorithm any more, you're talking about building a new social-web ecosystem to enforce the kind of voting rules needed to make that algorithm work, and somehow getting mass adoption of that new ecosystem.
> I'm amazed at how people can simultaneously think Google results are garbage, and yet, also think any attempt to try to solve the problem is futile and worthless. Cynicism is one hell of a drug.
You're attacking a much broader proposition here than anything I wrote.
I wouldn't say there's goalposts here or winners and losers, we're all throwing out ideas. Issue pops up, humans suggest fix. New issue pops up, new fix needed. This is how all stuff evolves.
I guess I'm more broadly reacting to the sentiment in most comments I'm seeing (which is itself a trope in the dynamics of how upvote-based communities work).
Based off all the recent "Google results now suck" posts that went viral, it seems there's broad sentiment that Google is now a cesspool of SEO'd-to-death affiliate marketing.
But it also seems there's broad sentiment to shut down and discourage new possibilities before they even get attempted, regardless of potential merit.
I don't think ranking based on upvotes, downvotes, or other forms of engagement is necessarily terrible, but it's certainly nothing to do with "market value", and misleading to suggest that it is.
If you want to create a way of ranking websites and call it "SocialRank", or "EngagementRank", that's fine, although see below.
However, users would have to be aware of the severe limitations. You've already highlighted some of the problems with social content: it's often spammy, low effort, clickbait, poorly written, misleading, or outright false.
The quality of the search engine and ranking algorithm is going to be judged by the quality of the results it returns, which I'd imagine will be pretty trashy in this case.
> Upvotes, unlike money, are not a limited resource.
On some (non-Reddit-derived) sites, they are, although usually you still get a daily allowance and not a persistent budget. There’s also the SO approach where upvotes are infinite but downvotes are limited, that could also work.
If voting didn't exist on HN, don't you think the mods would have had to spend more time curating submissions? The total number of submissions is presumably much larger than the number of reports they receive or the number of suspicious incidents they need to investigate.
Is there a correlation between upvotes and truth or utility?
Sure, upvotes might be fine for recognizing good content or popular opinions. [best resorts in mediterranean], sure, upvotes are as good a signal as anything. Do you think it would be a good signal for queries like [why is there war in ukraine] or [how did covid start]?
Some of the worst reddits are also the ones with active moderators -- these moderators get under the table money to promote specific content while suppressing the rest.
Any algorithm is going to get gamed sooner or later. The biggest problem I've with Google ranking is this:
1. Everyone now knows that Google favors long content pieces which covers the topic in depth (what, why, when, etc.)
2. So an army of content marketing firms are writing 2000 word posts for simple topics that can be covered in 200 words.
3. As the user gets lost in the 2000 word article, trying to find what they really need, Google treats this as a positive "lots of time spent on page" signal and rewards this behavior further.
The result is people trying to now write 3000 word articles to "one up" the other already long posts dominating the first few results.
What Google needs is to do is start taking explicit user feedback. If I click on something and it's a major waste of time, I want to be able to tell Google to never show me this result or the whole domain. Why can't I do this? They are happy to hoover all of my data in the name of "personalization", but there's zero way I can personalize the results myself without resorting to insecure 3rd party extensions.
That is the main value proposition for the new search engine https://kagi.com/
Each result has a menu to block or boost the domain, to remove SEO spam, and pin relevant domains (the online documentation for that framework that always ranks below stackoverflow) to the top. So far I am loving it.
Back in the early 2000s there was a browser toolbar from Google that allowed you to do exactly that. Maybe such a tool today would be too easy to abuse?
Indeed, why does Google not use their full stack? Other competitors have the problem that SEO agencies can flood the feedback with manipulated data. However, with Android, Google knows the identity of enough people that they can identify the genuine feedback.
There is already a search engine for feedback:
Whaleslide.com [1] seems to let users curate content with collections. However, I haven't seen a link to a collection on their results pages.
This would be fine on a per user basis, but if your "downvote" informs baseline results, it can also be gamed, with people burying each others' sites by sending malicious negative signals to google about them.
I nearly wept when I looked up some video editing techniques for Shotcut, and saw some from a channel with titles like "Fade in/out Audio in Shotcut in 1min" and the length of the vid was genuinely 1:08 or something.
So many tutorials on Youtube are front-loaded with a long ad advertising the creator's paid courses.
I'm not a fan of the tone either, and I don't think the author did themselves any favors by comparing taking about academic papers early on, because this is very much not one.
I still think you're missing the forest for the trees here. Your counterpoints are not wrong, but disagreeing with three sentences in the post and deciding that invalidates the whole thing is even worse than what you are saying the author did.
I'm kind of with WA; I think they're just being terse. The basic assumptions that underpin this whole concept are fundamentally broken.
upvotes != market value
market value != value
votes are very gameable
It's also inviting dismissiveness to be dismissive of the algorithm that created one of the largest and most successful companies in the world, because it's dismissive.
I didn't find it to be dismissive of PageRank as an innovative at the time, but I think it's fair to say it has not been thoroughly gamed out. Saying we shouldn't do this because it will eventually also be gamed doesn't seem like a strong argument to me.
I do think people generally upvote things they find valuable. I'm curious what the disconnect is there.
"MarketRank is built on the analogy of the market. An object with a higher market value is probably better."
This is already how SEO works: if you get a link from another website to your content, it's like a "vote" for your content. A vote from a website with lots of votes is worth more.
The result:
This system is intensely gamed. People and businesses buy high-ranked domains and create private blog networks to set up backlinks to content that companies want to rank. Billions of dollars are at stake.
It'll take a lot of effort to come up with something that can't be gamed. I appreciate the effort, but I don't think this is it.
I do actually "spend" my votes on HN; I very rarely downvote, and I only upvote if I think an article or comment is too far down the list. HN is the only online forum in which I participate at the moment.
My vote isn't really "spent", because I don't have a vote budget; but I do have an attention budget, and I'm certainly not going to vote on an article I haven't read.
I was shocked looking at the market rank algorithm giving 0 to the Zero to One book review on Goodreads (which shows up as a second result on Google).
> We do not think it is much of a leap to say that the Farnam Street and Slate Star Codex articles are higher quality than most of the articles on Google’s first page of results.
The author was like, I haven't heard to these blogs from other people, so they aren't quality. But hey, I know of fs and ssc, so they must rank higher because they are quality.
The biggest mistake you can make in tech is assuming everyone else would like what you like or think of that as quality. I mean even techies can't agree on quality or approaches, let alone the rest of the world.
The other paradox is that they compare the results on Google's first page to their marketrank score. If they could give us a top 10, we would be able to figure out better (subjective really) as to which set to 10 results do we want - when we know a lot about the search term, and when we know nothing about the search term.
ironically, a post like this may get more upvotes on HN simply because the prominent examples of sites shown all happen to be sacred to HN: slatestar, paulgraham and waitbutwhy.
It's almost akin to a Reddit post referencing Marvel, Minecraft and Keanu. Playing to the crowd!
As far as can tell, dang does a pretty good job fighting off sock-puppets and vote manipulation as things are.
Imagine if there was a multi-billion dollar industry focused on making fake accounts in order to increase vote counts for HN posts. I'd don't think dang would be very successful in fighting them off. So the claim that this is "immune to SEO" is debatable.
Small-time voting fraud in the form of asking acquaintances to vote probably goes on all the time and frequently succeeds at securing a front page spot already; after all it only takes five votes or so. We occasionally see hard evidence of it when someone outright admits to this practice or solicits votes in the open on Twitter. People just don’t mind it that much because right now the practice doesn’t noticeably degrade the experience. Imagine if big money is suddenly involved at a much greater scale.
>Imagine if there was a multi-billion dollar industry focused on making fake accounts in order to increase vote counts for HN posts. I'd don't think dang would be very successful in fighting them off. So the claim that this is "immune to SEO" is debatable.
That's an easy one. Just ask for camera permission and record a video of the guy hitting the reply button. Have thousands of hamsters to analyze each one.
I don't know if this new algorithm has fundamental advantages against what Google's currently doing other than it's not yet being targeted by SEO attacks. You can probably create a new algorithm which works significantly better than Google for the current web configuration, but it will be only until everyone decides to "optimize" their site against the new algorithm.
Probably in someday, general computational intelligence could solve this problem by understanding the content itself and use it for ranking in a meaningful way. But if you're just trying to utilize some arbitrary, accidental structure inside the data set for ranking, there always will be a way to exploit. That's the whole point of SEO anyway.
Exactly. Let's say that MarketRank grows in market share to be the dominant search engine. What's to stop someone from then gaming Reddit / HackerNews posts and comments with bots and fake accounts because the operator understands that these websites now have "community verified rank importance"?
They're already gamed or easily gamed. You can buy Reddit accounts with high amounts of karma which are sold to people who are trying to do "astroturf" promotions.
SEO will respond by creating 100% robot-operated communities that upvote their blogs, essentially shifting the burden onto Reddit/HN/Twitter staff, which will surely fall just like Google's did.
It looks so obvious to me yet the article doesn't even mention the word "bot" as a factor in modern Internet. To solve such a hard problem one has to try a lot harder than this.
I agree with you, but I don't think that's a reason not to do this. It's an arms race, but the only alternative is not to play, and live with shit content. We'll figure out the next move after this one gets gamed.
A lot of the Google SEO “secrets” are not so secret. If you have an inquisitive mind - start a new blog with the means of getting your articles ranked on page 1. In 3 months you will know exactly how it works because there is real-time feedback. Google says it doesn’t use site authority, but it clearly does.
People will pay $5,000 (a number i saw once) to get a link from Forbes. Not because they give a shit about Forbes traffic but because Google still thinks that Forbes is some “above them all” content site.
But if you want to see the real dirty stuff people do, buy a subscription to a tool like Ahrefs and see how people manipulate links for SEO purposes. It might just give you gray hairs.
I cannot because I don't have a subscription (it's quite expensive and for my needs I only need to buy it few times a year).
I can tell you this though:
- Brands buy paid guest posts on "established" sites. (Using a stuffed keyword for their money site, service or whatever.)
- They wait until Google indexes their sponsored post.
- They then go ahead and buy links that link to the sponsored post instead of their service / brand site.
- The idea is that they can avoid a penalty in long-term. And Google largely looks past it because the sites doing such tactics (buying hundreds of private network links) are still on page 1 of Google results.
Past a certain point (5,000+ referring domains?) Google isn't going to penalize your site because it can't differentiate between legitimate or spammy links, or even care for matter[0]. Though the internals on the algorithm are very scarce in this regard.
If you do use the tool (Ahrefs) you have full freedom to analyze sites you suspect are doing shitty tactics. And you will find inconsistencies like the one I mentioned above. Needless to say, because Ahrefs is a crawler, it can be blocked and I imagine quite a few blackhat networks do this to avoid further detection.
Lastly, this is a big problem for smaller niches / keywords because all you need is 20-30 links pointing towards an article and Google will assume you have a PhD on that topic.
Have a quick look at the /newest page here and you’ll see it dominated by people who are already gaming it for conventional SEO, just in case Google starts factoring this in some day.
If you make it Fake Internet Points are worth real money, all you’ll do is fill all these places with SEO people and ruin them.
Rand Fishkin mentions in the opening chapter of his book that HN has an algorithm to detect coordinated upvoting from the same geo ip so you 'need' to have a network of people with different geo ips to game the system.
There is a strange apparent paradox of Google floundering with attempting to address SEO spam, and alternative approaches including the one in the post and indeed my own demonstrating great success.
The truth is any alternative search ranking to Google's will probably work well. The problem is the search mono-culture with one very dominant search engine, which permits extreme specialization. If you have real competition with multiple mutually incompatible ranking algorithms, black hat SEO patterns will drastically lose effectiveness. The unreasonable effectiveness of SEO spam is largely water assuming the shape of the cup.
That’s the thing: my guess is that the reason your model works well is because (respectfully) very few people stand to gain enough from gaming it. It’s an instance of Goodhart’s law: when a measure becomes a target, it ceases to be a good measure. (In other words, search result quality is probably a convex function of popularity.)
I wonder about the approach @Jack was reputedly considering for Twitter, where users would select algorithms[0] out of a marketplace of options (perhaps with an interface similar to fonts.google.com, where they could sample the results of each algorithm for a given query). I don’t know if that would run within Google or would have to rely on something like Common Crawl. Google has a huge advantage not only because of its algorithms but also - as I’m sure you know - because of its index, and other miscellanea like click-and-query data.
[0] In that case for filtering/ordering, in this case for searching, though one could argue it’s basically the same thing except the latter has not only a user but also a search query as a parameter.
My point is that this is a basic dynamic that exists in biology as well (or any ecosystem). If you have extreme monoculture, specialists can and will flourish. Specialization as a strategy is much less effective in a diverse ecosystem. If you target one algorithm successfully, the others will work better as a result; and vice versa. It's fairly easy to construct mutually incompatible scoring systems. Even minor differences in scoring will limit the success of specialization.
It doesn't matter if it's a search engine or half a continent growing nothing but wheat, the dynamic is the same.
Sorry for the late reply - and thank you for responding! That makes perfect sense.
I share your thought about this. I hope my comment wasn’t too unclear. I was just considering that a more diverse ecosystem of algorithms doesn’t necessarily imply a more diverse ecosystem of companies. Hence my thought experiment about a marketplace of algorithms within one search provider.
But of course I’m not wedded to that, and I have no reason to be. Both situations seem equally improbable to me: (a) Google’s losing dominance, or (b) Google’s providing a marketplace of different algorithms.
I’d be interested to hear any more thoughts you might have! Thanks again for your reply.
This is incredibly naive. Of course this is not immune to SEO but is highly susceptible to SEO. When it is in the financial interests of millions of people with billions of dollars at stake, people will simply tweak the signals going into inputs of your algorithm and "break" your "anti-seo".
The author mentions the second-order effect of SEO upon the web: an entire industry gaming their client's sites to the top of search results has debased the quality of the web overall. How is that same industry not going to pivot to debasing social sites like HN in order to do the same thing? The entire market analogy is busted because HN and reddit aren't markets and up/down votes aren't currency. You don't "spend" your upvote on something. You have an unlimited supply.
This seems incredibly biased and naive. If you want to just search for a tiny fraction of issues relevant to startups and coding, just add you site: filter or use HN search and move on. There's no anti SEO going on here, it's ranking by echo chamber on a metric you hope hasn't been manipulated (that much). Given the the breadth and depth of a fully functioning search engine, this wouldn't cover much at all either.
I think you'd want to index by subreddit as well in the case of Reddit. Getting 500 upvotes on /r/pics is probably not the same as 500 upvotes on some relatively obscure subreddit.
If the public knows exactly what the algorithm is, they will game it. If they don't know but can approximate its behavior by observation, they will game that.
I also think that lots of upvotes on influential sites will soon turn into improved page rank so don't think that it will make a difference. Google isn't using the original, naive page rank algorithm, they have repeatedly refined it to fight the SEO folks.
and so, the oversimplified description of MarketRank is “just add up all the upvotes”. We have some more work to do to make the values accurate, but this is the general idea.
"1,000 FB Post LIKES $15 (On sale for $12)"[1]
"1,000 Facebook Likes $14.5o (On sale for $8.82)" [2]
Facebook likes are much cheaper than reddit or hn upvotes though.
I believe MarketRank is not completely useless, it shouldn’t be used as the only parameter for the ranking system though. Much more a single gear in a complex machine.
Is it that Google actually benefits from having an algorithm that is susceptible to being gamed and therefore they are disincentivized to move away from that?
The more likely but less interesting explanation is simply that once something becomes highly commercially valuable to game, it’s an ongoing cat and mouse situation. Google may or may not see it as a high priority to fix but I doubt it’s a deliberate conspiracy.
Google has several conflicts of interest. The first is that if they provided users with exactly what they are looking for then the users would accept that and not click on any Google search ads.
The second is a lot of the spammy sites are running Google AdSense so it is in Google's interest to send people to those sites even if they aren't what the user is searching for.
This is a very bad idea that is very well presented. Thanks for a fun read.
Why bad? People have largely said, but I’ll note that a new blog with a single legendary quality post on a niche topic would have no chance to rank, just like google. Plus this could be games if you engage in social media promotion, which many people choose not to do.
Sometimes wrong things get upvoted and reach to the top on Reddit. How does this algorithm stop that? Case: In football (soccer for US folks), there was a report by Marca about two players fighting in dressing room after a terrible loss. It got to 6000 upvotes on reddit. Turns out later everyone denied it. Now, would that show up, unless corrected at the source? because you can't invalidate keywords. What happens when things are in a flux?
For a market analogy - while the market hyped up Apple and Microsoft, they also hyped up Theranos. Market had the 99 boom, and then 2000 crash, and a 2008 crash. That is, market can very easily by misled. Replacing a system with something slightly better is probably not a good idea. There are ways Google can be improved or replaced, but things like these are not the way.
> Our naive currency conversion will work exactly like our naive inflation adjustment. We will compare the cost of a similar basket of goods across the different platforms. In this case, our basket of goods will be similar to our inflation calculation, i.e the average of the top 50 highest ranking inflation adjusted websites on a platform.
> We can now calculate the “GDP” of a domain by adding the total value of all webpages produced by a domain.
This is very easy to game by spamming posts with one upvote on any platform, because the formula for calculating score increases strictly positively with the number of links.
You could have low-voted posts count against the total score, but then anyone could easily de-rank a competitor's content by creating low-voted posts about it.
What I took away from this article is that the author(s) don't seem to have a feedback cycle outside their bubble.
Hopefully the discussion here might help a bit. I read a lot of valid criticism here that should (in best case) already have been incorporated into the thinking process while researching the idea of a 'Market Rank'.
> An object with a higher market value is probably better
That was the moment the article lost me personally.
I knew this was comedy when I saw that their top ranked site was paulgraham.com and they counted hacker news points as almost double Reddit. This is a poorly thought out idea that I just can't get on board with. This also doesn't account for new content that hasn't been ranked on a social aggregator site. I thought it would be about training an AI to recognise good content.
Instead of using upvotes from sites such as Reddit and HN, why not instead allow the users of the search engine itself upvote and downvote results? This might allow quality results outside the scope of larger online communities to still be highly ranked, and also tailor the results to the audience of the search engine itself.
Because there is no obvious way to translate your own upvotes in to a general search engine. If I want to find info on headphones, there is no way to use my upvotes to show a blog post by a non sponsored non SEO site. At most it could work out I'm interested in headphones and show more of that. But Google already does this.
We have the smartest people working on this problem on both sides. Pretty much every easy / obvious solution has been considered already.
It would know more than that you are interested in headphones. A particular URL that you could upvote or downvote has a ton more information than that - the website itself, the part of the website it is on, the content of the page, other websites it links to. Legitimate sites tend to link with other legitimate sites.
This is all stuff Google already does. Rather than requiring specific upvotes they can generally tell how satisfied with the result you were if you drop off rather than coming back to click more links. Problem is all of these signals are gamed to the extreme.
If that is true then whey do they always suggest Pinterest and Quora at the top when I've been ignoring them for years? If there was a downvote button they would be gone from my results.
And the huge number of false negatives is not unimportant at all, as the author likes to think it is. Using an idea like this will transform the Internet in bubbles and if you are unlucky you will never find information that is outside your bubble.
It's a good article, but I don't think it's a sensible suggestion. I don't think the problem can be solved by looking at how content score. There should be a dynamic structure that will check that a site is of good quality. and this quality control has to be fed from very different things. To be good at SEO right now, it's enough to add meta tags, write articles with certain characters and do a few more things like these. it has nothing to do with quality. Good page speed has nothing to do with quality. Using ssr has nothing to do with quality content. I don't know exactly for what purpose opentelemetry is used, but maybe we need something like opentelemetry for quality monitoring.
Are Twitter upvotes really a useful guide to article quality?
I note that the searches the MarketRank people have reported on are all concerned with startups and software. I wonder how it performs on e.g. political analysis, recipes, DIY, living on a budget and so on.
I find it particularly hard to find product reviews that aren't rather shallow re-hashes of marketing material; a product review that points up the deficiencies of a product is a rare animal indeed.
> And so, the oversimplified description of MarketRank is “just add up all the upvotes”.
Wait… the sum of fake internet points is what we’re using to measure pages? I agree that search results could be better but this doesn’t seem to be the way.
A reddit upvote is not a single currency. By this logic a submitted link would get an equal number of upvotes (on average) when submitted to any subreddit. This is not the case. Smaller subreddits have less potential for upvotes.
Everyone's pretty skeptical here, but I think something like this is the way forward. Not based on Reddit or HN, of course, but based on the same principle, namely, user curation. It's argued that Google was so successful at first with PageRank because at the time it was common for users to have personal web pages that had long lists of curated links to other websites. That signal has gone down the toilet, but why can't we just skip the middleman and get people to rank sites they think are interesting by, you know, upvoting them or something like that. We need collaborative filtering for content aggregation.
exactly. this post is a kind of: let's make new google! i know how to do it better (while ignoring 1000 other factors outside of my bubble).
Even if Dunning-Kruger effect is debunked, this case shows that people work this way.
why can't we have upvote and downvote on search results, and use that as as measure? i guess a lot of people would downvote spammy sites, not sure about upvoting legitimate ones
If you've heard of 'karma farming', you know that Reddit upvotes can be gamed. There are numerous accounts that accumulate upvotes (similar to HN commenter score) by simply reposting popular posts from 4-6 weeks ago, aka the right amount of time for most daily users to forget that they had previously seen it. Usually this would occur on high-volume, high-density subreddits like r/pics, r/funny etc where there are lots of new posts each day.
Then, those high karma accounts can be used to astroturf on more valuable real estate, the sections of Reddit where people ask for product recommendations. Posts made by high karma accounts are given greater visibility in the Reddit algorithm, which correlates to a higher number of upvotes as more people see it in their feed.
It gobbles up a huge fraction of search results from Google, even things that aren't really image related. Makes finding the original sources for reposted images extremely painful, infuriatingly difficult to avoid without adding -site:pinterest.* to the query, ...
Host a huge number of images that show up in search results and then make them impossible to view without logging in, don't allow right click so you can't save/view image in the normal manner. Basically all the annoying things they can to monetize pictures that people have uploaded.
As implemented on Twitter and Reddit and HN, they are simply engagement features. In other words, the point of voting is not to surface quality content, it is to make site visitors feel like they’re doing something, and therefore be more likely to return. Karma scores, too, are not about rewarding quality but simply about creating artificial incentives for return visits.
Quality in forums comes mostly from careful human moderation. HN is carefully moderated by dang and others. Likewise, the highest-quality subreddits are those that have strict rules and active moderators that enforce them.