An actual "best X for 202X" page is extremely useful; the problem is that there are no search signals to distinguish it from a trivially created spam mockery.
I think that's a failure of imagination. In the early days of search engines and web crawling companies paid subject matter experts such as librarians and other experts to manually grade and categorize web pages.
Spam pages tend to be filled with Ads and derivative content, owned by spam company domains, and do not provide unique information relative to other sources.
If we as humans can recognize a spam page, why can't the machines? At the very least Google can penalize repeated spam domain owners/domains to help reduce the problem.
Can humans recognize spam pages though? Back in the day, such pages would be created on blogspot or other spammy domains, so there was at least a visual signal of some sort. Now the spam is on Facebook groups, Whatsapp group chats, TikTok, Twitter, Medium, Linkedin.....the list goes on.
Online misinformation would not be as destructive as it is if people could tell the difference between say a credible news site and a Macedonian troll farm-run 'news' site written in broken English.
> Now the spam is on Facebook groups, Whatsapp group chats, TikTok, Twitter, Medium, Linkedin.....the list goes on
Of that list, Medium is the only one I would ever expect/want a search engine to even consider returning results for searches on "how to buy a bicycle for under $500". If I was King, you'd need to add allow:social to any google search to get any results at all from anything remotely like a social media or messaging application. I'd include LinkedIn and Pinterest and similar sites in that exclusion. If you want that stuff, you need to ask for it.
If we as humans can recognize a spam page, why can't the machines?
I think that many of these automated aggregators have come a long way. On the better sites, it's not immediately obvious that the page is spam, until I've read some way into it and notice patterns in the language and a lack of "meat".
No general AI required. Simply count the number of ads in the text.
For a top ten site I’d also expect a certain degree of interesting vocabulary which loops in industry jargon. I’d also expect spam sites to use similar syntax/sentence structure for all products.
Lastly if I can follow the links and get references to the final products, then I can estimate how similar the product description is to the product. I’d expect most customers looking at top tens want something more than what they get from searching a retailers site.
Heck, many legit top lists include dedicated YouTube videos visually inspecting the products and reviewing them - cross referencing blogs to YouTube channels and validating that the YouTube video is of some production quality (likes/views/youtube Spam detection) could strip down a ton of spam.
That does absolutely nothing to address the affiliate problem, nor does it catch advertorial or "native ads" as they sometimes called. Ignoring all content served by ad brokers is no more sufficient than browser ad blocking extensions. It helps, but you don't get an ad- and distraction-free result.
Plus, as soon as you make that a metric that is penalised by Google, how long do you think it'll take until ads are distributed from some rotating nondescript domains?
(See Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure."
Most affiliate spam sites that I see cover the page in intrusive advertisements to maximize revenue. Penalizing intrusive ads would certainly have some effect. This is also referring to penalizing rankings so that the entire page gets downranked rather than simply blocking ads in your browser.
It still wouldn't do much because as soon as Google starts to try to identify sites full of ads by some criteria, shady sites are going to start serving ads in some other way. It's already hard enough to distinguish ads in some sites that have gone with native ads or advertorials.
There is Consumer Reports in the US: http://cr.org They are a nonprofit that has been independent for decades and have strict policies against accepting gifts, sponsorships, ads, etc, and take other steps to remain free from influence. For example, when they review new cars they send an undercover individual to buy it at normal price from a random dealer so the company and the dealer don't know the buyer is CR (and maybe sweeten the deal or throw in upgrades or take steps to improve its reliability or whatever). They also review a lot of household and kitchen appliances, mattresses, etc.
The New York Times now has their "Wirecutter" reviews of tech and household stuff, but they do rely on affiliate links for income so take that as you will.
There is not a single independent consumer organization that can test every product out there, so if you're looking at road bikes you may end up with a different set of trusted reviewers than if you're looking at computer hard drives or roof shingles.
CR also tends to limit itself to middle-of-the-range items. If you're actually interested in the best, their surveys don't always cover it, since they prefer to review things that most people can/would purchase. This has changed a bit in the last few years, but it's still the case that if you want to find solid unbiased reviews of, for example, high end ranges or washing machines, CR is not always the most helpful.
My dad was a big CR subscriber. After a while I realized that they focus on what might be termed "best value", although that's still not quite right. Their picks always emphasized mundane things like TCO, reliability, durability. For cars they were very explicit about this, each year issuing data on vehicles with categories like reliable, held their value, and low-maintenance, etc.
The prefect example of a CR-friendly car, in my mind, is the Toyota Camry. Completely unremarkable.
Everything else I remember them reviewing – watches, TVs, stereos, calculators, cameras – always ended up recommending some middle-of-the road "it works but it's not fancy and doesn't have many bells-and-whistles" product. I have a vague memory of them putting a Minolta camera at the top of the list of best 35mm SLRs in the 70s.
In Germany, there is the Stiftung Warentest, which has in depth tests of many household appliances and goods. According to Wikipedia, they cooperate with Which? in the UK and Consumers Union in the US.
(For example, when they recently tested shaving blades, they had 23 testers shave their face half with one, half with a different shaver, randomised, then had each (half of each) shave assessed by the tester himself and an external expert blind (ie the tester did not know which shaver was used, or how the tester had judged the shave). Assessed where: quality of the shave; comfort; burning, reddening and irritations of the skin; cuts; how many shaves until the blades were blunt; ease of use & switching blades; cleaning of the blades; presence of polycyclic aromatic hydrocarbon and other chemicals in the handle; and more.)
Part of the point of my other comment in this thread [0] is that we shouldn't need to drown out link farms. Google chooses to not hide link farms. Does anyone seriously think that a company (Alphabet) that is capable of sucking in known protein structures and then spitting out predicted structures for thousands of other proteins is not also capable of identifying link farm pages? Yes, it's a non-trivial problem, but chess, go and protein folding are non-trivial problems too. Google appears to have chosen not to treat this as a problem worthy of its capabilities.