An actual "best X for 202X" page is extremely useful; the problem is that there ...

lumost · on July 29, 2021

I think that's a failure of imagination. In the early days of search engines and web crawling companies paid subject matter experts such as librarians and other experts to manually grade and categorize web pages.

Spam pages tend to be filled with Ads and derivative content, owned by spam company domains, and do not provide unique information relative to other sources.

If we as humans can recognize a spam page, why can't the machines? At the very least Google can penalize repeated spam domain owners/domains to help reduce the problem.

rchaud · on July 29, 2021

Can humans recognize spam pages though? Back in the day, such pages would be created on blogspot or other spammy domains, so there was at least a visual signal of some sort. Now the spam is on Facebook groups, Whatsapp group chats, TikTok, Twitter, Medium, Linkedin.....the list goes on.

Online misinformation would not be as destructive as it is if people could tell the difference between say a credible news site and a Macedonian troll farm-run 'news' site written in broken English.

PaulDavisThe1st · on July 29, 2021

> Now the spam is on Facebook groups, Whatsapp group chats, TikTok, Twitter, Medium, Linkedin.....the list goes on

Of that list, Medium is the only one I would ever expect/want a search engine to even consider returning results for searches on "how to buy a bicycle for under $500". If I was King, you'd need to add allow:social to any google search to get any results at all from anything remotely like a social media or messaging application. I'd include LinkedIn and Pinterest and similar sites in that exclusion. If you want that stuff, you need to ask for it.

cratermoon · on July 29, 2021

> Can humans recognize spam pages though?

Probably not, but they can recognize valuable non-spam. It's called by many names, but "reputation" is as good a term as any.

Yes, I'm saying that "Facebook groups, Whatsapp group chats, TikTok, Twitter, Medium, Linkedin" are all low reputation.

CWuestefeld · on July 29, 2021

If we as humans can recognize a spam page, why can't the machines?

I think that many of these automated aggregators have come a long way. On the better sites, it's not immediately obvious that the page is spam, until I've read some way into it and notice patterns in the language and a lack of "meat".

marlowe221 · on July 29, 2021

My dad can't seem to recognize spam pages 99% of the time...

bjowen · on July 29, 2021

I find the advertising to content ratio is a decent clue as well.

newsclues · on July 29, 2021

I don’t think there is a financial incentive to clean up the internet.

Too much money to be paid with shady practices that no one in power wants to do it.

at_a_remove · on July 29, 2021

Ah, the old "Step one: create a general purpose artificial intelligence."

lumost · on July 29, 2021

No general AI required. Simply count the number of ads in the text.

For a top ten site I’d also expect a certain degree of interesting vocabulary which loops in industry jargon. I’d also expect spam sites to use similar syntax/sentence structure for all products.

Lastly if I can follow the links and get references to the final products, then I can estimate how similar the product description is to the product. I’d expect most customers looking at top tens want something more than what they get from searching a retailers site.

Heck, many legit top lists include dedicated YouTube videos visually inspecting the products and reviewing them - cross referencing blogs to YouTube channels and validating that the YouTube video is of some production quality (likes/views/youtube Spam detection) could strip down a ton of spam.

cratermoon · on July 29, 2021

> Simply count the number of ads in the text.

What is an ad?

lumost · on July 29, 2021

Something served from double click or other major ad network

cratermoon · on July 29, 2021

That does absolutely nothing to address the affiliate problem, nor does it catch advertorial or "native ads" as they sometimes called. Ignoring all content served by ad brokers is no more sufficient than browser ad blocking extensions. It helps, but you don't get an ad- and distraction-free result.

FabHK · on July 29, 2021

Plus, as soon as you make that a metric that is penalised by Google, how long do you think it'll take until ads are distributed from some rotating nondescript domains?

(See Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure."

https://en.wikipedia.org/wiki/Goodhart's_law )

cratermoon · on July 30, 2021

Don't spammers already do a lot of domain cycling? Doesn't NextDNS already ignore domains less than 30 days old, for exactly this reason?

lumost · on July 29, 2021

Most affiliate spam sites that I see cover the page in intrusive advertisements to maximize revenue. Penalizing intrusive ads would certainly have some effect. This is also referring to penalizing rankings so that the entire page gets downranked rather than simply blocking ads in your browser.

cratermoon · on July 30, 2021

It still wouldn't do much because as soon as Google starts to try to identify sites full of ads by some criteria, shady sites are going to start serving ads in some other way. It's already hard enough to distinguish ads in some sites that have gone with native ads or advertorials.

antihero · on July 29, 2021

There's consumer advocate sites like https://www.which.co.uk/ that are pretty cool. I'd be interesting in knowing other trustable sources, mind.

hoppyhoppy2 · on July 29, 2021

There is Consumer Reports in the US: http://cr.org They are a nonprofit that has been independent for decades and have strict policies against accepting gifts, sponsorships, ads, etc, and take other steps to remain free from influence. For example, when they review new cars they send an undercover individual to buy it at normal price from a random dealer so the company and the dealer don't know the buyer is CR (and maybe sweeten the deal or throw in upgrades or take steps to improve its reliability or whatever). They also review a lot of household and kitchen appliances, mattresses, etc.

The New York Times now has their "Wirecutter" reviews of tech and household stuff, but they do rely on affiliate links for income so take that as you will.

There is not a single independent consumer organization that can test every product out there, so if you're looking at road bikes you may end up with a different set of trusted reviewers than if you're looking at computer hard drives or roof shingles.

PaulDavisThe1st · on July 29, 2021

CR also tends to limit itself to middle-of-the-range items. If you're actually interested in the best, their surveys don't always cover it, since they prefer to review things that most people can/would purchase. This has changed a bit in the last few years, but it's still the case that if you want to find solid unbiased reviews of, for example, high end ranges or washing machines, CR is not always the most helpful.

hoppyhoppy2 · on July 29, 2021

Good info. As a non-rich person I do appreciate their emphasis on highlighting the products that provide the best value for the money.

cratermoon · on July 29, 2021

My dad was a big CR subscriber. After a while I realized that they focus on what might be termed "best value", although that's still not quite right. Their picks always emphasized mundane things like TCO, reliability, durability. For cars they were very explicit about this, each year issuing data on vehicles with categories like reliable, held their value, and low-maintenance, etc.

The prefect example of a CR-friendly car, in my mind, is the Toyota Camry. Completely unremarkable.

Everything else I remember them reviewing – watches, TVs, stereos, calculators, cameras – always ended up recommending some middle-of-the road "it works but it's not fancy and doesn't have many bells-and-whistles" product. I have a vague memory of them putting a Minolta camera at the top of the list of best 35mm SLRs in the 70s.

FabHK · on July 29, 2021

I found The Wirecutter quite useful. It's been purchased by the NYT, I hope they keep it going in the original form.

https://www.nytimes.com/wirecutter/

In Germany, there is the Stiftung Warentest, which has in depth tests of many household appliances and goods. According to Wikipedia, they cooperate with Which? in the UK and Consumers Union in the US.

https://www.test.de

https://en.wikipedia.org/wiki/Stiftung_Warentest

(For example, when they recently tested shaving blades, they had 23 testers shave their face half with one, half with a different shaver, randomised, then had each (half of each) shave assessed by the tester himself and an external expert blind (ie the tester did not know which shaver was used, or how the tester had judged the shave). Assessed where: quality of the shave; comfort; burning, reddening and irritations of the skin; cuts; how many shaves until the blades were blunt; ease of use & switching blades; cleaning of the blades; presence of polycyclic aromatic hydrocarbon and other chemicals in the handle; and more.)

robertlagrant · on July 29, 2021

I came here to say this. Embedding trusted sources like that [0] would a) help fund such useful organisations and b) help drown out link farms.

[0] in fact I think in Germany there is a state-sponsered impartial review body for consumer products.

PaulDavisThe1st · on July 29, 2021

Part of the point of my other comment in this thread [0] is that we shouldn't need to drown out link farms. Google chooses to not hide link farms. Does anyone seriously think that a company (Alphabet) that is capable of sucking in known protein structures and then spitting out predicted structures for thousands of other proteins is not also capable of identifying link farm pages? Yes, it's a non-trivial problem, but chess, go and protein folding are non-trivial problems too. Google appears to have chosen not to treat this as a problem worthy of its capabilities.

[0] https://news.ycombinator.com/item?id=27996639