Hacker News new | past | comments | ask | show | jobs | submit login

How might you see solving this problem? How could we distribute the task or goal of curation amongst individuals? How do we incentivize and enable discovery of maven/curators?



I've been through this thought-process many times.

1. Google isn't working well any more.

2. Therefore bring humans back into the system of flagging good and bad pages.

3. But the internet is too big - so we have to distribute the workload.

4. Oh, a distributed trust-based system at scale... it's going to be game-able by people with a financial incentive.

5. Forget it.

---

Edit: it's probably worth adding that whoever can solve the underlying problem of trust on the internet -- as in, you're definitely a human, and supported by this system I will award you a level of trust -- could be the next Google. :)


> 1. Google isn't working well any more.

This is so true. It’s pain to search for anything undeterministic with nowadays. I usually find myself putting double quotes on every single word I’m interested in and Google still brings unrelated results.


I'm curious. Why do you use Google if you don't like their results?


PageRank isn't working well anymore. I use DDG, but it its quality is also flagging.


Yeah, DDG quality seems abysmal for many of my searches now. I'll then switch to Brave, which sometimes finds what I'm looking for. I rarely ever check Google or Bing as a fallback, but when I do it feels like a screenful of ads and it isn't any more helpful (except Google Image search).

Part of the problem seems like a recency bias in search results. I notice sites frequently update pages with new timestamps, but nothing of substance appears to have changed (e.g. a review of something that was released 4 years ago, but the page was supposedly updated last week). So if I do pretty much the same search that succeeded two months ago, but repeat it today, I might not find the useful result I remember coming across.

I'm sure there are a bunch of other issues related to search and SEO that are affecting search quality. It seems insane that the major search providers don't combat this trend by arming users with more tools to tailor their search, but rather steadily degrade the user experience with no recourse.

Honestly, I think if Google was wise, they'd have a skunkworks team rethinking search from scratch (not tied to AI/LLMs) that starts their own index and tries to come up with an alternative to the current Google Search. Maybe they have that already, though I doubt it. I'm sure if they do have such a team, it's intricately tied to existing infrastructure and team hierarchies which effectively nullifies any chance it has at success.


I would be happy with an even simpler solution. Give me a blacklist domains or sites. There are 50 or so sites that I never want to see posts from. It’s not one or two that I can add an exclusion in my searches.

Second, give me a way to express semantic meaning of something. If I’m searching for rust, let me choose programming language for example. I find myself adding various one word tags to limit the search results.


Personally I have one use case left for Google: Product Search.

If I'm looking to buy something, I will frequently end up using Google. The engine that matches my product search to a relevant ad is excellent. Basically anytime you need to search for something that could lead to a purchase of a physical product, Google will be extremely useful. For services and software you can't use Google, you will get hustled by fake review or top 10 XYZ for 2024 sites.


I can't believe how terrible product search is on Instacart - I place orders on there pretty frequently for my mom, and Petco is the worst.

I will search for "wellness chicken cat food" - and wellness has chicken cat food in a few different textures, so it seems like those should at least be on the page of search results, if not the top results. Not always so! At the very least I will have to scroll a ways down the page to get anything even wellness.

And sometimes the top results aren't even cat food, they will be random other pet supplies.

Or she wants a few different flavors of the food, and I find one and then the other flavors I have to search a few different ways to pull them up and they don't show up on any "similar" displays.

It's painful. I hope Google doesn't go the same way - I think with Instacart it's because they want to promote whatever it is they put at the top, but even that doesn't explain how terrible some of the search results are.


I just think wellness aside you should get other food than cat food for your mom.


Meijer (grocery store) is the same. Until a short while ago, you even had to match the capitalization of the product you were searching for. And we are not even talking about different writings of the same Unicode characters (ê etc.) with different bytes.


I use it for deterministic things. If I’m looking for something specific or a well known thing, it simply gets me the results I’m looking for.

For anything that’s indeterministic, it just brings me garbage. The same with YouTube as well. I’m searching for a specific thing about a particular library and it’s showing me stupid definition blogs or useless garbage. I apologize for my crudeness but no other words can describe it.


Have you tried verbatim mode? I use it by default.


Could you explain how to use verbatim mode please? For anyone else reading in the future as well.


Google used to work closer to verbatim mode but would get common synonyms to give you comprehensive results. Now it stretches synonyms and alternate spellings to the point of uselessness.


They also changed "did you mean Y" into "showing results for Y". So annoying.


Try using kagi. The results are so much better.


I think the solution to this is both unique and trivial: you cannot trust something that is freely expandable,or did not require some amount of stake from the other party. That stake can be anything, time, work or money.

If you want to trust a review, it's needs to have required a non expandable resource from the reviewer. That amount of resource should be an optimum of what an average user would be willing to expand without missing it (so that barrier of review is low), while being prohibitively expansive if an actor want to cheat the system and generate millions of reviews.


I like your thinking, but there's a middle ground before full automation: when humans are incentivized, one way or another, to provide the biased reviews. This might be via straight-forward employment of people in lower-cost places (e.g. via Mechanical Turk) or other incentives. For example, note how a proportion of Amazon reviews are gamed and unreliable.

At the moment, the only tasks (that I can think of) that come close to the 'time-consuming-enough to not scale, but not quite annoying enough to put off committed individuals' are the various forms of CAPTCHA - which is unsurprising, given that we're discussing a form of Completely Automated Public Turing test to tell Computers and Humans Apart. (And of course, there are CAPTCHA-solving farms.)

But would people invest time in a review system that required them to complete a form of CAPTCHA regularly?


> That stake can be anything, time, work or money.

I think you'll find that money should be removed from your list. There are some untrustworthy people that have tons of money. Sadly, I think trust must be EARNED, and that requires giving effort (work) and time. You cannot buy trust with simple money.


> You cannot buy trust with simple money.

yet rich ppl are granted more upfront trust. maybe because we assume less incentive to rob.


This clearly isn't true, since if you dress and make a rich person smell homeless, they simply won't be trusted in most parts of civilization.

No, what you're talking about is POWER and AGENCY. Rich people have the power to override trust through the fact that they can operate with near impunity; so you have very little agency to not trust them. If you choose to not show trust, you may invoke their wrath.


Then web of trust. Means SSO (as a way to link the review to the trust).

In order to prevent hacking trust the SSO again must ensure:

- unique human, or

- resource spend


> unique human

Here's the thing. A sovereign nation can "generate" as many "unique" humans as it wants (via printing "fake" but official identities). No one would be on to them until there were more users than probable people in the country.


Doesn't stop nation state troll armies though


How about Wikipedia's approach?

Of course Wikipedia is way smaller than the internet, but still one way to go could be by having themed "human curated niches"


> 3. But the internet is too big - so we have to distribute the workload.

> 4. Oh, a distributed trust-based system at scale... it's going to be game-able by people with a financial incentive.

These are solved by being transparent and surfacing the agent (maybe even the sub-agents) for ranking, and allowing us to choose.

This way, if someone/something is gaming the system, I can just say "this recommender is garbage", and consequently it and all its sub-choices are heavily downranked for me.

This'll make filter bubbles even worse, but that ship has sailed. And I'm sort of a progressive-libertarian-centrist (in the classical sense, not in the American sense). If I get put in a bubble with people who have similar balanced tastes: yes please!


Freenet FMS; more specifically: Web of Trust.

IMO it's how all moderation should go: you subscribe to some default moderators' lists initially and then mutate those subscriptions and their trust levels. Mod actions are simply adding visibility options to content and not actually removing anything.


I think a distributed del.icio.us could work, a p2p version not based on crypto. Something like https://veilid.com would be perfect actually.


The really obvious thing to avoid is DMOZ which got captured by spammers immediately.


Reddit could've been it. This is my default search engine at the moment:

https://www.google.com/search?q=%s+site%3Areddit.com&tbs=qdr...


Obviously the solution is to create a centralized system to electronically ID every human on the planet, and track what they post, talk, think, which medications and food they consume, who they are friends with, who they fuck, how much of this decade's evil chemicals they exhale, where they spend their money, and their real-time location.

Or, you know, just make your own open-source search engine, with blackjack and hookers.


> How might you see solving this problem?

Break up google, disentangle AdSense from Search. Then the search division doesn't have incentives anymore to prioritize websites based on AdSense presence.


I don't think that's necessarily required. Google has always done well, even without pushing ads as hard as they are now. The problem is that Google / Alphabet has become to reliant to the money from ads to finance other non-/less profitable products.

We've seen Google sell a profitable business, Google Domains, because it didn't make enough money for Google to bother with. For some reason Google believes that it is entitled to make some insane profit, rather than being content with having a reasonably profitable ad supported search business.

Google Search sucks because of the current financial model they, and most of society works under.


How would the search division make money? Serious question.


Search can still serve ads on their own pages (which Google calls "AdWords"), and (iirc) makes the most revenue for Google.

AdSense is serving ads on third-party websites, which is very different.


Does it need to? How large does it needs to be? Can it be treated as 'essential infrastructure'?


I don’t know, webrings?


we definutely need a return of webrings AND curated links page on decent internet pages.


Webrings were a lot of people doing work for free or with little return. Nobody has time to do this anymore, and the only way to support such project would be paid membership (hard to gain traction) or ads (back to square one).

Sorry that I sound pessimistic but I just am.


>> Nobody has time to do this anymore

By every measure don't people have more time than ever before? I think it's more a dramatic shift in how people use & view the internet. The (unfortunate in a lot of ways) answer is probably create something new and exclusive, but universal enough for the target audience that's going to do all the work to align incentives, try to pull along the good and leave behind the bad. There has been and will continue to be a lot of false starts & failed attempts. If successful it will eventually be co-opted and ruined too. So it goes.


> Nobody has time to do this anymore

Can you say more about that?


Thank you for the question. It really needs an explanation.

Times have changed. While there are significantly more open-source projects, online communities, and all kinds of available help than ever before, the proportion of active contributors relative to the number of people online has decreased. Consequently, the time required for each volunteer to process input for a project like a "webring" has increased significantly.

I also feel that it's become harder to motivate people, but not because they've become lazier. Rather, the tasks have become more demanding, so "nobody seems to have the time for them anymore".


I think people in general have more time now than they did 25 years ago, not less?


Yes.


The scale of the internet is too large for individual consumption search engines and word of mouth are the methods I see for distributing access. Search engines need individual Judgement to evaluate results and word of mouth provides context clues and trust.

Business prefers search engines to scale their monetization efforts but the quality of results are unknown.


IDK; the sites I regularly visit nowadays are all either one-off personal projects I learned about through word of mouth or user-centric portals like HN where I get that word of mouth. I guess it's kind of what digg used to be.


Pivot to paid search engines where you are the customer not the product.

Thanks @Kagi - I attribute three solutions last week directly to you.


Plurality of search engines?


The fix is a decentralized search network with nodes linked to people or legitimate businesses. You manually distribute trust to friends and businesses you like, and you can manually revoke that trust, with some network level trust effects occurring based on spammy/malicious behavior.


You know why you don't see systems like this at any scale

Because they don't work.


Decentralized communication absolutely works, and we have plenty of decentralized tools now. Decentralized search is a legitimately hard problem, both because it needs strong protocols (which are getting there but still WIP) and user adoption (which isn't really happening).


Ask an email administrator how well decentralization works at scale.

It should be concerning that email has re-centralized to deal with all the problems that come from federation and decentralization.


That’s not the reason email has re-centralized. It’s because big-tech companies provide it for free in order to bind users to their services, and at least in the case of Google, also to collect data and drive ads.

As someone who administrates their own email server and works at an SMB who administrates their own email servers, it works quite well and doesn’t require centralization.


"When mice march with elephants it's in the mices best interest to be cautious"

I've done a lot of a lot of email administration over the years, servers with 100,000s of accounts, and you walk a very fine line in being able to communicate with the rest of the world. Your IPs must exist in certain blocks (or at least not be in banned blocks). You must block outgoing spam messages at a much higher rate and quality than google/hotmail do. Get yourself blacklisted for any reason and expect to disappear from the internet, meanwhile no one is going to block the largest services.


Mail servers aren't real people or real businesses.

50 year old decentralized protocol has problems, so decentralized protocols can't possible work. Just like 50 year old computer can't play doom, so doom doesn't exist.


Search systems aren't real people or real businesses either. Never mind details like how you define and enforce "real" in a decentralized way across N legal systems with wildly varying ideas of what it means to be a "legitimate" business. There is no system or ledger you can query to find out if a given person is real or a business is legitimate. This is exceptionally unlikely to change on a useful timescale.

The basic problem with email is that it assumed good-faith participation from the parties involved. It was assumed that only legitimate actors would have the resources to participate and they would always be well-behaved. This, it turns out, was flawed on several counts. For one, it assumed legitimacy could be assured. For another, it assumed legitimate users would be well-behaved and would never abuse services for gain. For a third, it assumed account takeovers or other impersonation attacks wouldn't happen.

Every de-centralized system that aspires to not have email's problems needs to take them seriously.


Decentralized communications may work, but decentralized internet search is probably always going to be impractical.

The actual logic involved in search is itself pretty simple and almost trivial to shard, the problem is dealing with a mutable dataset that is of the order of a hundred terabytes.

Search gets enormous benefits from data locality, such that distributed approaches are all significantly more expensive.


I think they would work fine if the surveillance and advertising industry were overnight christened criminal enterprises and users had to start paying for what was once '''free''' (cost of entry: pieces of your soul, bit by bit).

Decentralized has a small convenience cost and the crime of surveilling everyone to advertise to them (among other uses) is too cheap for decentralized to come out on top. Note I am Atheist when I say this: if law doesn't hammer the scourge, those who get religious about this stuff are the ones who'll enjoy any modicum of sovereignty.


Except the whole fediverse... ahem...


Yep, create a small closed off community that shames outsiders and you can have what you want. Everything in it will be local and very much a bubble. It won't have the failure modes of a Google or Facebook though. So I guess pick your poison.


You’re writing this comment on a site with an upvote/downvote based algorithm.

The answer is simple, allow some level of user feedback from proven real users (for example, only people with gmail accounts that are over 5 years old and who use them at least 3 times per week to eliminate fakers—-but keep this a secret) and apply it mildly as a ranking signal.

As long as it doesn’t become the only factor in ranking, you still retain strong incentives to do all the old SEO stuff, yet with a layer of human sanity on top.


Google Maps reviews are working like that and are often gamed.

If you pay close attention, you can spot fake reviews because they usually come from “Local Guides” (so supposedly the most trusted users).

Reddit is somewhat better at ranking and filtering spam, due to local mods, like there were in the times of web directories and webrings.

One of the former bosses of Google search explained that the key metrics they follow to consider the success of “Search” are the number of page views and the total revenue.

So if a user doesn’t find what he needs but keeps coming back it’s a win for them.


> for example, only people with gmail accounts that are over 5 years old and who use them at least 3 times per week to eliminate fakers—-but keep this a secret

tbh this is just a bandage, its just going to get botted once people discover the pattern (a lot of premium bot farms do offer mature or hacked gmail accounts anyway) and its going to be worse for legitimate discovery


Then you quietly change the rule, and get years of success until it's widely gamed again.

The point is, there's experimentation that could be done and there's absolutely solutions that could be found.

Early Google did tons of experimentation with the search algorithm to maintain the integrity of the results. There was definitely an active game of cat and mouse back then that Google actually cared about staying on top of.

But as a decades entrenched monopoly, Google lost all incentives to tinker with anything anymore. The "operational" folks took over and any change to the search algorithm is now a multi-year endeavor involving thousands of stakeholders.


> Then you quietly change the rule, and get years of success until it's widely gamed again.

You won't though, unfortunately. Too many people know the rules to keep it a secret, especially given that corruption exists.


We are working towards this if you'd like to get in touch. I feel like "web3" as an interface/discovery platform is a very good topic.


What are you working on?


This was already solved in the past. Don’t just build a web site, build a community. This way the community will advertise the site and attract more users.


How do you build a community if google prevents users from finding it?


You go to the myriads of car communities and systematically promote your content for Nurburgring. It’s a niche site, and by the OP’s description we can safely assume that it’s quite usable since he’s been already approached by a sponsor. And since this is a site about a specific car circuit you just go to the actual place, or make an arrangement with the organizers to promote your site. Give a leaflet, or build an accompanying mobile app.

I'm not saying that circumventing Google is the easiest thing in the world, or the cheapest, but it's not mission impossible either. I didn't find Hacker News from Google, not the dozens of other tech communities I'm following.


The existence of meatspace never stopped the early web from flourishing, so why should the existence of the modern web stop anybody from making a second web? The only reason that Google was useful is because it tapped into the trust network that already existed before it.

I feel like the social media churn has destroyed people's brains, because they're more interested in stopping people from doing things they don't like than doing something awesome themselves.


Before people knew the web was vast and required digging through. Now people think google is the web, so if it doesn't come up by the third search it might as well not exist.


You're not wrong, but this is also a great acid test. We need to work to help people understand google is most definitely not the internet, and where we fail or don't get through leave them behind. I don't know what comes next, but there's not room for everyone, and I include many here (and likely myself) in those who won't make the transition. We love to imagine people physically leaving Earth for Mars and beyond, but what follows the internet is going to happen far sooner.


That's exactly it, the people are different now. They don't have the same time or energy as before.


Start local. With actual people. Advertise with posters on utility poles (oh well they do not exist anymore in most places so replace with a suitable alternative and go to your local church/mosque/dance club/whatever relevant). I'm not saying this will work, but it is how it used to work.


Same way people built a community before google?


it's not really a community if it relies on an influx of random strangers


facebook/twitter ads still works.

I know what you gonna say, but... ya... you knew.


Honestly I think Google needs to be broken up. It's not a novel idea but the more I think about it the more I like it.

So, Google becomes two orgs: Google indexing and Google search. Google indexing must offer its services to all search providers equally without preference to Google search. Now we can have competition in results ranking and monetisation, while 'google indexing' must compete on providing the most valuable signals for separating out spam.

It doesn't solve the problem directly (as others have noted, inbound links are no longer as strong a signal as they used to be) but maybe it gives us the building blocks to do so.

Perhaps also competition in the indexing space would mean that one seo strategy no longer works, disincentivising 'seo' over what we actually want, which is quality content.


I’m afraid the problem is not indexing, but monetization. Alternative google search will not be profitable (especially if you have to pay a share to google indexing) because no one will buy ads there - even for bing it is a challenge


The hope though is that by splitting indexing that puts search providers on an equal footing in terms of results quality (at least initially). Advertisers go to Google because users go to Google. But users go to Google because despite recent quality regressions, Google still gives consistently better results.

If search providers could at least match Google quality 'by default' that might help break the stranglehold wherein people like the GP are at the mercy of the whims of a single org


People go to Google, because it is default search engine in most browsers, they don't seem to change it.


> Google still gives consistently better results

How sure are you about that? I find them to be subpar when compared to Bing, especially for technical search topics (mostly, PHP, Go, and C related searches).


Not a bad idea, but there are lots of details need to be fill in and, you know, devils is in the details.

Google's index is so large that it's physically very hard to transfer out while being updated. Bandwidth cost is non negligible outside google's data centre. In terms of data structure, i can imagine it is arranged in a way that make google search easy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: