A search engine that favors text-heavy sites and punishes modern web design

snakeboy · on Sept 16, 2021

Wow, that's awesome. Great work!

For a simple test, I searched "fall of the roman empire". In your search engine, I got wikipedia, followed by academic talks, chapters of books, and long-form blogs. All extremely useful resources.

When I search on google, I get wikipedia, followed by a listicle "8 Reasons Why Rome Fell", then the imdb page for a movie by the same name, and then two Amazon book links, which are totally useless.

foofoo4u · on Sept 16, 2021

Good comparison. Reminds me of an analogy I like to make of today's web, which is it feels like browsing through a magazine store — full of top 10s, shallow wow-factoids, and baity material. I genuinely believe terrible results like this are making society dumber.

rchaud · on Sept 16, 2021

The context matters. I'd happily read "Top 10" lists on a website if the site itself was dedicated to that one thing. "Top 10 Prog Rock albums", while a lazy, SEO-bait title, would at least be credible if it were on a music-oriented website.

But no, these stories all come from cookie-cutter "new media" blog sites, written by an anonymous content writer who's repackaged Wikipedia/Discogs info into Buzzfeed-style copy writing designed to get people to "share to Twitter/FB". No passion, no expertise. Just eyeballs at any cost.

foofoo4u · on Sept 16, 2021

This got me thinking that maybe one of the other big reasons for this is that the algorithms prioritize newer pages over older pages. This produces the problem where instead of covering a topic and refining it over time, the incentive is to repackage it over and over again.

It reminds me of an annoyance I have with the Kindle store. If I wanted to find a book on, let's say, Psychology, there is no option to find all-time respected books of the past centenary. Amazon's algorithms constantly push to recommend the latest hot book of the year. But I don't want that. A year is not enough time to have society determine if the material withstands time. I want something that has stood the test of time and is recommended by reputable institutions.

jamra · on Sept 17, 2021

This is just a guess, but I believe that they use machine learning and rank it by the clicks. I took some coursera courses and Andrew Ng sort of suggested that as their strategy.

The problem is that clickbait and low effort articles could be good enough to get the click, but low effort enough to drag society into the gutter. As time passes, the system is gamified more and more where the least effort for the most clicks is optimized.

HPsquared · on Sept 18, 2021

It sounds like the problem is, the search engine has no way to measure satisfaction after the click.

sdoering · on Sept 18, 2021

But they have. or could have. At least Google (and to a smaller extend Microsoft), if you are using Chrome/Bing have exactly that signal. If you stay on the site and scroll (taking time, reading, not skimming) all this could be a signal to evaluate if the search result met your needs.

digaozao · on Sept 18, 2021

I've heard google would guess with bounce rate. Or another way, if the user clicks on LinkedIn website A, after a few moments keeps trying other linksw/related search. It would mean it was not valuable.

dtonon · on Sept 18, 2021

They tried to insight this with the "bounce rate"

amenod · on Sept 16, 2021

> is that the algorithms prioritize newer pages over older pages.

They do? That would explain a lot - but ironically, I can't find a good source on this. Do you have one at hand?

dvogel · on Sept 16, 2021

It is pretty obvious if you search for any old topic that is also covered incessantly by the news. "royal family" is a good example. There's no way those news stories published an hour ago are listed first due to a high PageRank score (which necessarily depends on time to accumulate inbound links).

RattleyCooper · on Sept 16, 2021

It depends on the content. The flip side is looking up a programming-related question and getting results from 2012.

I think they take different things into account based on the thing being searched.

II2II · on Sept 17, 2021

Even your example would depend upon the context. There are many cases where a programming question in 2021 is identical to one from 2012, along with the answer. In those instances, would you rather a shallow answer from 2021 or an indepth answer from 2012? This is not meant to imply that older answers offer greater depth, yet a heavy bias towards recent material can produce that outcome in some circumstances.

valvar · on Sept 17, 2021

If you're using tools/languages that change rapidly (like Kotlin, in my case), syntax from a few years ago will often be outdated.

II2II · on Sept 17, 2021

Yes, yet there are programming questions that go beyond "how do I do X in language Y" or "how do I do X with library Y". The language and library specific questions are the ones where I would be less inclined to want additional depth anyhow, well, provided they aren't dependent upon some language or library specific implementation detail.

post_below · on Sept 18, 2021

There are of course a variety of factors, including the popularity of the site the page is published on. The signals related to the site are often as important as the content on the page itself. Even different parts of the same site can lend varying weight to something published in that section.

Engagement, as measured in clicks and time spent on page, plays a big part.

But you're right, to a degree, as frequently updated pages can rank higher in many areas. A newly published page has been recently updated.

A lot depends on the (algorithmically perceived) topic too. Where news is concerned, you're completely right, algos are always going to favor newer content unless your search terms specify otherwise.

PageRank, in it's original form, is long dead. Inbound link related signals are much more complex and contextual now, and other types of signals get more weight.

rchaud · on Sept 16, 2021

Your Google search results show the date on articles do they not? If people are more likely to click on "Celebrity Net Worth (2021)" than "Celebrity Net Worth (2012)", then the algo will update to favour those results, because people are clicking on them.

The only definitive source on this would be the gatekeeper itself. But Google never says anything explicitly, because they don't want people gaming search rankings. Even though it happens anyway.

echelon · on Sept 16, 2021

The new evergreen is refreshed sludge for bottom dollar. College kids stealing Reddit comments or moving around paragraphs from old articles. Or linking to linked blogs that link elsewhere.

It's all stamped with Google Ads, of course, and then Google ranks these pages high enough to rake in eyeballs and ad dollars.

Also there's the fact that each year, the average webpage picks up two more video elements / ad players, one or two more ad overlays, a cookie banner, and half a dozen banner/interstitials. It's 3-5% content spread thinly over an ad engine.

The Google web is about squeezing ads down your throat.

andrepd · on Sept 17, 2021

Really makes you wonder: you play whack a mole and tackle the symptoms with initiatives like this search engine. But the root of that problem and many many others is the same: advertising. Why don't we try to tackle that?

HPsquared · on Sept 18, 2021

Perhaps a subscription-based search engine would avoid these incentives.

QuibbleQuota · on Sept 18, 2021

Let’s go a few levels deeper and question our consumption culture

echelon · on Sept 17, 2021

Exactly.

The only reason people make content they aren't passionate about is advertising.

hodgesrm · on Sept 17, 2021

> This got me thinking that maybe one of the other big reasons for this is that the algorithms prioritize newer pages over older pages.

Actually that's not always the case. We publish a lot of blog content and it's really hard to publish new content that replaces old articles. We still see articles from 2017 coming up as more popular than newer, better treatments of the same subject. If somebody knows the SEO magic to get around this I'm all ears.

WalterBright · on Sept 17, 2021

Amazon search clearly does not prioritize exact title matches.

1vuio0pswjnm7 · on Sept 17, 2021

Its the "healthy web" Mozilla^1 and Google keep telling their blog audiences about. :)

1 Accept quid pro quo to send all queries to Google by default

If what these companies were telling their readers was true, i.e., that advertising is "essential" for the web to survive, then how are the sites returned by this search engine for text-heavy websites (that are not discoverable through Google, the default search engine for Chrome, Firefox, etc.) able to remain online. Advertising is essential for the "tech" company middleman business to survive.

Dah00n · on Sept 17, 2021

I'm not sure I agree with your example. It seems to me it is the exact same as a "Top ten drinks to drink on a rainy day" list. There's simply too many good albums and opinions differ, so a top ten would -just like the drinks- end up being a list of the most popular ones with maybe one the author picks to stir some controversy or discussion. In my opinion the world would be a smarter place if Google ranked all such sites low. Then we might at least get fluff like "Top ten prog rock albums if you love X, hate Y and listen to Z when no one is around" instead.

dageshi · on Sept 17, 2021

Google won't rank them low because they actually do serve an important purpose. They're there for people who don't really know what they want specifically, they're looking for an overview. A top 10 gives a digestible overview on some topic, which helps the searcher narrow down what they really want.

A "Top 10 albums of all time" post is actually better off going through 10 genres of popular music from the past 50 years and picking the top album (plus mentioning some other top albums in the genre) for each one.

That gives the user the overview they're probably looking for, whether those are the top 10 albums of all time or not. It's a case of what the user searched for vs what they actually really want.

bythckr · on Sept 17, 2021

"The best minds of my generation are thinking about how to make people click ads"

redwall_hp · on Sept 18, 2021

So did Tim Berners-Lee. He was vehemently opposed to people shoehorning images into the WWW, because he didn't want it to turn into the equivalent of magazines. Which, I believe, he shared the opinion of them making society dumber.

Appropriately enough, I couldn't find a good quote to verify that since Google is only giving me newspapers and magazines talking about Sir Tim in the context of current events. I do believe it's in his book "Weaving the Web" though.

wwweston · on Sept 16, 2021

It's also possible that it's the other way around: a certain "common denominator" + algorithms that chase broad engagement = mediocre results.

The real trick would be some kind of engine that can aim just above where the user's at.

anonuser123456 · on Sept 17, 2021

> I genuinely believe terrible results like this are making society dumber.

You have to e causality reversed. Google results reflect the fact that society is dumb.

jimbokun · on Sept 17, 2021

Google results reflect the fact that educating and informing people has low profit margins.

crucialfelix · on Sept 19, 2021

Or the distribution of people now online better reflects stupidity in the general population.

bluGill · on Sept 16, 2021

what I really want is a true AI to search through all that and figure out the useful truth. I don't know how to do this (and of course whoever writes the AI needs to be unbiased...)

idiotsecant · on Sept 17, 2021

>whoever writes the AI needs to be unbiased...)

I'm not sure the idea of a sentient being not having a bias is meaningful. Reality, once you get past the trivial bits, is subjective.

jimbokun · on Sept 17, 2021

Isn't there a fundamental ML postulate that learning without bias is impossible?

Maybe not the same kind of bias we think of in terms of politics and such, but I wonder if there's a connection.

bluGill · on Sept 17, 2021

I didn't say the AI should be unbiased, just whoever writes it.

I want an AI that is biased to the truth when there is an objective one, and my tastes otherwise. (that is when asked to find a good book it should give me fantasy even though romance is the most popular genre and so will have better reviews)

hnick · on Sept 17, 2021

I think that is the goal, it's just what we currently have is an AI that's like a naive child who is easily tricked and distracted by clickbait.

Dah00n · on Sept 17, 2021

>an AI that's <snip> easily tricked and distracted by clickbait.

So, AIs are actually on par with most adults now? (Sorry)

SPBS · on Sept 16, 2021

Cool, it appears that the trend towards JS may be causing self-selection -- if a page has a high amount of JS, it is highly unlikely to contain anything of value.

fullsend · on Sept 16, 2021

True. Unfortunately many large corporate websites through which you pay bills, order tickets, etc. are becoming infested with JS widgets and bulky, slow interfaces. These are hard to avoid.

artificial · on Sept 17, 2021

Conversely no software to install. Browser as a platform. Don’t have to boot to Windows to pay your bills with activex for example

foxfluff · on Sept 17, 2021

The mostly JS-less web was fine, fast, and reliable 20 years ago and I never had ActiveX.

I hear stories about Flash and ActiveX but I literally never needed these to shop or pay bills online. Payments also didn't require scripts from a dozen domains and four redirects..

artificial · on Sept 18, 2021

Yup, and taking payments online was awful but privacy was more of a thing. In South Korea ActiveX was required until recently. https://www.theregister.com/2020/12/10/south_korea_activex_c...

TeMPOraL · on Sept 17, 2021

The platform isn't the problem. The problem is with the amount of code that does something other than letting you "pay bills, order tickets, etc.".

Dah00n · on Sept 17, 2021

Huh. A weighted algorithm, somewhere between Google and the one linked, where you could subtract from sites by amount of JavaScript might be interesting.

dv_dt · on Sept 16, 2021

If one could create an metric of ad to content ratio from the js used, I would guess that would be a nice differentiator too.

hinkley · on Sept 17, 2021

Browsers should be cherry picking the most compelling things that people accomplish with complex code and supporting them as a native feature. Maybe the Browser Wars aren’t keeping up anymore.

lugged · on Sept 16, 2021

Was that ever in doubt?

klntsky · on Sept 16, 2021

However, when searching for "haskell type inference algorithm" I get completely useless results.

burkaman · on Sept 16, 2021

That query is too long apparently. But if you shorten to "haskell type inference", I think it delivers on its promise:

> If you are looking for fact, this is almost certainly the wrong tool. If you are looking for serendipity, you're on the right track. When was the last time you just stumbled onto something interesting, by the way?

marginalia_nu · on Sept 16, 2021

The search engine doesn't do any type of re-ordering or synonym stuff, it only tires to construct different N-grams from the search query.

So if you for example compare "SDL tutorial" with "SDL tutorials". On google you'd get the same stuff, this search engine, for better or worse doesn't.

This is a design decision, for now anyway, mostly because I'm incredibly annoyed when algorithms are second-guessing me. On the other hand, it does mean you sometimes have to try different searches to get relevant results.

leephillips · on Sept 16, 2021

I like this design decision. It pays you back for choosing your search terms carefully.

mananaysiempre · on Sept 17, 2021

I’m not against a stemmer, actually, just against the aggressive concordances (?) that Google now employs, like when it shows me X in Banach spaces (the classical, textbook case) when I’m specifically searching for X in Fréchet spaces (the generalization I want to find but am not sure exists); of course Banach spaces and Fréchet spaces are almost exclusively encountered in the same context, but it doesn’t mean that one is a popular typo for the other! (The relative rarity of both of these in the corpus probably doesn’t help. The farcical case is BRST, or Becchi-Rouet-Stora-Tyutin, in physics, as it is literally a single key away from “best” and thus almost impossible to search for.)

On the other hand, Google’s unawareness of (extensive and ubiquitous) Russian noun morphology is essentially what allowed Yandex to exist: both 2011 Yandex and 2021 Google are much more helpful for Russian than 2011 Google. I suspect (but have not checked) that the engine under discussion is utterly unusable for it. English (along with other Germanic and Romance languages to a lesser extent) is quite unusual in being meaningfully searchable without any understanding of morphology, globally speaking.

medstrom · on Sept 17, 2021

I thought you could fix that by enclosing "BRST" in quotes, but apparently not. DuckDuckGo (which uses Google) returns a couple of results that do contain "BRST" in a medical context, but most results don't contain this string at all. What's going on?

mananaysiempre · on Sept 17, 2021

I’m not certain what DDG actually uses (wasn’t it Bing?), but in my experience from the last couple of months it ignores quotes substantially more eagerly than Google does. For this particular term, a little bit of domain knowledge helps: even without quotes, brst becchi, brst formalism, brst quantization or perhaps bv brst will get you reasonable results. (I could swear Google corrected brst quantization to best quantization a year ago, but apparently not anymore.) Searching for stuff in the context of BRST is still somewhat unpleasant, though.

I... don’t think anything particularly surprising is happening here, except for quotes being apparently ignored? I’ve had it explained to me that a rare word is essentially indistinguishable from a popular misspelling by NLP techniques as they currently exist, except by feeding the machine a massive dictionary (and perhaps not even then). BRST is a thing that you essentially can’t even define satisfactorily without at the very least four years of university-level physics (going by the conventional broad approach—the most direct possible road can of course be shorter if not necessarily more illuminating). “Best” is a very popular word both generally and in searches, and the R key is next to E on a Latin keyboard. If you are a perfect probabilistic reasoner with only these facts for context (and especially if you ignore case), I can very well believe that your best possible course of action is to assume a typo.

How to permit overriding that decision (and indeed how to recognize you’ve actually made one worth worrying about without massive human input—e.g. Russian adjectives can have more than 20 distinct forms, can be made up on the spot by following productive word-formation processes, and you don’t want to learn all of the world’s languages!) is simply a very difficult problem for what is probably a marginal benefit in the grand scheme of things.

I just dislike hitting these margins so much.

leephillips · on Sept 17, 2021

It would not be a difficult problem if they allowed the " " operator to work as they claim it does, or revive the + operator.

mananaysiempre · on Sept 17, 2021

In English, maybe; in Russian, I frequently find myself reaching for the nonexistent “morphology but not synonyms” operator (as the same noun phrase can take a different form depending on whether it is the subject or the object of a verb, or even on which verb it is the object of); even German should have the same problem AFAIU, if a bit milder. I don’t dare think about how speakers of agglunative languages (Finnish, Turkish, Malayalam) suffer.

(DDG docs do say it supports +... and even +"...", but I can’t seem to get them to do what I want.)

leephillips · on Sept 17, 2021

Ah, OK. I don’t know anything about Russian. This is a hard problem. I think the solution is something like what you suggest: more operators allowing different transformations. Even in English, I would like a "you may pluralize but nothing else" operator.

mananaysiempre · on Sept 18, 2021

Well it’s not that alien, it (along with the other Eastern Slavic languages, Ukrainian and Belarusian) is mostly a run-of-the-mill European language (unlike Finnish, Estonian or Hungarian) except it didn’t lose the Indo-European noun case system like most but instead developed even more cases. That is, where English or French would differentiate the roles of different arguments of a verb by prepositions or implicitly by position, Russian (like German and Latin) has a special axis of noun forms called “case” which it uses for that (and also prepositions, which now require a certain case as well—a noun form can’t not have a case like it can’t not have a number).

There are six of them (nominal [subject], genitive [belonging, part, absence, “of”], dative [indirect object, recipient, “to”], accusative [direct object], instrumental [device, means, “by”], prepositional [what the hell even is this]), so you have (cases) × (numbers) = 6 × 2 = 12 noun forms, and adjectives agree in number and gender with their noun, but (unlike Romance languages) plurals don’t have gender, so you have (cases) × (numbers and genders) = 6 × (3 + 1) = 24 adjective forms.

None of this would be particularly problematic, except these forms work like French or Spanish verbs: they are synthetic (case, number and gender are all a single fused ending, not orthogonal ones) and highly convoluted with a lot of irregularities. And nouns and adjectives are usually more important for a web search than verbs.

yesenadam · on Sept 19, 2021

> BRST, or Becchi-Rouet-Stora-Tyutin is literally a single key away from “best” and thus almost impossible to search for.

Hmm I seem to be getting only relevant results, no "best", not sure what you mean. Are you not doing verbatim search?

https://www.google.com/search?q=brst&tbs=li:1

dahauns · on Sept 17, 2021

English is more the outlier in regard to Germanic languages, try German or Finnish, with their wonderful compounds :)

https://e.humanities.uva.nl/publications/2004/kamp_lang04.pd...

mananaysiempre · on Sept 18, 2021

Well yeah, English is kind of weird, but Finnish isn’t a Germanic language at all? It’s not even Indo-European, so even Hindi is ostensibly closer to English than Finnish. I understand Standard German (along with Icelandic) is itself a bit atypical in that it hasn’t lost its cases when most other Germanic languages did.

Re compounds, I expected they would be more or less easy to deal with by relatively dumb splitting, similar to greedy solutions to the “no spaces” problem of Chinese and Japanese, and your link seems to bear that out. But yeah, cheers to more language-specific stuff in your indexing. /s

dahauns · on Sept 20, 2021

Gaaah, brain fart - you're right, of course, dunno why I included it.

ford_o · on Sept 16, 2021

Maybe list the synonyms under the query, so its easier to try different formulations.

akavel · on Sept 17, 2021

Oh this sounds like it could be a really cool idea! This way it could also be subtly teaching users that the engine doesn't do automatic synonyms translation so it's worth experimenting; also kinda like giving the synonyms feature while still keeping user in full control.

OneLeggedCat · on Sept 16, 2021

Don't change it. It's good this way.

Razengan · on Sept 16, 2021

It could simply become an option.

klntsky · on Sept 16, 2021

Since it does not use synonyms, it looks like it is unable to answer "how's that thing called"-queries.

LanceH · on Sept 16, 2021

It would be nice if we could pipe search engines.

BenoitP · on Sept 16, 2021

Definitely; We could create a meta search engine that queries them all, in desktop application format.

Let's name it after a famous old scientist, and maybe add the year to prove it's modern: Galileo 2021.

PaulHoule · on Sept 16, 2021

Meta search engines leave a bad taste in everyone's mouth because they've always failed. Here is why

https://en.wikipedia.org/wiki/Arrow%27s_impossibility_theore...

You can't combine a few different ranked lists and expect to get results better than any of the original ranked lists.

robrenaud · on Sept 16, 2021

> You can't combine a few different ranked lists and expect to get results better than any of the original ranked lists.

I am skeptical of this application of the theorem. Here is my proposal:

Take the top 10 Google and Bing results. If the top result from Bing is in the top 10 from Google, display Google results. If the top result from Bing is not in the top 10 from Google, place it at the 10th position. You'd have an algorithm that ties with Google, say 98% of the time, beats it say, 1.2% of the time, and loses .8% of the time.

vikingerik · on Sept 16, 2021

Right. Arrow's theorem just says it's impossible to do it in all cases. It's still quite possible to get an improvement in a large proportion of cases, as you're proposing.

PaulHoule · on Sept 16, 2021

I've had jobs tuning up the relevance of search engines with methods like

https://ccc.inaoep.mx/~villasen/bib/AN%20OVERVIEW%20OF%20EVA...

and the first conclusion is "something that you think will improve relevance probably won't"; the TREC conference went for about five years before making the first real discovery

https://en.wikipedia.org/wiki/Okapi_BM25

It's true that Arrow's Theorem doesn't strictly apply, but thinking about it makes it clear that the aggregation problem is ill-defined and tricky. (e.g. note also that a ranking function for full text search might have a range of 0-1 but is not a meaningful number, like a probability estimate that a document is relevant, but it just means that a result with a higher score is likely to be more relevant than one with a lower score.)

Another way to think about it is that for any given feature architecture (say "bag of words") there is an (unknown) ideal ranking function.

You might think that a real ranking function is the ideal ranking function plus an error and that averaging several ranking functions would keep the contribution of the ideal ranking function and the errors would average out, but actually the errors are correlated.

In the case of BM25 for instance, it turns out you have to carefully tune between the biases of "long documents get more hits because they have more words in them" and "short documents rank higher because the document vectors are spiky like the query the vectors". Until BM25 there wasn't a function that could be tuned up properly and just averaging several bad functions doesn't solve the real problem.

random314 · on Sept 16, 2021

Arrows theorem simply doesn't apply here. We don't need our personalized search results to satisfy the majority.

PaulHoule · on Sept 16, 2021

But in both cases you face the problem of aggregating preferences of many into one. In one case you are combining personal preferences in the other case aggregating ‘preferences’ expressed by search engines.

random314 · on Sept 17, 2021

But search engines aren't voting to maximize the chances that their preferred candidate shows up on top. The mixed ranker has no requirement to satisfy Arrows integrity constraints. It has to satisfy the end user, which is quite possible in theory.

Conditions the mixed ranker doesn't have to satisfy "ranking while also meeting a specified set of criteria: unrestricted domain, non-dictatorship, Pareto efficiency, and independence of irrelevant alternatives"

PaulHoule · on Sept 17, 2021

Sure, but the problem that conventional IR ranking functions are not meaningful other than by ordering leads you to the dismal world of political economy where you can't aggregate people's utility functions. (Thus you can't say anything about inequality, only about Pareto efficiency)

Hypothetically you could treat these functions as meaningful but when you try you find that they aren't very meaningful.

For instance IBM Watson aggregated multiple search sources by converting all the relevance scores to "the probability that this result is relevant".

A conventional search engine will do horribly in that respect, you can fit a logit curve to make a probability estimator and you might get p=0.7 at the most and very rarely get that, in fact, you rarely get p>0.5.

If you are combining search results from search engines that use similar approaches you know those p's are not independent so you can't take a large numbers of p=0.7's and turn that into a higher p.

If you are using search engines that use radically different matching strategies (say they return only p=0.99 results with low recall) the Watson approach works, but you need a big team to develop a long tail of matching strategies.

If you had a good p-estimator for search you could do all sorts of things that normal search engines do poorly, such as "get an email when a p>0.5 document is added to the collection."

For now alerting features are either absent or useless and most people have no idea why.

gnramires · on Sept 16, 2021

That's an invalid application of this theorem. (It doesn't necessarily hold)

Suppose there's an unambiguous ranked preference by all people among a set (webpages, ranking). Suppose one search engine ranks correctly the top 5 results and incorrectly the next 5 results, while another ranks incorrectly the top 5 and correctly the next 5.

What can happen is that some there may be no universally preferred search engine (likely). In practice, as another commenter noted, you can also have most users prefer more a certain combination of results (that's not difficult to imagine, for example by combining top independent results from different engines for example).

overkalix · on Sept 16, 2021

... is this Galileo 2021 a reference that I am not understanding?

BenoitP · on Sept 16, 2021

Yup, but so far no one got it.

There was such an app in the early 2000's, before Google went mainstream, and Altavista-like engines were not good: Copernic 2000.

I guess I'm officially old now.

genewitch · on Sept 16, 2021

For years I wanted to try Copernic Summarizer. It seemed like it actually worked. Then software that did summaries disappeared, maybe? And about 5 years ago bots on Reddit were doing summaries of news stories (and then links in comments).

This is a pattern I see over and over again, some research group or academics show that something can be done (summaries that make sense and are true summaries, evolutionary algorithm FPGA programming, real time gaze prediction, etc) and there's a few published code repos and a bit of news, then 'poof' - no where to be seen for 15 years or more.

tomerv · on Sept 16, 2021

FWIW, I got the reference. Maybe I'm old too?

squeaky-clean · on Sept 17, 2021

I was always a dogpile user :p

LightG · on Sept 17, 2021

Hotbot!

collinmanderson · on Sept 17, 2021

Brave browser currently has "Google fallback" which sometimes mixes in Google search results with Brave's own search engine.

https://search.brave.com/help/google-fallback

gmueckl · on Sept 16, 2021

Not an app, but probably comes quite close in all other respects: https://metager.org

Torwald · on Sept 16, 2021

I need that with a simpler interface, so I call it after a famous dedective: Sherlock.

artificial · on Sept 16, 2021

a magic pop sound is faintly audible as a new side project is appended to several lists Excellent, thank you!

mkr-hn · on Sept 16, 2021

Likely trademark collision with this: https://www.galileo.usg.edu/

hdjjhhvvhga · on Sept 16, 2021

As long as few people use it, it will be great. Rest assured that the moment it becomes popular, the people who want to game it will appear.

jandrese · on Sept 16, 2021

This sort of optimization is why simple recipes are typically found at the end of a rambling pointless blog post now.

Still, the best way to break SEO is to have actual competition in the search space. As long as SEO remains focused on Google there is an opportunity for these companies to thrive by evading SEO braindamage.

WorldMaker · on Sept 16, 2021

That sort of recipe blog hasn't happened just for SEO. It's also a bit of a "two audiences" problem: if you are coming to that food blogger from a search you certainly would prefer the recipe first and then maybe any commentary on it below if the recipe looks good. If you are a regular reader of that food blogger you are probably invested in the stories up top and that parasocial connection and the recipes themselves are sometimes incidental to why you are a regular reader.

You see some of that "two readers" divide sometimes even in classic cookbooks, where "celebrity" chefs of the day might spend much of a cookbook on a long rambling memoir. Admittedly such books were generally well indexed and had table of contents to jump right to the recipes or particular recipes, but the concept of "long personal ramble of what these recipes mean to me" is an old one in cookbooks too.

inanutshellus · on Sept 16, 2021

I see your point, but argue you've misidentified the two audiences.

One audience matches your description and is the invested reader. They want that blogger's story telling. they might make the recipe, but they're a dedicated reader.

The other audience is not the recipe-searcher, but instead Google. Food bloggers know that recipe-searchers are there to drop in, get an ingredient list, and move on. They won't even remember the blog's name. So the site isn't optimized for them. It's optimized for Google.

"Slow the parasitic recipe-searcher down. They're leeches, here for a freebie. Well they'll pay me in Google Rank time blocks."

hdjjhhvvhga · on Sept 18, 2021

> Food bloggers know that recipe-searchers are there to drop in, get an ingredient list, and move on.

This is not entirely true, though. If a randomly found recipe turns out particularly good, I'll bookmark the site and try out other dishes. It's a very practical method to find particularly good* recipe collections.

*) In this case "good" means what you need - not just subjectively "tasty", but e.g. low cost, quick to prepare, low calorie or in line with a particular diet and so on.

inanutshellus · on Sept 20, 2021

I'm aware of zero human-behavior-truisms that are "entirely true".

giantrobot · on Sept 16, 2021

> If you are a regular reader of that food blogger

I think this assumes facts not in evidence. It certainly seems like an overwhelming number of "blogs" are not actual blogs but SEO content farms. There's no regular readers of such things because there's no actual authors, just someone that took a job on Fivver to spew out some SEO garbage. Old content gets reposted almost verbatim because new results better according to Google.

The only reason these "blogs" exist is to show ads and hopefully get someone's e-mail (and implied consent) for a marke....newsletter.

WorldMaker · on Sept 16, 2021

I know at least a few that I commonly see in top search results that I have friends that read them like personalized soap operas where most of the drama revolves around food and family and serving food to family.

It's at least half the business models of Food Network shows: aspirational kitchens and the people that live in them and also sometimes here's their recipes. (The other half being competitions, obviously.) I've got friends that could deliver entire doctoral theses on the Bon Appetit Test Kitchen (and its many YouTube shows and blogs) and the huge soap operatic drama of 2020's events where the entire brand milkshake ducked itself; falling into people's hearts as "feel good" entertainment early in 2020/the pandemic and then exploding very dramatically with revelations and betrayals that Fall.

Which isn't to say that there aren't garbage SEO farms out there in the food blogging space as well, but a lot of the big ones people commonly complain about seeing in google's results do have regular fans/audiences. (ETA: And many of the smaller blogs want to have regular fans/audiences. It's an active influencer/"content creator" space with relatively low barrier to entry that people love. Everyone's family loves food, it's a part of the human condition.)

run-types · on Sept 16, 2021

I've basically never been taken to a recipe without a rambling preamble from Google. While food blogs may serve two audiences, a long introduction seems to be a requirement to appear in the top Google search results.

WorldMaker · on Sept 16, 2021

Personally, I think that has a lot more to do with the fact that Google killed the Recipe Databases. There did used to be a few startups that tried to be Recipe Aggregators with advertising based business models, that would show recipes and then link to source blogs and/or cookbooks, and in the brief period where they existed Google scraped them entirely and showed entire recipes on search results and ate their ad revenue out from under them.

dspillett · on Sept 17, 2021

Such databases would get battered by demands to remove content these days, if not already back then. No one want a database listing their stuff for ad revenue like that because many wouldn't follow the links so see their adverts or be subject to their tracking.

A couple of browser add-ons specifically geared around trimming recipe pages down have been taken down due to similar complaints.

tomrod · on Sept 16, 2021

That is a really bad thing by Google. Their core business is not recipes.

kwertyoowiyop · on Sept 16, 2021

Their core business is making money from other people’s content, no matter what it is.

WorldMaker · on Sept 16, 2021

Their core business is advertising and they have always been in a direct conflict-of-interest by competing with content sites for ad revenue buys.

xtracto · on Sept 16, 2021

That's why I use Saffron [1], it magically converts those sites into a page in my recipe book. I found it when the developer commented here in HN. Also, a lot of cooking website have started to add a link with "jump to recipe" functionality allowing you to skip all the crap.

[1] https://www.mysaffronapp.com/

Funes- · on Sept 16, 2021

There's also https://based.cooking.

ggggtez · on Sept 16, 2021

I've noticed this pattern start to pop up elsewhere. I've started to train my skimming skills, skipping a paragraph or two at a time to get past the fluff.

Like an article about some current event will undoubtedly begin with "when I was traveling ten years ago...".

zerd · on Sept 16, 2021

It's also because that's a way of trying to copyright protect recipes, which are normally not copyright protected.

> “Mere listings of ingredients as in recipes, formulas, compounds, or prescriptions are not subject to copyright protection. However, when a recipe or formula is accompanied by substantial literary expression in the form of an explanation or directions, or when there is a combination of recipes, as in a cookbook, there may be a basis for copyright protection.”

JohnFen · on Sept 17, 2021

But that copyright protection only extends to the literary expression. The recipe itself is still not covered by copyright, even if accompanied by an essay.

SerLava · on Sept 16, 2021

That's not really for SEO, which favors readily accessible information.

That's ads. When mobile users have to scroll past 10 add, theyll click on some of them and make the blog money.

Aeolun · on Sept 16, 2021

Searching for ‘chocolate’ on this search engine turned up a surprisingly large amount of chocolate based recipes.

YeGoblynQueenne · on Sept 16, 2021

>> This sort of optimization is why simple recipes are typically found at the end of a rambling pointless blog post now.

I continue to be curious about this kind of complaint. If all you want is a recipe list, without any of the fluff, why would you click on a link to a blog, rather than on a link to a recipe aggregator?

Foodie blogs exist specifically for the people who want a foodie discussion and not just an ingredients' list.

Is it because blogs tend to have better recipes overall? In that case, isn't there a bit of entitlement involved in asking that the author self-sacrificingly provides only the information that you want, without taking care of their own needs and wants, also?

joegahona · on Sept 16, 2021

I think the complaint is that those blogs rank higher than nuts-and-bolts recipes now. It wasn't that way a few years ago. Yes, scrolling down the results to Food Network or Martha Stewart or whatever is possible, as is going directly to those sites and using their site search, but it's noticeable and annoying.

YeGoblynQueenne · on Sept 17, 2021

Not my experience. For a very quick test, I searched DDG for "omelette recipe, "carbonara recipe" and "peking duck recipe" (just to spice it up a bit) and all my top results are aggregators. Even "avgolemeono recipe" (which I'd think is very specialised) is aggregators on top.

To be honest, I don't follow recipes when I cook unless it's a dish I've never had before. At that point what I want is to understand the point of the dish. A list of ingredients and preparation instructions don't tell me what it's supposed to taste and smell like. The foodie blogs at least try to create a certain... feeling of place, I suppose, some kind of impression that guides you when you cook. I wouldn't say it always works but I appreciate the effort.

My real complaint with recipe writers is that they know how to cook one or two dishes well and they crib the rest off each other so even with all the information they provide, you still can't reliably cook a good meal from a recipe unless you've had the dish before. But that's my personal opinion.

jandrese · on Sept 16, 2021

Because when you search for a recipe you get the link to the blog, not the aggregator.

Loughla · on Sept 16, 2021

It's the same thing that people always complain about. This thing is not in a format that I like, so it must be not what anyone likes.

If you want JUST recipes, pay money instead of just randomly googling around. America's test kitchen has a billion, vetted, and really good recipes. That solves that problem.

marginalia_nu · on Sept 17, 2021

Specialization mostly a problem in monocultures.

If you almost only plant wheat, you are going to end up with one hell of a pest problem.

If you almost only have Windows XP, you are going to have one hell of a virus problem.

If you almost only have SearchRank-style search engines (or just the one), you are going to have one hell of a content spam problem.

Even though they have some pretty dodgy incentives, I don't think google suffers quality problems because they are evil, I think ultimately they suffer because they're so dominant. Whatever they do, the spammers adapt almost instantly.

A diverse ecosystem on the other hand limits the viability of specialization by its very nature. If one actor is attacked, it shrinks and that reduces the opportunity for attacking it.

Nextgrid · on Sept 16, 2021

I don't think the existing media-heavy websites are gaming Google to rank higher. It's that Google itself prefers media heavy content; they don't have to "game" anything.

I also think a search engine like this would be quite hard to game. An ML-based classifier trained on thousands of text-heavy and media-heavy screenshots should be quite robust and I think would be very hard to evade, so the "game" will become more about how identify the crawler so you can serve it a high-ranking page while serving crap to the real users, and it seems fairly easily to defeat if the search engine does a second pass using residential proxies and standard browser user agents to detect this behavior (it could also threaten huge penalties like the entire domain being banned for a month to even deter attempts at this).

fragmede · on Sept 16, 2021

With the advances in text generation by machines that looks, but isn't quite accurate (aka GPT-3), seems like it would be easily gamed (given access to GPT-3). Even without GPT-3, if the content being prioritized is mere text, I'm sure that for a pile of money, I could generate something that looks like Wikipedia, in the sense that it's a giant pile of mostly text, but it would make zero sense to a human reader. (Building an SEO farm to boost ranking of not-wikpedia is left as an exercise for the reader.)

the_other · on Sept 16, 2021

If there were a wider variety of popular search engines, with different ranking criteria, would sites begin to move away from gaming the system? Surely it would be too hard to game more than one search engine at a time?

Nasrudith · on Sept 16, 2021

It would be a matter of numbers anyway about which they optimize for. A/B testing is already in place and doesn't care about where it comes from, just which one does better.

phendrenad2 · on Sept 17, 2021

There should be some perfect balance where this search engine is N% as popular as Google, where Google soaks up all of the gamifiers, but this search engine is still popular enough to derive revenue and do ML and other search-engine-useful stuff.

new_guy · on Sept 16, 2021

> the people who want to game it will appear.

So just add human review to the mix, if a site is obviously trying to game the system (listicles, seo spam etc) just drop and ban them from the search index.

hdjjhhvvhga · on Sept 16, 2021

Congratulations, you've just invented negative SEO.

rasz · on Sept 17, 2021

>followed by a listicle "8 Reasons Why Rome Fell"

but arent you curious about the 7th reason? it will surprise you!

coldtea · on Sept 17, 2021

You wont believe how Claudius looks today!

kahrl · on Sept 17, 2021

Doctors HATE him!!!

titzer · on Sept 17, 2021

Search engines whose revenue is based on advertising will ultimately be tuned to steer you to the ad foodchain. All the incentives are aligned towards and all the metrics ultimately in service of, profit for advertisers. Not in the 99% of people who can convinced to consume something by ads? Welp, screw you.

phendrenad2 · on Sept 17, 2021

Search engines should be something you pay for. Surely search engine powerusers can afford to pay for such a service. If Google makes $1 per user per month or something, that's not too high a bar to get over.

titzer · on Sept 17, 2021

Search engines should be like libraries. At least some tiny sliver of the billions we spend on education and research should go to, you know, actually organizing the world's information and making it universally available.

illys · on Sept 20, 2021

I see another issue here: companies like Google prioritize information to 1) keep their users and 2) maximize their profit.

If you move data organization to another type of organization (non-profit, state, universities - private or public), then the question of data prioritization becomes highly political. What should be exposed? What should not? What to put first? ...

It is already, but to a smaller extend since money-making companies have little interest in data meaning, and high interest in the commercial value of their users.

wizzwizz4 · on Sept 17, 2021

In which case, consider paying for something like Infinity: https://infinitysearch.co/

varjag · on Sept 18, 2021

The theoretical cap for this, if you include every human being on planet Earth, is 7 billion/month. This translates into $84 billion annual revenue.

Google's revenue last year was 146 billion, and it operates not anywhere near the theoretical maximum. Most of that revenue is advertisement.

Nition · on Sept 16, 2021

The Wikipedia link at the top is always given. It would maybe be good to make it a little clearer that it's not one of the true results.

Ajef · on Sept 17, 2021

I think this is just because of terms you have searched. In my test-searches Wikipedia has not come up once in first position (i think the highest was 3rd in the list).

Here's what I've tried with a few variations: golang generics proposal, machine learning transformer, covid hospitalization germany

[edit] formatting

Nition · on Sept 17, 2021

I think maybe it's a special insert at the top, but only if a Wikipedia page is found that matches you search term? I'm not sure now though.

eterevsky · on Sept 16, 2021

Imagine if you were looking for the movie.

MisterTea · on Sept 16, 2021

The you'd use a different search engine. Why does everything have to be a Swiss Army knife?

zozbot234 · on Sept 16, 2021

Or you could just search for 'rome movie'. Though for more complex disambiguation you would need to resort to, e.g. schema.org descriptions (which are supported by most search engines, and the foundation for most "smart" search result snippets).

eterevsky · on Sept 17, 2021

That's a fair point. This engine would be useful if you need grep over internet (by without regexes), i.e. when you want to find the exact phrases. But that's a relatively narrow use case.

lucideer · on Sept 16, 2021

I tend to prefer Wikipedia for movies. The exception is actor headshots if I'm trying to identify someone, which Wikipedia lacks for licensing reasons, but otherwise Wikipedia tends to be better than IMDB for most needs. Wikipedia has an IMDB link on every article anyway.

Another need I guess might be reviews, for which RT or MC are better than IMDB: not sure if either of those two will fare better than IMDB in this search engine but again Wiki has links out (in addition to good reception summaries)

mountainboy · on Sept 16, 2021

For me, imdb was much better when they had user comments/discussion.

I never even posted on it myself, but browsing the discussions one could learn all sorts of trivia, inside info, speculation, etc about each movie.

Since they (inexplicably) killed that feature, I rarely even visit anymore. Your right, for many purposes wikipedia is better, especially for TV series episode lists with summaries.

ncphil · on Sept 16, 2021

IMDB management thought it was their brilliant editorial work that drew people to their site. Morons. It was the comments all along. Of course they also believed they could create gravity-free zones by sheer force of executive will (and maybe still do).

_blu · on Sept 17, 2021

Especially for old and lesser known movies, the discussion board for the movie was a brilliant addition that could give the movie an extra dimension. Context is very important in order to understand, and ulitmately enjoy something.

I think they removed it in part because new movies, like star wars and superhero movies, had alot of negative activity.

sellyme · on Sept 17, 2021

I find IMDb to be more convenient than RT/MC/Wikipedia for finding release dates of movies - nearly every other website lists only the American release date, maybe one or two others if the movie was disproportionately popular in certain regions.

neltnerb · on Sept 16, 2021

Imagine including the search term "movie".

yreg · on Sept 16, 2021

That doesn't do anything useful.

shuntress · on Sept 16, 2021

?q=imdb.com:fall of the roman empire

jazzyjackson · on Sept 16, 2021

!imdb

purplefruit · on Sept 17, 2021

Wow I used "personality test" and actually got useful articles about personality theory. I'll actually use this!

HPsquared · on Sept 18, 2021

I think it's a case where systems diversity can be an advantage. Much like how most malware was historically written for Windows and could be avoided by using Linux, the low-quality search engine bait is created for Google and can be avoided by using a different style of search engine.

psadri · on Sept 17, 2021

Interesting choice of search topic. Are you trying to make an additional point?

1vuio0pswjnm7 · on Sept 19, 2021

No one mentioned the "bonus" audio in the page source: https://www.youtube.com/watch?v=7fCifJR6LAY

marginalia_nu · on Sept 19, 2021

Siira · on Sept 17, 2021

I tried some queries for Harry Potter fanfictions, and the results were pretty much completely unrelated. There weren’t that many results, either.

marginalia_nu · on Sept 18, 2021

I'm curious what you searched for.

https://search.marginalia.nu/search?query=harry+potter+fanfi...

This seems to return a pretty decent number of sites relating to that (as well as some sites not relating to that).

The search engine isn't always great at knowing what a page is about, unfortunately.

This seemed to return mostly relevant results

https://search.marginalia.nu/search?query=%22harry+potter%22...

Siira · on Sept 19, 2021

Yes, shorter queries return more relevant results. I think this was the first query that came to my mind:

https://search.marginalia.nu/search?query=Best+%22harry+pott...

marginalia_nu · on Sept 20, 2021

Yeah, that's just not a type of query my search engine is particularly good at. It's pretty dumb, and just tries to match as much of the webpage against the query as it can.

This used to be how all search engines worked, but I guess people have been taught by google that they should ask questions now, instead of search for terms.

I wonder how I can guide people to make more suitable queries. Maybe I should just make it look less like google.

hn_throwaway_99 · on Sept 16, 2021

I had the exact opposite experience. I searched the site for "java", got a Wikipedia link first (for the island, not the programming language), and the 2nd result was to a random JEP page, and all the rest of the results were random tidbits about Java (e.g. "XZ compression algorithm in Java). Didn't get any high level results pointing to an overview of the language, getting started guides, etc.

withinboredom · on Sept 16, 2021

You need to use some old school search techniques and search for “Java overview”

_wldu · on Sept 16, 2021

I'm not sure that's a bad thing.

rovr138 · on Sept 16, 2021

well, they're results to java related items...

What kind of links where you expecting to find?

acchow · on Sept 17, 2021

If this search engine ever takes off, the listicle writers will just start optimizing for it too, right?

dotancohen · on Sept 17, 2021

Mission accomplished, then.

acchow · on Sept 17, 2021

If the goal was to remove modern web design, ok sure mission accomplished.

If your goal was to create a search engine that ignored listicles and other fluff and instead got you meatier results like "academic talks" and such, then no.

dotancohen · on Sept 18, 2021

When a measure becomes a target, it ceases to be a good measure.

https://en.wikipedia.org/wiki/Goodhart%27s_law

adventured · on Sept 16, 2021

I did a search for "George Washington"

First result after Wikipedia:

"Radiophone Transmitter on the U.S.S. George Washington (1920)

In 1906, Reginald Fessenden contracted with General Electric to build the first alternator transmitter. G.E. continued to perfect alternator transmitter design, and at the time of this report, the Navy was operating one of G.E.'s 200 kilowatt alternators http://earlyradiohistory.us/1919wsh.htm "

Another result in the first few:

" - VANDERBILT, GEORGE WASHINGTON

PH: (800) ###-#233 FX: (#03) 641-5###. https://www.ScottWinslow.com/manufacturer/VANDERBILT_GEORGE_... "

And just below that terrible result:

"I Looked and I Listened -- George Washington Hill extract (1954)

Although the events described in this account are undated, they appear to have occurred in late 1928. I Looked and I Listened, Ben Gross, 1954, pages 104-105: Programs such as these called for the expenditure of larger sums than NBC had anticipated. It be http://earlyradiohistory.us/1954ayl2.htm "

Dramatically worse than Google.

---

Ok, how about a search for "Rome" then? Surely it'll pull some great text results for the city or the ancient empire.

First result after Wikipedia:

"Home | Rome Daily Sentinel

Reliable Community News for Oneida, Madison and Lewis County http://romesentinel.com/"

The fourth result for searching "Rome":

"Glenn's Pens - Stores of Note

Glenn's Pens, web site about pens, inks, stores, companies - the pleasure of owning and using a pen of choice. Direcdtory of pen stores in Europe. http://www.marcuslink.com/pens/storesofnote/roma.html"

Again, dramatically worse than Google.

---

Ok, how about if I search for "British"?

First result after Wikipedia:

"BRITISH MINING DATABASE

British_Mining_Database http://www.users.globalnet.co.uk/~lizcolin/bmd.htm "

And after that:

"British Virgin Islands

Many of these photos were taken on board the Spirit of Massachusetts. The sailing trip was organized by Toto Tours. Images Copyright © Lowell Greenberg Home Up Spring Quail Gardens Forest Home Lake Hodges Cape Falcon Cape Lookout, Oregon Wahkeena http://www.earthrenewal.org/british_virgin_islands2.htm"

Again, far off the mark and dramatically worse than Google.

I like the idea of Google having lots of search competition, this isn't there yet (and I wouldn't expect it to be). I don't think overhyping its results does it any favors.

kwhitefoot · on Sept 16, 2021

What were you expecting to see for British? There must be millions of pages containing that term. Anyway the first screenful from Google is unadulterated crap, advertising mixed with the usual trivia questions.

If you are going top claim something is wide of the mark then you really ought to tell us at least roughly where the mark is.

burkaman · on Sept 16, 2021

This is not a Google competitor, it's a different type of search engine with different goals.

> If you are looking for fact, this is almost certainly the wrong tool. If you are looking for serendipity, you're on the right track. When was the last time you just stumbled onto something interesting, by the way?

duckmysick · on Sept 16, 2021

I checked the results of the same query and they seem fine. Lots of speeches and articles about George Washington the US president. There's even his beer recipe.

As for the results you linked, it's part of the zeitgeist to list other entities sharing the same name. Sure, they could use some subtle changes in ranking, but overall the returned links satisfy my curiosity.

JasonFruit · on Sept 16, 2021

[flagged]

fouric · on Sept 16, 2021

The project explicitly bills itself as a "search engine", not an "interesting and unexpected material surfacer". Moreover, projecting emotions like "angry" onto a comment in order to discredit the content of the comment (hey! is that an ad-hominem?) is just about exactly the opposite of the discussions that the HN mods are trying to curate, and the discussions that I like to see here.

allknowingfrog · on Sept 16, 2021

If you click through to the About page, I think you'll see that "interesting and unexpected material surfacer" is a fairly apt description of the project.

handelaar · on Sept 18, 2021

I think in fairness that when "interesting and unexpected material surfacer" is merely a euphemism for "we didn't bother indexing the things you might actually be looking for", a degree of scepticism isn't unwarranted.

(Source: I looked up several Irish politicians because I run an all-text website containing every single word that they say in parliament. I got nothing of use, or even of interest, for anything.)

withinboredom · on Sept 16, 2021

In the early days of google, I found what I was looking for on page 5+. On the way, I’d discover many interesting things I didn’t even know I was looking for, often completely unrelated to what I was searching for.

kews · on Sept 16, 2021

I miss those old days of even being permitted to go many pages in.

kwertyoowiyop · on Sept 16, 2021

And now Google hides that more than one page even exists, as they populate their first page with buttons to ask similar questions and go to the first page of THOSE results.

adventured · on Sept 16, 2021

> Hobby project leads angry person to interesting and unexpected material; angry person remains angry.

Not angry in the least. I'm thrilled someone is working on a search competitor to Google.

I understand you're attempting to dismiss my pointing out the bad results by calling me angry though. You're focusing your content on me personally, instead of what I pointed out.

The parent was far overhyping the results in a way that was very misleading (look, it's better than Google!). I tried various searches, they were not great results. The parent was very clearly implying something a lot better than that by what they said. The product isn't close to being at that level at this point, overhyping it to such an absurd degree isn't reasonable or fair to the person that is working on it.

I would specifically suggest people not compare it to Google. Let it be its own thing, at least for a good while. Google (Alphabet) is a trillion dollar company. Don't press the expectations so far and stage it to compete with Google at this point. I wouldn't even reference Google in relation to this search engine, let it be its own thing and find its own mindshare.

bityard · on Sept 16, 2021

> I'm thrilled someone is working on a search competitor to Google.

Except the author goes to quite some lengths to explain that his search engine is not a competitor to Google, and is in fact exactly the opposite of Google in many ways: https://memex.marginalia.nu/projects/edge/about.gmi

resynth1943 · on Sept 17, 2021

Yeah, Google tends to send a lot of junk back.

marginalia_nu · on Sept 16, 2021

Yeah so this is my project. It's very much a work in progress, but occasionally I think it works remarkably well for something I cobbled together alone out of consumer hardware and home-made code :-)

crocodiletears · on Sept 16, 2021

It's very rare that I see a project on HN I can see myself using. This is one. Like others have said, the results can be a little rough. But they're rough in a way I think is much more manageable than the idiosynchrosies of more 'clever' search engines.

marginalia_nu · on Sept 16, 2021

I think you need to approach it more like grep than google. It's a forgotten art, dealing with this type of dumb search engine.

Like if you search for "How do I make a steak", you aren't going to get very good results. But a better query is "Steak Recipe", as that is at least a conceivable H1-tag.

crocodiletears · on Sept 16, 2021

This is exactly how I prefer to use my search engines.

quaintdev · on Sept 17, 2021

I searched like this all my life and always got expected results.

But just a week ago I found out that these "how", "what" questions give better and faster results on Google.

LeftHandPath · on Sept 17, 2021

That switch happened some years ago. I've been unlearning and relearning how to use google for what feels like at least three or four years now.

The main pain-point, though, is that a lot of long-tail searches you could've used to find different results in years past, now seem to funnel you to the same set of results based on your apparent intent. At least, it has felt that way -- I'm not entirely sure how the modern google algorithm works.

eitland · on Sept 20, 2021

I realized this a few years ago when I observed my wife find things faster on Google than me.

I appreciate that it is easier for newcomers but I still hate it personally after years and especially that they cannot even avoid meddling with my queries even when I try to accept the new system and use the verbatim option.

corobo · on Sept 19, 2021

Try your old Google Fu skills on DuckDuckGo (or Bing I guess). I've found it to have good results anyway

tomxor · on Sept 19, 2021

> I think you need to approach it more like grep than google. It's a forgotten art

A search engine that accepted regex as the search parameter would be amazing.

I actually used this method as a field filter for a bunch of simple internal tools to search for info. Originally people were asking for individual search capabilities, but I didn't want it to become a giant project with me as the implementor of everyone's unique search capability feature request - so I just gave them regex, encoded inputs into the URL query string so they can save searches - gave em a bunch of examples to get going and now people are slowly learning regex and coming up with their own "new features" :P

But this made sense because it's a relatively small amount of data, so small that it's searched in the front end which is why it's more of a filter... I don't think pure regex would scale when used as a query on a massive DB, it would need some kind of hierachy still to only bother parsing a subset of relevant text... unless there is some clever functional regex caching algorithm that can be used.

AQXt · on Sept 17, 2021

So, you are re-implementing Altavista, Lycos and other old search engines.

They used the naive approach: you searched for "steak", and they would bring the pages which included the word "steak".

The problem is that people could fool these engines by adding a long sequence like "steak, steak, steak, steak, steak, steak" to their site -- to pretend that they were the most authoritative page about steaks.

Google's big innovation was to count the referrers -- how many pages used the word "steak" to link to that particular page.

The rest is history.

frogpelt · on Sept 17, 2021

Effective Google search is also history.

I understand they are trying to maximize ad revenue and search does work very well for people who are looking for products or services.

But it no longer works well for finding information that is even slightly obscure.

wolverine876 · on Sept 17, 2021

> The problem is that people could fool these engines by adding a long sequence like "steak, steak, steak, steak, steak, steak" to their site -- to pretend that they were the most authoritative page about steaks.

I don't see a lot of people investing in SEO to boost their Marginalia results.

eitland · on Sept 20, 2021

> Google's big innovation was to count the referrers -- how many pages used the word "steak" to link to that particular page.

Then people fooled Google into showing the White House as top result when searching for "a miserable failure".

At the moment marginalia's approach of sorting pages into quality buckets based on lack of JS seems to be working extremely well, but of course it will be gamed if it gets popular.

However, I'd rather want SEO-crafting to consider itself with minimizing JS, rather than spamming links into every comment field on every blog across the globe ;-)

asah · on Sept 16, 2021

Love it, kudos! This is great for developers and others who Just Need Answers and not shopping or entertainment.

If you're looking for feedback, both from a UI design and utility standpoint, you might consider "inlining" results from selected sites, e.g. Wikipedia, stacked change, etc. Having worked on search for a long time, inlining (onebox etc) is a big reason users choose Google, and that channelers fail to get traction. If you're Serious(tm), dog into the publisher structure formats and format those, create a test suite, etc.

A word of caution: if this takes off, as a business it's vulnerable to Google shifting its algorithms slightly to identify the segment of users+queries who prefer these results and give the same results to those queries.

Hope this helps!

marginalia_nu · on Sept 16, 2021

If Google starts showing interesting text-heavy links instead of vapid listicles and storefronts, I have accomplished everything I ever could dream of.

MaysonL · on Sept 17, 2021

Google Info - for when you're looking for information, not shopping advice or lists!

wolverine876 · on Sept 17, 2021

Maybe you're joking, but this is a good idea for search engine. Better: Credible info.

addandsubtract · on Sept 17, 2021

Google info? Can you give me a sample query of what you mean?

kkiinnpptt · on Sept 18, 2021

It was a joke. The joke being that Google should launch at new product called Google Info, that would actually give you information when you search.