Hacker News new | past | comments | ask | show | jobs | submit login
Compare Google, Bing, Marginalia, Kagi, Mwmbl, and ChatGPT (danluu.com)
950 points by 882542F3884314B 9 months ago | hide | past | favorite | 465 comments



While I've made huge improvements to the algo recently, I do think Marginalia Search got a bit lucky with the sample queries, as it is still IMO far more hit and miss than many alternatives, but that also speaks for how hard evaluating search quality is.

Its efficacy is also strongly dependent on understanding that it's a keyword search engine with no semantic understanding.


> Its efficacy is also strongly dependent on understanding that it's a keyword search engine with no semantic understanding.

Good. I love keyword search.

"Semantic understanding" can be so biased and ... just shady sometimes.


It's tricky though. I think a lot of people think they want raw keyword search, but what they really want is a search experience that makes intuitive sense.

If you lean too much into embeddings and so on, it's easy to get errors that don't make sense to a human being. It's extremely frustrating when you experience "I typed X, why am I getting results about Y?!"

That said, I think there's a sweet spot with some magic, where it genuinely just makes search better. But it's like perfume, if it's immediately obvious that it's there, it's probably a fair bit too much.


Keyword search leads to things like every website would put meaningless words in the meta section of their website so it would be picked by Altavista


No, if it's done right. Source: I made my own search engine


> [...] but that also speaks for how hard evaluating search quality is.

Would you be able to share some of your personal highlights regarding this?

I've partially kept up-to-date with the DIY, non-corporate search space (YaCY and friends). I'd love to understand a bit more behind the engineering decisions made when creating a search engine; it seems like a very hard problem to solve.

P.S. Marginalia is a very impressive piece of work, overall -- I've heard nothing but positive remarks from users on here. I've been meaning to try it for a while, but time constraints have... well, constrained, thus far.


I just tested Mariginalia and it was completely unable to lead me to a Wikipedia or imdb page when searching for "driver ryan gosling" and variations. It just listed lots of random articles.


That.. is kind of the point of this particular search engine.

> This is an independent DIY search engine that focuses on non-commercial content, and attempts to show you sites you perhaps weren't aware of in favor of the sort of sites you probably already knew existed.


Well that makes sense, but I wanted to push against the result that the OP seems to take away from their test, which was that Marginalia seems to work well for the common user.


There's also a known bug with Wikipedia in particular, I do index it but the results are never ranked particularly high. I haven't fixed it because I don't want Wikipedia to be the #1 result for every search. Feels like most people are aware of Wikipedia and don't need help finding it.


I often do a Google search, and then go directly to the Wikipedia result. My reasoning is that during the initial search, I don't know if there's a Wikipedia page about that topic, and I might need a fallback option.


Unless it's something related to medicine; then you have to explicitly add "wiki" to the query. Some public health thing to discourage hypochondriacs I guess, but it's very annoying.


Thanks for your work!

I have a suggestion for the “About” section at the top of Marginalia’s landing page. I think it would read better like this:

> This is an independent DIY search engine that focuses on non-commercial content, and attempts to show you sites you perhaps weren't aware of [instead] of the sort of sites you probably already knew existed.

Showing one thing “in favor of” another seems contradictory in this case.


Honestly I understand it well enough that I see it is surprisingly hard, but not enough to have good solutions...


Just my feedback after trying to finally get to what it is exactly.

I tried to find marginalia on DDG, not on the first page. Google has it after some garbage. If I go to marginalia.nu I get a SSL error. search.marginalia.nu works

If i search on marginalia for duckduckgo there first link is somewhat relevant but is about the app, all the other links are related to DDG but of curious relevance.

If I search for ublacklist mentioned above, I do not see anything directly relevant.


Hmm, what's your browser? I renewed the cert today... Only thing I can think of is that it might not like a wildcard cert for the bare marginalia.nu domain.


Safari doesn't like https://marginalia.nu. Probably because *.marginalia.nu is not valid for the base domain. Add it as a Subject Alt Name


Try now?


Looks like you've fixed your bug.


Hi, your encyclopedia experiment(?) is also very inspiring. I really think it works, it makes it much easier to read the articles.


Yeah, it's pretty cool. I wish I had more time to polish it up a bit. A least make the mobile experience work a bit better.


Firefox android


Hmm, can't reproduce it myself, but firefox has a nasty habit of quietly "repairing" these types of misconfigurations by redirecting from one subdomain to another. I've added marginalia.nu as a SAN, should hopefully work now.


It does indeed work. Thanks for the quick fix, and I realize that there is more than a search engine here!


I notice you completely avoid the question on how a single developer can do so well ;)

I do think that search has gotten much worse but my ability to know the magic words like “ublock origin” instead of “Adblock” and “yt-dlp” instead of “download YouTube” and phrase my search has gotten better.

We’ve all been doing prompt engineering against the Internet-wide LLM that is the spam houses.


> I notice you completely avoid the question on how a single developer can do so well ;)

As much as I enjoy the notion of somehow being a 10,000X developer, it's probably mostly that modern search is a filtering problem, and MS does filtering fairly well.


I reckon these days search is pretty difficult and everyone knows how to game it. I recommend using a search engine that lets you effectively change which sites are shown. You can do this with Kagi, or with Google's Programmable Search Engines - I'm sure there are more too.

In particular I block Youtube, not because they aren't sometimes correct, but because I don't want videos polluting the regular results - it just takes too long to get info from videos.

An ability to upvote results for a given query seems tantalizing but I bet it would be gamed too. The DIY approach seems to be the only tractable one.

In my case I only only results from domains I believe are correct. The whitelist approach does have downsides. Usually I'll vet new potential domains through social means like Reddit and this site, rather than identifying them through the search results. I believe there's an inherent tradeoff between discoverability and the gameability of the results.

Though I do sympathize with folks who reminisce about 2008 Google Search results, there were probably orders of magnitude less content out there and a complete ignorance to how valuable your place is on your business and thus no SEO.

I also personally disagree that yt-dlp is the "correct" result for the average user when they search Youtube Download. I highly doubt the average user would know or care to use the command line. A website front end would be more actionable for them.


> In particular I block Youtube, not because they aren't sometimes correct, but because I don't want videos polluting the regular results - it just takes too long to get info from videos.

Funnily enough, lately I've been prioritizing YT videos more when searching. So many sites now are just regurgitated SEO farms with minimal quality, and easy to see why: it's minimal effort to produce and cheap to host. But making a video takes time and effort, so has a much higher barrier to use as a click farm.

More than once when traditional search failed me, I went to YT and found some video from 2009 clearly and eloquently explaining what I'm looking for in detail, and without any distractions because the person authoring the video clearly didn't specialize in the media format or show interest in experimenting.

I've found it to also be a better source when looking a product to buy. Want to know which fan to get? Turns out there's a channel from a dedicated guy who keeps finding ways to test different fans and their utility and with multiple videos demonstrating his approach and findings. The mainstream channels aren't all that useful, but there's a ton of "old web" style videos (some even recent) passionately providing details for almost anything you'd think to search. And they're a gold mine.


> But making a video takes time and effort, so has a much higher barrier to use as a click farm.

> The mainstream channels aren't all that useful, but there's a ton of "old web" style videos (some even recent) passionately providing details for almost anything you'd think to search. And they're a gold mine.

This won't be the case for long. YT is already starting to be polluted with spam and AI generated content, which will get more and more common. The same thing that happened to the web in text form, will happen to videos.

I think the only solutions are using allowlists for specific domains, and ironically enough more AI to filter specific results. Or just straight up LLMs instead of web search, assuming they're not trained on spam data themselves.


Yeah. I was recently looking for videos comparing two smartphones and among top ranked videos there were videos that just show the phones side by side and the video consists of showing specs side by side and videos that just have LLM-generated text, added to the video with TTS.


One critical difference is the date attached to youtube videos. It's easy to verify that a video was made before this tech was available, but you can't do that with websites, or search engine result pages.

It does limit utility for more modern needs, unfortunately.


Note that the problem of filtering bad data out of learning material isn’t inherently easier than filtering same out of search results.


Would a browser feature that skipped to the relevant parts of the video based on closed captioning and understanding search intent be useful? It seems like this would be a good way for Google to fight to stay relevant in UX vs having the chat bots just quickly spitting out a readable answer. Hunting through ad laden webpages is annoying. Seeking to the relevant section of the video is a solvable problem, especially for videos above some viewership threshold.


> Seeking to the relevant section of the video is a solvable problem

...and it has already been solved, though partially: SponsorBlock allows people to add a "Highlight" section to a video, which denotes the part of the video which the user most likely wanted to see (sans the "what's up guys", "like and subscribe", etc.)

Of course, it's not perfect: it relies upon humans doing the work, though some may see that as a positive over something more computerized.


I've definitely seen Google do this already: https://searchengineland.com/google-tests-suggested-clip-sea...


Google seems to be taking much more advantage of YouTube's transcription feature lately. The first addition was the (ok, gimmicky) animation on the Subscribe button when someone says the dreaded like. Hopefully a sign of things to come.

Overall AI summaries are very welcome for a certain subset of YouTube which is sadly dominated by sponsored, clickbait, and ad-driven content.


Didn’t Google try this already? It seems useful to me, at least. IMO the next frontier of search is not better hypertext, it’s podcasts, audio, and video.


Do you have some tips for finding concise videos that answer the question you are asking? I am finding more and more obvious LLM bullshit in results, so I am willing to try some other tactics. But I am not ready to spend the minutes watching videos to see if it is actually relevant or a waste of time, always artificially long to increase ad revenue.


For me, it really depends on the type of video. For fixing cars, I'm usually looking for something specific enough that there isn't a lot of chaff. It was probably recorded and edited on a phone just to splice the clips together. Probably the default thumbnail that youtube extracted from the video.

For product videos, if Project Farm did it, look there first. Otherwise, I look for someone has a lot of videos for competing products with basically the same format, not over 10 minutes.

Tech videos are the hardest, I often still prefer text. Maybe look for links to the docs in the description? I still get duds though.


I don’t know much about fixing cars, but yeah, YouTube is a treasure trove for tacit knowledge.


Wish I did, but here you're at the algorithm's mercy, unfortunately. One possibility is subbing/accruing watch time on channels that you find provide you the right value, so that the algorithm might recommend similar channels on other subject matters.


That's curious, I generally hate video due to inability to glance over content, and the few attempts I made to actually find useful information I searched for resulted in... spammy extra low effort video content that did not answer my questions.


Depends on what you’re looking for. A blog post about how to play Search and Destroy by The Stooges is not as useful as a video of James Williamson himself showing you the riffs!


Well, I don't think I'd be able to learn much just from watching the concert: teaching is fundamentally different from doing.

So I think even that example does not universally hold. I'd still appreciate a write up with tips on what's important and if there are any transitions to focus on with only the bits on video where some of that is demonstrated.

Now, I can barely contort my fingers into one riff, so I lack the knowledge to understand what I am missing, but I'd still have a hard time learning that from video.


I’m not talking about concert footage, I’m talking about James breaking the song down and showing you the riffs at quarter speed.

Until a recent YouTube video I was playing the song incorrectly. It’s blazing fast and the mix is sort of insane so it’s very hard to hear exactly what is going on. And the tablature isn’t going to let you see how his body fits into the groove.

This is tacit knowledge we’re talking about, not book learning. Guitar instruction is always hands on.


Almost everything is hands-on (everything apart from things you really can't do hands-on, like exploring black holes): I don't remember seeing someone come out of reading a book on programming and being a master programmer.

But video is not hands-on any more so than text: if it was, live concerts and sports games and other performances would not be such a big deal. Sure, video is richer in some signals (audio/video), but poorer in others (introspection, pacing and focus...).

That does not mean I can't read to understand a new topic or to be prepared to look for subtleties in a hands-on performance.

If anything, to a great student, they should be complementary, but still, each student will have one or the other contribute more to their learning, and that depends both on the teacher, but also on the student.


> Though I do sympathize with folks who reminisce about 2008 Google Search results, there were probably orders of magnitude less content out there and a complete ignorance to how valuable your place is on your business and thus no SEO.

That was a decade after Google was created and people certainly understood SEO and Google was constantly updating its algorithm to punish people who were trying to game the algorithm.

The wikipedia page on "link farming" for example references it happening as early as 1999 and targeting SEO on inktomi:

https://en.wikipedia.org/wiki/Link_farm

I remember some internal presentations at Amazon around ~2004 about how boosting Google SEO on Amazon web pages increased traffic and revenue (and Amazon was honestly a bit behind-the-curve due to a kind of NIH syndrome).


At the time it seemed like Google was winning, though. SEO seems to have gotten really good, or maybe Google just gave up.


I have a hard time believing it's so difficult for a search engine to distinguish between a credible, respected website that has been around a while with some generated garbage that exists to be a search result. We humans can tell them apart, so in principle, computers can too.


Yes, this should be table stakes for a classifier - a company with the resource of Google can definitely solve that problem if they weren't themselves in the business of spam (advertising) and benefited from spam sites (as they often include Google ads/analytics).


Google is quite quick in plugging holes in AdSense but AdWords.


> table stakes

Always “table stakes”. Do you think in buzzwords also? I’ve always wondered this. Or do you think normal words and then translate it into this bandwagoning / membership proving garbage ?


Hey, please don't cross into personal attack on HN. It's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.

Edit: unfortunately your account has been breaking the site guidelines in a lot of other places too—here are some recent examples:

https://news.ycombinator.com/item?id=38825624

https://news.ycombinator.com/item?id=38825543

https://news.ycombinator.com/item?id=38783196

We eventually have to ban accounts that post like this, so if you'd please stop doing that, we'd appreciate it. On HN the idea is: if you have a substantive point, make it thoughtfully; if not, please don't comment until you do.


I guess this brings up the question of how good are humans at doing this across a wide number of domains on average?

The other question I have is how long do these garbage results stay up for a particular query on average?


Google's PSE is neat but there isn't a good way to manage switching between them. They could easily add a little dropdown to let you select which one to use as part of the public link UI they provide for each one individually. Giggle[1] gives me this ability and I run it locally (alongside Kagi) for more specific things to target domain lists I've been building over the years.

1. https://github.com/dan-lovelace/giggle


I'm a big fan of the non commercial site search engines because of the gaming aspect. If you're not generating revenue from the clicks the game mostly goes away.

I'm not saying people aren't entitled to make some money, but it clearly incentivizes user hostile behavior.

Maybe make it an option because legitimate sites like journalism also use this model.


Subscription model like Kagi seems to work pretty well against gaming the results.

Their only remaining incentive is to be good enough that people keep paying for the service.


It works not because they're somehow smarter or have more resources than Google at detecting spam/SEO, it's because unlike Google (and other ad-supported search engines), they make money from result quality and have an interest in blocking spam.

Google on the other hand makes money off ads (whether on the search results page itself or on the spam sites), so spam sites are at best considered neutral and at worst considered beneficial (since they can embed Google ads/analytics, and make the ads on the search results page look relatively good compared to the spam).

Black-hat SEO has been around since the early days of search engines and they managed to keep it at bay just fine. What changed isn't that there was some sudden breakthrough in malicious SEO, it's that it was more profitable to keep the spammers around than to fight them, and with the entire tech industry settling on advertising/"engagement" as its business model, the risk of competition was nil because competitors with the same business model would end up making the same decision.

The same reason is behind the neutering of advanced search features. These have nothing to do with the supposed war on spam/SEO, so why were they removed? Oh yeah because you'd spend less time on the search results page and are less likely to click on an ad/sponsored result, so it's against Google's interests and was removed too.


Kagi works because there is no incentive for SEO manipulators to target it since their market share is so small.

Super tinfoil hat to believe Google wants to send users to blog spam websites (e.g. beneficial to Google).

Anytime there is money to be made, there is an effectively infinite amount of people trying to game the system.


That's why taking the money out of the click is effective.

There can be other models for making money, but methods that really on casting a wide net and driving low quality traffic is the thing that shouldn't be indexed or at least labeled as such


Google is a complex system so “want” can just include we are making money from the blog spam and while we don’t like it other things take priority over fighting it as effectively as we could.


It's never tinfoil-hat to assume that a corporation is, at very least, making sure not to fight too hard against any activity that brings it more revenue.


But the author tried Kagi and the results don't appear to be noticeably different, filled with scammy adspam just like Google and Bing. Kagi's results seem to mostly aggregate existing search engines [1], so this isn't much of a surprise. Perhaps a subscription-based service that operates an index at Google's scale might help, but no such thing exists to my knowledge.

[1] https://help.kagi.com/kagi/search-details/search-sources.htm...


Right, but Kagi has built in tools to make it easy to fix that. Blocking those spammy sites from ever showing up again. Moving certain sites up the ranking, and so on. These features mean that over time my Kagi results have become nearly perfect for myself.


This is addressed in the article. As Hacker News readers and expert computer users, we have a bag of tricks that we can reach into in order to make our searches perform better. With a similar level of effort and an expert user's intuition you can get good results out of any search engine. Not so for the average user. In fact, again paraphrasing the article, Google's original claim to fame was that you didn't have to spend a lot of time doing exact keyword matching and fancy tricks in order to get good results.


> it just takes too long to get info from videos.

I can’t wait until video transcripts get fed into LLMs just to eliminate the whole “This video is sponsored by something-completely-unrelated, more about them later. What’s up Youtube, remember to like, share, subscribe… 5 entire minutes pass on similar drivelthe actual thing you want, but stretched out to an agonizing length


You need SponsorBlock.

Usually people leave a "highlight" marker which tells you where you're supposed to jump to. Along with the regular "This video was brought to you by <insert>VPN".


Re: Kagi, I heard about it on HN, tried it for 100 searches, then subscribed. When I search for random JS and CSS things, MDN is the first result, and if it isn't, I can downrank whatever spammy site(s) are on top.

---

I wish I had a local LLM trained to detect clickbait and or low-effort content. I imagine searching YouTube and having all the clickbait collapsed together (just like Kagi condenses listicles), with the remainder being potentially high-quality content. Don't know how feasible this is right now.


Just use the Kagi Summarizer on YouTube videos and you don’t have to waste time watching trash. It’s a great life hack.


How does that work? Does it scrape the auto-generated captions?


Yeah, it uses the auto-generated transcript.


I became a huge fan of Kagi after seeing it on hacker news too. It's amazing how good a search engine can be when it's not full of ads.


Yeah. At first I primarily used Kagi to move away from Google as a company, hoping for results that were equally good. But Google search actually feels crappy now in comparison.


Been paying for Kagi for 6+ months and very happy with it. I’m pretty anti subscriptions so that’s saying a lot for a service that is otherwise free.

I do have to dump into google for local searches every once in awhile, but otherwise happy with it.


I keep Google Maps around for a similar reason; Apple Maps works well, but things like business hours are wrong often enough for me to double-check in Google Maps.


Yep, same. Yelp is not nearly as good in Apple Maps but for in town directions it works great for me.


You use yelp for directions?


No for business info. Apple pulls directly from Yelp. issue is when you want more info or you want to get a closer look at the business and tap any of its images, it will take you straight to the app store to download Yelp.


Current Kagi results for those without an account to compare:

youtube downloader

https://kagi.com/search?q=youtube+downloader&r=us&sh=_szITdy...

ad blocker

https://kagi.com/search?q=Ad+blocker&r=us&sh=-BHzV2ZoCDpmgOu...

download Firefox

https://kagi.com/search?q=Download+Firefox&r=us&sh=zkkmc_EQX...

why do wider tires have better grip?

https://kagi.com/search?q=Why+do+wider+tires+have+better+gri...

why do they keep making cpu transistors smaller?

https://kagi.com/search?q=Why+do+they+keep+making+cpu+transi...

vancouver snow forecast winter 2023

https://kagi.com/search?q=Vancouver+snow+forecast+winter+202...

I agree with the author that there is too much spam on the web. I think Kagi in general does a pretty good job at downranking it (number of ads/trackers is a negative ranking signal on Kagi) but we can always do better. Kagi has special search modes like "Small Web" which virtually eliminates spam.

I welcome such scrutiny from the community. Please continue to keep us honest.


Kagi gives me websites that require more clicking; Google just gives me reasonable answers and I don't see spam in your examples.

"why do wider tires have better grip?"

Wider tires provide more grip due to a larger contact patch with the road. While it's true that friction is not directly dependent on surface area, a larger contact patch allows for more even weight distribution and better traction, particularly during cornering. This can result in improved handling and stability.

"why do they keep making cpu transistors smaller?"

Smaller transistors can do more calculations without overheating, which makes them more power efficient. It also allows for smaller die sizes, which reduce costs and can increase density, allowing more cores per chip.

"vancouver snow forecast winter 2023"

The forecast for the 2023/2024 season suggests that we can expect another winter marked by ample snowfall and temperatures hovering both slightly above and below the freezing mark. Be prepared ahead of time.


That first result re: tires is simply wrong. Wider tires don't have a larger contact patch; the size of the contact patch is determined by the weight of the car and the air pressure in the tires:

    A = W / P
So the reason wider tires improve handling is more complex and subtle. Also, FTA:

    Assuming a baseline of a moderately wide tire for the wheel size.
      - Scaling both of these to make both wider than the OEM tire (but still running a setup that fits in the car without serious modifications) generally gives better dry braking and better lap times.
      - In wet conditions, wider setups often have better braking distances (though this depends a lot on the specific setup) and better lap times, but also aquaplane at lower speeds.
      - Just increasing the wheel width and using the same tire generally gives you better lap times, within reason.
      - Just increasing the tire width and leaving wheel width fixed generally results in worse lap times.
A full accounting of the effects of changing tire width should explain all of these effects.


So, why is it?

Such a nerd snipe this one. 400+ comments and still could not get the answer.


As a partial explanation, maybe it's not the case that area = weight/tire pressure as an absolute rule? In which case we get back to the layman knowledge of "wider tires have a greater contact area."

As an extreme example, run flats at atmospheric tire pressure don't drastically change their area.

https://web.archive.org/web/20090327161537/http://performanc...

From https://www.6speedonline.com/forums/996-turbo-gt2/242759-imp...


My only gripe with these quick answers or excerpts is sometimes it doesn’t fully give the context. After it’s given me couple wrong results I stop trusting it and always read the source. If I were to read the source might as well click into the website and read the whole thing.


I often default to using Kagi’s FastGPT when I want direct information instead of a starting point for web browsing. It combines web searches with LLM summarization to produce an integrated, sourced answer. Kinda like “New Bing” with a better interface. You can trigger this mode by adding “!fast” to your query, going to https://kagi.com/fastgpt, or using their API. Here’s what it returned for me on those specific queries: ========== "why do wider tires have better grip?" Wider tires provide better grip due to having a larger contact patch with the road surface. A larger contact patch means the tire can disperse the weight of the vehicle over a wider area, reducing pressure. This allows for more friction between the tire and road, improving traction and cornering ability. However, wider tires do not necessarily provide better grip in wet conditions. [1][2][3][4][5] 1. wheels - Does a wider tire always mean more grip/traction - Motor ... 2. Do Wider Tires Give You More Grip? Not All the Time 3. Learn Me: Wider tires = more Grip? = Faster cornering?| Grassroots ... 4. Wide or narrow tires? Here's how your choice will affect driving ... 5. VIDEO: Do wider tires offer more grip? - BMW M3 and BMW M4 Forum ========== "why do they keep making cpu transistors smaller?" There are a few key reasons why CPU manufacturers keep making transistors smaller: 1. Smaller transistors allow more transistors to be placed on a single chip, enabling increased processing power. More transistors results in more capabilities and performance. [1] 2. Smaller transistors use less power and produce less heat. This allows CPUs to operate faster and with more cores/threads within the same power and thermal envelope. [2] 3. Moore's Law, an observation made by Intel co-founder Gordon Moore in 1965, predicted that the number of transistors on integrated circuits doubles approximately every two years. Following this trend has driven the semiconductor industry to continue shrinking transistors. [3] 4. Smaller transistors can switch on and off faster, enabling increased clock speeds. This directly leads to increased performance. [1] However, we are rapidly approaching the fundamental physical limits of transistor shrinkage. Transistors cannot be smaller than an atom, which is around 0.2 nanometers for silicon. [4] Alternative technologies like carbon nanotube transistors may be needed to continue performance growth beyond this point. [5] 1. Why does computer chip process size have to keep getting smaller ... 2. Why does smaller = less heat? - CPUs, Motherboards, and Memory ... 3. What Is Moore's Law and Is It Still True? 4. Transistors are an essential building block used in almost every ... 5. These Transistor Gates Are Just One Carbon Atom Thick - IEEE ... ========== "vancouver snow forecast winter 2023" Based on the context provided: The winter of 2023-2024 in Vancouver is expected to see less snow and milder temperatures than typical winters. Forecasts predict a below average snowfall season with warmer weather [1][2][3][4][5][6]. Long range forecasts through December 2023 for southern British Columbia call for showers along the coast and less precipitation inland [7]. Metro Vancouver's fall and winter is also predicted to be a potentially 'down' snow year [4]. Confidence is high for a mild winter in western Canada with reduced snowfall amounts [5]. However, specific snowfall totals are uncertain given the long lead time [6]. 1. Vancouver winter weather: Less snow, milder temperatures ... 2. 2023-2024 British Columbia Winter Forecast Preview | OpenSnow 3. Snow Prediction Vancouver Winter 2023/24 — Alblaster Snow ... 4. Metro Vancouver's fall, winter forecast | CityNews Vancouver 5. What will this winter be like? Grab the hot cocoa — here's your 2023 ... 6. Canada's Winter Forecast: El Niño a critical factor for the season ... 7. 60-Day Extended Weather Forecast for Vancouver, BC | Almanac.com


I really don't understand why anyone writing articles about ChatGPT uses 3.5. It's pretty misleading as to the results you can get out of (the best available version of) ChatGPT.

For comparison, here are all the author's questions posed against GPT4:

https://chat.openai.com/share/ed8695cf-132e-45f3-ad27-600da7...


> I really don't understand why anyone writing articles about ChatGPT uses 3.5.

Because that’s what most people have access to. It’s absolutely worthless to most readers to talk about something they’ll never pay for and it’s not the job of random third-parties to incentivise others to send money to OpenAI.

What I really don’t understand is why anyone gets so hung up about it and blames the writer. If you’re bothered by people using 3.5 you should complain to OpenAI, not the people using the service they make freely available.

Anecdotally, I find this excessive fawning about 4 VS 3.5 to be unwarranted.

https://news.ycombinator.com/item?id=38304184


> Because that’s what most people have access to.

I’d agree with this rationale if the author clearly communicated their choice of model and the consequences of that choice upfront.

In this post the table of results and the text of the post itself simply reads “ChatGPT” with no mention of 3.5 until the middle of a paragraph of text in the appendix.

> It’s absolutely worthless to most readers to talk about something they’ll never pay for and it’s not the job of random third-parties to incentivise others to send money to OpenAI.

The “worth” is in communicating an accurate representation of the capabilities of the technology being evaluated. If you’re using the less capable free version, then make that clear upfront, and there’s no problem.

If you were to write an article reviewing any other piece of software that has a much less capable free version available in addition to a paid version, then you would be expected to be clear upfront (not in a single sentence all the way down in the appendix) about which version you’re using, and if you’re using the free version what its limitations may be. To do otherwise would be misleading.

If you simply say “ChatGPT” it’s reasonable to infer that you’re evaluating the best possible version of “ChatGPT”, not the worst.

Accurate communication is literally the job of the author if they’re making money off the article (this one has a Patreon solicitation at the top of the page).

Whether or not "most readers" are ever going to pay for the software is totally orthogonal.

If using GPT4 vs 3.5 would create results so distinct from one another that it would serve to incentivize people to give money to OpenAI, well then that precisely supports the argument that the author’s approach is misleading when presenting their results as representative of the capabilities of “ChatGPT”.

> What I really don’t understand is why anyone gets so hung up about it and blames the writer.

Again, if they’re making money off their readers it’s their job to provide them with an accurate representation of the tech.

> Anecdotally, I find this excessive fawning about 4 VS 3.5 to be unwarranted. https://news.ycombinator.com/item?id=38304184

Did some part of my comment came across as “excessive fawning”? Regardless, if this “excessive fawning” is truly unwarranted, this would again undermine your statement that using GPT4 would “incentivize others to send money to OpenAI”.

In regards to your link, I’ll highlight what another commenter replied to you. What should ChatGPT say when prompted about various religious beliefs? Should it confidently tell the user that these beliefs are rooted in fantastical nonsense?

It seems in this case you’re holding ChatGPT to an arbitrary standard, not to mention one that the majority of humanity, including many of its brightest members, would fail to meet.


> I’d agree with this rationale if the author clearly communicated their choice of model and the consequences of that choice upfront. (…) with no mention of 3.5 until the middle of a paragraph of text in the appendix.

You’re moving the goalposts. You went from criticising anyone using 3.5 and writing about it to saying it would’ve been OK if they had mentioned it where you think it’s acceptable. It’s debatable if the information needed to be more prominent; it is not debatable it is present.

> If you simply say “ChatGPT” it’s reasonable to infer that you’re evaluating the best possible version of “ChatGPT”, not the worst.

Alternatively, it you simply say “ChatGPT” it’s reasonable to infer that you’re evaluating the version most people have access to and can “play along” with the author.

> If using GPT4 vs 3.5 would create results so distinct from one another that it would serve to incentivize people to give money to OpenAI

Those are your words, not mine. I argued for the exact opposite.

> Again, if they’re making money off their readers it’s their job to provide them with an accurate representation of the tech.

I agree they should strive to provide accurate information. But I disagree that being paid has anything to do with it, and that their representation of the tech was inaccurate. Incomplete, maybe.

> Regardless, if this “excessive fawning” is truly unwarranted, this would again undermine your statement that using GPT4 would “incentivize others to send money to OpenAI”.

Again, I did not argue that, I argued the opposite. What I meant is that even if you believe that to be true, that still doesn’t mean random third-parties would have any obligation to do it.

> I’ll highlight what another commenter replied to you.

That comment has a reply, by another person, to which I didn’t feel the need to add.

> It seems in this case you’re holding ChatGPT to an arbitrary standard, not to mention one that the majority of humanity, including many of its brightest members, would fail to meet.

Machines and humans are not the same, not judged the same, don’t work the same, are not interpreted the same. Let’s please stop pretending there’s an equivalence.

Here’s a simple example: If someone tells you they can multiply any two numbers in their head and you give them 324543 and 976985, when they reply “317073642855” you’ll take out a calculator to confirm. If you had done the calculation first on a computer, you wouldn’t turn to the nearest human for them to confirm it in their head.

The problem with ChatGPT being wrong and misleading isn’t the information itself, but that people are taking it as correct because that’s what they’re used to and expect from machines. In addition, you don’t know when an answer is bullshit or not. With a human, not only can you catch clues regarding reliability of the information, you learn which human to trust with each information.

Everyone’s standard for ChatGPT, be it absolute omniscience, utter failure, or anything in between, is arbitrary. Comparing it to “the majority of humanity, including many of its brightest members” is certainly not an objective measurable standard.


> You’re moving the goalposts. You went from criticising anyone using 3.5 and writing about it to saying it would’ve been OK if they had mentioned it where you think it’s acceptable.

There are no goalposts being moved. My original comment was "I really don't understand why anyone writing articles about ChatGPT uses 3.5. It's pretty misleading as to the results you can get out of (the best available version of) ChatGPT."

This is still the position I'm arguing. It's a criticism of authors who use the older, inferior version of ChatGPT, do not make that abundantly clear to their readers, and then use that to make statements about the capabilities of "ChatGPT", which ultimately misleads those readers as to the current capabilities of "ChatGPT".

> It’s debatable if the information needed to be more prominent; it is not debatable it is present.

I'm not debating whether or not it is present in the article, I'm the one who highlighted its presence. What I'm arguing is that omitting this information from every reference to ChatGPT in the entire body of the text, and the tables front and center representing the data, and then burying this extremely important detail in a single sentence in the middle of a paragraph in the appendix, is effectively misleading.

> Alternatively, it you simply say “ChatGPT” it’s reasonable to infer that you’re evaluating the version most people have access to and can “play along” with the author.

It's even more reasonable to infer that when you're evaluating the performance of "ChatGPT", you're using the latest version.

If you review a video game, you don't play the free demo then tell the audience that the game is too short and lacking in a ton of features.

If you're reviewing Microsoft Word, you're not going to leave out the all-important detail that you're actually evaluating Word version 6.0.

> Those are your words, not mine. I argued for the exact opposite.

Then I misunderstood your line "incentivize others to give money to OpenAI".

> I agree they should strive to provide accurate information. But I disagree that being paid has anything to do with it, and that their representation of the tech was inaccurate. Incomplete, maybe.

Agreed that all of humanity should strive for accuracy and honesty in all their communication with others, but I do feel this responsibility is even more explicit when you are a professional making money off your writing for ostensibly providing an objective assessment of some thing.

I maintain that it's inaccurate, misleading, etc etc to present these results as representative of the performance of ChatGPT without making it abundantly clear to the reader that it's 3.5, which is significantly less performant than the latest version.

> Again, I did not argue that, I argued the opposite. What I meant is that even if you believe that to be true, that still doesn’t mean random third-parties would have any obligation to do it.

Again, I'm confused by what you're saying here about third parties.

Are you arguing the opposite that GPT 4 is not far more capable than 3.5? Are you arguing that it is more capable but that advanced capability would not make it a more compelling product? I admit I don't understand either of these positions.

That 4 is far better than 3.5 is something you can readily observe yourself and find measured on countless metrics and or find support for through countless anecdotes. If do believe it is better then that seems like it would automatically always make it a more compelling product than 3.5, whether or not you want to argue that ChatGPT as a class of products is anywhere from hardly compelling at all to God's Own Perfect Product.

> That comment has a reply, by another person, to which I didn’t feel the need to add.

Ah, I somehow missed that.

So, I went ahead and asked GPT 4 your ghost question verbatim, and the first bullet point it gave me urged me to consider rational explanations for the phenomena.

I then went ahead and asked it a question about sin and God phrased with the implication that I was believer. Then a direct, neutral question about whether or not God exists.

I think it performed well in all these cases, and the nuance that is being glossed over is that it matters whether you are expressing an implied belief in something supernatural or asking in a neutral fashion about the topic.

It's clear to me that a universal policy of responding to all queries involving topics of faith of first encouraging the user to question the validity of their faith would be the wrong way to go, so again I see this as an exceptionally arbitrary standard that I don't feel could be satisfactorily defended as a standard nor actually met by most people to the satisfaction of most people.

https://chat.openai.com/share/2dc2d6eb-b3f6-4571-a75b-af698f...

> Machines and humans are not the same, not judged the same, don’t work the same, are not interpreted the same. Let’s please stop pretending there’s an equivalence.

The purpose of comparison is precisely to draw attention to the similarities and differences between two different things, nobody ever said there was an equivalence.

> Here’s a simple example: If someone tells you they can multiply any two numbers in their head and you give them 324543 and 976985, when they reply “317073642855” you’ll take out a calculator to confirm. If you had done the calculation first on a computer, you wouldn’t turn to the nearest human for them to confirm it in their head.

This is a perfectly defined problem with exactly one correct and easily verifiable answer. The other topics we were talking about are nothing like this.

> The problem with ChatGPT being wrong and misleading isn’t the information itself, but that people are taking it as correct because that’s what they’re used to and expect from machines. In addition, you don’t know when an answer is bullshit or not. With a human, not only can you catch clues regarding reliability of the information, you learn which human to trust with each information.

I completely agree that people need to be skeptical when using ChatGPT, and that this distrust of seemingly omniscient "AI" that can confidently and plausibly provide bullshit answers to any query is something that will need to be cultivated in humanity.

Is that the point of using 3.5 to make ChatGPT look worse than it is though? Should we achieve this cultivation by being intentionally misleading? Maybe the ends justify the means but I'm not sure this is a compelling argument. I'd much rather look at the most powerful version available and point out the very real flaws with it, there are no shortage and no need to get stuck on older generations of the tech.

> Everyone’s standard for ChatGPT, be it absolute omniscience, utter failure, or anything in between, is arbitrary. Comparing it to “the majority of humanity, including many of its brightest members” is certainly not an objective measurable standard.

I mean, yeah, but there is a spectrum of arbitrariness. Asking it to answer arithmetic accurately could be reasonably argued to be on the end of the spectrum labeled "objectively the right way to do this" and expecting it to know the one correct way to answer queries regarding fundamentally unknowable topics of faith that are mythically sensitive and controversial for the majority of humanity would be closer to the other end.

----

Look, I'm so tired of online debates like this at my age. I likely wouldn't even have engaged except your first response struck me as unnecessarily abrasive with phrases like "absolutely worthless" and "excessive fawning" which are an irresistable call to arms to my inner keyboard warrior.

I'd really like to not spend the rest of my life writing essays at each other on this topic so I'm happy to agree to disagree here.

Also, this has all left me with the impression that this is largely a branding issue. OpenAI does call all of their ChatGPT versions "ChatGPT". If they made unmistakable distinctions through their product line that would go a long way in addressing any confusion.


Why does OpenAI continue to offer chatgpt 3.5 if it's so bad?


GPT 4 is THIRTY (30) times more expensive.

In the llm-assisted search spaces I'm involved in, a lot of folks are trying to build solutions based on fine tuning and support software surrounding 3.5, which is economical for a massive userbase, using 4 only as a testing judge for quality control.


Chatgpt3.5 is good enought if can give context in query.


Cheaper and faster.


It's a bit hard to use for most, either $20/month fixed for a limited # of messages, or you need to be able to reason through how to get an API key, or get another 3rd party service with similar cost & limits.


You can use GPT-4 for free via Bing - though I find it a little hard to explain to people how they can do that because I'm never sure what the rules are with regards to creating Microsoft accounts, whether you can use any browser or have to use Edge, what countries it's available in etc.

Actually maybe the recommendation should be to use GPT-4 for free via https://copilot.microsoft.com/ instead now.

(Except I can't tell which version of GPT that's using yet - there was a story on 5th December that said GPT-4 Turbo was "coming soon", not sure when "soon" is though: https://blogs.microsoft.com/blog/2023/12/05/celebrating-the-... )


FYI: Balanced doesn't run pure GPT4. Balanced uses a combination of multiple models. Precise and Creative is pure GPT4.

About GPT4 Turbo, to check if you are on Turbo, ctrl+U > ctrl+f > check if "dlgpt4t" exists. If it exists, you are running turbo.

You can also double-check by, well, asking stuff after 2021 knowledge cut-off as well ("What are the oscar winners?") with search disabled.

But you'll notice because turbo is much faster on bing (and better too).


But that gpt-4 says it can't code


IMHO TBF the "limited # of messages" is continously increasing, to the point I hardly remember it exists these days


Try uBlacklist, it's like uBlock, but for search results.

https://addons.mozilla.org/en-US/firefox/addon/ublacklist/

https://chromewebstore.google.com/detail/ublacklist/pncfbmia...

You can sync the settings and your personal blocklist to either Dropbox or Google Drive. It also has the ability to subscribe to blocklists. Mind, you need to manually turn on search engines and subscribe to lists. The uBlacklist subscriptions setting doesn't have any built-in feeds yet though. :(

edit: THere are some feeds on the uBlacklist site though. https://iorate.github.io/ublacklist/subscriptions

edit edit: Found an even better list of feeds. https://github.com/quenhus/uBlock-Origin-dev-filter#other-fi...


This is a feature of Kagi already. You can promote or blacklist domains in your search results.


But I can't do regexes, wildcards or anything like that as far as I can see, like I can in uBlacklist

And it seems like they also have a 1000 domain limit?


Kagi is just the best, it feels like Google did before a decade+ of enshittification and ad tech.


Did anyone notice that Kagi showed as barely better than Google in the article?


Yeah, for the the results of kagi are so much better than anything else, that it makes me wonder how objective can one be measuring search results.

I use google in a clients computer and it’s just horrible.

But it could also be a factor of the customizations I’ve made for my kagi. Ban quick a few paywalls sites, always put Wikipedia articles on top, prefer blogs than stackoverflow stuff…


Even without customization, I did not change anything yet, Kagi provides far superior results to Google.


I just tried it (free account), and it felt underwhelming, not many search results, or particularly interesting ones, for the image and video stuff I searched.

There was little to no spam, though, but not much to look at either. Maybe it might be useful when searching for stuff that usually has high amount of confusing spam, but otherwise not really useful for me...


Kagi is still very weak for searching for videos and images. For those, I still use Google.

Kagi really shines when you are doing a standard search, though, which is what most people do most of the time.


uBlacklist is absolutely excellent: I've been using it for a few years now, with absolutely no problems.

Quick tip: turn on the 'Skip the "Block this site" dialog', and disable 'Hide the "Block this site" links' settings -- they make it much quicker to block spam websites (of which there are many on regular search engines).


Just today I was looking for an extension just to block Quora from search results. (Talk about a useless site that seems to uselessly outrank Wikipedia on google lately — what on earth is Google up to?) I’m thankful I saw your and your parent’s post.


When Quora was new I followed some topics, got to read interesting answers to interesting questions, but then some kind of enshittification happened. I've blocked it in Kagi now.


This is amazing, I was maintaining my own custom solution that did this.


Appreciate you sharing this; I've been searching for something similar for quite some time.


I use uBlacklist with my own blacklists and Google has been pretty usable, it's great.


Does this exist for DDG?


Yes, it works for most search engines.


The addon you linked (on the Firefox version) only requests permissions on google.* sites so I don't think it will work for DDG. Is there a separate extension, or am I misunderstanding something?


uBlacklist has a button to enable other search engines, like DuckDuckGo. Press it!


I'm in the camp of those who think Google's results are still very good. I admit I use adblock (uBlock Origin) and won't even try to disable it.

I understand the author's point of turning off their ad blocker "to get the non-expert browsing experience" but then they could make a different test with uBlock on for every query and see how it goes.

It's also a bit inconsistent to expect results for downloading videos mentioning yt-dlp while trying to emulate "the non-expert browsing experience"... Yt-dlp is a command-line Python utility. Talk about non-expert! Most people don't know that videos are files that can be downloaded; of those who do, most don't know about the command line or Python.

Yet when searching for "how to download youtube videos" the first result I get on Google is a link to a service called "savefrom.net", which appears to work well and does not seem to be a scam. This would qualify as "very good" in my book.

When searching for "how to download youtube videos from the command line" the first few results are about youtube-dl, including links to github and superuser. Granted they don't mention yt-dlp, but youtube-dl is a good start.


When I do a Google search in an Incognito tab for "how to download youtube videos", the first two results I get are the following.

- https://msunduziassociation.online/perfect-online-videos/

- https://gssaction.org/program-all-in-one-media-solutions/

I would certainly put those in the "Terrible" category like the author.


My top 2 (incognito) are blog posts from pcmag.com and zdnet.com listing 5 ways to download YT videos. Maybe it's blogspam, but the listed services seem valid at first glance.

savefrom.net is the 5th result (2nd page underneath 5 youtube videos)

Edit: This is from the US. If i had to guess, these are regional differences. What country are you in?


I got similar to you; I'm in Australia.


I'm curious: what is the rationale for "in an incognito tab" being part of the test harness?

It seems pretty arbitrary to me to disable one of the key features - in this case personalization - of the software being evaluated.

Or is the evaluation not between "search engines" but rather "search engines without personalization"? If so, then this restriction does make sense. But that is not the evaluation that "normal users" are interested in.


> I'm curious: what is the rationale for "in an incognito tab" being part of the test harness?

It's the closest we can easily get to the 'average user experience'. Someone who has a long account/cookie history with Google has plausibly trained the site to return more relevant results through implicit user-curation of avoiding obvious-to-them SEO-spam on other queries.

If we posit that every user eventually trains Google to avoid SEO spam, then this begs the question of why Google(/Bing) don't eliminate the SEO spam in the first place.

Besides that, it's not obvious why search engine personalization should dramatically change the basic utility of search results. We should expect personalization to mostly address ambiguities: is 'the best way to set up tables' asking about furniture assembly/carpentry or SQL? None of the author's queries for this article supported such ambiguities, and besides that the results returned (see the final appendix) aren't[†] valid answers to a different interpretation of the question.

[†] -- I think I'd quibble about the 'adblock' question, since a reasonable person might still find an adblocker that works but participates in the 'acceptable ads program' to be sufficient.


> It's the closest we can easily get to the 'average user experience'.

Maybe it's the closest we can get (though I doubt it), but it definitely isn't close enough to tell us anything about the "average user experience".

The average user has been using google for years, without taking any steps to avoid personalization. An incognito session (on a browser / machine / network that is probably fingerprinted...) is pretty much the opposite of that typical usage pattern.

I recognize that just writing a blog post or comment on HN is not a research project so needs to do something quick, but I think it mostly invalidates the experiment. What would get closer would be to devise a few user personas and attempt to search and browse for awhile within those personas before trying the experiment. Or much better yet, put together a focus group comprised of real people within the personas you're interested in, and run the experiment using their real accounts.

> If we posit that every user eventually trains Google to avoid SEO spam

I don't think it's that, I think it's that every user trains it to return results more likely to improve the metric of "more likely to click one of the links", and I think that makes it more, not less, likely that they see what most of us here consider to be spam.

But I don't know! Maybe that's not what this experimental setup would show. But it would be a lot more enlightening than a setup using a fresh incognito window, which reflects the usage pattern of a proportion of search queries that is a tiny rounding error above zero.


Why are you assuming all users are logged in to google all the time?


Because it is objectively the case that the "average user" of the internet has a google cookie in their browser. It doesn't require that they be logged in - though I believe it's likely also the case that the "average user" is indeed logged into a google account - it just requires that they use google search without turning off cookies or specifically blocking google's. Essentially everybody uses google search and essentially nobody cares enough (or would know how) to turn off cookies or block google's cookie.

If this doesn't describe most people you know, you're in a very small bubble. (I'm somewhat in that bubble too, but I still have lots of family and friends who use the internet the normal way.)


Google has billions of user accounts ....


> It's the closest we can easily get to the 'average user experience'

You wouldn’t be really taking the average here though would you? You would be capture the experience someone might have if they were in incognito, using google for the very first time, or using google on another device for the every first time, but not the “average experience”.


Google gets paid when you click on an ad. It's reasonable to guess you're not going to click on too many scam software ads with your software engineer profile. So naturally you'll be showed less of them.

In this thread we can see people both using incognito tabs seeing different results, it will only become worse to compare if they are using personalized results.


I get savefrom.net in both Incognito and normal tabs, uBlock or not. I have no idea why you get crap results that are somehow different. uBlock doesn't change google results in Firefox for me at all. It seems you get crap added, not removed.


I searched with Chrome, perhaps that's the difference. Firefox also blocks some ads out-of-the-box even without uBlock, so maybe it was already blocked.

It could also be related to targeting, like time zone, location, IP address, age group etc.


I get the same search result in Edge as in Firefox. Can't test in Chrome, but something seems strange.


savefrom.net is a crap result.


You seem to be missing the point of the discussion here, which was to compare results returned. Not to rate if one site is better than another.


Did you click either of those links?

Both seem to do the job of downloading a youtube link to mp4 for free.


Did you click either of those links? They are not YouTube video downloaders, they just link to another downloader. There is nowhere on those links to even put a YouTube URL.

Are you seriously suggesting that a website with the following "About us" with only a link to another YouTube video downloader is itself a good YouTube video downloader?

> Good Samaritan Support Action is to reawaken the Body of Christ to receiving the extravagant love of The Father, as well as our call to respond to this love by loving God with all of our hearts, souls, strengths, and minds. In order for people’s hearts to be linked to the heart of our Heavenly Father, we want to foster and facilitate the establishment of a culture of love in our churches and ministries.


so, there is one extra click... But for the user, the site does the job and takes an extra 1 second.

Ideal? No. But it does the trick.


Not GP, but navigating to an unrelated scammy site just having a link to the actual site is a terrible and unethical job by Google. Imagine if you search "youtube" and the top result is not YouTube but some scammy site just having a link to YouTube. It's not about click counts, if the youtube downloader has bad UX and requires extra clicks, it's a bit inconvenient but ok.


Those are both garbage/scam sites


cross-posted: Did you try using savefrom.net? You can type "https://www.youtube.com/watch?v=IkYVmtgxebU" into the text box and hit "Download". Then you'll get a new tab that tries to get you to install malware. If you decline to install it, the new tab takes you to the malware's homepage. If you close the tab and go back to the original tab, savefrom.net presents you with an error message saying "The download link not found." and does not help you download the video.


savefrom.net used to be good but it seems they've switched their MO. plenty of decent alternatives filled the gap though.


Can you name the alternatives, and are they present in the search results?


An adblocker is necessary, and IMO a script blocker as well. I feel vaguely like search has gotten worse over time, but it is not a huge problem—usually a good site is on the first page or two, and so I can just go check them out.

But if clicking a site meant I would be under attack, that really increases the stakes, I start to care strongly about the absence of bad sites, not just the existence of a good one.

Other than that, people need to be trained to not download programs from websites in general. I think this has gotten better over time? This is just a human mistake. Maybe Google could suppress sites that link to executables. It must, right?


It would suppress linking to malware executables, but just general programs I don't see why they would.


By the time you know enough about a site to download some random executable off it and run it, you know more than enough to just enter the URL, so there’s no point to having it show up in search results.


Put me in the camp of google and the rest are horrible for all but very specific/unique technical terms, ie weak neutral currents. Anything that is more "everyday life" is an exercise in futility sorting through trash, often without even the terms you are looking for. And good luck with "verbatim" searches - either ignored or zero results.


> they could make a different test

The takeaway I got from the article is everyone can make their own test, as opposed to relying on other people's sentiments and memes about X is bad or Y is good.

Trying to emulate a non-expert experience without workarounds is not the common usage pattern since everyone familiar with their favorite tools have ways to get more value out of them, but this article presents a way of constructing an experiment (this is why I chose these queries, this is how I ranked scams, etc.), and I think people should follow this same spirit to evaluate if they are stuck in a local optimum with their current choice of tools.


Yeah, author seems to heavily underestimate his own needs vs general needs. But for what Google et al know about me the results could indeed be more precise. I have developed a habit of appending “GitHub” in the search query for when I am actually looking for source code vs just trying to find a page that just downloads me a video.


I'm also in the same camp who think search results from Google is very good but ChatGPT based search with RAG is better, granted it's a paid version. The latter however is kind of experimental, personally would love to have another column on ChatGPT with RAG (Bing) and the fact the author ignored RAG is rather strange.


For those (like me) wondering what RAG means: “Retrieval Augmented Generation (RAG) represents a groundbreaking approach in information retrieval, where the accuracy of search results directly influences the quality of generated answers. In essence, RAG combines traditional search mechanisms with Large Language Model's ability to understand and generate answers.”

(https://www.linkedin.com/pulse/how-we-increased-search-accur....)


If you like Bing (ChatGPT with RAG), then also give perplexity.ai a try - similar concept, but IMO better executed.

https://www.perplexity.ai/


The topic of control (in ChatGPT like models) explained: https://arxiv.org/pdf/2311.11701.pdf


This makes so much sense why people think search results are bad. Great results for "Download youtube videos" is "Ideally, the top hit would be yt-dlp or a thin, graphical, wrapper around yt-dlp"

Just give me a website where I can plug in the DL link and download it to my hard drive. I don't care what package they are using (I don't worry about malware like I did in the 90s). 99.999% of people are not programming tinkerers.

Just makes me realize how subjective search results are. All of their "Great" results are my "Terrible" results.


Malware or well, the actual viruses, in the '90s were a joke, especially because a computer was an isolated thing. Connected computers were the exception.


In the early 90s, yes. By the turn of the century the current industry we see today existed in basic form: malware stole credit cards, compromised PCs were used to send spam as part of botnets, etc. The only major advance was when cryptocurrencies made it much easier to launder money and the professionalism went up accordingly.


The first result on Kagi is exactly this, just tried it a moment ago. It processed and downloaded the video extremely fast. Why would any reasonable person prefer youtube-dl?


IMO, if you're capable of running yt-dlp, it's far better than any website.

It's pretty simple to run these download tools as website, but it's expensive in terms of bandwidth and tends to attract legal attention. So a lot of websites go up supporting it, but even if they were started with good intentions, they will virtually all eventually add intrusive ads or other types of monetization just to break even. So there's never going to be a reliable website for it. If you're lucky, a search engine will send you to one that's working okay right now, but even odds you'll be fighting through a dozen malware nests.

Meanwhile, yt-dlp just works every time, with only an occasional pip upgrade to keep it up to date.


Totally, As the sibling said, it is the same using Google. I am not sure, why anyone would want a programming package to accomplish a task that could be done in < 10 seconds.

But again, I guess that's why search is so hard is because I have to parse that intent from 3 words.


Over here the first result on Kagi is savefrom.net which variously tries to install malware or sell a paid subscription and does not download videos.


It was the same domain for me, but maybe I should have tried it without adblockers? The page downloaded the YT video completely fine for me.


It is the same using Google.


What always confuses me about the „search has gotten so bad“ mentality is that it is often based on anecdotal evidence at best, and anecdotal recollection at worst.

Like, sure, I have the impression that search got worse over the last years, but .. has it really? How could you tell?

And, honestly, this should be a verifiable claim; you can just try the top N search terms from Google trends or whatever and see how they perform. It should be easy to make a benchmark, and yet no one (who complains about this issue) ever bothers to make one.

Dan at least started to provide actual evidence and criteria by which he would score results, but even he only looked at 5 examples. Which really is a small sample size to make any general claims.

So I am left to wonder why there are so many posts about the sentiment that search got worse without anyone ever verifying that claim.


I think the point he's trying to make that the search results page from the mainstream search engines are a minefield of scams that a regular person would have difficulty navigating safely.

If he was looking at relevance, yours would be a solid point, but since most of the emphasis is on harm, a smaller sample works. Like "we found used needles in 3 out of 5 playgrounds" doesn't typically garner requests for p-values and error bars.


I think this is a good illustration of my frustration with this discussion: I don't think search has gotten bad, I think the web has gotten bad. It's weird to even conceptualize it as a big graph of useful hypertext documents. That's just wikipedia. The broader web is this much noisier and dubious thing now.

That's bad for google though! Their model is very much predicated on the web having a lot of signal that they can find within the noise. But if it just ... doesn't actually have much signal, then what?


The web has gotten bad because of what big search engines have encouraged. If they stopped incentivizing publishing complete garbage (by ruthlessly delisting low quality sites regardless of their ad quantity, etc) then maybe we'd see a resurgence of good content.


I don't think so. I think it's the inevitable outcome of giving all of humanity the ability to broadcast without curation.

Or maybe we're saying essentially the same thing, but you think search engines should be doing that curation. But that was never my conception of what search engines are for.


I think we are indeed saying the same thing. However, I would like search engines to do some curation -- specifically, to remove results that deliver malware, are clones of other sites, and are just entirely content free (eg Microsoft's forums).

I'll give Google credit: I haven't seen gitmemory or SO clones in a while. It took a few years but they seem to have dealt with them.


I disagree, the bad sites people are talking about are spam, not bad personal takes. They are written by people being paid to churn out content. This is now being done with AI. This is a result of search engines listing them.


I don't think the definition of "spam" is nearly as objective as this suggests it is.


The web is bad because it is both popular and commercial. Every now and then I fantasize that just finding a sufficiently user-hostile corner would suffice to recreate the early internet experience of an online world nearly exclusively populated by anticommercial geeks.


I understand this is the tactic the Gemeni folks are using.


But there's still plenty of signal. It isn't as if there are no working YouTube downloaders, or factually correct explanations of how transistors work. It's just that search engines don't know how to (or don't care enough about) disambiguating these good results from the mountains of spam or malware.


I think that both of you are correct. The internet has much more "noise" than in the past (partially due to websites gaming SEO to show up higher in Google's search results). As a result, Google's algorithm returns more "noise" per query now than it used to. It is a less effective filter through the noise.

Imagine Google were like a water filter you install on your kitchen faucet to filter out unwanted chemicals from your drinking water. If as the years progress your municipal tap water starts to contain a higher baseline of unwanted chemicals, and as a result the filter begins to let through more chemicals than it did before, you'd consider your filter pretty cruddy for its use case. At the bare minimum you'd call it outdated. That is what is happening to Google search


On the one hand, I'm not sure the data corroborates that. If this is a web problem and not a search engine problem, then I'd expect every search engine to have the same pattern of scam results.

I'd also argue that finding relevant results among a sea of irrelevant results is the primary function of a search engine. This was as true in 1998 as it is today. In fact, it was Google's "killer feature", unlike Altavista and the likes it showed you far more relevant results.


Relevant is a difficult concept to agree on. In 1998 it was more about X != Y, that is being shown legit pages that just were not the correct topic.

These days the results are apt to be the correct topic, but instead optimized for some other metric than what the user wants. For example downloading malware or showing as many crypto ads as possible.

I don't expect every search engine to have the same scam results. Scammers target individual search engines with particular methodologies. Google does a lot of work to prevent crap on their engines, the issue is the scammers in total do far more.


If the web is being polluted by a nefarious search engine provider that is excluding the polluted pages from their algorithm, you wouldn't see the same pattern across search engines

Not saying or even suggesting that's happening, but the logic isn't airtight


Well, there's always the Münchaussen trilemma, by which no reasoning is airtight.


> I think the point he's trying to make that the search results page from the mainstream search engines are a minefield of scams that a regular person would have difficulty navigating safely.

Yes, and he makes the point well. It also means if you are part of the 0.49% of people who use Firefox on Android, he isn't talking about your experience. I find Firefox mobile remaining at 0.49% utterly inexplicable, which I guess just goes to show how out of touch with the mainstream I (and I assume most other people here) are.

It's not just ad blockers. My first attempt at a tyre width query got relevant results, mostly because "tyre grip" looked so bad as a search term so I used "traction" instead. In the mean time, friends of my age (60's) can't get an internet search for public toilets to return results they can understand. When I try to help them, their eyes glaze over in a short while and they wave me away in frustration. These mind games with google hold no interest for them.

I am regularly bitten with one thing he mentions: finding old results is hard, and getting harder. It makes it really hard to find historical trends ("am I wrong about what it was like back then?") really difficult.


I agree we can say "this is a minefield of scams" without doing a comparison.

There still is a question about when it got bad--I think Dan mentions 2016 as a point of comparison, and there were plenty of scams back then, so you might wonder whether the days when a query wouldn't return many scams.

If you go back far enough, then there wasn't the same kind of SEO, and Internet scams were much smaller/less organized, but that's a long time ago.


I think the automation tools for scams are what the major change is. In the distant past it was humans doing this, now I'm guessing there are a few larger businesses and likely nation states that have a point and click interface that removes 99% of the past work.


I don't think this is a fair criticism.

1) The step where you evaluate "how they perform" is necessarily subjective.

2) you could design a study and recruit participants but that isn't something a blogger is going to do.

3) He does link to polls where people agree with the idea the result have gotten worse. Yeah, there are sampling problems with a poll, but its better than nothing.

In this case especially, the writer is answering the question: "Whose results are best according to my tastes?"


What always confuses me about the „search has gotten so bad“ mentality is that it is often based on anecdotal evidence at best, and anecdotal recollection at worst.

I can't speak for anybody else, just trying to find stuff online, not writing a treatise about it or writing my own engine to outcompete Google. It's been asked many times here over the years and the answer was always explanations, never solutions.

Shittification does not happen overnight, but along many years. It started with Google deciding that some search terms weren't so popular: "did you mean...?" (forcing a second click to do what you intended to do in the first place) and went downhill when qualifiers to override that crap got ignored.

For me enough was enough when I realized that a simple query with three words, chosen carefully to point to the desired page, gave thousands of results, none of them relevant. YMMV.


Dan approached the problem from a qualitative perspective. Perhaps if more people took this approach over quantitative maximalism we would actually have products that don’t drive us fucking insane.

All that matters is the overwhelming sentiment that search has gotten worse, not the same fucking spreadsheet that got us here in the first place!


To do this you would need to have a comprehensive definition of "quality", and that's anything but easy, and it will be at least partly subjective. It's also hard to include omissions in your definition of "quality" (and again, what should or should not be omitted is subjective as well).

For example, let's say I search for "Gaza"; on one extreme end some engines might only focus on recent events, whereas others may ignore recent events and includes only general information. Is one higher "quality" than the other? Not really – it depends what you're looking for innit?

All you can really do is make a subjective list of things you find important and rate things according to that, and this is basically just the same thing as an anecdotal account but with extra steps.


Some things are easily quantifiable, but very few. Such as the number of ads per search. Back in the day google had at most 1 and it was visibly distinct from the rest of the links.

Otherwise, yeah, maybe search didn't degrade but the internet got more spammy. Or maybe users just got wiser and can see through the smoke screen better. Who knows...

Doesn't change the fact that today one has to know how to filter through pages of generic results made by low effort content farms. Results that are of dubious validity, which at best simply waste your time. Or through clones of other websites (i.e. Stackoverflow clones).

Search engines can choose to help with that (kagi certainly puts in the effort and I love it for that), or they can ignore the problem and milk you for ad clicks.

Anecdotal evidence is good enough for me.


> Dan at least started to provide actual evidence and criteria by which he would score results, but even he only looked at 5 examples. Which really is a small sample size to make any general claims.

US NIST, in their annual TREC evaluation of search systems in the scientific/academic world, use sets of 25 or 50 queries (confusingly called "topics" in the jargon).

For each, a mandated data collection is searched by retired intelligence analysts to find (almost) all relevant result, which are represented by document ID in general search and by a regular expression that matches the relevant answer for question answering (when that was evaluated, 1998-2006).

Such an approach is expensive but has the advantage of being reusable.


So you're confused why other people aren't doing research for you and when they do provide some evidence, you dismiss it because it's not a large-scale scientific inquiry into search quality? Get frickin a grip.


Every time I encounter an egregious poor result in DDG I document it with images. I have a directory of them over the last few years. However I encounter so many now, while when I first began using DDG just a couple years prior to that it was less of an issue (and I fully switched at the time). So yeah, I don't have before/after comparisons but it's a little more solid than just 'I feel the results are worse' being characterized here.

There are particular search parameters that DDG changed the behavior of, including exclusion and double quoting, which are now, according to even their own docs, more a hint of the direction results should go rather than any explicit/literal command (ime these virtually never work, which was a motivation for documenting failures, and they actually removed them from their docs temporarily at one point earlier this year).


Yes to get an accurate comparison we would need to have results from queries 10 years ago.

I still remember myself having to really often go to page 3 and more of google searches to find things even really early on.

I think it has never been good, got a bit better before SEO farms took all the gain out. That's my feeling with nothing to back it.


> So I am left to wonder why there are so many posts about the sentiment that search got worse without anyone ever verifying that claim.

I suspect it has gotten worse, so posts complaining about it resonate. But, it is not really a huge problem, and anyway it isn’t as if there’s much I can do about it, so I’m not going to bother collecting statistically valid data.

I think this is generally true about a lot of things. We should be OK with admitting that we aren’t all that data-driven and lots of our beliefs are based on anecdotes bouncing around in conversations. Lots of things are not really very important. And IMO we should better signal that our preferences and opinions aren’t facts; far too many people mix up the two from what I’ve seen.


When it comes to human psychology what we believe tends to be more important than what is when it comes to future predictions of our actions. If people think search sucks then it's likely they'll use less of it in the future and it opens up companies like Google for disruption.


Internet Archive remembers. https://web.archive.org/web/*/google.com/search/%2A

Find a query of interest, see for yourself (and take a snapshot of the present state for posterity).

The api enables more powerful queries, https://web.archive.org/cdx/search/cdx?url=google.co.jp*&pag...

Also try other search engines and languages.


Even without looking at the subjective quality of search results, the sheer user hostility of the design of the Google search results page is an obvious, objective instance of how search has enshittified.

That is, in the early days, Google used to highlight that "search position couldn't be gamed/bought" as one of their primary differentiators, ads were clearly displayed with a distinct yellow background, and there weren't that many ads. Nowadays, when I do any remotely commercial search the entire first page and a half at least on mobile is ads, and the only thing that differentiates ads from organic results is a tiny piece of "Sponsored" text.


> has it really? How could you tell?

Yes it has and for a certain class of queries it's not even open for debate, because Google themselves have stated they deliberately made it worse. And they really did, it's very noticeable.

This class of queries is for anything related to any perspective deemed "non authoritative". Try to find information that contradicts the US Government on medical questions, for example, and even when you know what page you're looking for you won't be able to find it except via the most specific forcing e.g. exact quoted substrings.

Likewise, try finding stories that are mostly covered by Breitbart on Google and you won't be able to. They suppress conservative news sites to stop them ranking.

15 years ago Google wasn't doing that. It would usually return what you were looking for regardless of topic. There are now many topics - which specifically is a secret - on which the result quality is deliberately trashed because they'd prefer to show you the wrong results in an attempt to change your mind about something, than the results you actually asked for.


Probably for the same reason that there are so many more posts about anything that make claims than that explore evidence systematically, especially when the people making the posts stand to gain nothing by spending their time that way.

I encounter claims that "protobuf is faster than json" pretty regularly but it seems like nobody has actually benchmarked this. Typical protobuf decoder benchmarks say that protobuf decodes ~5x slower than json, and I don't think it's ~5x smaller for the same document, but I'm also not dedicating my weekend to convincing other people about this.


The problem with benchmarking that claim is there's no one true "json decoder" that everyone uses. You choose one based on your language -- JSON.stringify if you're using JS, serde_json if you're using Rust, etc.

So what people are actually saying is, a typical protobuf implementation decodes faster than a typical JSON implementation for a typical serialized object -- and that's true in my experience.

Tying this back into the thread topic of search engine results, I googled "protobuf json benchmark" and the first result is this Golang benchmark which seems relevant. https://shijuvar.medium.com/benchmarking-protocol-buffers-js... Results for specific languages like "rust protobuf json benchmark" also look nice and relevant, but I'm not gonna click on all these links to verify.

In my experience programming searches tend to get much better results than other types of searches, so I think the article's claim still holds.


I agree. You wouldn't use encoding/json or serde-json if you had to deserialize a lot of json and you cared about latency, throughput, or power costs. A typical protobuf decoder would be better.


If you wanna know why Google (or any search engine) sucks, just look at how it measures its own search results. Most search companies do this “at scale” according to very specific guidelines, like what the author did here but on steroids. For example, take a look at Google’s 168-page instruction manual for search quality raters:

https://static.googleusercontent.com/media/guidelines.raterh...

It talks about figuring out a query’s meaning(s), judging the user’s intent (were they looking for some specific answer, etc.), evaluating the “quality” of a website, rating the site’s usefulness in relation to the query’s meaning/intent, etc.

All this is to say, it’s not that search companies don’t do exactly what the author did here, it’s just that they have different standards than the author. And I’d venture their standards match their users’ better than the author’s, but maybe not or not forever, anyway.


I really don't think that's true. For example, page 29 of your link describes "Lowest Quality Content." Most of the search results that the author rated as spammy or scammy clearly fit these guidelines, which means that either (1) the raters aren't knowledgeable enough about the subject matter to determine that the website they're rating is harmful or misleading; or (2) the raters are rating these sites correctly, but it still isn't having the desired effect.


> If you wanna know why Google (or any search engine) sucks

While I obviously don't know it may be related to how Google believes a "normal" person search. I have come to view Google as a product search engine/price comparison site, that's what it's great at. Google can find you the most relevant products for any purchase you may consider, so maybe that's what Google has optimized for. The majority of my searches are related to IT, programming, software and computers in general, but what does "normal" people search for. They search for products, news, opening hours for a store, Google is pretty decent at that, but the money is in the "go buy something". The ads on a product search on Google is always way more accurate than the actual search result.

I think Google has optimized for selling products.


Why would an average user want blog spam search results?

My hope is as LLM’s improve, they can be more discriminating about the results returned.


> Why would an average user want blog spam search results?

I didn’t say they would :)

In fact, I can’t figure out how your comment relates to mine. Are you claiming that Google doesn’t factor blog spamminess into its evaluation of search results? If so, that’s quickly put to bed by the document I linked, pretty much section 4.6. Excerpt:

> Creating an abundance of content with little effort or originality with no editing or manual curation is often the defining attribute of spammy websites.

You could claim that they fail to capture some essential quality of “blog spamitude” or that they don’t weight it heavily enough in their eval but to say they just, like, don’t know about blogspam over there, is pretty far fetched IMO.


I was responding to this part, “And I’d venture their standards match their users’ better than the author’s”, which I understood you to mean people like these seo’d results.


I'm sorry but the very first request is completely wrong. When people search for a YouTube downloader, they want a website that allows to download a YouTube video, not a command line tool. And the first results given by Google do that. I'm one of the people that think Google search became bad but it's not because of the kind of search


That's the tricky mind-reading aspect about search intent.

Different people have varying expectations as to what they want to find with the same query. I'd definitely want yt-dlp in favor of some website.


it's easy: just append command line to the query like you would append android app if you wanted and android app


That is a user POV solution, speaking from the search engine POV.


Based on your handle, I suspect you have much better insight into this than the rest of us!

But can the search engine mind-read by assuming Windows users don’t want to use a command-line utility?


They can based on user tracking and profiling, but that's murky waters I personally don't want to dip into.


I assume you meant to say you don't want to! :)


Yeah I accidentally a word.


I think that Newpipe would be better than some website or even yt-dlp...


They do not do that, have you tried using them?


the two i tested were downloading the videos


I'm not able to reproduce the author's bad results in Kagi, at all. What I'm seeing when searching the same terms is fantastic in comparison. I don't know what went wrong there.

In the Youtube Downloader search, NortonSafeWeb is nowhere to be found. I get a couple of legit downloader websites, and some articles from reputable tech newspapers on how to use them or command line tools.

In the Adblock search, ublock Origin is #3, followed by some blogs about ad blocking ethics debates and the bullshit Google has been pulling recently.

In the wider tires grip search, #3 is a physics blog that dives deep into the topic.

In the transistors search, the first reddit link directly answers the question in very similar wording to the hypothetical correct answer spelled out in the rubric. 4/5 of the reddit results are on the correct topic, followed by two SuperUser questinos also on the correct topic, then some linus tech tips and toms hardware articles, also on the correct topic. No Quora questions.

In the vancouver winter snow search, the first several results are from local news papers talking about the anticipated effects of el nino on snowfall, and then a couple of high-quality blogs and weather sites.

Really wondering how Dan got such bad results.

------

Aside from that, the way that the author expects all the results to return the same kind of thing is just... weird? Like, that's not how search engines are supposed to work. A search that gives you 10 links to fundamentally the same thing is a bad search. Search results should cover a breadth of reasonable guesses for what you should be looking for given a query. If you search for "download firefox", and you scroll past the first 5 download links, then you're probably not actually looking for a download link and a blog post about firefox is not "irrelevant" and shouldn't be points against.

This opinion is even borne out in search engine quality metrics that have been industry-standard for decades, like mean reciprocal rank and distributed cumulative gain. What matters is how far you have to scroll to get to a good result, not what proportion of the first N results are good.


What region? I get similarly bad results with international (and a quick check with region US also didn’t improve things) and uBo at only #5, and ytdl at #12. And I already have github on "raise" and a bunch of domains blocked (not many though)

For the transistor query, it’s a very "googly" way of writing a query, when I saw the results I instantly felt like rewriting it and the first try gave much better results with "Why keep cpu transistors getting smaller?". Caveat that the results look better and more topical, I don’t know what a good answer would be, also why I didn’t evaluate the tires or Vancouver weather (I tried a local search for my cities weather, and while the first result was unreleated, the 2nd was okay)

edit: This whole thread made me finally create a file for documenting bad searches on Kagi. The issue for me is usually that they drop very important search terms from the query and give me unrelated results. But switching to verbatim or "forced terms" also prevents any kind of error correction of the search. This used to be one of my main annoyances with DDG back then, and Kagi did not have that issue during the early days.


I have a new Kagi account with no custom rankings and I see the same terrible results. Basically the same as what he describes. yt-dlp is not found at all, the 2010 link to youtube-dl, and a bunch of spam sites.


Same here, I was curious about Kagis low ranking, and couldn't replicate the search results. Also saw ublock Origin on #3, good results for tires, transitors and snow, etc. I've never used any of the Kagi search result weighing features.

Ctrl+F on the page for "System prompt" doesn't show any hits. Given how important those are for ChatGPT (another thought - was the author testing GPT3.5 or 4?) I'm not sure how much weight to put into the ChatGPT results either.

Not sure how much I can take away from this comparison.


I asked GPT-4 about Youtube Downloader and it rambled on about how downloading videos is against Youtube’s TOS and I should buy YouTube premium which has the download feature.

Getting any useful data from GPT-4 about anything even remotely “illegal” is a waste of time.


With a better prompt, you can get it to list some, but it’s very annoying to do so.

Mistral showed that their medium model is far better (yet not good), and the same prompt as in the article gives only one instead of 3 paragraphs of rambling about copyright, and then lists 3 categories of options with examples for each (not good, because ytdl is not one of those listed).

Funnily enough, both mistral and GPT4 apologize profoundly and almost with the same wording when asked "Why did you not mention the very popular, free and open source "youtube-dl" software?" and then mention how/where to get it and how to use it.


> Funnily enough, both mistral and GPT4 apologize profoundly and almost with the same wording when asked "Why did you not mention the very popular, free and open source "youtube-dl" software?"

Likely because they were optimized for general population, which would not have a use for command line python utility.


I’m clear why they didn’t include it, I wanted them to tell me why, though. And I thought that both of them apologized in almost the same way, was funny.


It's plausible that mistral trained on GPT-4 output and therefore has similar mannerisms.



The author already alludes to the fact that you can probably prompt-engineer around this and indeed, as soon as I added a blurb like "these are my own videos that I own the copyright to" it did suggest a bunch of third-party tools and let me ask it about what third-party tools I could use.

It suggested '4K Video Downloader', 'YTD Video Downloader', 'JDownloader' and 'Clipgrab' at first and when I asked for cli tools it came with 'youtube-dl', 'yt-dlp', and 'ffmpeg'

Those seem pretty reasonable results to me but I'll readily admit I don't know (yet) if 'most users' would ask these follow-up questions.


So it has also become one of the glitterati. That didn't take long.


claude.ai produced pretty reasonable results.


I'll second the chorus of those curious to hear how you've customized the search engine. I was able to reproduce the lackluster results, and was sadly disappointed. I expected what you seem to have found, that Kagi would outperform.

A specific example: for "ad blocker" the first result was some paid ad blocker and ublock was down the page below the fold.


I use Kagi because I'm trying to remove Google from my life, but their text search is worse than Google in my experience, and the image search is abysmal. I'm wondering how long I can keep this up. I already revert to Google for image search, and am finding myself using either Google or ChatGPT over Kagi more and more for text as well.


Kagi had a pretty substantial image search update just few days ago [1]. Do you still the issues with it?

[1] https://kagi.com/changelog#2793


Good info - will experiment!

It's already performing better on a (n=1) test I tried.

"Talos Principle 2". (Video game sequel) Previously (~5 days ago), Google returned various screenshots etc from the game `The Talos Principle 2`. Kagi returned mostly results from `The Talos Principle (1)`. Now the latest Kagi results are a mix, mostly from 2. So, it does look like it fixed this query.


have you customized your results and lowered or raised many domains?


Kagi is awesome for me too. I just realize using Google somewhere else because of the shit results.


The issue with traditional search engines is that keyword-first algorithms are extremely gameable.

Try https://search.metaphor.systems - it's fully neural embeddings-based search. No keywords, only an embedding of what the actual content of a webpage is.

So in the mentioned example of searching for Youtube downloaders, with Metaphor you'll get only Youtube downloaders (https://search.metaphor.systems/search?q=This%20is%20the%20b...)

Full disclosure - I work there :p


How is that different from keywords? Embeddings aren't magic, they're just page content. Content is trivial to game since it's controlled by the website owner.

edit: The results are also from my quick QA not that great. Searching for "what is the best mouse to buy" leads to links to buy random mice versus review summaries or online discussions on mice. One of the recommended queries of "Here is a great fun concert in San Francisco" leads to some really bizarre results in non-English languages that have nothing to do with either SF or concerts.

edit2: Also, Google has been using LLMs part of their search since at least 2018 so definitely not just keyword matching there.


Yup, definitely still gameable but if the model learns what high quality content is like and what high quality webpages there are (which it does), then the only way to game would be to be great :)

For your search - I would recommend turning autoprompt off and searching something like "Here is a great summary of the best computer mice to use:".

Our embeddings model is trained on how links are talked about on the Internet, if that helps with querying. So you have to query like how someone would refer to a link before sharing it


> Our embeddings model is trained on how links are talked about on the Internet, if that helps with querying. So you have to query like how someone would refer to a link before sharing it

So it's not high quality web pages but web pages that people talk about a lot which is expected since no one has an oracle that says what high quality is. The embeddings are merely a proxy and generalization for "how links are talked about on the Internet." That can be gamed at scale just like every other signal any popular search engine has been based off of.


That's true. Although should be much harder


The first result vtubego.com is a 144MB downloader app. The page contains "Pricing Plans Lorem ipsum dolor sit amet, placerat verterem luptatum phaedrum vis, impetus mandamus id vix fabulas vim." above its 3 paid plans (there is no free plan).

I haven't installed the downloader app, so I'm not sure if it lets me download youtube videos for free.

The second result "ytder.com" is a redirect to "https://poperblocker.com/edge/" which seems to be a browser extension for Microsoft Edge that protects the user from the Holy See. I'm not using Edge and I'm trying to download a Youtube video.

The third result download-video.net says that it can download videos from a list of sites. Youtube is not in the list, but let's try anyway. If you put "https://www.youtube.com/watch?v=IkYVmtgxebU" into the text box and click "download" you get "500 SyntaxError: Unexpected token '<', ""

At this point I gave up, but please let me know if any of the results work.


This is excellent!

Definitely excited to see how it holds up to daily use.

So far it gave me exactly what I wanted at the top for all of my test queries that were well formed.

As for asking “ignorant” questions both your service and the goog failed where phind gave me an actionable starting point (after a prodding follow up question: https://www.phind.com/search?cache=hmul4znpn7y4ei6qa64fosmc )

“max-height like css property for top and left”

Unsure if this sort of thing is even a goal of your project, but you won over a new user.

Wish you and your team all the best.


> with Metaphor you'll get only Youtube downloaders

I clicked into the top 5 results, none of them were real youtube downloaders that worked, so I clicked the next 5 results, then I finally got one single (really slow) downloader that worked. 1 out of 10 top results


https://getthatvideo.com/ Is the first result for downloading YouTube videos. Seems super sus (especially since the site doesn’t load).

Auto-prompted to: "Here's a helpful website for downloading YouTube videos:"

Also, this result is horrible:

“What does it mean if someone is not covered in nfl football?”


>it's fully neural embeddings-based search. No keywords, only an embedding of what the actual content of a webpage is.

What prevents websites from gaming their embedding? Switching to a similarity search doesn't prevent the results from being gamed.


So far so good. I'll try using this first from now on, and see how it does. Good luck!


How do you deal with dynamically/contextually generated content? And how about paywalls and login-required content?


Do our best at getting the right content.

For paywalls/login - we play pretty straight, always obey robots.txt, etc.


Just wait until the content farms adapt


I noticed that the author uses ChatGPT3.5 rather than 4, which is a rather large difference. I don't have the knowledge to rerank all questions the author asked, but I will say that a test of ChatGPT 4 leads me directly to youtube-dl, which is better than every other search engine listed.


That was the first thing I checked reading the article. Although the argument would be 3.5 is free - any comparison of systems against ChatGPT that isn't using ChatGPT 4 can be dismissed almost out of hand; there is not much point talking about ChatGPT if it's not using ChatGPT 4 and making proper use of its capabilities.

That is not to say that there aren't valid criticisms of and shortcomings in ChatGPT 4 - just that it's not useful to say ChatGPT when it's referring to 3.5


This is silly, most people aren't going to pay for ChatGPT, just like they won't pay for Google or DDG. So using 3.5 in this case is perfectly acceptable when we're talking about free software.


Kagi isn't free, that's on the list


>any comparison of systems against ChatGPT that isn't using ChatGPT 4 can be dismissed almost out of hand

Does everyone or even most use ChatGPT 4? The most used version is -of course- by far the most relevant.


ChatGPT 3.5 was great, until 4 came out and now it is garbage in comparison.

But I suppose what I really want is for everyone who includes ChatGPT in comparisons to explicitly say which version they are using (and, if they are using 3.5 in their comparison I hope they at least try 4 first) and definitely not just say "ChatGPT" when they only mean 3.5. The difference really is that stark.


He gives the full queries - do you have chat 4.0 that you ran run it against?


Sure. Bear in mind I have custom instructions active - which, if you want to make full and proper use of ChatGPT, you should configure, along with customised GPTs - so I get lots of dot-point descriptions, because that's what I've asked for.

Also I would not normally write ChatGPT queries the same as I write them for search engines but for the sake of comparison, I'll use their queries verbatim except where my custom instructions affect the context too much.

> download youtube videos

https://chat.openai.com/share/3e18e4f0-5527-4479-8a2f-ef17bd...

I got - good results. They got - "Very bad results (fails to return any kind of useful result) ChatGPT: basically refuses to answer the question, although you can probably prompt engineer your way to an answer if you don't just naively ask the question you want answered".

> [What] ad blocker [can I use?]

https://chat.openai.com/share/e1985d7a-c89f-4b5e-bb59-70bd11...

Looks good to me

> download firefox

https://chat.openai.com/share/3a62e5ae-8dbd-4179-8eb0-cc38ee...

Also good

> Why do wider tires have better grip?

> [Provide links to scientific sites that describe] why wider tires have better grip?

https://chat.openai.com/share/8cbcd1dc-b23f-41f3-83ad-f43f3d...

Honestly, I have no idea if this is a good answer or not. But I don't use ChatGPT for answers that I don't have confidence that I can determine its veracity; if I needed to know this with certainty, I'd use ChatGPT as a jumping off point for my own research.

> Why do they keep making cpu transistors smaller?

> [Provide links to scientific sites that describe] why do they keep making cpu transistors smaller?

https://chat.openai.com/share/dbb97ac0-840c-402c-a917-657af6...

> vancouver snow forecast winter 2023

> Environment Canada winter 2023

https://chat.openai.com/share/aab017d7-f86b-49c9-b5c0-86a0b1...

I don't know if almanac.com is any good but giving it the specific "Environment Canada winter 2023" query gave the expected very good result.

I think ChatGPT 4 generally provided very good results for the test queries, if you tailor the queries just slightly for the format


> I will say that a test of ChatGPT 4 leads me directly to youtube-dl

And yet to other people it starts rambling about how that’s wrong and you shouldn’t do it and doesn’t give a usable answer.

https://news.ycombinator.com/item?id=38822040

I boggles the mind the extent to which people salivate over a system that cannot decide between a correct straight answer, something wrong but plausible, something wrong and impossible, or outright refusing to answer.


That's GPT 3.5. It sounds like you have a bit of an axe to grind with ChatGPT, but if you're going to do so, do try to grind it on the correct version.


The comment says it’s v4. Since there’s no information on the page either way (funny, considering the original complaint), I took them at their word. If you don’t believe them, that’s up to you.

For what it’s worth, I do have access to v4 and it did give me an answer right now. But since I also know even v4 can give you wildly different answers to the same question even if you ask them one right after another, that doesn’t prove it either way.


I’ve come to recognize that any article that uses 3.5 has an agenda.


I also suspect as much, but obviously can't know for sure. IMHO it's intellectually lazy if not dishonest to benchmark against 3.5 and not make that fact clearly known upfront

A better benchmark would have had two entries for ChatGPT, showing both 3.5 and 4 results


The agenda of not wanting to pay for something just to test it out when there is a free version?


The agenda of using the significantly shitty version to try to paint it in a poor light.


> Here's a fun experiment to try. Take an open source project such as yt-dlp and try to find it from a very generic term like "youtube downloader". You won't be able to find it because of all of the content farms that try to rank at the top for that term. Even though yt-dlp is probably actually what you want for a tool to download video from YouTube.

Is that true? Do most people want to install a command line tool to download youtube videos?


No. They want sites like savefrom.net - which is hit number one on Google.


Did you try using savefrom.net? You can type "https://www.youtube.com/watch?v=IkYVmtgxebU" into the text box and hit "Download". Then you'll get a new tab that tries to get you to install malware. If you decline to install it, the new tab takes you to the malware's homepage. If you close the tab and go back to the original tab, savefrom.net presents you with an error message saying "The download link not found." and does not help you download the video.


I tried this. I went to savefrom.net. First thing it does is ask permission to send notifications.

After that there is a popup asking me if I want to continue in the browser or download their app. If I click download, it downloads a file called download_helper_2.3.27.apk.

Instead of downloading their app, if I paste a YouTube link, it tells me I can wait or download their APK to skip waiting. The download link downloads an older version called download_helper_2.3.19.apk.

When I do the process again, instead of the older APK link it gives me a Chrome extension link. But if you look at the instructions you see that it's not a Chrome extension, but a minified userscript. And it has `@include https://*` so it can basically run on any website regardless of clicking on an extension icon like regular browser extensions.

If I try to ignore all the distractions and wait for the download link, I can click it and it downloads the MP4 file. But it also opens a popunder with the domain https://refpamjeql.top/.

Not the best experience, and seems like a high risk of getting malware, but it does get an MP4 file at some point.


Interesting! I tried again and got completely different results this time. Now there's no malware tab, and instead it tries to get me to pay for a subscription to download high-quality videos or MP3s. If I click the barely-visible "Just let me download in my browser with low quality" below the paid subscription button, I get the same error as before.

Edit: the paid subscription payment flow says I'm actually buying "Televzr Premium Max Subscription for

1 Month_mp

Televzr helps get wireless access to the media library on the computer from the mobile phone"

So it purports to be something unrelated to downloading youtube videos. I didn't pay 1400 yen for it, so I won't get to find out if it helps me download youtube videos.


You would think that a browser extension would be better than a website.... that's what I do with Soundcloud anyways....


Search was the biggest feature of the web in the early '00s. Now it's such a mess. I can't imagine Search will ever be amazing again, given all the complexity of providing quality while still avoiding all the crap.


Is it actually more complex to provide good results, or is it just more profitable not to?

I have a hard time believing an organization like Google doesn't have the resources to provide a search engine that's just as usable as what they had 6 years ago (around the time I feel like the decay really set in). Seems a lot more likely that it's just more profitable to serve up garbage sponsored content.


Definitely more profitable not to. Especially as Google is an ad company, not a search company.

I’d rather see a world with numerous paid/subscription search engines, that are motivated to do nothing but return search results well. I expect you would see some of the SEO crap getting solved.


i cant remember where i read this, but something about how google ranks site that have google ads higher than sites that dont. makese sense, its evil, but makes sense, thats why we get all this scrapped spam. is there any more info on this?


Intentionally ranking sites with Google ads higher would be a huge antitrust liability, so no way they're doing that.

On the other hand, they can achieve virtually the same outcome while keeping plausible deniability by just not doing anything that would downrank sites with ads (of which a significant chunk is likely to be Google's).

Spam sites often include ads.


i dont think they public disclose that fact


It doesn't need to be public to become an antitrust liability. Internal written material can still come up during discovery, potentially even in unrelated cases.

Therefore the safest option is to never openly discuss it or intentionally do it and instead use other means to achieve the objective (don't intentionally rank spam higher, just defund/cancel any projects that would make it rank lower).


> Internal written material can still come up during discovery, potentially even in unrelated cases.

Yup, and I think we've seen how careless and thoughtless Google is, as an organization, with internal comms in the Epic case. It would be shocking if it hadn't been discovered in that, or a prior, lawsuit.


this is like focusing on one single problem as being the cause of the decline of the United States. It's actually a lot of things combining and there's not going to be one fix


wtf decline of the United States are you talking aboout


Google, or Alphabet, is not a search emgine company. It's an ads company and that's what they are optimizing for.


To me it is only due to the ads, google and bing return nothing but ads on the first page. Plus for me to have the joy of seeing these ads, I need to got through a CAPTHA that I need to try multiple times.

But all in all, a very good article


Search probably hasn't changed much, but the internet is very different.


Yeah, the problem is that there is so much low quality content, that search doesn’t (or can’t) do a good job of surfacing it above the noise. There is still some signal left, but it’s such a small fraction that it’s much more difficult to filter it out.

Having said that, I’m usually still able to find what I’m looking for, if I know that it likely exists, and know the keywords to use to find it. But it’s much harder nowadays for sure.


i have a radio that can "hear" down to -130dBm, i've proven this empirically. Cellular signals work at -12dB or more below the noise floor, wspr works even lower than that. Lightning is broadband noise, and yet i can still use digital stuff when there's lightning storms.

I don't buy the signal to noise argument. For example, whenever i get on youtube and get fed some content, i can immediately tell if it's had AI involved anywhere, and thumbs it down. I won't recommend it, i've called people out for linking such tripe to me (or others).

Hear me out - google got bad about 11 years ago when the dorking stopped being effective, right around the time of the spotlight search results and the sponsored junk taking the top results. Around this time, various agencies (news, etc) started gaming the SEO to respond to any remotely related search with whatever the news was currently. Google chose not to "fix" this, because we're not the customer. DDG was better for a few years for real results, too, but that has gone downhill as well.

The current zeitgeist uses stuff like tiktok and facebook for "web searches" - "food trucks near Austin, TX" or so. No one really uses web search like people on this site do, and google couldn't care less if we don't like the search results.


I wonder how much influence google had in lowering the content quality over the years? After all, most SEO spam was a direct response to all the ludicrous requirements they've forced the whole web into, which eventually only SEO spam were willing to commit to.

I also wonder if google just stopped existing, would the web heal over time?


The problem is that even if providers of the service are 100% trying to provide a great service, everyone on the web will always be min-maxing to appear on top.

So it's inevitably going to become crap.


The golden era of search results is very much over. Welcome to the pot-metal era.


I would love to see Perplexity.ai in the benchmark. It has completely replaced Google/DDG for information questions for me. I still use DDG when I want to do a navigational query (e.g. find the URL for a blog i partially recall the name).


While kagi was the product that most brought me joy in 2022, perplexity.ai has been the one for 2023, even though i only recently started using it. It's just been a joy to be able to iteratively discuss most of my searches.

EDIT: here's a search for tire (I don't know anything about tire, so maybe there's much better links out there, but this is pretty much what I was expecting. Not an ad or SEO in sight.) https://www.perplexity.ai/search/tire-3iuI9T6BQUSvu2tAhgsRmA...


I am wondering if you can use AI chat exclusively for your search needs? If not, what does the perfect integration looks like?


I've been really enjoying Perplexity as well. It's a much better Internet/search focused experience than ChatGPT, Bing, or Bard. For anyone interested, until the new year (~20 more hours?) there's a code for 2mo free Pro: https://twitter.com/perplexity_ai/status/1738255102191022359 (more file uploads, choose your model including GPT4)


Me too. I only heard about it this morning and it looks kinda perfect so far.


I think the result grading is too opinionated here.

For example, the first query is "download YouTube videos", for which Google is ranked "terrible" for not showing you a command line open source program. But the literal first result is an ad supported site where I can paste in a YouTube link and download it right from the browser. That seems like exactly what most people would want or to the CLI tool the author is searching for. The author seemed to be looking for sites without ads as what they wanted to see in search results more than search relevance.

Search is a very gamed system with a lot of SEO spam type results, but I think a much better analysis could be done for more meaningful results. Also I recreated some of the searches and got very different results (including ublock origin in the top three responses). Again, a more scientific ranking system could help uncover better data on searches.


The author describes that site as such, which seems fair to rate as "terrible":

> Some youtube downloader site. Has lots of assurances that the website and the tool are safe because they've been checked by "Norton SafeWeb". Interacting with the site at all prompts you to install a browser extension and enable notifications. Trying to download any video gives you a full page pop-over for extension installation for something called CyberShield. There appears to be no way to dismiss the popover without clicking on something to try to install it. After going through the links but then choosing not to install CyberShield, no video downloads. Googling "cybershield chrome extension" returns a knowledge card with "Cyber Shield is a browser extension that claims to be a popup blocker but instead displays advertisements in the browser. When installed, this extension will open new tabs in the browser that display advertisements trying to sell software, push fake software updates, and tech support scams.", so CyberShield appears to be badware.


That's how he described it but I tried it myself and found it perfectly functional to download a video with different options for size / quality. It has ads but not nearly as bad as described.

It's a service that is quasi illegal and explicitly breaks the YouTube terms of service. I think the search engine did a good job surfacing what was searched for, there just aren't going to be any free online YouTube downloaders without advertising.


Which web site did you use to successfully download a youtube video? Which youtube video did you download?


It'd be useful to know what site you used to verify - but if we're talking about the same site, IMO a website that presents Dan's experience sometimes, and your experience sometimes, is actively harmful.


Yeah, if one typed “YouTube downloaded cli” you the results the author was thinking.

It seems like the author wants to search to read their kind without specifying what kind of YouTube downloaded they want


I really don't agree with some of the expectations around results.

> Download youtube videos

> Ideally, the top hit would be yt-dlp or a thin, graphical, wrapper around yt-dlp. Links to youtube-dl or other less frequently updated projects would also be ok.

That's not what a random person expects. yt-dlp or youtube-dl have no meaning to a normie. The first result is an online downloader and that's what an average person is after. I checked the first result in Kagi and it's a valid youtube downloader.

If you're after a commandline tool, ask for it: "commandline tool download youtube videos" gives youtube-dl as the top result with valid options afterwards: https://kagi.com/search?q=commandline+tool+Download+youtube+...

"Ad blocker" seems to ignore other options exist. Yes, ublock would be preferable for most, but ABP is not "very bad". Kagi mentions ABP at position 1 and ublock at position 8: https://kagi.com/search?q=Ad+blocker&r=au&sh=4VHApDrTEfuxMOt... (But for a query like that, I'd be happy with a wikipedia article about adblockers, because why not?)

I'm not disagreeing that results have been getting worse for years, but... this is a really bad scoring system. It feels like that one very new person jumping on SO posting something like "syntax error: if 1 {" - what are you even asking for? (To be honest, the search engines could also give you the equivalent of "this is a very vague, would you like to specify what you're actually after? here are some suggestions: ...", but that's beyond the scope here.) The search returning not the exact thing you want to see for a super generic query, but returning a valid answer to a question is not "very bad".


If you try using it, the first result doesn't help you download a youtube video and does try to get you to install malware.



Interesting, over here that one gives me a 403 at "rr2---sn-p5qlsn7l.googlevideo.com" when I click "Download".


My thoughts exactly.


Weird article. Basically, the author thinks that anything that is not yt-dlp is a bad search result, which is pretty insane.

Like, for me at least, I already know yt-dlp exists. When I search "youtube downloader", it's exactly because I want an online-website page to download youtube videos.


The author would probably accept any result that helps them download youtube videos. Did you find any and successfully use it to download a youtube video? Could you provide a link to the one you used?


If I search "Download youtube video" on bing, literally the first result works fine for me (y2meta.net).


Thanks, y2meta.net seems to work over here too.


Kagi really shines on topics that are SEO-spammed on other search engines. I.e. when travelling to a touristic city, searching a recipe, or basically any product you want to buy. I actually got "search anxiety" searching these topics, as I know I will have to navigate a lot of SEO spam, content that is artificially blown up, and the core information purposefully hidden somewhere on the page - if any. Plus the multitude of cookie consent banners and newsletter subscription popups on each link...

I've been using Kagi's FastGPT [0] now for these searches, it basically removes all the bullshit and gives verifiable sources for any answers.

[0]: https://kagi.com/fastgpt


Yeah that’s my go-to as well. Interestingly, I often find that “Fast” mode results are as good or better than “Expert” mode for simpler tasks.


Ironically I had to use a search engine to discover what "Mwmbl" was. It's apparently a search engine. But, visiting the front page, I see something akin to a git commit log?! I'm not sure I'd have guessed that this was a SE if Brave Search did not tell me it was (even then I'm not convinced yet).

https://mwmbl.org/

Added: Interesting. Apparently it's allowed to edit the SERPS there. Which implies that I'm out, but (well, because) I've got a feeling which kind of Internet Entrepreneurs this factoid will appeal to


For me the problem is not just that searching on Google is bad, but that sometimes it COMPLETELY hides exactly what I'm looking, for no good reason.

For instance, I wrote an R ggplot2 package called "fedplot" (following the convention of calling the package for the figure style it replicates, as in "bbplot" for BBC-style charts).

Try searching for it on Google: "github" "fedplot" doesn't get you anywhere. Meanwhile, every other search engine gives you exactly what you want if you just type "fedplot". I even tried to add the relevant websites through google's suggested tools, and nothing happened :|


Searching for "fedplot" looking for https://github.com/sergiocorreia in the results:

Qwant: Result 1

Bing: Result 1

Google: Result 2

Marginalia: Zero results

ChatGPT 3.5: Some Federal Reserve dot plot nonsense and no useful results.


I would say Google has zero results, as it does not find https://github.com/sergiocorreia/fedplot nor https://sergiocorreia.github.io/fedplot/ ; even with the advantage of the latter being manually added to the Google Admin console.

Meanwhile, both Bing and Qwant give me exactly what I want


I don't know what customization or personalization is going on in your search compared to mine, but I followed GPs directions and posted my results. Searching today (this time not in Incognito and with extensions turned on in Firefox), I see the correct result as number 1 on google.com, so one better than last time.


You're never going to find github results on Marginalia as long as they block 3rd party crawlers :-/


Well, zero results are better than spam ;-)


Their black box semantic guesser has been told not to feed the radicalizing conspiracy theorist fires about federal plots.

Who needs to know anything about government owned land anyway?


I have found appending site:edu remarkably improves google results.

For both the tire question and with respect to a youtube dowloader, the first results were on the nose with the addition of site:edu on Google.

Why this is needed and whether a noncommercial, information rich web portal should exist are questions for another thread.


Wide tires by Jason of Engineering Explained: https://www.youtube.com/watch?v=kNa2gZNqmT8

Better answer: learn the differential equations in this book:

https://ftp.idu.ac.id/wp-content/uploads/ebook/tdg/TERRAMECH...


Kagi is great, it's now my daily driver for search. This is after I got tired of DDG, moved to Google (through StartPage), but the spammy result, or just irrelevant... and the fact that sometime they aren't any results even... for the most trivial search. So I switch recently to Kagi, and so far it's been smooth sailing and a real time saver.


I use serpapi for my hot RAG and the results are fine.

Brave search API is obscenely overpriced. I hope someone is working on Search because Google has become a singularly garbage company. Propping up DEI is sinful enough but just failing to compete is lame. /shrug


No mention of https://www.qwant.com


Meta: Since the text on the page is so dense, I tried reading it in Chrome's reading mode. Which was fine until the Appendix. All the results are missing, leading to confusion.


I also was overwhelmed by the amount of data. I came back here to find the cliff notes :)


mostly my search is now Wikipedia.

I'm probably in a very small group who have the entirety of English wikipedia (without images) on my Android (via Kiwix), and I just search that. 99% of the time that's all I need.

the only exceptions are super current things like weather (Windy), or travel (Navan work travel system gives me enough to just go direct to airlines, hotels, etc), and local (OSM via Organic Maps).

I've almost completely degoogled (not intentionally, but driven gradually by Google becoming crappy incrementally), but didn't really find a single generic replacement as much as I found far better single purpose tools.

I'm reminded of that Craigslist image showing how many startups were each competing against specific parts of Craigslist https://cbi-blog.s3.amazonaws.com/blog/wp-content/uploads/20... , and this is what it feels like is happening to Google.. they're being beaten in specific areas, but at the same time spam and crap is diluting their core product.


The intro query "youtube downloader" already showed me relevant results (some website where you paste an URL and bam download). I think there's a big tech bias in the whole post (how relevant is a mastodon poll, for real).

Not saying the current landscape doesn't suck with ads everywhere and incentives to not give exactly relevant results at times, but I think google is pretty good still.


Which web site did you use to successfully download a youtube video, and which youtube video did you downloadl?


What's most shocking to me is how much malware there is in all of this. The fact that Google et al aren't constantly in trouble for directly forwarding unwitting users to malware distributors indicates to me just how far our standards have fallen for a "good" search engine. I feel like we'd be happier with search engines that adhered to "first, do no harm" principles.


I have recently started using kagi after seeing a recommendation here.

From what I understand, it aggregates results from multiple sources rather than having their own indexer.

The results aren’t really any better, but the lack of ads and videos in the results makes for a cleaner experience.

I also haven’t yet taken advantage of the extra features to block certain websites from results.

Personally, I pay the $5 mostly in an attempt to support another competitor in the space.


Pretty sure the reason Kagi is better isn't because they use multiple sources, it's just because they can use the presence of ads as a negative ranking signal, something that none of the major public search engines will ever do as it goes against their own business model.


Kagi is awesome, so much better experience than Google!

Start using bangs, lenses and customized results ASAP, that makes a big difference.


I actually find myself using bangs way more since I switched to Kagi from DDG. I think it's the AI bangs like !chat and !expert that got me in the habit of using bangs besides !g (which I never actually use anymore).


I'm not sure youtube-dl is a good answer unless you're a nerd.

Which is a similar phenomenon to search. If you have sufficient tech skills there's a whole world of freely available software out there to complete your task.

If you're not then you are at the mercy of a range of commercial offerings (some built on the free software) that range from arguably scams to outright scams.


I don't understand the praise of Marginalia.

When I search for "Steve Jobs" on Marginalia, I got blogs about his speech in 2011 and some mailing list from 2007.

When I search for my own name I get nothing. In Google it's just me.

It's cool that one person built all this of course but... that's not a good search result compared to Google?

Maybe I miss something, maybe I use it wrong


What do you expect when you search for Steve Jobs? Also, which filter did you use?


By the way please don't take it as if I am taking you down or something

It's amazing what you did, it's just not a Google killer? or at least I don't see it


It's really not supposed to be either. Like it's designed to be the search engine you use when you can't find something elsewhere, so it's largely designed to show you different results than the ones you get on Google and Bing.

In general a lot of the complaints seem to be "I'm not getting what I expect from Google". Well... yeah. That's the point. If someone wants the same results as Google, they should arguably use Google.


I don't know I used any filters? I don't know what are filters sorry

I expect wikipedia article on Jobs as a baseline.


Ah, I downrank Wikipedia pretty hard :P


If you just want the Wikipedia article, you can go straight there. Why are you searching for it?


> When I tried running the query from the paper, "cellular phone" (no quotes) and, the top result was a Google Store link to buy Google's own Pixel 7, with the rest of the top results being various Android phones sold on Amazon.

Interestingly, if you add "before:2001-01-01" to the query, the paper that Brin and Page referenced shows up as the third result.

That this query now ranks phones you can buy higher than information about phones makes sense, since the web is much bigger these days and cell phones are much more widely accessible than they were back then.

> Although Google doesn't publicly provide the ability to see what was historically returned for queries, many people remember when straightforward queries generally returned good results.

See above. Sort of.

---

I wish Dan spent more time talking about Kagi. I, too, have found it terrible for searching for things to buy and some images but excellent otherwise.


The appendix describing the individual search results is both entertaining and scary e.g.

"Two of the top three hits are how to install the extension and the rest of the top hits are how to remove this badware. Many of the removal links are themselves scams that install other badware."


> However, there's a sizable group of vocal folks who claim that search results are still great.

I think that this very sentence shows the author's bias, because I feel that Google's search results are not just great, but better than what it was 10 years ago.


You must be kidding, Google is becoming worse every day. Still better than useless Bing though.


Consider yourself part of the sizeable group of vocal folk then.


The thoughts about building a better search engine than Google are interesting.

Unlike the author, I think that building a better search engine than Google is possible. But it's going to be rather expensive. And the only proven way to monetize it is selling ads. Which will degrade the quality of the search results fast. For potential investors, there are probably many better ways to invest money then by building a search engine.

This lets us with only one viable alternative: build it in the open like Wikipedia and source donations from people and from Google competitors like Amazon or Apple.


Okay, so all search engines suck. Yeah, that matches my experience


> It's common to criticize ChatGPT for its hallucinations and, while I don't think that's unfair, as we noted in this 2015, pre-LLM post on AI, I find this general class of criticism to be overrated in that humans and traditional computer systems make the exact same mistakes.

Finally some one said it. We are unnecessarily harsh on hallucinations. LLM’s don’t intentionally ‘lie’. To say this is a wrongful anthropomorphism.


To validate hallucinations is anthropomorphism. Tools shouldn’t make stuff up.

I don’t second guess Pythons math results. If the result is wrong, that’s my fault for coding it wrong, never Pythons for hallucinating


It's also a wrongful anthropomorphism to claim that human beings "make the exact same mistakes" as LLMs, because they don't. Humans don't confabulate the way LLMs do unless they have a severe mental illness. A human doctor isn't as likely to simply make up diseases, or symptoms, or medications, whereas an LLM will do so routinely, because they don't understand anything like human anatomy, disease, chemistry or medicine, only the stochastic matching of text tokens.

We're not unnecessarily harsh on hallucinations, it's absolutely necessary because of how effective LLMs are at convincing people that because they can generate language, they are capable of sentient thought, self-awareness and reason. Acting as if humans and LLMs are basically equally trustworthy, or worse, that LLMs are more trustworthy, is dangerous. If we accept this as axiomatic, shit will break and people will die.


I hear what you’re saying. Yes of course we should aim to make our LLM’s as trustworthy as possible. I think my argument was more philosophical than practical. I meant that directing real anger seems misguided; after all humans lie with intent/real negligence, LLM’s are random number generators. And ‘don’t believe everything everything you read on the internet’ is something that persists regardless of AI generated content; we shouldn’t expect to lower our guard any time soon. But yes, I agree strongly with how the danger arises because people DON’T treat LLM’s the same and get lulled into a false sense of trust.


Formatted with minimal CSS: https://ddanluu.com/seo-spam


Speaking of bad software, anyone getting a huge amount of horizontal scroll on mobile on this blog post? What should I add to my bag of tricks to work around that


Reader mode might do the job.


I am not (Chrome on iOS).


I got different results for Google on "ad block".

And changing the query to "ad blocker" like Google suggested raised ublock origin way up in the results


"Going back to the debate between folks like Xe, who believe that straightforward search queries are inundated with crap, and our thought leader, who believes that "the rending of garments about how even google search is terrible now is pretty overblown", it appears that Xe is correct."

Also, the article tested Mwmbl as well, not mentioned in the title here.


While I think the article is interesting, I disagree with its results regarding Kagi. I like Kagi and rarely use anything else. Kagi's results are decent and I can blacklist sites like Amazon.com so they never show up in my search results.


Honestly, if you have to search something remotely technical, try HN's search function with comments enabled.

If the topic has ever come up the discussion and links are likely to be more relevant and better than your avg. wiki article


There’s something incredibly entertaining to me about even this well researched article struggling to find a reason for why wider tyres have more grip.

As I understand it, this is because tyres are still somewhat of a mystery, and anyone outside of a laboratory really doesn’t know shit. The best explanation I can think of is due to tyre load sensitivity. The friction coefficient of rubber decreases with normal force (E.g, a heavily loaded tyre has a lower friction coefficient), which is a pretty well accepted fact, this is one of the methods engineers will use to tune the handling of cars. This means a wider tyre has a lower force per unit area of the contact patch, which means it’ll have a higher friction coefficient.

Now that sounds plausible to me, but that’s just my best guess explanation.


https://www.bicyclerollingresistance.com/

gives good tyre advice (obviously not car tyres, but info is there)


I will admit that I can't read between lines here and just go ahead an ask: What does "bluesky thought leader" suppose to mean? (1) Any guesses who this may be? Why is he not quoted directly? (btw, the term is used 3 times, presumably to refer to the same person).

1: my reading is that this is a sarcastic denomination for someone that is supposed to be an innovation thought leader but actually is just defending the broken search landscape status quo.


Or it's some vocal guy on Bluesky which didn't allow public links until recently.


Have you tried perplexity.ai? It's like ChatGPT and Google had a baby. Looks very promising and I'm seeing a lot of tech leaders (example Toby of Shopify) moving to it.


Aren't Bing Chat and Kagi FastGPT the same in effect?


No, FastGPT is GPT-2 based. I actually prefer FastGPT because its fast (duh!), and as it gives very concise answers and all the generated response carries footnotes with the link to the source.


Just to correct, FastGPT uses claude-instant.


Honestly, this is depressing. Back in the day, AltaVista and AskJeeves existed but returned terrible results, and Google showed up to disrupt them all. It seems like we should be on the verge of repeating this cycle.

Maybe LLMs will help, but I can’t shake the nagging feeling that the situation will simply get worse with LLMs, not better, due to hallucinations and the apparent “gullibility” of LLMs: I would not be surprised if SEOing an LLM turns out to be easier than SEOing Google.


I’d love to see this a little extended.

Searx and Yandex.

Specifically… if I need something even slightly “gray”, Yandex is the only option anymore. Torrent search on google et al is just awful.


> Continuing with the theme of running simple, naive, queries, we used the free version of ChatGPT for this post, which means the queries were run through ChatGPT 3.5.

why


I have a small page that modifies my get requests to google by adding -site:… for a bunch of most annoying content farms for stuff I search often (docs)


Have you tried uBlacklist?


For the ad blocker results, the author judges the search engines by how they rank the best result (uBlock Origin), but I think that search results that point to Adblock Plus or AdBlock are good enough. Sure, they do not block all ads, and take money from advertisers to allow through certain types of ads, but they still block ads in general, and 'acceptable ads' can be disabled in the settings. So I would consider these 'good results', rather than 'bad results'/'very bad results' as the author does.


Without labor to run their circus, 99% of business would disappear overnight.

Without business, spam would disappear.

So if you remove the labor you remove the spam.

So the best spam filter is UBI.


The Github link is my top result on Google. Clearly a mix of uBlock and Privacy Badger are more powerful than most appreciate.


I would kinda have liked side by side screenshots so I could see for myself rather than a wall of text


More incorrect usage of "hallucinated" for simply made-up or inaccurate results.


Is this from desktop? What region?

Ublock origin in the very top result for ios device is simply a bad search result page. Maybe fourth position is tolerable, after three different working ones. Maybe it should be lower, I doubt myself, if my point of view is too elitist.

Yt-dlp is subject to all sorts of takedown requests in different jurisdictions.


Look at the source for that page. Is it hand-coded? (I think it's great.)


Using phind most of the time. Would be interesting adding it.


Was gpt4 used (with paying subscription)?


I wonder if this aggregate enshittification of computers (be it search, social media, video games) etc. is actually a good thing for humans in general.

I feel like today's digital spaces don't have as strong a grip on the minds of people - I think folks started rediscovering the value of genunine human interaction and hobbies that do not involve a computer screen.

For example, I haven't seen the equivalent of 2000s-2010s Facebook addicts or (WoW addicts in the gaming space) to such an extent, with parasocial media, such as TikTok or Youtube or Twitch, having replaced social media, and social video gaming such as MMOs having lost a lot of popularity.


Pretty biased selection of queries. Article avoids the things that ChatGPT and the others without fresh data can't answer. Look at the trending searches on Google. They are all for fresh info that none of the others can answer. Sports scores. Google probably judges quality weighted by the questions their users actually ask, not this nerd bullshit.


How is a youtube downloader biased to fresh results? Seems to cover a pretty broad test.


It selects a "right answer" that suits a stale index, assuming that there can't have been a right-er answer discovered after ChatGPT's training horizon.


Isn't any selection of queries would be biased. Even what you are saying is biased, you try to say that Google would be better for cases that it optimize for which is even weirder. That is like saying you want to compare highly optimized code that is using some C libraries vs some native python code.


I had to stop reading this because I found it too depressing and it triggered a lot of anger about how big tech combined with the incentives of capitalism is basically fucking up the world.


Can someone tell me why Bing, and thus DDG, has switched to prioritizing local results? I'll search the most inane things, like lyrics to a song, and get results for local businesses containing maybe one word in common.

It's most frustrating with phone numbers. I picked up the habit of searching the random numbers that called me, to try and find out if they were possibly important. I used to get a bunch of spam sites that clearly existed to profit off me making those searches.

Both Google and DDG have removed those spam sites, even though they were useful at times. Google will tell me the number is in some random PDF that contains a few of the digits, then no other results. DDG will say the top result is my local police department, something that freaked me out the first few times.


Man, thank you for saying this. Stuffing results with geolocated local junk despite explicitly opting out by choosing “All regions” is so frustrating. This wasn’t happening a year or two ago. I submit negative feedback about it constantly, but I guess not enough people are doing that for anyone to notice or care.

I’ve also noticed a significant increase in attempts to stuff news into regular search results. I really do not appreciate being force-fed mental health poison. I don’t need it ever, but I especially don’t need it when I’m searching for some specific technical thing and then get emotionally sabotaged by some clickbait headline because … why? Some bullshit KPI? Why are tech companies so obsessed with pushing news into every orifice?


Hah, calling the news "mental health poison" is the most accurate thing I've read all day.


Yeah, I've noticed this as well with DDG recently: even with the localised checkbox disabled it still prioritises them, which often is very frustrating as the results are then almost totally useless.

However, more generally, I've personally found that DDG (and maybe Bing's then?) localised results are just really bad, and have been for the multiple years I've been using DDG and it's had this feature: I'm in New Zealand, and enabling localised / region-based search still often provides results to pages with TLDs like "co.uk", ".ca" and ".pl" (these latter are really common for content-generated spam in my experience), which I just can't understand...

Unfortunately, I have found that Google's results are usually a lot better in terms of being "location-aware" than DDG, at least when that's what you want...


That's a bit surprising that you're seeing spam sites with .ca, those are illegal here and all .ca domains must be registered by someone in Canada.

You can report them: https://ised-isde.canada.ca/site/canada-anti-spam-legislatio...


I habe the same experience from Germany. There's the slider but it's not doing mich.


DDG is just repackaged Bing. Always has been. I remember looking into them when I was ready to job-hop many years ago, and they asked for dedication to their search engine as their foremost requirement for employment. It's the "drop-shipping" equivalent of search engines.


hope kagi takes ddg place in terms of adoption. never really liked ddg even though i always care about privacy.


I really don't get that sentiment. Currently Kagi is just as dependent on Google as DuckDuckGo is on Bing. That might only be temporary of course and Kagi does seem to be working on a search engine of their own.

Rather than wanting Kagi to take the place of DuckDuckGo, it would would be better if Kagi could take users from Google, and then when ready, drop Google as a search provider.


DDG used to be the HN darling and you would get downvoted for saying anything negative or even insinuating that they are relying on Bing. Now the spot has been overtaken by Kagi but it looks like it suffers from the same problems. The counterargument that they have their own index as well is the same that was used for DDG, when the reality was that it was only used for widgets and other fluff. Let's see how it plays out for Kagi.


Kagi mixes google, bing, some non-profit small-web SE, and their own index.


I don't think they use Bing, but yes, Google, Marginalia, Yandex, Brave and others. I still fail to see how that's different to DuckDuckGo, who also run their own crawler. It's really weird that people are almost hating on DuckDuckGo for how they run their search engine, while applauding Kagi, for doing the same, but with a different business model.


Only if they changed that (which they might have as part of their cost-optimization). They said they mixed bing and google results back then.


I also assume that Kagi uses some shady residential IPs proxies and similar tricks to scrap Google while DDG has access to the Bing API.


You can buy access to the Google Search API, which is what I assume Kagi does. Building your product on being able to circumvent some Google restrictions seems like a bad business move, if you can buy the same service for a reasonable price.


Where can I buy it?


https://developers.google.com/custom-search

It's been available for ages. We used it to power the company internal search for a large enterprise I worked at 17 or 18 years ago.


Yes this isn’t an API to make a generic search engine.


Kagi should hire the Marginalia author.


We already include Marginalia results in Kagi [1]

https://help.kagi.com/kagi/search-details/search-sources.htm...


> Bing, and thus DDG, has switched to prioritizing local results

From what I can tell this is an issue with the Bing API that DDG uses that the DDG folks have been unable to resolve. I've tried many identical queries between DDG and Bing and while Bing does occasionally return incorrect local results, the completely irrelevant local results that appear on almost every DDG search do not seem to happen with Bing itself.

From what I understand, DDG is aware of the issue. I don't know why it isn't more of a priority.


Long time DDG user (>10 years) here, and it’s astounding to me that they haven’t prioritized making their own independent index to switch off Bing. I would have expected them to do it like 5 years ago, but there’s afaik no initiative to do so. It’s unfortunate and am now trying other engines like Brave search.


I also occasionally try Brave search when a DDG search fails. Sometimes Brave finds what I want, but I frequently get Captcha (and now proof of work) challenges that are quite annoying. I don't get this with any other search provider (though StartPage would frequently do this a while back). I hope this is just a phase, because I would likely use Brave Search more if not for this issue.


Hi, Brave engineer here,

Are you by any chance using a VPN while using Brave Search? (ProtonVPN?)

Thanks for your help, we're working on ways to reduce the number of captchas shown to VPN users and your feedback is very useful.


Sorry, just saw your response. I indeed use ProtonVPN.


Thanks for confirming, we've just deployed some improvements to better handle VPN traffic and penalize legit users less. I hope this improves your QoL on Brave Search.


> I'll search the most inane things, like lyrics to a song, and get results for local businesses

Query: “I’m coming out of my cage…”

Result (Ad): “You’ll be doing just fine with these amazing year-end closeout prices at Al’s Discount Car Barn. Gotta come down—you’ll want it all!”


Ads would make sense, but there's no way my local city council is paying Bing and they are the most frequently listed result.


It was only a list, how did it end up like this?


Nearly every local search is a leading indicator of buying intent and, therefore, is worth more money when served as a response instead of an authoritative response.


If you’re going to search for phone numbers you’ll want to ensure you enable verbatim searching under tools on Google, and put the number in quotes, perhaps in “xxx-xxx-xxxx” OR “(xxx) xxx-xxxx” forms. Many of the sites you mention are fake sites with fake contacts just for ad serving, and I’ve read in some few cases the scammers seeded the spoofed numbers they appear to call from on to the sites they control to see who googles their phone numbers.


Reverse spoof the numbers of FTC investigators and Google employees?


Maybe it was an attempt to make better their results for local results?

When searching for results from my country in DDG (picking the country in the drop-down below the search box) still returned results from the USA or other countries. Even when searching stuff in the local language. Maybe they tried to fix that because it really sucked, so much I never used it again for searching into local websites.


This is the one area it still ignores my location. I live in a town named after a UK city, there's several bigger towns in the US with the same name. I just searched "McDonalds city name." I got results for the locations at least half the US away from me, as well as Uber Eats GB.


I’m confused, you are searching for, specifically, a local phone number and you are upset that the machine interprets that as you looking for a local result? That’s what most people expect from a local number search.

Perhaps the incorrect thing is not your internet search results, but actually your phone carrier for lying to you and telling you that a caller has a local number?


The number is local, and occasionally I've searched and found the number was a local clinic or business that had legitimate reason to call me but not leave a message. In those scenarios, close to all ten of the numbers are found on the page.

The top result being my local police department because it shares the same area code and has maybe one other number in common is clearly a bad result. It does this even if the phone carrier isn't lying to me and the caller does have a local number, like the increasingly common political spam calls.


If I search for a ten digit number, it is not helpful to return a local business that shares the last four digits.


I suspect it's a failure to distinguish mobile searches (where people are legitimately looking for a business) from desktop searches.


In my country (Colombia) Google still has not removed those spam sites that just generate all possible numbers.


you can use true person search for numbers


I am not sure what the intention of this post is. In my handpicked results Kagi far outperforms Marginalia.

#1 "Gordon ramsey" (misspelled "Gordon Ramsay"). Marginalia shows "The Life I Imagine: are my cheeks red?". Kagi corrects to Gordon Ramsay and shows relevant results.

#2 "Ukraine war". Marginalia shows an article about the Russian Orthodox church and a Substack post about the war. Kagi shows Wikipedia, Al Jazeera, etc up-to-date summaries about the war.

#3 "Dildo". Top post on Marginalia is "Students for Concealed Carry Embraces UT Dildos | Students for Concealed Carry". Top posts on Kagi are Wikipedia (read) and Amazon (buy).

> How is Marginalia, a search engine built by a single person, so good?

Because it's not good?


I don't disagree with your assessment in full, but I don't exactly consider wikipedia and Amazon good results. Like they are big enough that if that's the result I want I can go to them directly. So like they aren't bad or wrong, but I can see the case for excluding them. Should something like Webster's dictionary be a top result?


I think for single word queries like that Wikipedia covers more ground than a dictionary. Personal preference, perhaps. If I need a definition I search for "define dildo" (Kagi shows Merriam-Webster, Oxford, etc dictionary entries).


Marginalia supports the old Google syntax, e.g. "define:dildo"


Thanks! If you are that "single person" who built Marginalia... hope you are not taking my criticism personally. I am more annoyed by this blog post that uses a few handpicked queries to present generalized long winded conclusions that are completely disproven when using a different set of queries.


Yeah, its me, and to be fair I made a comment to a similar effect myself. Assessing search result quality is very hard, and this is definitely a pretty flattering selection of queries.


On the plus side - in addition to Marginalia's own success, you can take partial credit for how good Kagi search results are (IIRC Marginalia's index is one of the sources for Kagi search results). So... thank you for that!


Marginalia Search isn't trying to be a universal knowledge engine, it's just a website finder.

That's bad if you're looking for a simple answer or basic fact, and good if you're looking for a few hours of reading.


I had a similar experience when testing Kagi after reading this. The top result for the “wider car tires” query on Kagi was a link to Physics StackExchange with some marginally informative answers [0], which would be easy to expand on in future searches. The second result was Reddit. Then a couple of incorrect/irrelevant pages but they don’t look like scams

[0]: https://physics.stackexchange.com/questions/29903/why-do-peo...

Edit: I did just realize that I have StackExchange customized to be up-ranked. So that probably helps. But yeah, I guess this is why I usually get good results, which is something that generally still fails with Google for me.


It seems to me that the name "marginalia" is not just a random set of syllables. It sounds like it's doing what it says on the tin, which is gooder than not doing what it says on the tin. (distinct from whether what it says on the tin is something you want)


Blah blah blah. Could you lay this article out any worse? What are the queries you used to test? I want to try them too. Buried in here somewhere.

Using an adblocker is not expert anything.

That you've defined your own opinion for what some of the results should be blows the thing up.

Searching youtube downloader, many people would be fine with some of the ad covered but totally functional sites that pop up on Google. I use some of them every day for quick conversion tasks. I don't want any youtube-dl result. The average users don't either.

Download firefox? What's that? All the top links are fine? No one's looking at the 7th listing for a simple query to download a program.

Why do wider tires have better grip? .. what, sites like roadandtrack, prioritytire, reddit, some physics and stackexchange sites aren't good enough? they are.

The Vancouver snow report one also. Lots of major news sites. Some weathernetwork and almanacs. All totally acceptable results for a sort of variable question.

blah blah this is just a hate on for Google and a HN/nerd view of the world that the average user is nowhere near living in.


> Download firefox? What's that? All the top links are fine? No one's looking at the 7th listing for a simple query to download a program.

They are if the first six results are SEO bullshit. Which is the de-facto state of affairs for Google today: advertising traipsing around as search.


heh, they're not. They're all variations of mozilla download pages and site posts.


For whatever it's worth, I think your comment would be a whole lot more convincing without its first and last lines, which had the effect of making you sound (at least to me) like you're shallowly dismissing the article.


Which web site did you use to successfully download a youtube video, and which youtube video did you download?


Completely agree. I personally thought searching "Vancouver snow report" to be extremely strange. Just search zip code or city name and weather. Two words. That's all you need to get results. What the hell is snow report? Do you even think you can trust weather reports 10+ days out?

Whole article is rambling and silly and assuming.


Well, "weather" is "it's snowing", "snow report" is "amount and quality of snow", I guess. Relevant for skiing and/or driving safely.


Do search engines censor political topics these days? If you search "truthsocial" on ddg, the truthsocial.com website is the first hit. But if you search "trump truthsocial", it doesn't give you trump's truthsocial page, and doesn't even give you truthsocial.com within the first few pages of search results.

Since ddg uses bing, does anyone know what is happening here at bing? It looks like google results are similar.


DuckDuckGo (and by extension perhaps Bing, assuming identical upstream results) has some terrible results when trying to filter by all kinds of domains.

There's a power tools review/news site that returns zero hits for the actual domain when searching its name (which is the same as its .com address). While for some domains even searching using the `site:` parameter will give far fewer results when paired with a query than just searching the domain name + query sans the TLD (the router firmware site openwrt.org is among such).

It's a mess and reporting it hasn't any difference ime in the past 3 years. So I'd be reluctant to say irrelevant results are due to censorship unless there was more evidence.


I doubt you're seeing censorship. If you search for "truthsocial trump" on ddg, you'll see his profile, for better or worse.


Oh, interesting. So it depends on the order of the terms:

- "truthsocial trump" works

- "trump truthsocial" doesn't work


I have concluded that Google definitely censors search results relating to the Ukraine war, after vainly searching for articles about documented Ukrainian war crimes (reported in mainstream Western media like NYT/WaPo).


I'm not seeing this. I Googled "war crimes by ukrainian soldiers" and the top link was an Amnesty International Article, "Ukraine: Ukrainian fighting tactics endanger civilians".

https://www.amnesty.org/en/latest/news/2022/08/ukraine-ukrai...

I use Google as little as possible because I don't like surveillance advertising but fair is fair.


You're right: I just checked and there are several hits for events that happened over a year ago that I couldn't find at all with Google back then. Shame on me for not checking before I posted. I have no idea what happened but apparently it's now fixed.


I feel like you could reboot yahoo directory and have more utility that most searches.


The !bang directory for Kagi is honestly pretty good, found some cool sites there: https://duckduckgo.com/bangs


Did you mean to say Kagi or Bing?

Anyway, here’s Kagi’s bangs:

https://help.kagi.com/kagi/features/bangs.html


> Note that Kagi supports all DuckDuckGo-style bangs.

You can also make your own bangs.

That said, my point was that the bang directory has a bunch of the most useful sites in each category.


The return of something like Yahoo Directory would be most welcome. There is great utility in having more than one approach into a data space. That we have been stuck with essentially one way in for over a decade means that there is a great deal out there which would be great to access but which has been rendered invisible.



this is awesome! thanks


Nice. Thanks!


Categorization sounds like a good job for AI. Yahoo execs, are you paying attention? :)


On a side note: would it kill the author of the site to use a stylesheet?


it's the same as my choice to only use lowercase letters, it is designed to make you upset that i am not following conventions. that's as far as i have been able to figure for hwy i started doing this, and by extension, why tech bois love to drop some vital freature to communication to signal being an 'insider'


ok ee cummings

Omitting readable styling doesn't read as "techboi rebellion", it reads as ineptitude and lack of respect for people whose attention you're seeking.


what about lowercase makes you feel a need to attack me as a person? is lowercase text an assualt on you in some way? i have not had someone insult me based on lack of key pressing, and the fact you can argue with me means you have no issues understanding or responding to me


Search engines are not designed to give you the information you desire. They are designed to sell ads or metadata. "Result quality" is of no consequence.

If you actually wanted accurate results you wouldn't use a tool that is literally attempting to read your mind like a fortune teller. It is impossible to know what you want just by the word "snow". Jesus Christ engineers are so dumb.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: