(DuckDuckGo CEO/Founder) Just seeing this and we're looking into this now. This is not intentional.
Update: Still investigating, but have made some progress -- Determined that on desktop there was a link to Techdirt up continuously via our About module (when you search for "Techdirt"). And now the traditional web link is back up as well (for desktop and mobile): https://duckduckgo.com/?q=techdirt&ia=web.
I thought you barely relied on Bing anymore. At least, that's the claim you've repeatedly made here. This indicates that DDG is little more than a Bing mirror.
I don't believe I have ever made that claim and it isn't true either that we are "little more than a Bing mirror". See other comment on this thread (https://news.ycombinator.com/item?id=36898807) but in short it isn't a binary thing and I think the fallacy people keep making is a modern search engine != web index.
Over the past fifteen years, search has become "universal" with dozens of indexes and modules throughout the page. Also it is important to understand people click/engage with things on the page roughly half as much each position down, so by the time you get to the bottom it is like 100 times less than the top. This means that things on top, increasingly non-web links, have become more and more important.
In this context, as mentioned in the other comment, the largest modules on mobile and desktop we power ourselves, that is, local and knowledge graph. AI will be the same. We do use Bing as the primary source for traditional web links, but not for all and even when we do it sometimes looks different in various ways. In something like this, we can re-insert this link if that is what is needed.
> Bing is our largest source of traditional web links.
I think "traditional web links" is the main (and I think should be the only) product of a search engine, and it seems like you rely on Bing quite significantly to serve search results.
From lots of experience, I can tell you if you don't have these things (and others, this is an incomplete list just off the top of my head) showing up in the right places (often above "traditional web links") at the right times then you cannot get long-term search engine adoption from mainstream users:
I understand that you need these to be a full "search engine".
What we really want to know is if for links alone - what extent do you index? Let's ignore the knowledge graph, let's say we just want a simple word match.
I'm surprised you didn't have techdirt's homepage indexed at all. "techdirt" did not return "techdirt.com". To me this means you aren't indexing the home pages for popular websites at all - let alone their content - and rely effectively exclusively on bing with maybe some "caching".
Talking about links alone - exact word matches with the domain name (or page content other than your knowledge graph), what do you index?
What specifically do you do other than adding sites manually to a list and I assume "caching bings results"? To me this would still be 99.999999% ~= 100% reliant on bing for search results.
This isn't meant to be an attack in anyway, I'm just looking to clarify what's going on.
Gabriel, could you please clarify once and for all: for "traditional web links", how much do you rely on bing? The user consensus seems to be 100%. What is the official number from you? Thanks!
> but not for all and even when we do it sometimes looks different in various ways
Personally I'd like to know if that means "we have a manual list we re-added". I think what myself and probably many people are most interested in is "do you rely on bing for < 95% of word match searches (no knowledge graph)"
> when we do it sometimes looks different in various ways
IMO this is also vague - I assume this means they're knowledge graph augmented? I really want to know if they have their own indexer (vs a manual list + caching bing results) for text matching.
> I think "traditional web links" is the main (and I think should be the only) product of a search engine
This is context-dependent. If you're looking for a link, then you're looking for a link; but more often than not, you're actually looking for answers.
Say you're looking up an exchange rate or the value of a stock; converting units; checking the weather; or just want to look up an actor's photo. Why would you click and wait for a (likely, slow & ad-infested) page to load, if the search engine can provide factually correct data without any extra steps? How often do you read past the first paragraph on Wikipedia?
If you want to dive in / verify any of that information, it's still all just a click away. But having to only do one thing instead of two is almost the definition of technological progress.
I use DDG as my main search engineer. Many of my searches have hashbangs into other sites. On DDG landing pages, I often go straight to the right hand side of the page, with the info panel. Sometimes, I click on traditional web links, but with less frequency as the sites are so often spam.
> I think "traditional web links" is the main (and I think should be the only) product of a search engine
I want room for both.
There are times I want the search engine to stop "helping" (sic) and just give me a straight search based on my terms.
There are other times where I'm looking for something specific but I don't remember exactly what it is, so I want help correcting/ narrowing down my search.
You can spin it any way you like, but the fact is evident: you rely on Bing to the extent that if Bing delists a site, that site will be completely sanitised from every single part of a DDG search, including all those dozens of modules you keep referring to.
Thanks for looking into this. Would love to learn how a specific query like "site: techdirt.com" got wiped like that. Will you share your findings once it's been identified and resolved?
I doubt there will be any findings, because the removal from the index was done by Microsoft rather than DuckDuckGo. The postmortem has to come from Microsoft. Likely the only thing that Yegg can do is poke some internal contact at MS and tell them to fix it.
They can do better, however, it takes time and effort to do that - they can and do augment Bing results with other data sources when needed, but up until recently there was no need to add anything extra with respect to Techdirt; it certainly makes no sense to index all of the internet and try to duplicate all the search results for every search just in case Bing has filtered something out.
I feel like they would love to be able to do that.
One potentially relevant line from the article:
>I love that first one. Microsoft, a company with a $2.5 trillion market cap, “may not have enough resources” to crawl and index Techdirt? Cool.
I have to imagine DDG's valuation wouldn't quite hit 2.5 trillion if it went public. It's not absurd that they aren't able to fully duplicate those efforts (or that they don't need to in order to have a decent product).
They fixed the problem so it shows ability to. Maybe the issue is “detect what parts we should do ourselves”? How do we detect censorship beyond blog posts like this? Can we diff various indices on a regular basis?
You can diff indices but it doesn't make economic sense to do so.
The whole point of using Bing as the base is so that you don't have to make and keep an index of everything yourself, so DDG wouldn't have another exhaustive index to compare to (and if they would, they wouldn't need Bing). They index some things to add onto Bing, but they don't try to capture everything.
Furthermore, any specific site missing from the index does not necessarily mean a flaw - okay, you would diff some indices, and find out a long list of what Bing has excluded... and then what? For every 'fixable censorship' case (accidental or intentional) there will be thousands of spam or malicious sites which should be excluded, as nowadays the key part of search is not finding everything but throwing out the results which want to be found but shouldn't. Again, DDG wants to piggyback on the effort that Bing is doing to filter the index, and if they want to second-guess all Bing's filtering (as opposed to just making a fix for this specific case) then they have to replicate and improve on all of the (huge and expensive!) Bing's filtering effort; which goes counter to the reason for using Bing which is to avoid all this expense.
It's turtles all the way down, I mean, it's Bing. If your basic source of information, the thing that you rely on to create any smoke and mirrors on top of it, i.e. the index, is basically Bing and just Bing, then you're just a Bing mirror. More and more I see myself using Yandex out of all things to get decent results. The extremes the web got in 2023 are bizarre, the "best" search engine in the world today may very well come from a dictatorship controlled country.
Either you’re very misinformed about what a mirror is, or you’re being extraordinarily disingenuous here.
Bing is their initial and primary source of data for web links. Not the only one either, but the primary one. They maintain their own index from it.
That’s not “a mirror” by any stretch of the imagination.
What appears to have happened here is some form of replication bug where in a way yet to be determined a removal from the original data source was unintentionally replicated.
Data synchronization is hard when you’re not directly mirroring data. This kind of bug crops up
If they're not just a mirror, it seems rather strange that when a specific website disappears from Bing's index, it also disappears from DDG. In my very common sense based view of the situation, if DDG really builds it's own "index" based on a variety of sources, then why did TechDirt completely disappeared from it's search results? DDG can argue all day about how much their "local and knowledge graph" is built in-house, yada yada yada. At the end of the day, a search engine must retrieve the most relevant websites for a given query. And nothing can be more relevant to "site:xxx.xxx.xxx" than the site itself. If it doesn't appear on DDG results, then it's not indexed. And then it clearly shows that DDG is nothing but a fancy mirror to Bing search.
> Bing is their initial and primary source of data for web links. Not the only one either, but the primary one. They maintain their own index from it.
What is their other data source? And how did it come to suffer from the same problem?
I keep seeing insinuations that DDG has some other secret source than Bing, but it's always hand waved and never explicitly named. Incidents like this seem to strongly imply that only Bing really matters at the end of the day.
Yeah DDG has always been weirdly evasive about how their search engine works and where results are coming from. If it makes you feel any better, bing has been caught copying results from Google (https://www.wired.com/2011/02/bing-copies-google/) so there's a chance you'll get bing results plus results from whatever other search engine bing copies.
Personally, I don't even mind if using DDG is the same thing as searching bing so long as it actually works, but they still can't get a simple search like "office -microsoft" or "headphones -best" right. I even thought they acknowledged the problem and were looking into it at one point.
It’s not an insinuation, it’s a straight fact direct from the founder.
That you don’t understand data replication problems—or know the inner workings of a private company’s codebase & data funnel—doesn’t make it some conspiracy.
Why is Bing doing this? It feels like we’re burying the lede, a $2 trillion company isn’t showing sites that are critical of it on its supposedly unbiased search engine
Regardless of what their CEO claims here every single time, if Bing cut off their access, DDG would be dead overnight.
All the stuff they claim to do themselves is <1% of whatever value DDG provides. Lucky for them it’s just enough to be able to use terms like “largely” instead of “entirely”, “other sources” instead of “only source”, etc.
Hi Yegg, I'm sorry to bother you about another issue; however, I'm having a similar experience with DuckDuckGo and Bing. I've exchanged dozens of emails with people from Bing's web team without any success.
Long story short - at some people in the past someone proxy-mirrored all content of SaaSHub. You'd open a page like "someshady-proxy-mirror.com/duckduckgo-alternatives", and it will mirror saashub.com/duckduckgo-alternatives. The same happened for all all pages. Soon after that, Bing (and DDG respectively) dropped all SaaSHub links from the index and kept the proxy-mirrored content! WTF.
After a week-or-two of "fighting" I managed to block the proxy-mirrors; however, I never got SaaSHub back in Bing's index.
I'm a single founder and feel helpless with this Bing/DGG issue. Any help would be appreciated. Thanks.
I've had a similar experience, an issue I reported was corrected, but at the same time, they have major problems that have persisted for years but haven't been fixed (the "windows -microsoft" problem) which cause me to G! far more than I'd like.
Since you're here, I have a bug to report :)
I'm in Slovakia. When I look up something that has a related Wikipedia page, the widget cites the English Wikipedia. However, the link leads to the Slovak Wikipedia but to an article that has a name from the English one. Usually things are named differently, so if I click the link, I get a 404.
Could you please stop posting unsubstantive comments? You've unfortunately been doing it repeatedly. If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.
(We definitely don't want fake/bait stuff here either so I share your feeling on that - this just isn't a good way to express it, at least not on HN.)
Nope I am not really here to satisfy randomers, but feel free to ignore me, or report me to moderators so that they can decide if they want to delete my account? Otherwise just stfu
so he could just delete my account without pinching my balls? What are you trying to say? You can't change others to make what you want them to do, you can ignore it, or act in a way that it doesn't affect you, he decided to break my balls with requests even if he could ignore OR EVEN ban me, as he had the power.. unbelievable
What a benevolent god, he won't take away the opportunity to comment on hacker news from people, how would they live otherwise
I don't really like to be cheeky/sarcastic, but I don't really like people telling me how to behave, I do me, you do you, especially if you are an admin
It’s either people are compatible with a community or they are not, asking to change means people have no personality and you can just shape their behaviour at will, i guess that’s true for tech people, but not the rest of the world
I am definitely not trying to pinch, break, or otherwise contact your balls. I also don't want to ban you (or any other user who I assume is posting in good faith) without first giving you enough information to use HN as intended if you want to.
We're just trying to have an internet forum that doesn't suck—it's nothing personal! The problem we face is that large internet forums end up sucking by default - that's where the arrow of entropy points - so if we want to avoid that outcome, we need to spend a lot of energy to deflect it, and we need users to help with that.
There's nothing intrinsically wrong about posting unsubstantive/flamebait comments on the internet - there are other sites where that is common and expected. We're simply trying to play a different game here, and for that we have different rules*. Just like baseball rules say you can't tackle the pitcher, and football rules don't let you whack the ball with a stick, HN rules say not to be snarky, not to post flamewar comments, and so on (https://news.ycombinator.com/newsguidelines.html).
Btw, what you said about "changing others" is correct in a way - I post like this as a moderator because I'm trying to persuade users who are breaking the rules to change their behavior. (Not make them change - just persuade them that it's in their interest.) But you're wrong that this isn't doable—it does work sometimes. You decide whether or not it works in your case, of course, but HN has a long history of users switching to using the site as intended, once they understood why it was in their own interest to do so.
Here's why it's in your interest: HN is only worth visiting in the first place because we have these rules and most users follow them. Flamewar, for example, can be exciting for a while, but in the end it's just repetitive and boring. The idea of HN is to satisfy curiosity, and for that we have to avoid the snarky/flippant/sensational/indignant/etc. types of internet posting, all of which destroy the site for that purpose. It's in your interest to participate in the intended way, because then you're contributing to the reason why it's worth visiting at all. I'm sure you wouldn't leave fires burning in a campsite that you enjoy, or (less dramatically) drop trash in a pleasant park. The way we see it, asking people to play the intended game here and stick to its rules is sort of like that.
* this is also why I often ask people "Please don't do this here". It's not "please don't do this" or "it's wrong to do this" or "you're bad for doing this" - it's just "please don't do this here."
I understand this message is much more reasonable, thank you, I mean much less ordering, can I have my account and all my comments/contributions deleted please?
I feel like this makes it clear the level of reliance DuckDuckGo has on Bing. I know it's been a bit ambigious the full extent to which they do, and they claim to have their own indexer, but there's no way this is a coincidence. Surely if you have your own indexer you would have seen techdirt before?
At this moment "techdirt" only returns the wikipedia article, twitter, then mostly unrelated mentions of it. Surely DDG would have seen at least their homepage before?
> Most of our search result pages feature one or more Instant Answers. To deliver Instant Answers on specific topics, DuckDuckGo leverages many sources, including specialized sources like Sportradar and crowd-sourced sites like Wikipedia. We also maintain our own crawler (DuckDuckBot) and many indexes to support our results. Of course, we have more traditional links and images in our search results too, which we largely source from Bing. Our focus is synthesizing all these sources to create a superior search experience.
DDG’s business model is largely about privacy and UI rather than attempting to be objectively better than other search engines. Search is not quite 1:1 with Bing, but frankly I hate Bing’s UI not its results.
They don’t claim to be a privacy wrapper for Bing; they claim to be a standalone search engine. But they aren’t. It’s 99% Bing and there’s nothing they can do about the results getting censored at the origin.
On multiple occasions they have uncensored results that Bing censored. They default to censoring websites when Bing does, but can trivially remove sites from that list.
As to their businesses proposition privacy was central from day 1: In 2011, Weinberg, then the company’s sole employee, took out an ad on a billboard in San Francisco that declared, “Google tracks you. We don’t.” That branding—Google, but private—has served the company well in the years since.
“The only way to compete with Google is not to try to compete on search results,” says Brad Burnham, a partner at Union Square Ventures, which gave DuckDuckGo its first and only Series A funding in 2011. https://www.wired.com/story/duckduckgo-quest-prove-online-pr...
In other words, you’re seeing results that Bing wants you to see unless it raises a big enough stink for DDG to manually override. They default to being a wrapper for Bing.
Edit: Since you edited, no one is arguing that their whole spiel isn’t about privacy. What I’m saying is that they refuse to admit that they’re a privacy wrapper around Bing which would make their value proposition quite weak. They puff their chest and pretend to be a big standalone player in the market but they aren’t. You aren’t at the whims of DDG as a user, but at those of Bing/Microsoft. If you’re cool with that, carry on, but I’m not going to let yegg continue to make vague statements uncontested. If they’re so sketchy and dishonest about their role in deciding the links, what else are they dishonest about? Some of you let a lot slide because they’re the underdog.
That’s not a question of being a wrapper, it’s a question of using Bing’s spam list. The difference is about sites Bing doesn’t include but also doesn’t specifically exclude.
PS: As to your edit, they have specifically mentioned using Bing in the past alongside other engines. They are clearly running their own search process and widgets while also using a lot of data from Bing. I can see why you might consider it deceptive, though I really don’t.
The real question is what extent do they index sites themselves?
If they're manually adding a entry for "insert site here", maybe some "caching" of bing's results for the biggest most common searches then IMO that's still 99.999999% ~= 100%
Because an entire domain with thousands of sites can show up the simplest explanation is they are copying some Spam or Phishing etc list while also having an index they are searching.
Now, they might be using low level API access to Bing’s index or something else. But I don’t see how it can be a simple wrapper.
A dinky little but long standing staple of it's topic bitchin100.com had the same problem for some years but I guess the community was small enough and traditional enough that not enough people were using anything but google to notice except me.
They eventually got it fixed but only after a few other people finally noticed and the owner contacted someone and then waited a week or so. But it was like that for years. You search a perfectly good search term that should pull up one of the articles on that site first, and all you got are all kinds of other 2nd and 3rd hand references like email archive posts and articles on other sites, but scroll down as many pages as you want and the actual site would never come up on ddg.
Overall search quality has been declining on all search engines. Maybe there's too much spam. Saw an entertaining video about it yesterday that echoes how I feel when I google stuff: https://www.youtube.com/watch?v=jrFv1O4dbqY. It's so hard to find content written by humans these days. Seems like only the top sites are being indexed.
I'm in the same boat, I feel like my search-fu is being thwarted with the ever growing list of products that coopt existing words rather than coining new terms. We're in this ever expanding word overloading mode and I think the commercial and marketing spaces are now dominating search to drown out the useful hits that would previously rise to the top.
More and more, our search-fu is being actively thwarted. Tools to tailor our search have been steadily taken away: exclusion/inclusion operators, verbatim search ignored, and on and on.
> exclusion/inclusion operators, verbatim search ignored, and on and on.
I've been assuming that this was just a bug with DDG and not a deliberate attempt to cripple their search engine. I've seen it with Google results too, but I could buy that Google would break them, since making websites harder to find encourages people to pay for prominently placed ads, but what would DDG's motivation be?
I want. a search engine that remembers my preferences, and I can then click an ignore entire domain option, and never see that domain in search results even again. is there such a thing?
I'mma have to look into this thing, there's an ever growing need for this sort of thing as the incentives drift further and further from the end user https://kagi.com/.
Google's attempt to introduce zero-click results has been nothing short of catastrophic. The sheer volume of bullshit they are spreading rivals anything that ChatGPT could ever hope to generate.
A couple of weeks ago, I was debating with someone about what "LMR" stood for in the context of cable specifications, such as LMR-240, LMR-400 and so on. I thought it meant "Land Mobile Radio" while the other person disagreed that it stood for anything. A Google search on LMR coax cable acronym returned a helpful info blurb stating that LMR stood for "Last Minute Resistance" as a means of fending off sexual assault.
Needless to say there was no way to tell exactly what site Google had copied that definition from, and no useful way to provide feedback to them. Sometimes there's a "Feedback" link, this time there wasn't. Sometimes the feedback link is present but only offers the option of reporting illegal activity. That option wasn't present either.
For whatever reason, Google clearly does not give a flying fuck at a rolling donut about search quality anymore. With the right leadership, Bing could own that entire line of business, in a manner reminiscent of IE's original dominance over Netscape. I'm not holding my breath, but at this point I'm cheering for anyone who can offer Google some competition.
I tried the example you cited. The "blurb" is called a snippet; the snippet comes from the web page itself. One of the pages had that actual text, which is why it appeared. Why it had it on a page that's primarily about coaxial cable isn't clear, but we'll look into how to improve.
As for sending feedback, each link in the results has a little three dot icon next to it that brings up our "About This Results" panel, and you can send feedback that way.
Also, to belatedly introduce myself, I'm the public liaison for search at Google. It's a position we have within the actual search engineering team to help us gather feedback to improve search quality. Feel free for you or anyone comfortable sharing examples of unhelpful results to flag me about it:
> As for sending feedback, each link in the results has a little three dot icon next to it that brings up our "About This Results" panel, and you can send feedback that way.
Surely you don't expect us to believe this ever gets read, much less acted upon? Don't insult the intelligence of your audience.
I asked ChatGPT 4.0 what LMR meant as applied to coaxial cable. It gave me an excellent response, including that LMR is a trademark of Times Microwave Systems.
I rarely use search engines anymore. I'll bet the same is true for many people.
In fact when I used regular Google, the results are good enough for me to deduce that on my own.
So I consider GP's search skills inadequate. I mean it's not exactly wrong to desire a tool that handholds you and feeds you the answer; but if you are willing to do a little bit of deduction Google is fine.
Regular Google itself tells you that. At least for me, when I tried this search right now, the first snippet says "These letters indicate the brand. LMR is a trademark of Times Microwave, whereas RFC is made by Shireen." It's the second snippet that says the last minute resistance thing, and it appears to be from someone's personal blog that blocks IPs from the United States, so without logging into a VPN, I can't open the site to figure out why it thinks this.
It appears to indeed mean nothing, but of course, it took me a whole two minutes or so to find an actually authoritative source. Everyone knows no real user will ever do that and just wants a search engine or chatbot to dictate reality to them.
I don't trust the zero-click results at all any more. I recently had to fill out a form for the DMV that required my county code. To find out, I of course googled it, and wrote down the knowledge box result. This turned out to be the wrong number, and thanks to Google, this happens so frequently the clerk knew where I got it, and knew to check it and fix it.
People are building workarounds in real life due to how bad Google's results have gotten.
Similar experience for me. One that I do a fair amount is searching for specs on various car makes & models. The zero click results are typically quite bad. To be fair, a lot of it just ends up being bad source data, but their primary source for automotive specifications is the same shitty site.
Another problem is that Google seems to ignore a significant part of the words you type into the search bar.
If you type 'word1 word2 word3', where word3 is less common than word1 and word2, a lot of the time, it will act as if word3 simply wasn't in the query.
ddg does this too with annoying frequency. Even quoting it does not necessarily guarantee it will appear in the snippet, much less the result.
And when DDG runs out of "web links" it just fills the rest of result pages with local results that are out-of-place and useless. Like, getting "Visit Paris" sites after searching for rare computer parts.
Well, the last few months DDG's search quality started dropping hard for me, some queries that I clearly remember having good results no longer have them today.
It seems there's stuff going on behind the scenes. DDG got taken over by some vested interest, perhaps. Or Bing doing stuff and DDG never branching out of it and just blindly getting results from it.
Either way, it's starting to rival Google in uselessness and I'm likely to stop using it fairly soon.
The thing with search engines is: you try one, scan results for 30 seconds, don't find a useful result, curse under your breath and hastily try another search engine. It's a normal programmer flow, which sadly almost completely eliminates the possibility to provide actionable feedback to the search engine's maintainers.
I gather you're from the DDG team? If so, I'd say that the writing is on the wall that Bing is no longer a good backend for DDG. You guys should start branching out because from where I'm standing, many programmers are 10-15 annoyances away from switching.
Is there anywhere people can report clearly bad results? I don't have any on hand right now either, but I do know I somewhat regularly encounter DDG results that are just "wtf?!"-level, whereas Google does return good results for the same. I'd be more than happy to report these things when I encounter them if that's helpful.
I'm not the OP, but here's a quick list of search items that did not turn up useful results recently off the top of my head:
number of Americans without a credit card
panic nova git automation
microvision (the video game console)
time in chicago (seems to be fixed now)
mozilla open directory (seems to be fixed now)
Now that you've solicited this feedback, I'll make a point of saving futile searches for the next time you pop up on HN.
free for 100 searches.
$5/month for 300 searches
$10/month for 1,000 searches
I'm not at the point where I feel DDG and Bing are so bad I want to start paying to get better search results. I'd be interested to see how many people are there though.
I am not sure I am doing more than ~33 searches a day lately so probably the $10 plan will suit me fine. I think I do something like anywhere from 5 to 20 a day.
But I too started getting disgusted by "everything is a subscription" but I might jump the train if I like Kagi enough.
Because apparently the internet companies can't figure out an ad model that's not extremely toxic and does not trample on every single privacy rule the world has (and the 100x more that the world still doesn't have but should not be broken anyway because they should be a moral / ethical no-brainer but alas, go tell that to the "money above everything" types).
But finally, after decades, all the VCs funding internet companies that planned to capture the market with network effects and then start charging, are showing their true colors. I am glad. It makes them more honest and gives the users better information to act on.
Us the nerds just practiced endless bikeshedding and the corporations took over everything in the meantime. Oh, let's not forget the people who kept inventing LISP dialects BECAUSE THAT'S EXACTLY WHAT THE WORLD NEEDED!
That can't be right, DDG's CEO has assured me that DDG isn't just a dumb proxy to Bing. DDG uses several sources as well as its own index so it cannot suffer from what is being claimed [1][2][3].
- We have on the order of a million lines of search code at this point and have a lot of talented people working them. That code does a myriad of things across many indexes.
- As an example, mobile searches are the largest category of searches, and local searches are the largest category of searches within mobile. We don't get any local search module content from Bing.
- Similarly, on desktop, knowledge graph / Wikipedia-type answers come up the most and we don't get any of that module content from Bing either.
- Bing is our largest source of traditional web links, which have become less and less relevant/engaged with over time as more and more modules are in search results and put on top of traditional links (and people interact with things on top of the search results page about two orders of magnitude vs. things on the bottom).
- When Bing has dropped things out of the traditional web index, we have put them back, and we've been working with them so this happens less and less. In fact, there hasn't been hardly any reports of this in the past month or so, which is why I've asked for other examples in the comments.
As a long time DDG user I think it's cool you're personally engaged here. The disappointing part is it seems like TechDirt has had to resort to customer-service-via-social (Reddit/HN/Twitter) which sucks.
Once this particular issue is sorted out I'd would also be cool to see some sort of post-mortem report. Seeing why this sort of thing happens and a standard process for fixing it would be beneficial to all involved I think. Complex systems are complex and shit happens. But if this happened to a site far smaller than TechDirt (they're not even that big AFAIK) I don't think there would be any avenue to cure the situation for them.
It's great to see you on here answering/responding to all the unfounded rumours on here.
I'd be interested to know because it gets to the heart of the matter and why confusion seems prevalent. If we ignore the modules and local search, what differentiates DDG and Bing when returning bread and butter traditional web links?
I'll note that there is a April 17th update tweet referenced in that thread. That is unrelated and it was about a rumor that DGG was purging certain media sites. Nothing to do about censoring Russian sites. Archive of that: https://archive.ph/I2iUp
It is simply not true that we have intentionally censored anything for political reasons or made ourselves "the arbiters of truth" in general, which people have also accused us/me of doing.
I realized I previously explained how our news rankings work very poorly on Twitter that got grossly misinterpreted, so I subsequently put out a clarification in this help page with a much clearer (and detailed) explanation of how our news rankings actually work: https://duckduckgo.com/duckduckgo-help-pages/results/news-ra...
From that page: "When we apply our own ranking signals we do so in a strictly non-political manner, meaning we don’t evaluate or otherwise take into account any potential political bias or leanings of websites in our search result rankings." That is, we did not/do not have a disinformation/"truth" detector, nor did we go looking for any Russian narrative (or any other narrative for that matter). Instead, we just have essentially a spam detector that had detected some spam from Russian state sites. That's it.
It's looking like 2023 is the year that search dies. Between the removal of exclusion operators (hey, we saw you were adding -pinterest to your searches, but we want to make sure you get your pinterest!) and de-indexing of sites, it's looking like it's time for the search wheel to start making its third spin.
Guess I need to start playing more with things like Algolia or Kagi.
Unfortunately, creating a search engine competitive with Google or even Bing is insanely expensive. Not just from the server cost perspective, but also from marketing it and convincing people to switch.
Can you provide some context about Bing censoring Techdirt? These HN comments are the only thing that shows up in a search. The above article says nothing of censorship, other than screenshots of the Bing AI giving "maybe" speculations about it.
I am now looking into Kagi and I love that it seems like they just picked the era where Google wasn't terrible and are emulating the look and feel of that.
The only problem is the pricing model. I don't do overage fees because they make me anxious. I like knowing there is a concrete ceiling on what something will cost me. 300 a month also feels far too low and $10/mo, whether fair or not, feels far too high for a search engine.
This is why I've been working on a behavioural change: stop using search engines and start going right to the websites I know and trust.
At this point, my searches are tending to start at Wikipedia, enough that I might make it my Firefox search default. If it doesn't directly provide enough information for me, it can at least send me in the right direction.
Kagi is an underrated superpower. It just works. I tried Google again recently, and I was astonished how mediocre the results were. If that's the gold standard for no-cost searching, the internet is in serious trouble.
I am totally not affiliated with kagi but since I started using it, I haven't gotten pissed at my search engine for sucking ass.
OH! Another feature that actually works on Kagi: Date filtering! Never again will you set that date filter to the last two weeks and receive results that were posted in 2005.
They build their own index but don’t let you access it with keyword based queries. Instead the do ‘magic’ and shape the responses from multiple data sources.
I’d love it if instead of ‘magic’ we just had a search engine that let you be the magician with a better query language and ux filters.
They're probably doing something like mapping the embedding of the query to the nearest neighbor in the search given how the front-end of their search infra works
Kagi uses multiple search APIs (including their own) and implements their own ranking and mixing of results.
Basically you get the same results with spam removed, with some truly unique results from the in-house engine.
I've been using Kagi for a little under a year and it is absolutely worth it. DDG gives very inconsistent results and G is just as bad and complete dog shit when the adbloker is off.
Comments summary: lots of people who don’t understand how large search indexes work freaking out or pontificating about their pet favorite alternative. Occasionally both.
Meanwhile the DDG founder calmly stating it’s being looked into, is definitely a bug, and explains that this should not generally happen… and being ignored.
People understand how this works. Removal from bing means removal from duckduckgo.
Founder says they get info from many sources. Talks about local and factual information, flights, images, videos, etc as important categories while calling normal websites part of legacy web. Fails to mention they get this content from only bing. Tries to explain people are searching for less legacy while people believe they mostly use a search engine for legacy web.
I can't see a bug explanation being anywhere near the truth when every site delisted from bing is automatically delisted in duckduckgo and requires from some action to save it.
I explain it by directing you to take a basic course of data replication—-please for the love of god—-so you can educate yourself on this topic rather than come up with uninformed hot takes that veer into conspiracy territory.
Because god know a you’re not the only one that’s managed to turn “I don’t know why” into “I’m sure this is what is happening”…
Then how do you explain what we see? Tech Dirt disappears from both Bing and DDG indexes at the same time. Google, meanwhile, has 136 thousand pages indexed on techdirt.com domain
He claims when sites get dropped from bing they are automatically dropped from ddg and require manual efforts to get them up.
There is no process either. You need to connect with someone on the inside. No form, phone number or any reasonable way of contacting them. If you don't share a daycare timeslot with him or can get your submission upvoted here good luck
Actually it’s a fairly easy process to contact DDG. There’s the “Share Feedback” button on every search page. It may be a little less visible, but there’s also a contact email listed on the Press page.
Some of my stuff was in a Bing black hole for a while, and I did everything I could fix it, following their guidelines, contacting tech support, etc. It was good content, too--educational, no spam, SEO, ads, or tracking. Eventually I just gave up. Fast forward a year or so, and the search results spontaneously started working.
So it's a black box, and you kind of just hope for the best.
It would be neat to see a black list of all the things Bing and Google block from search results. I don't know how to get such a list other than brute force trial and error.
Then I guarantee your search engine will mostly contain spam, low-effort reposts, content farms, egregiously bad SEOs, and other useless stuff.
There's a reason why mainstream search engines like DuckDuckGo perform this kind of blacklisting. Because 99.99% of the time this blacklisting only renders invisible what is worthless anyways. But that 0.01% of the time when it incorrectly blacklists a site HN is outraged.
>Then I guarantee your search engine will mostly contain spam, low-effort reposts, content farms, egregiously bad SEOs, and other useless stuff.
I can't help but think I've encountered this type of content before...but where.
I also wouldn't exactly call this kind of response "outrage" but something close to befuddled amusement. "Whoops! Looks like we accidentally blocked those sites that criticize us! Yikes." If in err, then what an error! If by intent, then who would believe otherwise?
I suppose some cranky, overly emotional weirdos might get in a tizzy, but this is pretty bog standard stuff. It's just fun to gawk at the sweaty fat man on the tricycle trying to backpedal across the motorway.
It would be really nice if DDG had their own webmaster console (like Google or Bing) to help understand why pages might be indexed or dropped. I love DDG as my primary search engine, but it's a black box from the SEO side of the desk, unless I'm missing something obvious.
techdirt is working fine on Kagi. Ever since its launch, I've been predominantly using Kagi, barely resorting to any other search engine. I estimate that only about 5% of my searches involve Google, and that's mainly when I need to utilize Scholar or Books.
Kagi is definately worth the $10 fee. First, it allows me to block websites that I find irrelevant or annoying in my areas of interest, such as Pinterest or Quora. It also promises an environment free of advertisements and tracking. I particularly appreciate the excellent customer support, where actual human beings respond to your emails, not automated responses. Additionally, it offers 'custom lenses', which are essentially search templates. For instance, using this feature I can refine my search to target only educational sites, filter for PDFs, limit the time to the "last 48 hours", and search for specific subjects, like machine learning, along with my query.
My only gripe with Kagi is its stringent limit on searches, set at around 10,000 per month. I've bumped against this ceiling a few times. Despite this limitation, the wealth of features and the quality of the search results make Kagi a worthwhile investment for me.
And that's why we need a diverse ecosystem of search engines. As indexing whole internet is pretty much impossible, maybe several startups could take a slice of it? Some overlap would be useful too.
a more sustainable approach would be to use the "Share Feedback" in the (regrettably positioned) lower right corner of the search results page. The story I've heard is that actual humans look at that feedback
Title says [fixed] but that's only for DDG - which despite all the flaming it's getting, somehow did fix it while Bing has not (will it?). My unfounded theory is that Techdirt is effectively banned on some country and that leaked to the global Bing index.
I find it simply impossible to believe that Masnick is so incompetent at running a website after so long, that he's finding out from random friends that his site has been de-listed from Bing instead of him or his staff having their search console lighting up like a dashboard and pinging them with emails. I get emails from both bing and google if there are site indexing issues, and while they're not always super clear about why, the major stuff is pretty easy to fix.
I find it even harder to believe that when he confirmed the problem, he then didn't do anything about it for months.
It's ridiculous to talk about DDG being separate from Google. In the same way Brave is Chrome with some minor doodads and settings, DDG is minor reskin of Bing.
After DDG's earlier announcements delivering the image that DDG then severed their uneasy dependency from Bing, this makes them look really bad. Retorting with mumbo jumbo about secondary stuff like flights, business listings etc that are just related to website retention rather than their core structure as an apparent search engine, just makes it worse.
I still believe DDG has a place, in being a more privacy-focused aggregator to Bing and a few other sources, but this did shatter the previous image of DDG having a large degree of autonomy for me.
Update: Still investigating, but have made some progress -- Determined that on desktop there was a link to Techdirt up continuously via our About module (when you search for "Techdirt"). And now the traditional web link is back up as well (for desktop and mobile): https://duckduckgo.com/?q=techdirt&ia=web.