The comments here about building search engines that viably compete with Google in terms of quality are hilariously wrong. Hilariously wrong.
The dominating cost is not hiring smart people to work on the problem, it's hiring enough smart people to work on the problem.
Consider. Here at Microsoft, we employ a search relevance staff of -- what, 20% the size of Google's? And Google's engineers are rock solid. MS might be able to make up a small margin, but our search relevance staff -- good though they are -- cannot compete with a brilliant workforce that is 5 times larger than them. If we are going to actually compete with Google, the problem is not hiring smart people, it's hiring 5 times the current number of smart people employed for this task at MS.
Never mind buying such a team. MS can afford that. How do you even find that many people in 2-3 years?
When you consider the rest of the data, the case for building something like Google really begins to look grim. For example, how much does it really cost to build a search engine? We've poured at least tens of millions of dollars into just search relevance (I'm not even counting infrastructure). That's good mileage, considering that this way is littered with the corpses of companies like Cuil, who made investments in this area and failed miserably. Still, while this has been a great deal for us, we're still not there yet, and it's not clear we will be in the very near future. And so it's worth wondering: if MS can't buy something like that, realistically, who can? DDG? lol no.
In the end, this is the true dominating cost of building a search engine: people capital. Other bottlenecks, like engineering debt, politics, etc. pale against the sheer, awe-inspiring investment of Google in people capital.
You know what? I really can't get behind the idea that MS is somehow the stirling example of engineering prowess and discipline. This is the organisation who fails their way through every other operating system. One of the poster children for missing the boat on the internet, tablets, operating systems, cloud services, and more.
I also reject this idea that Google's got a lock on every smart person ever. That they don't have any politics or wasted effort. That they're the snowflake they think they are. This kind of thinking is pathetically misguided, and a huge part of their marketing. Google is not perfect. They are staggeringly weak in the face of real competition (witness Facebook vs. G+ and Apple vs. Android).
Your argument is a replay of the prevailing attitude in the mid nineties regarding operating systems (from the same company, surprise!). And then some college student named Torvalds pushed out the most influential operating system ever, and did it for free, with the help of the world.
Nobody and nothing is invincible, and people who say that a situation is unassailable are always, categorically wrong, given enough time.
Hi OP. Your points, as I understand them, are the following. Correct me if I'm wrong.
* MS is not a good example of a strong engineering org. Further, it is not a good example because Windows sucks.
* Google doesn't employ every smart person ever.
* Google can't actually compete. (ed note: in any field, or just against iOS and Fb?)
* My argument is like the argument that no one would supplant Windows.
* Given enough time, everything dies.
Points 2 and 3 are about Google. Let's start there. You're right that Google doesn't employ every smart person ever, but then, who said they did? :) But there are a limited number of search relevance engineers, and finding enough to keep up with Google is a monumental, and maybe insurmountable feat. If you want to compete with Google, you will need to have a serious advantage that supercedes this. That is just a fact.
Point 3 is about competition. I'll grant you that G+ is no Facebook, but Android is the most widely adopted mobile OS on the planet, and by a huge margin -- iOS is basically not even competitive, except for the top 5% of the market. Further, a billion smartphones will be bought this year, the most of which will be Internet enabled Android devices, and most of which will be bought in developing countries by people coming on the Internet for the first time. You tell me who's forward thinking there, because when that market comes online, it will be huge. The fact that you mention this as being non-competitive indicates to me that you might not know what you're talking about. :(
Point 1 is about the MS org. What can I say, OP, I work here, so maybe I'm not the best person to have this discussion with. But FWIW I chose this place over some much sexier jobs because the team I work with is arguably the best of its type in the world. There are bad neighborhoods, but the disparity between a good team in MS and a good team at Google is basically negligible. Also I think Windows is one of the great engineering feats of CS, so ... :| (Note that I still use UNIX at home.)
Point 4 is probably the result of confusion. I don't think no one can compete with Google. Bing has 20% market share! Clearly we can. But I do think it will be hard to compete with Google on search quality. I don't see how you can argue that.
And point 5 is obviously true but not relevant.
EDIT: Actually I now see your point 1 as saying "MS couldn't pull this off because they're not a good engineering org, but someone else could". Maybe someone else can build a better search engine, but I think what MS has pulled off with Bing is a monumental feat.
For starters, we built the entire Bing stack from scratch. No OSS. No common platforms like the JVM. Nothing like that. We started from nothing, and invented the server infrastructure, the data pipeline, the runtime that would support the site, the ML tools, everything. The fact that the site runs at all is a small miracle, but the site does not "just" run: the most remarkable thing by far is that the quality of our tooling is quite incredible, generally an order of magnitude better than the OSS equivalents. For example, the largest deployment of an OSS NoSQL datastore seems to be a few thousand nodes. The small NoSQL cluster backing our MapReduce implementation is stably deployed on a cluster an order of magnitude larger than this. This is something you only really see at companies like Amazon, Google, or MS.
I understand that the consumer market is not something MS is strong at, but I am hoping this gives you a taste of the scale and quality of what's happening behind the scenes. Happy to talk more about this if you drop me a line or skype me at `mrclemmer` :)
> For starters, we built the entire Bing stack from scratch. No OSS. No common platforms like the JVM. Nothing like that. We started from nothing, and invented the server infrastructure, the data pipeline, the runtime that would support the site, the ML tools, everything. The fact that the site runs at all is a small miracle, but the site does not "just" run: the most remarkable thing by far is that the quality of our tooling is quite incredible, generally an order of magnitude better than the OSS equivalents. For example, the largest deployment of an OSS NoSQL datastore seems to be a few thousand nodes. The small NoSQL cluster backing our MapReduce implementation is stably deployed on a cluster an order of magnitude larger than this. This is something you only really see at companies like Amazon, Google, or MS.
You lost me here. Not building on OSS seems like setting yourself up or failure from the start, particularly when you are fighting a manpower war, which is where OSS is beating every proprietary entity. OSS already powers Google and Amazon and OSS db's will scale to billions of nodes, not a few thousand.
For starters, we built the entire Bing stack from scratch. No OSS. No common platforms like the JVM. Nothing like that. We started from nothing, and invented the server infrastructure, the data pipeline, the runtime that would support the site, the ML tools, everything.
Wasn't much of the Bing stack built off Powerset, whose core technology was licensed from Xerox PARC?
For the time I was working there I'm quite sure most of the tools were internally made by MS:
- they had their own map reduce
- they had their own service deployment and management system
- their own db and nosql
- etc...
about the search and relevance they are updated quite fast, sometimes with big code replacement. This means that even if powerset tech was used as a start (and I do think it was not, it was possibly adapted and integrated -- i believe the base code was msn/live) it is now changed in most of its pieces.
>Windows is one of the great engineering feats of CS
Do I have to point out all the lame exploits and bugs which take so long to get fixed, the terrible design, the "things which should have been there years ago but we still don't have" like a decent file manager, task manager, copy utility? And AFAIK Windows made few major contribution to the theory of OSs (semaphores, threads, paging, scheduling and so on) so there's really nothing to be amazed at.
I'll spare you my opinions on Bing. Reinventing the wheel is not worth describing, no matter how beautiful that wheel is, although I'm happy for you to be a part of the team making that wheel. If only Bing had more ambition than just being a clone of Google Search, some people under 70 would actually consider migrating. But if you're happy with your default-search-engine-bundled-with-IE market share at 20%, good for you (eh, it does bring a lot of ad money). You may not call the shots at MS, but you can at least admit all the shortcomings.
You could point out all the lame exploits and bugs, but only if you want to spark a long argument that you will end up losing. There are a variety of things I don't like about Windows, and I avoid using it. But the idea that it's somehow less secure than other operating systems is for the most part a Linux advocacy myth. The fundamental security architecture of WinAPI is just not that different from that of Linux or OS X (which also has a legacy compat issue that complicates its security).
Fair enough, you know much more than me in that domain. :)
Are you going to argue about the famous general instability of Windows compared to its large competitors though? It seems like a good indicator of bad design in low level implementations.
What do you mean by famous general instability of Windows? Windows is perhaps my 3rd choice for OS, but almost all my clients use Windows and among the problems they deal with, instability is not one of them.
Hey devcpp, let me see if I have your points right.
* Windows actually sucks.
* Reinventing the wheel is not worth talking about no matter what.
* Bing sucks.
* I should admit the shortcomings of MS.
Regarding the last point, I'm kind of shocked that you think I'm a shill. I feel like I've been pretty honest about my feelings.
re: Windows sucks, that's sort of OT, but if you want to have the discussion, drop me a line. clemmer.alexander@gmail.com
re: Bing sucks, I don't see what your point about market share or unoriginality is, I already conceded that we have a lot of work to do with search relevance. I'm sort of annoyed about the negativity of your post, as from my perspective I've been pretty candid about what I think our strengths and weaknesses are. :(
re: reinventing the wheel, I don't think this is going to be something that we agree on. The way in which engineers here pull together and simply build what needs to be built is nothing short of breathtaking. I don't see how building, e.g., Cosmos shouldn't be considered an accomplishment. Should Amazon's Dynamo? What about Yahoo's Hadoop? What about anything in OSS, for that matter? I think you're not being fair here.
Not the OP - but I think you miss the relevance of point 5. Tech is littered with the irrelevant remnants and skeletons of "giant unstoppable forces". Look at Sun - in the 90s they were unbeatable, by 2004 they were quickly growing irrelevant. Oracle seems to be on the way out. Cisco too - they haven't done much exciting in a while, in a lot of circles they are viewed as the company holding networking back.
Other companies no one could unseat:
* DEC
* Xerox
* Apple (a couple of times)
* Corel
* Lotus
* IBM (a couple times in a couple fields)
* And on and on.
The point is that given a bit of time, Google will mess up, someone will come up with some new tech, and/or google will implode under it's own crushing weight.
I'm responding to the fact that OP claims MS is not a good engineering firm. I don't think that point is arguable. We are not just good, we are among the very best in the world in terms of engineering achievements.
Of course, whether we survive is another question entirely! I did not speculate on this, nor would I. Who knows what the future holds, we've just this year basically bet the company on some fairly risky things.
That said, I'm all for hating on large corporations, but the idea that Cisco and Oracle -- literally the market leaders in their respective domains -- are "on their way out" because they don't innovate fast enough is not a very convincing argument. :( Precisely who poses them an existential threat at this point? I see no one at all.
The difference between building an operating system and building a web scale search engine is that the former is mostly just work and there is a lot of material to learn from in order to inform that work. Sure, it requires talent, skill and perseverance, but there is a shitload of operating systems you can study. It also doesn't require tons of hardware, nuclear-reactor-scale power and enough bandwidth to rapidly copy, and keep updated, a significant portion of the web.
Sure, just making an OS doesn't mean it takes over the world. But guess what: you have the same problem with search engines.
This. When you do build a competitive search engine and you see how much money Microsoft has put behind theirs to essentially no avail (how many times have angry shareholders tried to get them to kill it off?) you start to realize a key fact.
If you want search engine competition you have to take Google's Ad business away.
Its weird I know but Google's search ads pay $80 - $100 RPMs and other guys ads pay $30 RPMs. If Microsoft could use Google's ad network they would be a solid contributor to the bottom line of Microsoft (which is why they can't of course).
If you could magically peel that Ad network/agency into its own entity and require it to give non-discriminatory terms to everyone, I believe we would have a pretty vibrant search space. My reasoning there is that the money associated with search advertising would fall into buckets that were much more closely aligned with market share, as opposed to today where someone like Microsoft can have a large share of the search 'eyeballs' but only a fraction of the revenue because their Ads don't have the RPM numbers.
Both Yahoo & Microsoft did a really bad job for a really long time at simply providing a platform that is comfortable for advertisers to use.
I have run ad campaigns on Adwords continuously for 7+ years. Throughout that time I have run ad campaigns on Yahoo, now Microsoft Adcenter, on and off. The last I checked their ad platform was about where Adwords was in 2005.
Yahoo had some very reprehensible things on their platform. I had to shut off all of my campaigns because someone at their company was changing what my ads said without my permission. Besides being a legal issue for Yahoo at the time, it put me in a position of unlimited liability. Fortunately, I never witnessed that behavior after Microsoft took over. Yet, Microsoft's platform was just too difficult to get working.
That was probably $5 million + in missed advertising revenue from me, just a tiny advertiser. I can only imagine the billions of dollars of revenue Microsoft and Yahoo lost for failing to take seriously the search advertising marketplace.
It is not clear that agency is the issue here. The search ad business is a long-tail business. People who read and comment on HN, who understand how search works, likely are heavy users of search yet contribute little revenue. But hackers are important to Google in other ways because they are the tech trend setters who turned Google into a verb and told their computer-illiterate brethren to just "Google" it. Now just imagine what some of those less literate, who believe in palm reading or tarot cards, would think when faced with a computer that appears to half read one's mind. I mean these are the real gold mines that the ads people are looking for. And once they get comfortable they are not just going to switch based on some technical merits they've never even heard of and the ad market will continue to pay more to those who can deliver more of the "gold mine" type users.
Honestly, the problem is that microsoft built a search engine like google, which needed to reach feature parity and until parity is reached consumers have no reason to use it. If feature parity is used, consumers have no reason to switch as the cost of switching does not outweigh the benefits of it, therefore you have an unnecessary product that you have to pump a shit load of advertising on, which creates more costs and no revenue.
The correct way, I believe, is to compete on non-consumption and capture those users, therefore you start small and execute well and scale up as users grow. You avoid the comparison to Google and can sneak in the back door without the crazy upfront costs. Imagine if microsoft focused on Knowledge graph before google did? It would be very interesting times, but every google competitor just competes head on. At the end of the day its about getting users and they don't care about the slight change in your algo, only in the way they feel when using your product i.e frustration, delight, anger, surprise, etc!
The advantage of being a startup is precisely that your mission need not cohere with the investments of a large corporation. You can do what you like.
From MS's perspective (again: I work here, but my opinions my own blah blah) the problem is this. People use computers to access the Internet. MS can't just be the OS and the browser used to access the Internet to maintain its lofty position as a field leader -- if MS's job is to supply the Internet as a service to people on MS devices, then it is mission-critical that it also be the landing page of the Internet. If MS gives up Bing, it might as well give up all consumer investments IMHO.
It is (IMHO) more important that Bing exists and is mostly functional than it is that Bing is equal or better to Google in every way.
Of course it is a huge priority to make Bing a viable threat in its own right, but what I'm saying is that this is not the only consideration.
That's an amazing insight, thanks, i hope potential entrepreneurs take note of this and realize the problems corporations face with innovation. It also makes sense as to why corporations are so quick to acquire a startup but also why most times the acquisition is not successful.
I disagree that this is the main reason why MS is behind Google on search.
I think the main problem of MS is that it is copying Google search and not providing anything substantially different or better. In other words, fighting against incumbent is not about spending more money than incumbent and doing the same. It is providing service for "niches" where incumbent is not willing or not able to provide service.
Here is an example:
Google shuts down Google Code search. I'm not happy. But, I was thinking... There is Bing... They will maybe jump in and provide state of art code search. But... nothing happen.
Is not the whole story of technology doing more with fewer humans?
Google could simply stay ahead of the curve, but this assumes all of that human capital is being dedicated specifically to building the most effective search engine.
What I see now is an advertising search engine. The friends who don't believe me are the ones who haven't turned Adblock Plus off in the past 3 years.
What about targeted search engines? Google has gotten progressively worse for technical searches as it has been improving generalized "what you mean" searches for everybody else.
Personally? I think there's lots of room for innovating search. Probably not in that area, but in a lot of other areas. Consider, for example, that the majority of the world will has an Internet connection only via phone. Mobile search is an entirely different beast. It is not clear Google's solution is the best here, and because the pattern of using Google on mobile is shaky enough that it could be supplanted with a convincingly better solution. That's to say nothing of the fact that Google's success in developing countries is actually not at all a given.
My biggest concern with Google is the phenomenal amount of data they track and record about users. They have no self-restraint when it comes to tracking the online behaviour of users. Now they can track across devices (phone, desktop, tablet) giving them unprecedented knowledge of your online activity.
Companies are collecting more data about online behaviour than ever before. And no-one has the same online reach as Google. From analytics, to apps, to fonts to jquery - there's barely a site that doesn't link in one form or another back to Google in some way. Google's digital fingerprints reach into every corner of the web.
I've said this before, but I was hoping that 2014 would be the year we become more privacy-conscious, but I don't actually think that will be the case. Google get an incredibly easy ride on the subject of privacy and online tracking from the tech community. They're probably salivating at the prospect of capturing even more precise user behaviour through an OS (Chrome) that potentially captures everything you do online. Google aren't capturing this data anonymously either. The tech community's response to this seems largely to be - so what? For anyone who cares about privacy, that's pretty depressing.
I didn't and still don't particularly care. The arguments to why this is bad always seem to be of the slippery slope / what if somebody does something untoward with the data variety.
The first one is a easily-dismissed fallacy, the second is not limited to Google or any other company. I have yet to see a convincing argument that Google is misusing this data or doing anything bad with it.
On the other hand, a service that knows you intimately enough can provide some very cool things that are otherwise impossible. The cards on Google Now, for instance, rely completely on the search history on your account and the location data from your phone. I get up in the morning and my usual route to work is plotted out with an ETA. I search for a nearby restaurant on my computer and the directions appear on the phone complete with ratings. Things like that.
My philosophy is to deal with any abuses if/when they occur (and mitigate the forseeable ones), instead of walling yourself off from the ever-more connected world. I'm starting to think there's a fundamental shift happening in what "privacy" is, why it's necessary, and what it means nowadays. And as usual, the choices are hop on, get out of the way, or get run over. For better or worse.
If I were Larry Page, I would put homomorphic encryption research at a much higher priority than quantum computers, or even researching robots (unless they intend to make most of their money from robots within a few short years, and not rely as much on search money, but that seems pretty unlikely in the short term).
If they plan on continuing to make money from tracking users, then they'd better figure out a way to do it very securely and without invading user privacy, otherwise they're going to feel an increasingly bigger pain in terms of public perception of Google over this, which could lead to them losing money in the long term, too.
I would also forget about tracking "everything possible" until then, and encrypt end-to-end stuff like chatting and video-calls (maybe they can do this one less costly through P2P or a hybrid system). I very much doubt they see a ton of money as a return from tracking and data mining people's chats online. And the downside is quite huge, because those online chats can be abused by the governments. So why not secure them properly? A little cost to them, huge privacy benefit to their users.
The goal should be to only track public, and not private information (at least in the short term, and then they should use homomorphic encryption for public information, too, as that will become increasingly more revealing, too, in the future).
I would also pay a lot of attention to what the Dark Mail Alliance is doing with e-mail encryption, and I would at the very least implement their protocol as an option for people who want to talk securely with others, from inside Gmail. They don't necessarily have to make all of Gmail encrypted by default, although that would obviously be very nice, but probably not very practical until they figure out homomorphic encryption.
There are also other things they could do to make e-mails a little more secure against abusive governments. In the US, ECPA allows the government to take the e-mail after 180 days without a warrant. So how about you ask people to give a password to G-mail, that's locally stored, and can automatically encrypt emails older than 179 days. If you need to access your own 6+ months old e-mails, then you're just asked to insert your password to access them. I don't see this as a huge issue for convenience, since the vast majority of 6+ months old e-mail is never accessed again by most people anyway.
The memories of a privacy-focused Google are certainly getting blurry, but I seem to remember there was a time when Google really cared about user privacy, and didn't have the same mentality as NSA for "collecting it all", storing it forever, and using it forever with data-mining.
Google has a lot of very smart people working for them. I'm sure they can come up with many more and much better solutions than even I proposed here. The problem is they have to want it. If it doesn't come as an objective from the top, then it's not going to happen.
From Wired, last year:
> Lloyd made his pitch, proposing a quantum version of Google’s search engine whereby users could make queries and receive results without Google knowing which questions were asked. The men were intrigued. But after conferring with their business manager the next day, Brin and Page informed Lloyd that his scheme went against their business plan. “They want to know everything about everybody who uses their products and services,” he joked.
Google needs to lose that attitude. Adapt or die, Google. And by adapt, I mean having their business incentives once again aligned with those of their users, or it's not going to end well for them.
Is PageRank really the indomitable tech of our generation? Nobody can do better algorithmically, or integrate some kind of crowd sourced feedback, or measure browsing time and habits, or simply hand tune some of the most competitive key phrases? I’m sure I’m oversimplifying, but I wonder if we haven’t all been hypnotised by the complexity, much of which is marketing hype...
Actually, the tech is extremely complex, and PageRank is only a tiny part of that. Try building a search engine sometime. To hackers inspired by this article: here be dragons.
DuckDuckGo took the only sane approach and aggregated results from existing search engine apis, then gradually mixed in some secret sauce. Even then, is DDG viable competition for Google? Will it ever be?
All the same, I join you in wishing for some Innovators Dilemma-style disruption here. A "toy" service comes along one day that's a substitute for Google search, but only within a tiny niche. Google doesn't take it seriously enough until it's too late...and we have a real ballgame on our hands again.
Thought experiment: what could that disruptive niche be?
> Thought experiment: what could that disruptive niche be?
Someone needs to write/buy a search engine and then build a really rich API into the internals of it that lets 3rd parties write customized search engines. How about an auto parts search engine, or a search engine for spanish-speaking people living in southern california? What about a Christian search engine, or a meme search engine?
When someone can empower 3rd party developers to make the same kinds of decisions as Google does, but with different tradeoffs, and they really put the full institution behind supporting that, I think:
A) Google will have a hard time competing because they won't be able to give the personal attention to the needs of what is, effectively, a community of modmakers.
B) This new company will capture the long tail of search. Only a sliver of that is covered right now, with a scattering of niche search engines (Google Scholar, etc).
C) The number of users could be VERY large. It could be the cable to Google's broadcast television.
D) Getting started wouldn't require any massive technological achievements. Just find an underserved niche where even a really really stupid search engine would work better than Google. Write it, figure out how to make money off it it, grow. Start with something that requires only a small index. Slowly expand into additional niches according to what will help keep the company and the tech moving forward.
Google's bet is that any information you can glean from someone coming to a niche search engine can be reasonably approximated with contextual information in the query. That's proved true for many queries, but the key is to find the queries where it's really not.
Essentially, you get the Yahoo results, and decorate, filter, reorganise, improve, combine as you wish. So you get the organic results, which you can then innovate upon.
Which means you have API access to the second most-used search engine in the industry. So what better way to voice your discontent with Google by supporting a competitor.
Running a successful search engine is expensive, it needs continuous investment into R&D. That's why Yahoo took a step back and partnered with Bing instead. The level of investment needed just to hold status quo with the existing market runs into billions of dollars a year, something Yahoo baulked at. Microsoft, however, were still strongly inclined to invest that every year.
Yahoo screwed up so bad here, read the research papers coming out of labs.yahoo.com back then and you realize they were onto knowledgraph before Google and actually had a head start, but they shattered their research division and lost a lot of those search people to microsoft and google. They just didn't have faith to create their own path and also lost their identity by deciding to become an entertainment company.
> Even then, is DDG viable competition for Google?
I'd say "yes". I've switched to DDG as my primary search engine, and I'd say they return a good result about 80-90% of the time. They're definitely not as good as Google, but you can always add "g!" to redirect to Google results.
I'd say their best selling point is their focus on privacy, but oddly they don't seem to be touting it very loudly or trying to make hay from the recent NSA revelations.
When watching talks like http://www.youtube.com/watch?v=vShMxxqtDDs I'd be surprised if Google's search model couldn't be leap-frogged if a competitor were to have live access to Google's dataset.
That if is actually Google's biggest strategic advantage though: the resources to keep a near-live and nicely deduplicated copy of all relevant data on the internet. [PS1]
That's why by far the biggest strategic threat to Google is actually JavaScript UIs + RESTful APIs and the easily accessible data sources they form.
Google's demise will eventually come in the form of the adoption of protocols which allow you to efficiently maintain a live view of a service's public resources.
[PS1] And as per https://news.ycombinator.com/item?id=7011816 click-throughs of course, although I don't see a way to side-step that with Google having implemented [not provided]
> is DDG viable competition for Google? Will it ever be?
For search? Absolutely, it is today.
However, Google figured out a long time ago that they could lock people in by offering a hundred other services too. DDG has no maps, no mail, no image search, no video search, no drive, no docs, no news, no book search, no academic paper search, no patent search, no stock graphs, no language translation, and so on. For some of those, it can link off to other services, none of which are viable competitors to the Google equivalents.
What do you mean by "locking people in"? I don't have gmail or youtube account, or picassa or any of that crap, but just use their search because it is orders of magnitude more relevant than DDG for my queries. And I consider maps, scholar, patents to be part of the search.
An irritation for me with DDG is them formatting result URLs incorrectly, and then ignoring any feedback about it. Two examples are them adding spurious www. prefixes to Google code results, and leaving out slashes when creating Apple developer URLs.
DDG could distinguish themselves by playing bazaar to Google's cathedral, but they don't appear to. A search engine that uses crowd sourcing and feedback would be disruptive IMO.
The examples. https://duckduckgo.com/?q=apsw - note the second link is on google hosting and shows code.google.com/p/apsw/ but clicking on it gives page not found because it goes to WWW.code.google.com/p/apsw/. https://duckduckgo.com/?q=NSString and note infobox at top which goes to Apple developer but clicking gives an error. The link should have a slash between Reference and NSString at the end.
Crowd sourcing would be really tricky. It's inherently easier to manipulate, the more traction you got the more resources would be spent on gaming the crowd sourcing system. Look at what those "reputation management" firms are doing these days with sophisticated blackhat SEO and wikipedia astroturfing rings for example.
Some kind of bitcoin-esque system where you bought sponsored positions by doing search engine scoring related number crunching could be interesting though. Since, if you ever matter, you'll have people devoting resources to trying to game your search engine results you need some system to deal with that. Of course it's sooo "out of the box" to suggest a bitcoin inspired solution to things right now ...
I think vertical search engines are part of the answer too. For general search engines Google have such an advantage in data and R&D that only Microsoft can afford to compete.
On the data side I think http://commoncrawl.org/ can help with creating vertical search engines. Their crawl is much smaller than Google or Bing but it is web scale (2 billion pages of 2013 data). Data recency is still a problem but it can help with finding which sites belong to a niche. Some smaller scale crawling of these sites would then be a much more achievable task.
> Thought experiment: what could that disruptive niche be?
Internal search engines could get a lot better. I don't know how people who make websites make them, what tech they use etc., if it's mostly incompetence that does this, but the fact that I'd rather use plain Google than an internal search most of the time tells me that Google is just too good.
The complaint seems to boil down to how Google deals with spammers. The question is, how is competition going to make it better? If we instead had five search engines that each had 20% market share, would people stop trying to game them? Would false positives in efforts to thwart spammers be eliminated? How?
The problem isn't even one that responds to market forces -- the "victims" (sites that should rank highly in organic results) are no one's customers. If you want to flip the script, they're the product. They have no leverage, and having more search engines leaves them with still no leverage. Only the users of the search engine have leverage because they can switch to another search engine -- but they can do that today. The problem is the alternatives are no better.
What we need is a cost-effective accurate way to identify spammers and exclude them from high rankings in search results. Someone who could do that more effectively could challenge Google, because there is a market for spam-free search results. But if that was an easy problem to solve then why hasn't Google solved it? They have the right incentives. It's just not easy, because spammers adapt. Whenever a search engine does something to thwart spammers, the spammers do something different.
Having more search engines doesn't make it easier, it makes it harder because each search engine has less resources to dedicate to it and they have to duplicate each other's work, and the cost to sites of legitimate optimization for a larger number of search engines increases which creates an even larger advantage for major institutions over small timers who can't afford the higher cost.
It's silly to say "why doesn't Google do X" and list some abstract thing you think would solve it which they've already considered and declined to do. There is probably a reason. Maybe manually curating every website is too expensive. Maybe arbitration proceedings would be overrun with spammers trying to challenge legitimate removals of their spam. And if they're wrong, don't speculate about it on a blog, prove it by building a better search engine. So far no one has been able to do it.
You falsely assume a new search engine can only be "as good as Google". Google's search has become highly commercialized over the years, it used to be about finding the "best" answer to your search query (the "original" PageRank), but it has increasingly become the "best consumerist" answer to your query (where to buy something). This make it more susceptible to spammers, as their basic incentives are similar. New, different, search engines might try another approach, perhaps less personalized, less commercialized, and less mass-media oriented, and be less of a "petri dish" for spammers.
Most of his complaints are reasonable (some aren't), what's happening is that Google is not evil enough for them not turning into actual damage. But if you distance yourself from the current context, you'll be able to see how the means Google use to deal with spammers are dangerous, and we are betting the Internet on Google not being evil (or somebody replacing them fast after they do become). Things would be better if we had 5 search engines with 20% market share.
But then, I completely agree that just complaining about the issue is useless. Anybody that (thinks that he) can build a competitor for Google will try it or not based on his odds of success, tolerance to risk, etc, not because somebody is complaining.
I don't think anything can compete with Google when it comes to search in English. I almost feel like Google can read my mind. With the adequate query you can find anything you've ever read on the internet if it's still online. Yesterday night I wanted to re-read a blog post about a guy that made a few bucks with QNX, but I didn't remember much I tried a few queries and I got the expected blog post with this:
I feel the same. I wanted to find a video about a zombie prank in a New York cafe and did a Google search for it. The first result was the actual video that I needed.[1]
I would say that Bing already is a viable alternative. Having used it for several months now, there aren't too many queries that it doesn't handle well, and usually when I check Google's results for those queries, they are equally bad.
Bing has finally gotten to the point where I don't fallback to Google queries anymore (I used to end up doing that about once a day in the past). I just use Bing now, and only use Google when I hear about some interesting Google art on their homepage.
It's certainly true that Google's search product has been getting worse and worse for years. You have to jump through hoops to get the real ("verbatim") search results, rather than "what we thought you might mean".
Personally, I will try gradys suggestion because I need to use several google products for work; I know Microsoft is just as bad as Google but at least it's not Google, so they'll both only have more partial data.
I wish more people would use and/or improve YaCy[0]. It's a decentralized search engine. Client is written in java. With the power of decentralisation hardware shouldn't really be an issue. It's just that the search result somewhat sucked the last time I tried it.
But I guess now is a good time to install it again, help scraping the internet, and maybe hack at the code.
I love the idea of this stuff but I'm so scared things like this install a backdoor to our PCs. Is there any way to mitigate this? Run YaCy in a VM on VirtualBox?
>1. Google is making arbitrary rules on how sites should behave, because they have a monopoly.
How are rules against paid linking scams or procedure generated content farms considered arbitrary? It's clearly trying to game the system, and the rules are explicitly laid out to tell you NOT to do it.
>2. Google needs these rules, because Google’s rankings are apparently trivial to game.
If this were true, you can make millions executing your plan for any number of websites. It's not.
> How are rules against paid linking scams or procedure generated content farms considered arbitrary? It's clearly trying to game the system, and the rules are explicitly laid out to tell you NOT to do it.
The whole case with the delisting/penalization and subsequent(and extremely speedy) re-listing of RapGenius is a great example of Google's current arbitrary practices. I covered this is some details on another thread[0] in a couple of replies[1][2][3][4].
The rule they broke is clearly laid out [1]. The practice of manual penalties is as well [2]. They outline a course of action on how to remove the offending links [3]. How is this arbitrary?
One man's advertising is another man's paid linking scam. Do you know where to draw the line? Google doesn't - or at least they can't create an algorithm that knows the difference, even after significant investment in the problem.
The 'system' they are trying to game is Google's system - tuned to maximize the profitability of Google's ad business. It's not some benevolent public good that is being 'gamed' here.
Perfectly valid expectation with near impossible odds of coming true.
Google has gotten so rich, entrenched and popular that IMHO no competitor can dislodge it. I say this as a thoroughly-disappointed user who's tried nearly all the alternatives to Google in the various segments that it operates in. I've managed to stop using nearly all Google services except Android (running CM) and Search.
As others have pointed out in the thread, DDG is nowhere near as good, especially if you're not in the US (I'm in India). After having forced myself to use DDG for a month, I've now resigned myself to Google searches with a couple of extra steps:
- all searches performed while logged out of all Google services
- browser plugins to rewrite all Google tracking URLs from search results
So sure, we need viable search engine competition. But don't wait up for it either.
"Google has gotten so rich, entrenched and popular that IMHO no competitor can dislodge it."
Isn't this the definition of a monopoly? And if so, isn't that reason enough to consider search as a public good or a publicly regulated means of accessing information?
Judging from the various anti-trust cases that have been brought on against Google around the world it's clear that proving that is nearly impossible too. More so because Google operates in a sector (Internet software) that is theoretically open to infinite competition and zero switching costs.
There's a whole issue of what the definition of 'using google' is to people. Even if I never search with Google, I'm still often using google stuff - or, more precisely, Google services are using/tracking me. And there's no obvious way to turn it off or opt out of it.
If you think Google is impervious to competition, maybe you're thinking about the challenge in the wrong way. If someone beats Google it won't by being exactly the same as Google but better, but by solving what Google solves in a better way. A relevant Quora answer: http://www.quora.com/Search-Engines/What-should-a-new-search...
Part of the problem is that Google's PageRank algorithm is patented and the patent won't expire until 2017.
Google insists that its current search mix is based on a lot more than just PageRank, but it seems that PageRank probably contributes the foundation of their business. I don't see a way of competing with Google's results unless we are allowed to use something like PageRank, which we can't do unless we pay royalties or wait until it expires.
I'm not saying that PageRank is the be all end all of search algorithms, and certainly someone somewhere could come up with a different method of ranking superior to Google's. Ranking pages by how many other pages cite them seems like a pretty fundamental insight, and where I would start with any new search engine.
PageRank is almost certainly only a signal used in a learned model. An important signal, but probably one feature in hundreds, if not thousands. It was a critical algorithm to helping them overcome Yahoo in the 90's, but I doubt it is as essential these days.
What is the most important signal? Click-throughs. This is why any new search engine is at a massive disadvantage.
As you said the PageRank is just one of the many features used by the ranking algorithm. I highly doubt that there is any patent issue here (it's "just" an eigenvector computation), and there is a ton of literature and practical evidence that you can build very good web-search ranking technologies without PageRank.
It would seem very surprising to me that one can patent a feature used in an algorithm for commercial purpose...
a) PageRank is just one of several hundred factors used for determining ranking. Search engine ranking is a lot more complicated than even most information retrieval-programmers tend to think.
b) There may have been a window of a few weeks or months in 99/00 where not every major search engine used some form of link-based ranking, but it was, as noted, a very brief period.
It always amazes me that people for almost 15 years now have believed in the myth of PageRank's uniqueness and power. I would like to challenge people to think about two things:
- how useful do you think a pure static ranking of web tens of billions of web pages is? Think about what it represents. What does it mean to assign one page higher rank than another?
- do you really believe that other search engines would not have implemented PageRank or something similar? Do you really think that search engine designers do not read papers and apply every trick in the book that they can manage to implement in a scalable manner?
There are lots of hard problems you need to solve if you want to build a web scale search engine. If ranking becomes your biggest problem: that would be a luxury. The biggest hurdle today is money to buy computing power and storage. The time when you could build a competitive web scale search engine for regular startup-money is over. It has been over for close to a decade.
It saddened me greatly when Yahoo threw in the towel because it meant that search in the western world was effectively a two-horse race. And once you get off that horse there is no getting back on it again without some seriously heavy lifting.
It's not really a static ranking, though, is it? The PageRank is constantly recalculated when incoming links change.
I don't know what other search engines have implemented because none of them will let me see their code :-) I do believe if they could implement a link-based ranking system without fear of being sued by Stanford they would. I don't know how different a link-based ranking system would have to be from PageRank to avoid getting sued by Stanford, or whether Stanford litigates this when they suspect unlicensed use. I'm guessing they do sue to defend the patent, because Google's royalties for using the algorithm number in the hundreds of millions, a significant amount towards Stanford's endowment.
The scientific literature shows that PageRank is not as good as it is cracked up to be, particularly with the TREC style of evaluation.
The issue is that PageRank is a general factor which doesn't have much to do with the question of ("is page A relevant for topic B?") If PageRank causes a popular but irrelevant page to rank above an unpopular but relevant page it is part of the problem, not the solution.
I didn't realise this. Is this really enforceable? One of the tools that Moz sells is their PR proxy, opensiteexplorer. I wonder how they're able to replicate PR without running afoul of this?
Also, is PR really the best, only way to do this? Seems like there are all kinds of better (more modern) signals we could use other than links, which were kind of the only game in town 10ish years ago.
Moz uses the original pagerank research paper published before google incorporated. It is the original seed formula which google later modified and now claims it is hardly using.
Google pays Mozilla hundreds of millions a year to keep its default status in the search bar. Perhaps being able to use PR was part of the deal, or perhaps Mozilla is rich enough to pay royalties to Stanford. Or perhaps Google doesn't mind because Mozilla doesn't seek to directly compete with Google on text search.
This is one of PG's frighteningly ambitious problems and I believe it's an important (and really hard) problem. I think the answer is to focus on an important niche and grow from there. DDG has done a good job by focusing on privacy; they're currently at nearly 4m queries per day.
I'm starting by building the search engine I want to use while writing code, which does parallel searches of different parts of the web dynamically based on the query.
The internet is now littered with badly written content linking to more badly written content, all in an attempt to increase inbound links and ultimately rank higher in Google. And since Google can't tell the difference between badly written and well written content companies continue to pump this crap out. A lot of problems might be solved with some healthy competition but I don't think this particular problem would be. It would require a new way of ranking that severely devalued inbound links and that seems like a really, really hard nut to crack.
I agree with the article. I wrote last week (http://markwatson.com/blog/2013-12/practical-internet-securi...) about using two browsers. Chrome with default security settings: used only for gmail, google search, facebook, and twitter; Firefox with ghostery: all normal web browsing, search using duckduckgo and bing.
My setup encourages me to at least use duckduckgo and/or bing when I am using firefox.
FYI: I would keep tabs on Ghostery, there's a for-profit company behind it that works with Ad Networks - They don't do anything bad now but experience shows that a massive user-base that you make zero revenue off can be a tempting fruit when profits start falling.
That's exactly what my startup is trying to address - how do you help users navigate 100, 1000 or even a million search engines? Ultimately, you need a way to browse search engines by topic and bundle them into custom lists for your specific needs. It's early going, but check Nuggety out if you're interested: http://nuggety.com/
There is no need to navigate million of search engines. There aren't too many to speak of.
Aggregating services are successful only where there are millions of equal service providers (like hotels or restaurants) and the one needs a one-stop place (aggregator) to search.
When Google (and Bing and others are far secondary) aggregator model couldn't really do much.
Sure. I know the site needs some more clarity and that's what I'm working on now. The basic idea is a Pinterest-type community where you can build a list of sites on any topic and share lists with others. But each list lets you search each website directly from the same search box (results come up in a new tab, like Kayak).
So what is it good for? Any subject where you want depth or breadth over surface results. So collectors and researchers can build a portal for themselves. And there are some interesting possibilities for specific needs, like a search list for many places to find a cached URL: http://nuggety.com/u/nuggety/cached-webpages or a search list to search any of Google's 190 international websites: http://nuggety.com/u/nuggety/international-google
Last year i wrote this: https://medium.com/surveillance-state/32ba2b38c219 - and got a proper dollop of hate from certain segments in the webmaster community. They claimed it was 'sour grapes' - I have 140k visitors a day, am doing great, better than great. The monopoly is bad for all of us, even though I'm benefiting as it is.
"Besides emphasizing Office, Elop would be prepared to sell or shut down major businesses to sharpen the company’s focus, the people said. He would consider ending Microsoft’s costly effort to take on Google with its Bing search engine, and would also consider selling healthy businesses such as the Xbox game console if he determined they weren’t critical to the company’s strategy, the people said."
Yes, search is hard. Look how most websites that build a search for their own site still do a worse job than Google's site search. And its not like people haven't tried to take on Google. Google always faces the threat of companies succeeding in smaller areas of search, but so far it's dealt with those pretty well too. The other threat it faces is a (big) company developing an alternative approach, e.g IBM Watson.
Here's an idea... how about instead of using just one search engine website, browsers will have a functionality that allows you to search for something and have it query to 3 different search engine at the same time? Users can configure which 3 search engine he/she wants to use. That way, website owners won't be so affected when one search engine screwed him up.
or: We need an inclusionist wikipedia - where every possible search term starts with an empty page/any possible search term would have an entry, even if it's just for redirecting a spelling mistake. Eventually every search term would grow its own little community of experts. Unfortunately, as pg said, "Deletionists rule Wikipedia."
I am sorry to say this, but I have been trying to move to DDG for a while now. Its currently my default search where ever I can make it a default search, and frankly, for me, it's pretty much useless. Most results point to US sites, and in most of my search cases, thats no good. Unless my search is incredibly simple, I end up back on google.co.uk. OK, ethics wise, I don't like google one little bit, but in the end, google gives me useful results. DDG rarely does.
I simply do not know how people, and there is a lot of them, come to the conclusion that DDG results are in any way better than google, apart from the fact that DDG is not google. It would not totally surprise me if people were more making do with DDG because its not google to make a point, and as a result being over generous about its utility.
Maybe for Americans, DDG does give better results, I cant say, but I'd like to know on what criteria that is based. But for me, a non American, like I say, DDG is regrettably mostly useless.
as lot of people have said.. u cannot beat google at search. but, there is a way.. push information that users are looking for before they even think about it (similar to Google NOW). search is still a sophisticated operation for regular users. by showing the right information at the right time, users don't even have to search.
Funny. I use DDG as my default. Frankly, I think it's results are way better than Google's. Yes, they are more simple-minded and less "intelligent", but then they are much more predictable as well. As such, it took some getting used to, but now I feel like I search more precisely than I ever would with Google.
Way too often, Google will try to be "smart", and search for some "intelligent" interpretation of my search, where no such thing was called for. Especially for technical, precise searches for "strange" strings, this can get really annoying.
Also, those bang-searches are just genius. I regularly search Wikipedia, programming language docs, or maps using those. And yes, sometimes Google as well, mostly for fuzzy "how do I" searches or searches that I don't know precise keywords for.
> "Way too often, Google will try to be "smart", and search for some "intelligent" interpretation of my search, where no such thing was called for. Especially for technical, precise searches for "strange" strings, this can get really annoying."
Wow, so it's not just me then - I was describing my exact same feeling to someone (perhaps less eloquently than you!) just three days ago!
Even when I go and set it to 'Verbatim' search (which is well hidden), it still often gives me useless results for these kinds of technical queries...
We need an open-source, or decentralised (i wish) solution. That is as close as you can get to a pipe-dream at the moment. Doesn't mean we won't get there eventually though.
DDG is good enough most of the time: What's the formula for the volume of a tetrahedron? What's a "Cleveland Steamer"? When it's not, there's a Google link right at the bottom of the page, so you can turn on private browsing and click on that.
I use duck duck go as my default in my search bar, but I rarely actually use it to search the web directly. Instead, I use the !bang features to get where I need to go faster without taking my hands off the keyboard. Some examples:
ctrl-t !w cyclocross
ctrl-t !g site:ohio.gov filetype:pdf economic development
The dominating cost is not hiring smart people to work on the problem, it's hiring enough smart people to work on the problem.
Consider. Here at Microsoft, we employ a search relevance staff of -- what, 20% the size of Google's? And Google's engineers are rock solid. MS might be able to make up a small margin, but our search relevance staff -- good though they are -- cannot compete with a brilliant workforce that is 5 times larger than them. If we are going to actually compete with Google, the problem is not hiring smart people, it's hiring 5 times the current number of smart people employed for this task at MS.
Never mind buying such a team. MS can afford that. How do you even find that many people in 2-3 years?
When you consider the rest of the data, the case for building something like Google really begins to look grim. For example, how much does it really cost to build a search engine? We've poured at least tens of millions of dollars into just search relevance (I'm not even counting infrastructure). That's good mileage, considering that this way is littered with the corpses of companies like Cuil, who made investments in this area and failed miserably. Still, while this has been a great deal for us, we're still not there yet, and it's not clear we will be in the very near future. And so it's worth wondering: if MS can't buy something like that, realistically, who can? DDG? lol no.
In the end, this is the true dominating cost of building a search engine: people capital. Other bottlenecks, like engineering debt, politics, etc. pale against the sheer, awe-inspiring investment of Google in people capital.