Hacker News new | past | comments | ask | show | jobs | submit login
Dorking: the use of search engines to find very specific data (alec.fyi)
560 points by abarrettwilsdon on Aug 9, 2020 | hide | past | favorite | 195 comments



A few corrections:

The + (formerly used to force a term to be present in the result) and ~ (also find synonyms) operators have been deprecated.

Google now advises to wrap the word in quotes instead of using the +. Google will also automatically look for synonyms without the use of ~.

I have seen 'AROUND(n)' mentioned in many other places working as a proximity operator in Google, but I don't believe that is true and haven't found it to work in any logical way.

Also the use of parentheses to nest queries is not necessary in Google. It is actually required for Bing on complicated queries though.


Worth mentioning that even if you put a term in double quotes, Google still tries to be too clever - you are not guaranteed to get results that contain your quotes search term :/


This must be a recent change? It's been driving me nuts lately. I have to resort to adding a lot of negated search terms to compensate but it's still sub optimal.


No, this has been the case for a long time, years anyway. I don't know if it goes back quite as far as when they removed the '+' operator tho.

But bejesus, this drives me nuts! If I know the double quotes function even exists, then Google should know I actually want to use it as intended - it shouldn't decide "yeah, but maybe you'd like these irrelevant results too!"


I think it’s primarily for people copying and pasting something like an error message, which may have user-specific data in quotes. That should be what “not many results for...” is for, but Google is always trying to optimize away those clicks. Maybe they could add a “programatic” search feature with documented syntax for power users.


A moderate-length string of quoted words (>5?) used to return zero or near zero results and a suggestion to try without quotes- imo ideal


Power users aren't where the money is.


Maybe they should go back to +


Add? They used to have one and actively removed it.


I am always surprised that these systems are not more friendly to engineers, since they were built by them. You'd think for their own sakes they'd stick in a system like xkcd.com/806/


Use 'verbatim'. It's a drop down option, and it isn't as good as the old + or other operators (deprecated because of Google Plus). For example, with verbatim you can't only + one single thing in a search.

Regardless, I basically just always search with verbatim on. Google is mostly useless otherwise.


Thank you. I quickly changed my default Chrome search to https://www.google.com/search?q=%s&tbs=li:1 instead of regular Google


It seems to me that after a few negated search terms are included, they are taken less strictly; "minus" seems to mean "probably minus".


And even if that exact term is present on popular websites, like Stack Overflow, Google still seems to have trouble finding those exact results regularly.


As a workaround and under search tools one can enable the “verbatim” option.


I was under the impression "verbatim" is to disable filter-bubble personalisation.

Normal queries are tailored to your personal filter bubble. You can't see what other people see from same search, and if you're doing SEO or just trying to find who tends to come top in results for something you have a lot of history looking at, you can't tell who comes top for other people.


Verbatim doesn't work consistently: often it helps a bit, often it gets ignored it seems.


AFAICT, the verbatim option gives the same results as if I'd quoted my search term?


In my experience it depends on the number of results, and the results are more accurate with verbatim.


It’s like being able to touch your nose in the dark, or even having been blind since birth.

Verbatim searches are for those easy tasks when you know literally, exactly, what you’re looking for.

Something tells me that a lot of folks on HN are being let down by Google in this area. I don’t mean it in a bad way, I want to identify these issues to help everyone. I wish the UI of search wasn’t so textually-bound.

Is there such a thing as an augmented reality search engine? You search like you do at the library, or when you lost your keys. But for data and virtual objects, they lose out - their evanescence creates a perception gap. There is no correspondence with physical objects, etc. We are getting better with haptics, however.

Rainbows End[0] is a future society I’d choose to live in, especially right about now.

Anyone have any good book recommendations?

[0] Vernor Vinge

https://en.wikipedia.org/wiki/Rainbows_End


I think you use 'intext:"search term"' now don't you? Unless that has changed now too


Yes, just put a + in front of the double-quotes and you get what you want.


Google has been gradually becoming useless for anything but the most basic/popular terms, over the last few years.

Just now I had to give up trying to look up "the term for fans that are paid actors" and variations.

Asking on Reddit or Stack Overflow would be faster than Google's search engine for some things now.


The plus operator in the page appears to be binary rather than unary. I've never used it. Is that affected as well? (Though I'm confused why AND is necessary. Isn't it implied normally?)


That is correct. AND is added by default and is never necessary in Google.

It's a little confusing because fo how Google implemented some of the operators. The boolean + operator in many cases is used in the same way as AND, but Google originally used it to let users to force a specific word to be present in a search result.

So a search for Fish +Chips was a search for both words, but 'Chips' MUST be present. The equivalent search today is Fish "Chips". It's a little annoying because it requires typing another character, and it it is still not always respected.


I think the default is more like AND/OR. If AND were the default, and I put a list of five words in without any 'AND' or 'OR', then I'd expect to get only pages that have all 5 of those words. Now of course google looks for synonyms, unless you put quotes around each search term (which ostensibly blocks synonyms, although as you and others mention, this doesn't always work). But afaict, returned pages don't even necessarily have synonyms for all five search terms. That's especially true for words that don't have much semantic content, but it seems to be true even for very specific words. And that's more like OR.


Updated the article to reflect and credited you for the contribution!


When you say "deprecated", you mean as in "discontinued" right? Not just like, discouraged?


Whoever the first developer was that used "deprecated" got it kind of wrong, the word should have been "depreciated".

Deprecate: "express disapproval of."

Depreciate: "diminish in value over a period of time."

I kind of cringe when other developers say "deprecated".

Edit: Versioning and not removing APIs is kind of the way to go, so you don't break client apps that possibly can't be updated easily or at all. "Depreciated" is a far better word to use with a far better outcome. AWS versions their APIs, they don't remove old ones. "I disapprove of using this API and we're taking it away at some random date" vs "this isn't the latest API, use the current one for new development" seems like a pretty stark difference in thinking to me. YMMV.


Nope.

It is deprecated -- it's use is disapproved of, you should stop using it. In the future it will go away but for now it works, so you can use it, but its use is discouraged.

Depreciated doesn't make any sense -- the value of the deprecated API does not diminish over time. It works, until it stops working. It's on or off. It doesn't work less and less every month or anything. It currently still works completely, but is deprecated -- that is, discouraged. At some point in the future, it will stop working, completely.

the rest of us don't just kind of but REALLY cringe when people say "depreciate" when they mean "deprecate". They are different words, "deprecated" is the right one, it is intentional, it is the word.

Sorry, you are the one using the wrong word.


Yeah, nope yourself. It seems like a lot of people aren't really thinking this through very much.

And that is absolutely the wrong way to approach API development. An API that is being sun-setted should never be removed, because older clients could still use it but sometimes can't be upgraded to newer clients. Removing a v1 API breaks those clients and it's a shitty thing to do to users. Yeah, people should be building NEW things with it, but there's no reason to look at the v1 API with "disgust" as "deprecated" implies - It's simply an older version that should remain functional, if your system is worth half a shit. AWS doesn't terminate older API versions, they just create new versions. Or you can be like Facebook and "deprecate" stuff and just shut it down before your official shutdown date, or not give any notice at all - that's REALLY a fun culture to work in, I guess, for them. "deprecated" is a really negative word, and doesn't even really translate to anything good in terms of software development. It's my opinion that "depreciated" is a far better word and far better outcome when used in software development instead of "deprecated". YMMV.


OK, I understand you have an opinion that API design should be done in a certain way (by the way, by "API" I meant like method signatures, not network API, but it could be either).

And I understand you disapprove of the word "deprecated" being used to refer to API that is discouraged, usually because it will be no longer supported/going away in the future.

But that doesn't change the history of the word. The word "deprecated" is what engineers have been using, intentionally, for several decades.

"Depreciated" is a mistaken variation. Even if you think "deprecated" has unfortunate connotations, it still doesn't make "depreciated" right. "Depreciated", as you said, means losing value over time. That is, 10% a year or something. Deprecated API does not "lose value over time".

The word "deprecated" has historically been used to mean that certain API (again, likely a method or function, I don't mean network api specifically) is now discouraged, it's use is disapproved of. Usually becuase it will be going away in the future. Arguments about whether this is the right way to do API change are entirely separate to this historical and current usage, where API change often IS done this way, and it's what the word is used for.

You can have opinions of how you'd like to people to handle API change over time, but that doesn't chagne the fact that "deprecated" is the word engineers have meant to use for decades. If you'd like to advocate for a differnet word and/or different practice you can -- but all "depreciated" has going for it is it sounds confusingly similar to "deprecated", it is not the word you are looking for.

> Not to be confused with Depreciation.

> In several fields, deprecation is the discouragement of use of some terminology, feature, design, or practice, typically because it has been superseded or is no longer considered efficient or safe, without completely removing it or prohibiting its use.

> It can also imply that a feature, design, or practice will be removed or discontinued entirely in the future

https://en.wikipedia.org/wiki/Deprecation

> In accountancy, depreciation refers to two aspects of the same concept: first, the actual decrease of fair value of an asset, such as the decrease in value of factory equipment each year as it is used and wears, and second, the allocation in accounting statements of the original cost of the assets to periods in which the assets are used (depreciation with the matching principle)

https://en.wikipedia.org/wiki/Depreciation

> In economics, depreciation is the gradual decrease in the economic value of the capital stock of a firm, nation or other entity, either through physical depreciation, obsolescence or changes in the demand for the services of the capital in question.

https://en.wikipedia.org/wiki/Depreciation_(economics)

Depreciation has nothing to do with what we're talking about, it's not the right word. Deprecation is the word that has been used for decades for API whose use is discouraged, often because it will not be supported in the future. You can argue that a new term is needed, but that's your argument not a historical usage, and there's no reason you need to limit yourselves to words that sound confusingly similar to "deprecation".


You got a little repetitive there, but yes, I agree: "deprecated" means stop using this because it's going away.


I sure did!

Technically, it doesn't have to be because it's going away, although that is common. I think it always means there's a better recommended way to do the thing, but sometimes the deprecated way doesn't go away.


Deprecated is a word. No negativity implied, but equating "deprecated" to "depreciated" is insular. It is okay to be wrong, I have been wrong (George Foreman Grill dissenter..."it drips fat in the front? Gross."). I am not sure what your native language is, and I wish I did so we could communicate, but it is like calling a "warning" as "decay" in English (US).

Typo corrected.


Isn't deprecated actually correct here?

It means the feature still works, but will be removed in the future or is no longer supported. There also be may a new implementation of it that the developer would like you to use, hence the warning that it's deprecated.

Depreciation implies a rate of change over time, which isn't the case. Today we deprecate feature X, and in two years we plan remove it. It never depreciates.


But "express disapproval of" is exactly the meaning intended when we say that a feature is deprecated. It signifies that it is best practice not to use it.


If it’s given as a warning then yes, e.g. the dplyr package in R sometimes outputs “feature xyz is deprecated and will be removed in version x.x”.

Often though it’s used when the feature is already removed, i.e., it’s not only best practice not to use it, but also impossible with that version.


In this case, depreciated is incorrect. Removal has already happened, the "period of time" is already over.


Removing APIs is not a great practice though. Look at AWS, they version their APIs, they don't just remove them, and removing them should be unnecessary if your underlying tech isn't brittle and badly written. "Depreciated" is a far better term to use, with a far better outcome in my opinion. Companies that remove old versions of APIs and break existing client apps (that possibly can't be udpated) really suck.


1) Whether you agree with the practice doesn't affect the terminology used. People remove APIs. Before doing that, they deprecate them for a period to advise people to move off of them.

2) If you were to always maintain backward compatibility, how is "depreciated" in any way an accurate term? If the old API continues to work indefinitely, its value stays the same.


I don't think these two are incompatible?

If APIv3 has a `/foo` endpoint that is deprecated, usually I take that to mean that the developers discourage its use, and likely plan to remove it in a future version (say, APIv4 or APIv5). `/foo` will never be removed from APIv3, because that would be a breaking change, and so if I'm willing to stay on v3 forever, that's fine, but in the (likely) event I will want to take advantage of new features at some point in the future, I'm doing myself a disservice by using /foo because it will make the migration harder.

There is at least one case where I think "deprecated" is clearly, inarguably, the right word: when the developer wants to remove a part of an API (say, because it is a large maintenance burden), but it's also committed to stability, so they won't remove that api until some acceptably small number of users are using it.


Right, you version APIs and have a policy of deprecated APIs being removed in the next version. Or you can just copy and paste a comment calling people shitheads for politely disagreeing with you. Your call.


The “public APIs form an immutable, irrevocable contract” argument means that an api layer with these tenants is always going to be a source of technical debt. Get it right the first time or fight an ever growing compatibility matainance war - even when your instrumentation is saying that old apis aren’t being used, just published, seems like a footgun


This is a jaw-droppingly arrogant attitude. You're trying to justify your own incorrect usage by asserting that the person who coined the term decades ago "got it kind of wrong"? And you cringe when others get it right?

"Depreciated" is absolutely the wrong term, because it implies that the value is less, when the intent is to communicate "this is still fully functional, but you are warned away from it because it is targeted for future removal." Deprecated.


Feels like I often see it used to retire APIs that are now understood to be unsafe, insecure, or otherwise a bad practice for some reason. It gets replaced with an API that does not inherently have that problem, and the old one is in deprecated. it feels like "expressing disapproval of" is the right definition in that case. It's only there for a migration period to happen more gracefully, but its continued use is frowned upon, and not just because it will eventually be removed.


If you deconstruct the original latin that forms the word, it's literal translation is something like "ask to go away"

de == away

prek == ask


In French, we have an equivalent for "depreciate": déprécier. But we don't have a close relative for "deprecate" (which translates to "désapprouver, dénigrer" and would never be used for an API).

We tend to use the terms "déprécié" (~depreciated) or "déprécaté" (~deprecated but not valid French).

On the other hand, "deprecate" seems to also translate to "mark as obsolete" according to https://www.wordreference.com/enfr/deprecate

I guess both terms make sense but I would keep using "deprecated".


The first definition is intended and more fitting for the usages of "deprecated" I've encountered.


I used to always use "depreciated" until I was embarrassingly corrected one day :P


To be frank, grandparent sounds like someone who was corrected one day, and rather than learn something and move on, dug in and developed a detailed justification for why the rest of the world was mistaken so he can cringe about their ignorance.


You could not be more wrong if you practiced every day https://www.etymonline.com/word/deprecate#etymonline_v_29603


I appreciate the etymonline reference but I'm afraid you've been breaking the site guidelines quite badly in repeated comments such as this one and https://news.ycombinator.com/item?id=24101885. We ban accounts that do that because we're trying for a bit better than internet-default outcomes on HN. Would you mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the spirit of this site more to heart? We'd be grateful.


No longer have Google Chrome on any devices, switched over to Chromium Edge.

Same browser, different overloads.

Left the default search engine as Bing, but only because Duck Duck Go is useless for geographicly local search.


I don't know what they did but several weeks ago DDG got magically better at geographic local search (I do not live in the US and pound my head on that one too).

One day I actually DID need to find something local and so I dutifully typed my search into the FF url bar and hit enter. My default search is DDG but my brain stroked off for a second because the results that came back were 100% what I was looking for and I thought I was on Google. Give it a shot again, they're getting better.


Useless? "Plumber Timbuktu" returns plumbers in Timbuktu.


I'm on Launceston Tasmania, DDG returns results for Launceston Cornwall England.

I'll try again though as your sibling comment suggests it's improved.


Might be my librarian career bias but I'm always surprised at how few people know about query operators. Ironically as Google search seems to be ignoring vital parts of people's queries, they are becoming more needed now, whereas years ago I would have assumed a constantly improving Google search would get better at determining what I was looking for.


The operators don't work as well as they used to, and even when using them lots of results are still left out or are not an exact match. The combination of the SEO arms race and Google's algorithms to filter "bad" information make it nearly impossible to find some things. Sometimes you are looking for that "bad" piece of info as a counter example rather than a source of truth, and don't need google's patronizing filtering, so would prefer exact string matches. But apparently they know better than you.


Google is quickly becoming nothing more than a souped-up Yellowpages for online business.

I’d love to see an “anti-seo search engine” that eschews all results that are oriented around selling a product, but I don’t think it’s feasible to bring back the joy of finding a new online community/forum every day.


You don't even wanna know how many times specialized searches have saved my ass, after multiple years on uni, and working as a writer, journalist, programmer, en even a musician! You can safely say that my entire life revolves around being good at doing various forms of searches.


No doubt. Enjoying & (feeling like compared to others I was) excelling at finding information was what made me get interested in information science in the first place, but I often felt advances in ML and NLP would allow for anyone to find exactly what they wanted (which would be great) even considering the increasing amount of information to have to search through. Google’s ‘I’m going to ignore half the words in your search query’ seem to be moving away from that for whatever reason.


I have long believed that the art of precision search should be taught at the primary level. It is a necessary skill.


That idea is present in the Rainbows End by Vernor Vinge. Basically search engines and the use of discussion forums to find information becomes a subject in schools etc. Though this is not the main point of the novel at all.

https://en.wikipedia.org/wiki/Rainbows_End


Last week I blocked every * .google.* domain on my network except "youtube-ui.l.google.com".

Google Search: (1) ask a natural language question (since actual search is hobbled) (2) get unrelated garbage and ads back (3) blame yourself for "not being technical enough" to understand why the results aren't actually garbage.

Google Search has deteriorated to the point that so far I haven't missed it at all.


I've been slowly degoogling myself this year. For ~80% of my search, DDG has been entirely adequate.

I do miss some of Big G's cards, and their Maps is vastly superior to DDG's Apple Maps integration, even despite GMap's advertising. DDG's solution is wild, really: they use Apple for static-image-only maps with no real contextual interface, only a sidebar for search results. If you want directions, you must search for your destination by text alone, then in the sidebar choose to get directions from one of four providers (defaulting to Bing).

But when I just want an engine to match the text I give it (i.e. most of the time), DDG performs at least as well as Google's increasingly-fuzzy matching.


use OSM it's much better anyway


I'll miss browsing street view for fun. I doubt I use it more than six times a year, though.

For actual turn-by-turn navigation, I never used a Google or Apple product. I use an offline iPhone app (Navigon).


Google still good for coding related searches


Ever spent three minutes opening useless links from Google's Search results, only to realize they dropped the keyword you searched? That seems quite common now, especially with programming keywords, which are often obscure.

Remember Google Code Search, and Google (Usenet) Groups? Back then, Google cared about this stuff. Now they seem only to want to show you furniture ads, or get you to use their Zoom knockoff, etc.

These days Google substitutes the heck out of searches. Perhaps it's better if you've logged in, but I'd rather hack my leg off with a rusty saw than voluntarily log in to an account just to search the web.


This is also very common with DDG. And the strange thing is that many times, DDG will even ignore if you are using parenthesis. The parenthesis seem to add weight but I will still get tons of results that don't contain the required words at all.


Never seen this happen with DDG. Required words should be in quotes, not parenthesis.


To be fair, I suspect "coding related searches" are easy for any search engine, given

1. the immense online/open-source nature of the profession: every blog/forum question and answer/documentation since the origin of the profession being in plain-text and mostly publicly accessible by default

2. and it all revolves around a precise, limited vocabulary.


You get SEO crap very often though IME.


Depends on the platform and what you're looking for. Some operating systems and languages/ecosystems are worse than others. Windows stuff is largely incredibly bad (not saying Windows is bad, for this reason anyway, just that search results for anything MS-related tend to be awful). The nerdier the OS and less "corporate" the language, the better the results get.


The nerdier the OS and less "corporate" the language, the better the results get.

But don't get too obscure. Otherwise, you'll discover that Google has dropped the information you require from its index because it's not new or trendy enough.

If we can get Taylor Swift interested in the old internet, then Google will suddenly snap back into usefulness.


Depends on what coding-related search, I think. Searching for C is useless unless you know to search for clang, for example; but then you get results for the compiler. If you're trying to search for lesser known languages with short names or names that overlap with common words, then forget it! (Arguably that's a fault of the language, but arguably arguably you shouldn't have to choose what to name your creation based on Google.)


i think github code search is sometimes more helpful that google (when searching for specific examples in code)

github search has its own search operators:

https://docs.github.com/en/github/searching-for-information-...


What is it you are searching for that the results are useless?


For example a specific error string which is definitely in a github repo - it won't find it. I was trying to find why the error is printed. In the past, I used google to even find codes related to my interest (so I additionally specified language keywords like "void" "define" "float") - not working anymore. Fortunately github search works for that but it's slow. In the past, Google ignored "c++", amusingly now it doesn't ignore it but it has much much smaller index, it seems. Google is now useful only for "programming" queries like: why is c++ popular. And most results are then Quora / Stack Overflow / YouTube. Google is sadly a tool for masses and their simple queries about restaurants, actors, etc now. :(


Indeed. This seems like a bit overreacting. Google is lots of things, but a shitty search engine to the point of deserving being blocked is not one of them.


    > a shitty search engine to the point of 
    > deserving being blocked is not one of them.
Google's search quality isn't why I blocked Google. I've wanted to block Google for over half a decade, but the excellence of their search stopped me. That stopped being an issue this year.


What search do you prefer and why?


I prefer Google circa 2005, but DDG and Bing work better for me now.

I've never wanted anything fancy:

- don't show me paid search results - show me a blank page if there are no results - make it easy to 'AND' terms (+include +search +terms) - most importantly: search for my damned search terms! If you want to "did you mean" my spelling, fine. I don't really care. But it's unacceptable to ever drop a search term.

I have plenty of other complaints about Google, but in terms of search quality, those are the relevant ones.


I live two towns over from Dorking.

https://en.wikipedia.org/wiki/Dorking


Also weird for me to see the name here (I’m in the next village over), not one you see popping up often. I occasionally wonder how many other HNers there are scattered about in my local area (I suspect not many).


I grew up in Dorking, but this is the first time (that I can remember...) that I actually read its wikipedia article.

TIL: No one knows why 'Dorking' is called 'Dorking', but there's a English Place Names Society which since the 1920's has researched the origins of town names in England, and is considered [0] to be "the established national body on the subject".

[0] https://epns.nottingham.ac.uk/


The -ing part of a place name is Saxon, from memory, something like "ingas" meaning "people of". So, People of Dork, literally, and whatever became Dork (a person/place/event) is probably lost to historical memory. We have an awful lot of places with "ing" in them around here, probably due to a lot of C7 Saxon settlement.


Didn’t it feature in “War of the Worlds”?

My Dad worked for Mullard, which was renamed to Philips Electronics and relocated to Dorking.


Roots in the UK rocket industry that never was: https://en.wikipedia.org/wiki/Mullard_Space_Science_Laborato...

Another local here; 20 years in Horsham now Haywards Heath


Almost due south of Dorking, down the road in Horsham. The small town was the first thing I thought of on seeing this title. I'd imagine this area actually has a fair few HNers, as it's in the tech catchment area for London, Reading, etc.


I live just a few miles to the North. Nice to see a few other ~Dorking locals here :)


Slightly more than a few miles, but still pretty local (KT4)!


Also in the immediate vicinity. Who knew there were so many of us! (And only those who have seen this on a Sunday & bothered to comment.)

Perhaps a mini meet-up is in order? :)


This is reddit humor, that I sometimes miss here. Thx neilduncan.


This is a pretty common practice among SEOs for a variety of different reasons. They are also known as advanced search operators.

Ahrefs has a pretty comprehensive list here: https://ahrefs.com/blog/google-advanced-search-operators/


I think it would be useful to be able to explicitly search around knowledge graph entities or site topics, e.g. a programming language, a city, a season, without having that single/specific term.

So a search including all sites related to an entity, say Munich or python along with the terms the user is searching because a page might then not specifically include the entity in its keywords or the text on the site or have a different language or use a synonym.

I’m sure search engines consider this somewhat, but explicitly activating such a feature would be a great improvement for the user.

Stackexchange has this feature with tags (using []), with user curated tags. Would be nice to have in DDG or google.


I would just like to create my own groups. As another user said, tagging would probably be gamed by SEO companies, but if people could use their own groupings, that problem wouldn't occur. Their could even be curated lists out there of specific sites that fall within a general category. At the least, I'd like to be able to block sites from ever appearing in my results. I've used add-ons for that which work pretty well, but it should be built-in in my opinion


Businesses don’t game stackexchange.


Syntax for doing things like this with DDG:

https://help.duckduckgo.com/duckduckgo-help-pages/results/sy...


I'd switch to DDG in a half second if they supported the full query syntax of altavista.digital.com (see http://jkorpela.fi/altavista/ if you've forgotten). Disclaimer: I work for, um, Google.


I do wish they supported a larger search syntax. My current workaround is I have a massive bookmark folder of alternative search engines that I try if I'm not having luck narrowing things down enough.


That might make an interesting blog post


I'm still mid-setting up a blog, but I'll keep that in mind once I've got it up and running.

I'm afraid it probably wouldn't be that interesting to HN'ers though, because this is where I found most of them.


I'm pretty new to HN so I'd really find that post interesting – having them listed in one place would be useful.


If you want, you can shoot me an email and I'll send a link to a page where I have them listed already. Not a blog post, and not much info about them, but I have a list.

I just don't have a domain yet so I don't really want to go super public.


Sure thing, I've just emailed you.


Update: there is now a domain, the page can be found at

http://a-shared-404.com/other-stuff



I have a question for anyone reading this thread:

Do you believe you can get consistent results with any search?

For example, if we pick some uncommon search terms will we get the same results on the first search, the second search, the third, etc. Or will the results change?

I did a search with some terms from one of the comments in this thread, in quotes. The first search returned only one result: this thread.

As I searched the same quoted terms repeatedly along with additional terms, more results were returned that contained the exact string of original terms. Surprised by this, I tried a search with only the original terms, in quotes, once again. This time the search returned more than just the one result.


If it's specific enough, the SERP should stay the same until someone else publishes the same thing

e.g. the search of another article "set up Google Sheets APIs (and treat Sheets like a database)"

turns up my site and a couple Twitter threads talking about it (plus a phishing site which has scraped and republished it). I presume that will stay the same b/c it's such a specific title phrase (but not because searches are necessarily deterministic)


I am skeptical that quotes really work like the plus operator used to work.

For example, try searching the following string in quotes: "the SERP should stay the same".

https://www.google.com/search?q=%22the%20SERP%20should%20sta...

Now, the logical presumption, assuming Google works as people say it does, is that each result will contain that exact string. If no results contain the string, then you should receive no results.

However, for me, results are returned. Did each of the results from that search contain this exact string? For me, they did not.


When I run that query, it returns

`No results found for "the SERP should stay the same".`

Then defaults to providing the SERP for the fallback query:

`Results for the SERP should stay the same (without quotes):`

That SERP should change when this HN thread is indexed though


I must have missed the line "No results found" and the disclosure that Google has, by default, gone ahead and performed a "fallback query" without the quotes. Perhaps that is the goal. That I, the user, will not notice. If I wanted the fallback query's results then I would not have used quotes. This appears to be another another example of Google second-guessing the user. Perhaps they assume that the user who searches for an exact string with quotes would, in most cases, try the search again without the quotes if there are no results found.


google removes a shit loads of search results for anything related to torrents or porn, forcing me to go to other search engines that won't either censor or remove content for legal reasons.

Even that list of search engines are reducing now.


Dorking is not that easy to do, Google is very easy on assuming you are being malicious on certain queries, try one too many and you'll hit their dreaded captcha that is impossible to pass.


That really angers me, and I've tripped it more times than I can count, usually by searching for very specific things. Coworkers have also run into it multiple times (before everyone started working from home, we would exclaim "Fuck you, Google!" and raise a middle finger to the screen, which was a cue to everyone else to help).

The fact that they think you're "not human" when you use a search engine for its intended purpose and show how much you know how to use it is both disturbing and saddening. I wonder if Google's own employees run into it and/or the continuing degradation of results, or if they're somehow given immunity and a much better set of results...


I’m curious about this. Can you give an example of the kind of query you are talking about where Google assumes you are a bot and not a human?


I get them all the dang time

Suppose I want to search for a textbook in an open directory. This query will get flagged almost immediately:

    textbook -inurl:htm -inurl:html -inurl:asp intitle:"index of" +(epub|mobi|pdf|txt)
Suppose someone says to me "no one in this forum believes (dumb conspiracy theory from Obama-times)" or "the lamestream media never reported on X." Naturally I will do a few quick iterations of

    conspiracy related "terms" site:someforum.com before:2017-01-01
which will trigger a humanity check if you iterate too quickly.

I got hit recently when the news about Tucker Carlson's racist writer came out. I set out to find the full threads in which the offending comments were made. Iterating through combinations of text, usernames, and urls like

    "full text of offensive post" username before:date-news-broke -inurl:news
got me checked every 3rd search or so.


You could always keep Google Chrome open to rerun these specific queries, as the captchas are less irritating to solve then.


I would sooner switch to DDG, Bing, or Yahoo, or even just change my IP, than use Chrome.

The ban seems to expire after a short time too.


Back when I was a teenager,I had a book titled "hacking with Google" by Johny long that was basically all specific searching tips and terms (oriented to find open vulnerabilities and the like, but still very useful in general despite the tacky name).

I wonder how much of it is still valid after all this time.


Back when I was a teenager, I had a slide rule. I can guarantee that a slide rule is still valid, so long as you're not interested in more than two or three significant digits, and you don't want to add or subtract.


I've always heard the search terms as "Google dorks", but never heard it called "dorking".. Seems Google articles on the subject prove me wrong.


Why doesn't google.com have a comprehensive list of these? I'm constantly seeing new ones that I didn't know about, but google never teaches you about them so you have to find them in obscure blog posts


https://support.google.com/websearch/answer/2466433?hl=en but it's not complete. My favourite is actually the "range" operator. I don't need it often, but when combined with the exact match quotation marks, it's great. For example, here's a search for Sony bluetooth headphones available on Amazon.ca for between CA$100 and $150: https://www.google.com/search?rls=en&q=site%3Aamazon.ca+%22C...

The range operator also works great with years, dates, though the Tools menu with shortcuts for before: and after: operators can help there too.

One I haven't seen mentioned yet but used to be documented is that you can leave out words in a phrase by replacing them with an asterisk. I'm having trouble not italicizing text in this comment box, so pretend \* means a single asterisk: "Stocks rose today by \* percent" as a search matches the phrase "stocks rose today, led by a 4.4 percent". (Which until this post, had only one result on Google.)

Note that it's not 100% exact matching, because for actually exact matches you have to select "Verbatim" under Tools > All Results in the menu below the search box on the results page.

The only downside to using all these operators is that you'll get very familiar and frustrated with the Google reCAPTCHA prompts as your search is "too precise to be human". Even when signed in to Google, especially often in Safari on an iPhone. Sigh.


You can use three asterisks in a row, surrounded by whitespace, to get a single asterisk like: "Stocks rose today by * percent".

Oddly, this results in a non-italicized asterisk in the output, contrary to reports in earlier comments that the resulting asterisk would be in italics. There is, however, a zero-length italicized string right before the asterisk in the HTML:

    "Stocks rose today by <i></i>* percent".


> Oddly, this results in a non-italicized asterisk in the output, contrary to reports in earlier comments that the resulting asterisk would be in italics. There is, however, a zero-length italicized string right before the asterisk in the HTML:

> "Stocks rose today by <i></i>* percent".

Sounds like the matching is something like

    /\<\*.*\*/
or maybe

    /\<\*[^*]*\*/
rather than

    /\<\*.+*/


Is there actually a page that says "too precise to be human" or are you just assuming this is what triggered the reCAPTCHA?

If there is such a page, can you give an example query that would trigger it?


It mostly happens using "site:" queries which I use frequently to limit things to local websites (by domain) or for searching sites that have poor search engines (Amazon, for example). It rarely happens the first query, but often by the third or fourth modification or by the third or fourth page of results you visit, it will show a reCAPTCHA if it doesn't have enough "randomness" or doesn't think you're actually browsing Google and third-party sites the way others commonly do. (Robots are more likely to use search operators, for example, and more likely to pretend to be iPhones so they don't have to move the mouse, etc.)

My earlier query triggered it. Without a query, I can make the following text show up by going to https://www.google.com/sorry/index which when a relevant query is attached to the URL, it shows a reCAPTCHA for the search query, and also shows your IP address, etc.

> About this page

> Our systems have detected unusual traffic from your computer network. This page checks to see if it's really you sending the requests, and not a robot. Why did this happen?

If you click the link "Why did this happen?" it says:

> This page appears when Google automatically detects requests coming from your computer network which appear to be in violation of the Terms of Service[1]. The block will expire shortly after those requests stop. In the meantime, solving the above CAPTCHA will let you continue to use our services.

> This traffic may have been sent by malicious software, a browser plug-in, or a script that sends automated requests. If you share your network connection, ask your administrator for help — a different computer using the same IP address may be responsible. Learn more[2]

> Sometimes you may be asked to solve the CAPTCHA if you are using advanced terms that robots are known to use, or sending requests very quickly.

[1]: https://www.google.com/policies/terms/ [2]: https://support.google.com/websearch/answer/86640

The annoying part is that my account has never been whitelisted based on good behaviour. Instead, I end up seeing such reCAPTCHAs thousands of times a year, to the point where I stop counting them. Roughly half the time I'll answer the reCAPTCHA and the other half of the time, I'll close the tab and go do something else. Cloudflare site loading captchas are even worse, though. They delay the site by 5 seconds while they "check my browser", and then show an hCAPTCHA to solve, even when I'm already signed in with the first-party site. Very annoying, though the captcha is often easier to solve than Google's. The Cloudflare block often on streaming media websites. Ironically, Cloudflare's captchas have never prevented me from using commonly available Python scripts to watch streaming flash videos in VLC, they only block my web browsing...

I can only assume that Safari's excellent ad blocking and tracking prevention is causing my browsing traffic to stand out compared to others', enough that it prompts these CAPTCHAs more frequently.


You could try your site:amazon.ca specific queries on DDG. DDG will allow you to iterate your queries without triggering a reCAPTHA. When you believe you have the best results you can get on DDG, then try that same query on Google. This way you avoid doing the iteration on Google and triggering the reCAPTCHA. Instead you just do one search on Google. Google will usually give some additional results that were not shown in DDG.


That's a good idea, yes, I'll have to try DDG if I hit a reCAPTCHA: Most of my searches don't start with me using advanced operators (except if I'm buying products in Canada because I'll generally use "site:ca" term then), I generally add the advanced operators when a basic search isn't working. And the reCAPTCHA doesn't kick in right away, so I'm not always reminded of the issue. But if I get hit by a reCAPTCHA, I might try DDG now instead. Thanks for the tip!


Google randomly ignores "search term in quotes".

Related:examplesite.com used to work well. Now, it's better to use sites like alternativeto.net.

~phrase is unnecessary because but google searches for synonyms by default

phrase1 + phrase2 - Google randomly ignores it. I use it this way +compulsoryTerm

Although rare, there are things I simply can't find using Google. But Bing would. If Google keeps it up, other search engines would benefit.


I would be interested to see an example of it ignoring quotes silently because I've heard a lot about it. I use search terms in quotes relatively often and have never noticed that, although it does the 'did you mean without quotes' thing all the time.

In the past for very long tail content, I've found Bing and Yandex to be useful. Yandex image search in particular is often better than Google or Bing, particularly if you are searching for people because it does some facial recognition.


Doing some "related:" queries returns some interesting results that look human-curated and out-of-date. related:google.com shows results for Yahoo, Bing, AOL Search, and HotBot (which used to be a search engine, but the brand is now for a VPN provider).


That is great! It's like a search engine blast from the past.

Also interesting, related:bing.com gives me no results.


Ah, I never knew "related:" existed. Also, saashub.com could be used as an alternative to alternativeto.net :)


Having a reliable search syntax would commoditise Google as other search engines could offer the same options. Having just a search box, instead of lots of options was how they moved ahead of e.g altavista in the first place.

Google would rather people are trained to just type human speak into the search box.


> would commoditise Google as other search engines could offer the same options

i highly doubt that's a concern. Google's competitive advantage will not be eroded if they did have operators clearly documented. Another search engine could not replicate google's index, even if they could replicate the operators.

And most people do want to just type human speak and have the machine magically interpret it correctly.


> Why doesn't google.com have a comprehensive list of these?

It is quite obvious that google does not give a s&it whether I find what I think I want to find. Google is much more interested in 1) serving me ads they think are most profitable and 2) giving me results they think I want.


I'm pretty sure your (1) is correct. I think your (2) should be giving you results they think you should want, as in you shouldn't want to search for a conspiracy theory (even if your goal is to see what the other side is thinking).


My guess would be because Google eventually wants users to find everything via natural language queries.


Actually, Google eventually wants users to find everything with predictive AI giving it to them before they search. That's not really a secret, they've announced more than once in the past that that is what they are increasingly working toward.


That would be great for malware researchers. Google can give them malware before they even search for it!

The reality is that all sorts of things are blocked now, including things that are perfectly legal.


Reminds me of a story I once read about feeding to an optimizing compiler a program to print out the digits of pi.


One reason they might not have a comprehensive list is because some might be relatively expensive to execute, but they can't/won't disable them for legacy reasons.


> One reason they might not have a comprehensive list is because some might be relatively expensive to execute, but they can't/won't disable them for legacy reasons.

Ah, Google, always so reluctant to get rid of anything legacy because of their fanatical devotion to their existing user base.


At least now we are somewhat more empowered to find obscure blog posts. Which raises the suggestion that hackers are advantaged towards finding information. Which raises the suggestion that we should take the independent initiative of using SEO to inform more people about how to become search super-users.


Yea, almost all existing Dorking content is around either SEO or offensive/defensive security scanning.

The broader OSINT has used dorking for a decade plus (e.g. exploit-db.com goes back to 2003).

The goal in writing this was to democratize access beyond those who use it regularly for work


Worth pointing out if you do some of these crafted operator searches quite quickly, you'll end up getting blocked or having to complete a captcha. I haven't done so in a while so I'm not sure what their current behaviour is.

Main reason being there's plenty data mining, e.g. looking for "powered by wordpress" and vulnerable versions, and generally all kinds of data mining that involve very specific requests for information, likely queries that aren't creating revenue, either.


The - prefix operator is very useful and still works.

Google should reinstate the + prefix operator. It was only taken out because it screwed up the search results for Google+, which is dead now.


I find myself having to use the "-" prefix a lot these days.


I love the “inject JS into the page to find stuff” hack. The author mentions local “site you are on” but this can be applied with headless chrome to crawl many sites.


That's web scraping 101


Fun fact: googling for -273.15 without double quotes produces no results.

You need to quote negative arithmetic values when searching, even if there are no other query parameters. It made me wonder if I was misremembering absolute zero.


Oh, probably because it interprets it as a logical negation; not "negative X", but "remove X from results".


Why is this called "dorking"? "Dorking" is a word that just means using search engines to find very specific data? This seems bizarre to me. Why does this need a special word?

Or it actually means using search operators beyond natural language entry? That's what this page seems to be about? I don't know why that would be called "dorking" either?


It all started with a def con talk if I remember correctly.

https://youtu.be/N3dzVl40lQA


A very comprehensive and frequently updated list is here: https://www.exploit-db.com/google-hacking-database


All I want is the ability to search for symbols. Symbolhound.com is the only site I've heard that will support that, but it leaves a lot to be desired.


It’s strange to me that more domain-specific search engines haven’t been created. There must be value in a programmer-specific search engine for instance. Or why aren’t there search engines that specialise in news, social media, Q&A websites or events, to give a few examples.


Wow, I hadn't heard of that. I need that kind of search a couple times a week. It may leave a lot to be desired, but it's like democracy: the worst possible thing of its class, except for all the rest.


Learn to use time. It's a drop down.

The web is slowly atrophying. Going back in time for originals makes a big difference.

Reverse is also true.

After a blow up the mass media will repeat the same thing on mass and swamp results.

Often an article in the last hour might have what you want, like the database link they are all talking about.


Don't you just love it when you're carefully crafted search finally displays the words or phrases you want in the snippet on the results page but then when you actually open the link and CTRL+F for it it's nowhere to be found? Not even in the raw HTML?

I sure do.


There's a related thing you can do. If you have web pages somewhere, create a bunch of blank web pages with just one random word on them (something like "ristordshest") and then create an index page that links to them all.

Then link to that index page somewhere where noone except web crawlers will notice it. Then wait a few weeks.

Now when you

a) sell something on eBay where you are not allowed to link to the product support page page or some other stupid restriction like that

b) want to promote something on Instagram where you can't link to it

Ask people to google for the search term. There will be only one result: Yours.


the "link:" operator doesn't work for me--it just seems to include the URL's tokens in the search


Pretty sure that one is deprecated. It was very useful for SEO research, which is probably why it doesn’t work anymore.


Hmm, I'm seeing the same now.

You can more or less replicate the functionality with intext:specific.url/subsite

Will update and credit you.


A few thoughts:

1) Great information!

2) It seems like the world could use a book like Joe Celko's "SQL For Smarties", but for search engines. Yes, there are such books already, most notably O'Reilly's "Google Hacks" by Rael Dornfest, Paul Bausch, Tara Calishain -- but I think the world could still use a book covering more search engines and search techniques. The above web page would be a great starting point to an endeavor like that.

3) "Dorking" (love that term!) -- is going into my 2020 vocabulary lexicon! <g>



I do. Alec's blog entry is outright pathetic in comparison. It does not even scratch the surface of fravia+'s treasure chest.


Is there any way to search the actual page text? I find that often I remember some unique turn of phase from the page that I'm looking for and it would be extremely helpful to be able to simply search for that.


`intext:phrase` and `allintext:multi part phrase`

generally "phrase" works well too


Thank you!


Does filetype: still work? I'm getting zero hits for example filetype:epub


It still work but some file type never return anything, I have the same problem with epub, pretty sure it's some google's shenanigan about books piracy.

https://support.google.com/webmasters/answer/35287?hl=en


That explains why essentially no DejaVu files show up. I was shocked to test out a .djvu search and find almost nothing.


Maybe Google doesn’t index epub at all? I think I never saw one in search results.


Well, I may have become crazy but i have vivid memory using it in the past, and some websites even refer to this specific query ( https://ebookfriendly.com/google-search-tips-books/ )


For some file extensions yes, for others no or not anymore (eg .js)


I'm kind of surprised to see Google brought back the + operator. I remember they prominently changed its meaning when they made it the @ of Google+, and I never bothered to check again after it died.


As a teenager, I used to search for "Index Of <movie name>" for movies. 2/3 times, I was able to find and download the movie I wanted to watch.


I think I should rename filechef.com to dorkchef now


The email specific queries don't appear to work. The "@" is ignored by google so you just get results for the domain string.


The first two appear to still work, but the third does not.

The permutation searches are tricky because you don't know if a lack of results means the email does not exist, or just hasn't been posted anywhere indexed

Will update and credit


This reminds me of an article I once read about the neat tricks that used to exist in altavista.com search engine


Effective Google-foo is one of the first things I teach my first year students. Few greater life skills exist.


NLP and to a lesser extent SEO has vastly diminished the value of this type of searching.


It would be useful to use “Dorking”, even for non-programmers.Good article!


https://www.google.nl/search?q=site%3A+news.ycombinator.com+...

i use to use these a lot but now it's just useless


You need to remove the blank after the colon.

https://www.google.nl/search?q=site%3Anews.ycombinator.com+l...


Try wrapping lizardmancan in quotation marks - "lizardmancan". That narrows it down to 10 results for me

(also: you'll want to remove the space between site: and news.ycombinator.com)


Prediction: Using the methods of "dorking", this is the only page on the internet among 10 million+ results that is calling this "dorking".


I hope it doesn’t catch on since it makes me die a little inside. It’s a very Reddit-type word though. I can easily imagine it being used by non-technical folk and tech journalists.





Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: