Hacker News new | past | comments | ask | show | jobs | submit login
Sherlock: Find usernames across social networks (github.com/sherlock-project)
172 points by Sn0wlizz4rd on Nov 30, 2019 | hide | past | favorite | 83 comments



CODE_OF_CONDUCT.md

Examples of unacceptable behavior by participants include:

Publishing others' private information, such as a physical or electronic address, without explicit permission

How is this tool not a violation of its own CoC?


Interestingly enough, the tool is under the MIT license.

> Permission is hereby granted... to deal in the Software without restriction, including without limitation the rights to use


From what I understand, the code of conduct is supposed to limit the participation in the community around the software. Issue tracking, support groups etc. If you don't adhere to it, you're free to fork and make your own github repo, discourse, IRC channels or whatever.


User profiles on publicly available websites are not private information.


The vast majority of users do not expect that their digital identity could be stitched together so easily. We can either imply that they deserve this for imagining they had any privacy, or we could take a normative stance and make it difficult to reveal someone’s private information. We have a choice. We can pass laws and ban this nonsense.


If they use the same username on a public site they should expect it. This is common sense. Then again, this is the same world where people voluntarily upload their entire lives to social networks, then complain about "privacy."


The scale makes the difference. Similarly, you don’t expect your appearances in public spaces to be private, but you would likely object if someone could arbitrarily pull video recordings of you and trace your movement through the day.


I bet most of them use the same password with that same username too.


What private information? This links to public profiles.

I would never register an account on Twitter, put information in my public profile, then consider that to be private information that people should not know about. And if I did do that, I did that, this tool did not do that. And the "privacy" is already breached regardless of whether this tool exists or not. That definition of "privacy" is incoherent. Something deliberately shared in a public profile is not private information by definition.


Sorry, but what people expect doesn't matter. What is true is that digital identities are intrinsically linked between sites where the same username is used.

You can't hide behind, "but no one expected that" and expect to live a happy, unsurprising life. There is no law you can pass that will fix this, no matter how much you complain.


But people might be deceived into publishing their information. For example, social networks motivate them to do it by saying things like "make it easier for your friends to find you" or "share photos with your friends" and not saying the truth that the information will be accessible to criminals, marketing companies, corrupt governments, and weird mentally ill people from anonymous Internet forums.


1920: "Don't talk to strangers, sonny". 2020: "Don't say too much about you online, sonny".

I thought the last one was common sense in the 2000 actually. But apparently it was lost together with the netiquette.


"such as an electronic address"


so what is an electronic address?


Yet it's still user's data, and if it's under GDPR article 14 fall into place (you have to inform them that you are using that data, how, for how long, ...)

https://gdpr.eu/article-14-personal-data-not-obtained-from-d...


....if you are an "organization" engaging in "professional or commercial activity." Doxing people for fun isn't covered by GDPR.


I think it comes down to the definition of "publishing".


This tool doesn’t publish anything at all. It runs locally.


The CoC is for contributing and working on the project not for using the tool. And having made this they probably don't think batch searching for public profiles is bad or doxxing.


It’s okay to use this to stalk someone as long as you keep secret the fact that you’re doing so.


I don't get this? It checks against a limited list of websites if a username is taken. So what? This is hardly doxxing or "smart". Simply a faster method than the manual way, except it doesn't exhaust all avenues of search.


I'm surprised I had to scroll this far for this comment, which was exactly my thought after viewing the Github page.

From other comments here I assumed it did something sinister/sophisticated (matching photos, avatars or even text analysis) to try and tie a poster at one site to another site.


OP @sn0wlizz4rd, it’s particularly odd that you’re using a different username on HN than on GitHub, given what you’ve made. Could you tell us why you created this project?


In context of the project, that doesn't seem odd at all, does it?


I don’t understand which of the several possible implications you’re leaving unstated here. Could you say more directly what you mean?


avoiding linking usernames on different sites...


Okay, that’s a fine guess. The question I asked - “OP: Why?” - remains unaddressed.


OP: understands how easy it is to find accounts of the same username and writes tool as a result

OP: uses different usernames, for the same reason

I'm not sure what is "particularly odd" about this? Also, you have a very accusatory tone.


The question was for OP for a good reason. I’m not feeling accusatory, just disappointed in y’all. I received lots of “speculating about OP’s thoughts and feelings” replies that have not added anything useful to the discussion beyond the speculation that led me to ask OP to clarify in the first place.


Things like these existed for over a decade - they are for squatting your brand name across all social media sites. Pretty handy!


There is already a site like this... that I used years ago ... https://namechk.com/

And no installation needed.


Here's another: https://knowem.com/


what use does this tool have outside doxxing


I just ran it against my own (typical) nickname and found a couple of instances where it was free on websites that I could see myself using at some point in the future so I went ahead and reserved them for myself.


So, it’s useful for both doxxing and namesquatting. What a marvellous invention.

Maybe the twitter mob could make itself useful for once...


This tool as for namesquatting has been around for over decade. I don't see a problem if you are setting up a brand and wan't to pick one that's free across the board (or make sure someone wouldn't over-squat you).


If your justification consisists of 'the other guy might do it first and reap the benefits', maybe you should rethink how you apply ethics in everyday life.


By 'reap the benefits' did you mean 'extort you based on your brand value'? Sure, there are laws now that prevent domain name squatting, and most platforms would give your brand to you anyway (provided you are big enough). But why go thru expensive hassle when you can solve this proactively?

All companies do that by registering their trademarks. I don't see how this is different.


Picking a project name that doesn't clash with something that already exists.


Good for intel if you're on a redteam.


Promoting yourself / your personal brand?


Apologies for this off topic reply but in an old thread you asked me about a particular episode and it's old enough I couldn't reply there. Here it is: https://www.imdb.com/title/tt0647497/


It helps you screen for yahoos—priceless when you’re hiring!


Finding old accounts that one forgot to delete.


its good for demonstrating the dangers of doxxing


If someone has no email listed on their github or twitter and doesn’t have open DMs but you need to contact them privately for whatever reason (eg a security issue in a repo they control).

GitHub doesn’t have private messaging.

There are tons of legitimate uses. The point of a username is to identify someone.


If you really want to email a GitHub user, find any of their commits and look at the raw details (hint: add “.patch” to the URL). The email address will be in the author field.


By username? is this AI that everyone is talking about?


What’s a good use for this tool?


Researching people you are considering entering an asymmetric power relationship with under limited information situations, e.g., new bosses and landlords.

Would-be dating partners are another good use.

Also, searching for a way to contact someone. Reporters looking to contact sources, or anyone trying to find someone who offered a rare item or service for sale a while back.


Honestly, I’m not a fan of this kind of software or websites like Whitepages.com (which should be illegal). It should be hard to find someone’s personal information if they don’t freely divulge it. If they created a username to speak freely, with no ill intent, then the dev is acting against their will by making it trivially easy to triangulate their identity. Any benefits that the software may have pale in comparison to the doxxing, spying, and voyeurism that it may enable.

If this is truly for public benefit, then the dev should at least include an opt-out mechanism.


> it trivially easy to triangulate their identity

And this is good because it removes the illusion that they can speak freely and saves them from repercussions coming from eventual de-anonymization.


Freedom of speech lies on a spectrum. You can currently speak your mind on the internet, with little fear of the virtually non-existent threat of de-anonymization. That is a good thing for the well-intentioned majority. Any marginal increase in fear of speech reduces freedom of speech, so making de-anonymization easier doesn’t “teach ‘em a lesson.” It just kills something beautiful, something that must be protected. Don’t get me wrong, I don’t think this tool alone would do that, but it’s another step in that direction.

What next? A publicly available ML web crawler that analyzes speech and belief patterns, triangulates them with metadata, and returns someone’s identity with 99% confidence? I’m not naive enough to believe the government won’t build that, but for that to be freely available is just a recipe for chaos. It should be illegal, and it should be illegal to distribute many intermediate tools as well.


Using different usernames across various sites is a requirement if you don't want to be de-anonymized. It's already trivial to do so, even without tools like this.


> What next? A publicly available ML web crawler that analyzes speech and belief patterns, triangulates them with metadata, and returns someone’s identity with 99% confidence?

Close but no cigar.

But seriously, correct me If I am wrong but if I were to propose a solution to defeat this `ML web crawler' confidence in it's results, then the first thing I would do is to feed the internet with invalid data. How does ML-based analyzers deal with this kind of data set?


Train it on content that you know was produced by someone, then unleash it on the rest of the internet to find content that they wrote anonymously.


> Train it on content that you know was produced by someone, then unleash it on the rest of the internet to find content that they wrote anonymously.

I apologies but that's disappointingly vague..


I do not have direct ML experience at the moment. I’ve only run simpler regressions. I was making an educated guess as to the tech which may emerge, and it doesn’t seem far-fetched since it probably would not require language comprehension or higher order reasoning. I would love someone who’s worked with ML to weigh in.

The way I see such a thing working is that you train it to identify what makes your writing unique . Everyone has a highly esoteric writing style, akin to a thumbprint.

An ML optimizing for uniqueness can identify:

- Relative frequency of certain words

- Diction

- Written tics

- Distance between certain words that the author tends to cluster together

- Mean clauses per sentence, clause variance

- Symbol usage

- Affect

- Interests

and abstract patterns that we haven’t even recognized yet. You can limit the search space at first by pointing the algorithm at certain websites and sub-sites that you’re fairly certain the person uses, but eventually I think even that will not be necessary.


> I do not have direct ML experience at the moment. I’ve only run simpler regressions. I was making an educated guess as to the tech which may emerge, and it doesn’t seem far-fetched since it probably would not require language comprehension or higher order reasoning. I would love someone who’s worked with ML to weigh in.

Aha. Indeed. I would prefer to have someone actively working in ML to weigh in as well.


It’s called styleometry and it’s a very real thing, and can be done with very high accuracy. There are academic papers on it (just search Google scholar for stylometry). Do not expect things you post under another username to be never be linked to you. There’s just no publicly available tool to run it.


OK: it's fairly trivial for someone like me to do this, and you don't even need "ML" to do it. It's just counting, binary trees and simple models.


> OK: it's fairly trivial for someone like me to do this, and you don't even need "ML" to do it. It's just counting, binary trees and simple models.

Would you be interested in PoC'ing it out for such a trivial project?


Absolutely not. Just because something is easy to do doesn't make it a good idea.


>You can currently speak your mind on the internet, with little fear of the virtually non-existent threat of de-anonymization

I don't think you can, not without special means.


There's no reasonable way to opt out of information that's already out there. Connecting the dots is power that powerful people already have; this is democratizing that power.


Most of the people that might stalk, harass, or threaten you aren’t in traditional seats of power. This tool at the very least makes the ability to connect the dots more salient to bad actors, and allows them to do so with industrial efficiency.


Why should whitepages.com be illegal?

Phone companies used to print hard copies of the white pages and hand deliver it to everyone’s house, and the biggest concern anyone ever had about them was the volume of waste they generated.


Conversely, new bosses and landlords who want to avoid choosing someone who is objectively a bad choice for everyone else involved (they say they're a full-time carpenter but they're really a weed dealer/bassist in a progressive death metal band, etc.).

Not saying this is a net positive for society by any means, but just another example of a potentially legitimate use for doing online research about someone.


Sorry, I don't understand how someone's choice of profession reflects on their ability to be a good tenant. Definitely not a reasonable excuse for snooping on someone's supposed Internet history. This is what leads to first come first serve regulations for rental applications.


Well, I did have specific reasons for those examples (besides comedy value). Weed dealer tends to bring sketchy and dangerous people to the property. (If you don't buy that, try "heroin dealer" instead.) And I also stipulated that they lied about it. These are all things that happen all the time.

And then as far as "bass player," that's just a proxy for low income; I also picked a hyper-niche genre that even most metal listeners don't enjoy. If he can prove income via sending copies of contracts with music venues or something, more power to him.

For good or bad, this all stuff that could plausibly be used as an accurate source of information about someone's ability to pay and to not be a problem to the other tenants--especially the lying part.


You can do all of this with a Google search.


Reminding people that they need to be more careful about the tracks they leave?


OSINT research.


Does not allow for spaces in usernames, only searches for the first word before a space.


Can you please mention a site that sherlock supports where an username with space is allowed?


Good point. I hadn't realized that this is not all-inclusive of social media, just the top ranking sites.


I thought it was the opposite (which is something I'd actually be interested in). That is, finding usernames which aren't present in any social network.


Hmmm, how would you do that? Generate random username, try all networks, if it's free an all of them add it to a list, rinse and repeat?


Looks like a good business model now, but one day Apple will replace it with their own free product, and where will you be then?


Sherlocked :-)


> Sherlocked :-)

Hehe


So this is spokeo, pipl, beenverified, etc but I have to install it on my system?


it has to get its data from somewhere...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: