From what I understand, the code of conduct is supposed to limit the participation in the community around the software. Issue tracking, support groups etc. If you don't adhere to it, you're free to fork and make your own github repo, discourse, IRC channels or whatever.
The vast majority of users do not expect that their digital identity could be stitched together so easily. We can either imply that they deserve this for imagining they had any privacy, or we could take a normative stance and make it difficult to reveal someone’s private information. We have a choice. We can pass laws and ban this nonsense.
If they use the same username on a public site they should expect it. This is common sense. Then again, this is the same world where people voluntarily upload their entire lives to social networks, then complain about "privacy."
The scale makes the difference. Similarly, you don’t expect your appearances in public spaces to be private, but you would likely object if someone could arbitrarily pull video recordings of you and trace your movement through the day.
What private information? This links to public profiles.
I would never register an account on Twitter, put information in my public profile, then consider that to be private information that people should not know about. And if I did do that, I did that, this tool did not do that. And the "privacy" is already breached regardless of whether this tool exists or not. That definition of "privacy" is incoherent. Something deliberately shared in a public profile is not private information by definition.
Sorry, but what people expect doesn't matter. What is true is that digital identities are intrinsically linked between sites where the same username is used.
You can't hide behind, "but no one expected that" and expect to live a happy, unsurprising life. There is no law you can pass that will fix this, no matter how much you complain.
But people might be deceived into publishing their information. For example, social networks motivate them to do it by saying things like "make it easier for your friends to find you" or "share photos with your friends" and not saying the truth that the information will be accessible to criminals, marketing companies, corrupt governments, and weird mentally ill people from anonymous Internet forums.
Yet it's still user's data, and if it's under GDPR article 14 fall into place (you have to inform them that you are using that data, how, for how long, ...)
The CoC is for contributing and working on the project not for using the tool. And having made this they probably don't think batch searching for public profiles is bad or doxxing.
I don't get this? It checks against a limited list of websites if a username is taken. So what? This is hardly doxxing or "smart". Simply a faster method than the manual way, except it doesn't exhaust all avenues of search.
I'm surprised I had to scroll this far for this comment, which was exactly my thought after viewing the Github page.
From other comments here I assumed it did something sinister/sophisticated (matching photos, avatars or even text analysis) to try and tie a poster at one site to another site.
OP @sn0wlizz4rd, it’s particularly odd that you’re using a different username on HN than on GitHub, given what you’ve made. Could you tell us why you created this project?
The question was for OP for a good reason. I’m not feeling accusatory, just disappointed in y’all. I received lots of “speculating about OP’s thoughts and feelings” replies that have not added anything useful to the discussion beyond the speculation that led me to ask OP to clarify in the first place.
I just ran it against my own (typical) nickname and found a couple of instances where it was free on websites that I could see myself using at some point in the future so I went ahead and reserved them for myself.
This tool as for namesquatting has been around for over decade. I don't see a problem if you are setting up a brand and wan't to pick one that's free across the board (or make sure someone wouldn't over-squat you).
If your justification consisists of 'the other guy might do it first and reap the benefits', maybe you should rethink how you apply ethics in everyday life.
By 'reap the benefits' did you mean 'extort you based on your brand value'? Sure, there are laws now that prevent domain name squatting, and most platforms would give your brand to you anyway (provided you are big enough). But why go thru expensive hassle when you can solve this proactively?
All companies do that by registering their trademarks. I don't see how this is different.
Apologies for this off topic reply but in an old thread you asked me about a particular episode and it's old enough I couldn't reply there. Here it is: https://www.imdb.com/title/tt0647497/
If someone has no email listed on their github or twitter and doesn’t have open DMs but you need to contact them privately for whatever reason (eg a security issue in a repo they control).
GitHub doesn’t have private messaging.
There are tons of legitimate uses. The point of a username is to identify someone.
If you really want to email a GitHub user, find any of their commits and look at the raw details (hint: add “.patch” to the URL). The email address will be in the author field.
Researching people you are considering entering an asymmetric power relationship with under limited information situations, e.g., new bosses and landlords.
Would-be dating partners are another good use.
Also, searching for a way to contact someone. Reporters looking to contact sources, or anyone trying to find someone who offered a rare item or service for sale a while back.
Honestly, I’m not a fan of this kind of software or websites like Whitepages.com (which should be illegal). It should be hard to find someone’s personal information if they don’t freely divulge it. If they created a username to speak freely, with no ill intent, then the dev is acting against their will by making it trivially easy to triangulate their identity. Any benefits that the software may have pale in comparison to the doxxing, spying, and voyeurism that it may enable.
If this is truly for public benefit, then the dev should at least include an opt-out mechanism.
Freedom of speech lies on a spectrum. You can currently speak your mind on the internet, with little fear of the virtually non-existent threat of de-anonymization. That is a good thing for the well-intentioned majority. Any marginal increase in fear of speech reduces freedom of speech, so making de-anonymization easier doesn’t “teach ‘em a lesson.” It just kills something beautiful, something that must be protected. Don’t get me wrong, I don’t think this tool alone would do that, but it’s another step in that direction.
What next? A publicly available ML web crawler that analyzes speech and belief patterns, triangulates them with metadata, and returns someone’s identity with 99% confidence? I’m not naive enough to believe the government won’t build that, but for that to be freely available is just a recipe for chaos. It should be illegal, and it should be illegal to distribute many intermediate tools as well.
Using different usernames across various sites is a requirement if you don't want to be de-anonymized. It's already trivial to do so, even without tools like this.
> What next? A publicly available ML web crawler that analyzes speech and belief patterns, triangulates them with metadata, and returns someone’s identity with 99% confidence?
Close but no cigar.
But seriously, correct me If I am wrong but if I were to propose a solution to defeat this `ML web crawler' confidence in it's results, then the first thing I would do is to feed the internet with invalid data. How does ML-based analyzers deal with this kind of data set?
I do not have direct ML experience at the moment. I’ve only run simpler regressions. I was making an educated guess as to the tech which may emerge, and it doesn’t seem far-fetched since it probably would not require language comprehension or higher order reasoning. I would love someone who’s worked with ML to weigh in.
The way I see such a thing working is that you train it to identify what makes your writing unique . Everyone has a highly esoteric writing style, akin to a thumbprint.
An ML optimizing for uniqueness can identify:
- Relative frequency of certain words
- Diction
- Written tics
- Distance between certain words that the author tends to cluster together
- Mean clauses per sentence, clause variance
- Symbol usage
- Affect
- Interests
and abstract patterns that we haven’t even recognized yet. You can limit the search space at first by pointing the algorithm at certain websites and sub-sites that you’re fairly certain the person uses, but eventually I think even that will not be necessary.
> I do not have direct ML experience at the moment. I’ve only run simpler regressions. I was making an educated guess as to the tech which may emerge, and it doesn’t seem far-fetched since it probably would not require language comprehension or higher order reasoning. I would love someone who’s worked with ML to weigh in.
Aha. Indeed. I would prefer to have someone actively working in ML to weigh in as well.
It’s called styleometry and it’s a very real thing, and can be done with very high accuracy. There are academic papers on it (just search Google scholar for stylometry). Do not expect things you post under another username to be never be linked to you. There’s just no publicly available tool to run it.
There's no reasonable way to opt out of information that's already out there. Connecting the dots is power that powerful people already have; this is democratizing that power.
Most of the people that might stalk, harass, or threaten you aren’t in traditional seats of power. This tool at the very least makes the ability to connect the dots more salient to bad actors, and allows them to do so with industrial efficiency.
Phone companies used to print hard copies of the white pages and hand deliver it to everyone’s house, and the biggest concern anyone ever had about them was the volume of waste they generated.
Conversely, new bosses and landlords who want to avoid choosing someone who is objectively a bad choice for everyone else involved (they say they're a full-time carpenter but they're really a weed dealer/bassist in a progressive death metal band, etc.).
Not saying this is a net positive for society by any means, but just another example of a potentially legitimate use for doing online research about someone.
Sorry, I don't understand how someone's choice of profession reflects on their ability to be a good tenant. Definitely not a reasonable excuse for snooping on someone's supposed Internet history. This is what leads to first come first serve regulations for rental applications.
Well, I did have specific reasons for those examples (besides comedy value). Weed dealer tends to bring sketchy and dangerous people to the property. (If you don't buy that, try "heroin dealer" instead.) And I also stipulated that they lied about it. These are all things that happen all the time.
And then as far as "bass player," that's just a proxy for low income; I also picked a hyper-niche genre that even most metal listeners don't enjoy. If he can prove income via sending copies of contracts with music venues or something, more power to him.
For good or bad, this all stuff that could plausibly be used as an accurate source of information about someone's ability to pay and to not be a problem to the other tenants--especially the lying part.
I thought it was the opposite (which is something I'd actually be interested in). That is, finding usernames which aren't present in any social network.
Examples of unacceptable behavior by participants include:
Publishing others' private information, such as a physical or electronic address, without explicit permission
How is this tool not a violation of its own CoC?