Researching people you are considering entering an asymmetric power relationship with under limited information situations, e.g., new bosses and landlords.
Would-be dating partners are another good use.
Also, searching for a way to contact someone. Reporters looking to contact sources, or anyone trying to find someone who offered a rare item or service for sale a while back.
Honestly, I’m not a fan of this kind of software or websites like Whitepages.com (which should be illegal). It should be hard to find someone’s personal information if they don’t freely divulge it. If they created a username to speak freely, with no ill intent, then the dev is acting against their will by making it trivially easy to triangulate their identity. Any benefits that the software may have pale in comparison to the doxxing, spying, and voyeurism that it may enable.
If this is truly for public benefit, then the dev should at least include an opt-out mechanism.
Freedom of speech lies on a spectrum. You can currently speak your mind on the internet, with little fear of the virtually non-existent threat of de-anonymization. That is a good thing for the well-intentioned majority. Any marginal increase in fear of speech reduces freedom of speech, so making de-anonymization easier doesn’t “teach ‘em a lesson.” It just kills something beautiful, something that must be protected. Don’t get me wrong, I don’t think this tool alone would do that, but it’s another step in that direction.
What next? A publicly available ML web crawler that analyzes speech and belief patterns, triangulates them with metadata, and returns someone’s identity with 99% confidence? I’m not naive enough to believe the government won’t build that, but for that to be freely available is just a recipe for chaos. It should be illegal, and it should be illegal to distribute many intermediate tools as well.
Using different usernames across various sites is a requirement if you don't want to be de-anonymized. It's already trivial to do so, even without tools like this.
> What next? A publicly available ML web crawler that analyzes speech and belief patterns, triangulates them with metadata, and returns someone’s identity with 99% confidence?
Close but no cigar.
But seriously, correct me If I am wrong but if I were to propose a solution to defeat this `ML web crawler' confidence in it's results, then the first thing I would do is to feed the internet with invalid data. How does ML-based analyzers deal with this kind of data set?
I do not have direct ML experience at the moment. I’ve only run simpler regressions. I was making an educated guess as to the tech which may emerge, and it doesn’t seem far-fetched since it probably would not require language comprehension or higher order reasoning. I would love someone who’s worked with ML to weigh in.
The way I see such a thing working is that you train it to identify what makes your writing unique . Everyone has a highly esoteric writing style, akin to a thumbprint.
An ML optimizing for uniqueness can identify:
- Relative frequency of certain words
- Diction
- Written tics
- Distance between certain words that the author tends to cluster together
- Mean clauses per sentence, clause variance
- Symbol usage
- Affect
- Interests
and abstract patterns that we haven’t even recognized yet. You can limit the search space at first by pointing the algorithm at certain websites and sub-sites that you’re fairly certain the person uses, but eventually I think even that will not be necessary.
> I do not have direct ML experience at the moment. I’ve only run simpler regressions. I was making an educated guess as to the tech which may emerge, and it doesn’t seem far-fetched since it probably would not require language comprehension or higher order reasoning. I would love someone who’s worked with ML to weigh in.
Aha. Indeed. I would prefer to have someone actively working in ML to weigh in as well.
It’s called styleometry and it’s a very real thing, and can be done with very high accuracy. There are academic papers on it (just search Google scholar for stylometry). Do not expect things you post under another username to be never be linked to you. There’s just no publicly available tool to run it.
There's no reasonable way to opt out of information that's already out there. Connecting the dots is power that powerful people already have; this is democratizing that power.
Most of the people that might stalk, harass, or threaten you aren’t in traditional seats of power. This tool at the very least makes the ability to connect the dots more salient to bad actors, and allows them to do so with industrial efficiency.
Phone companies used to print hard copies of the white pages and hand deliver it to everyone’s house, and the biggest concern anyone ever had about them was the volume of waste they generated.
Conversely, new bosses and landlords who want to avoid choosing someone who is objectively a bad choice for everyone else involved (they say they're a full-time carpenter but they're really a weed dealer/bassist in a progressive death metal band, etc.).
Not saying this is a net positive for society by any means, but just another example of a potentially legitimate use for doing online research about someone.
Sorry, I don't understand how someone's choice of profession reflects on their ability to be a good tenant. Definitely not a reasonable excuse for snooping on someone's supposed Internet history. This is what leads to first come first serve regulations for rental applications.
Well, I did have specific reasons for those examples (besides comedy value). Weed dealer tends to bring sketchy and dangerous people to the property. (If you don't buy that, try "heroin dealer" instead.) And I also stipulated that they lied about it. These are all things that happen all the time.
And then as far as "bass player," that's just a proxy for low income; I also picked a hyper-niche genre that even most metal listeners don't enjoy. If he can prove income via sending copies of contracts with music venues or something, more power to him.
For good or bad, this all stuff that could plausibly be used as an accurate source of information about someone's ability to pay and to not be a problem to the other tenants--especially the lying part.
Would-be dating partners are another good use.
Also, searching for a way to contact someone. Reporters looking to contact sources, or anyone trying to find someone who offered a rare item or service for sale a while back.