Hacker News new | past | comments | ask | show | jobs | submit login

> Train it on content that you know was produced by someone, then unleash it on the rest of the internet to find content that they wrote anonymously.

I apologies but that's disappointingly vague..




I do not have direct ML experience at the moment. I’ve only run simpler regressions. I was making an educated guess as to the tech which may emerge, and it doesn’t seem far-fetched since it probably would not require language comprehension or higher order reasoning. I would love someone who’s worked with ML to weigh in.

The way I see such a thing working is that you train it to identify what makes your writing unique . Everyone has a highly esoteric writing style, akin to a thumbprint.

An ML optimizing for uniqueness can identify:

- Relative frequency of certain words

- Diction

- Written tics

- Distance between certain words that the author tends to cluster together

- Mean clauses per sentence, clause variance

- Symbol usage

- Affect

- Interests

and abstract patterns that we haven’t even recognized yet. You can limit the search space at first by pointing the algorithm at certain websites and sub-sites that you’re fairly certain the person uses, but eventually I think even that will not be necessary.


> I do not have direct ML experience at the moment. I’ve only run simpler regressions. I was making an educated guess as to the tech which may emerge, and it doesn’t seem far-fetched since it probably would not require language comprehension or higher order reasoning. I would love someone who’s worked with ML to weigh in.

Aha. Indeed. I would prefer to have someone actively working in ML to weigh in as well.


It’s called styleometry and it’s a very real thing, and can be done with very high accuracy. There are academic papers on it (just search Google scholar for stylometry). Do not expect things you post under another username to be never be linked to you. There’s just no publicly available tool to run it.


OK: it's fairly trivial for someone like me to do this, and you don't even need "ML" to do it. It's just counting, binary trees and simple models.


> OK: it's fairly trivial for someone like me to do this, and you don't even need "ML" to do it. It's just counting, binary trees and simple models.

Would you be interested in PoC'ing it out for such a trivial project?


Absolutely not. Just because something is easy to do doesn't make it a good idea.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: