Using ChatGPT for a "neutral and expert question" is really broken: it gives the public the opinion that ChatGPT output could be trusted whereas it's so often wrong...
That's really dangerous (beyond the immediate stupidity of asking such question to a LLM) because it may give an impression that ChatGPT answers should be trusted and neutral... It's neither.
It looks more like retaliation against anonymous whistleblowers:
> EJMR has been used to expose multiple counts of plagiarism, corruption and serious professional misconduct that would not likely have been shared for fear of retaliation by their higher ups or colleagues. Indeed one of the co-authors of the paper had their own likely plagiarism exposed by anonymous EJMR users, calling into question the motivation for the study.
no, the research was something-something-humanities-economics-whatever.
In doing the research they - random academics, not some nation state, or criminal organisation - were able to find that the posts were not anonymous, and extract the location of posters so it was not hard. They clearly informed the forum owner, because the forum owner fixed the issue (though notably did not tell anyone that their posts were not anonymous).
Now new users of ejmr can hopefully rely on actual anonymity, and existing users can know that in principle their posts could be partially de-anonymized (the IP is the external one, most organizations being described have you behind some kind of NAT so full identification is at best questionable imo).
If you find a security vulnerability, I don't think the right course of action is to spend thousands of dollars of GPU time to determine as many IPs as you possibly can, then write an economics paper about it.
The whole point is they aren't security researchers - they were doing research on the nature of posts on this forum. They worked out that they could do that, and so did - for the paper they wanted to publish having that information was the goal, and the way they did that was essentially in the methods section.
Certainly the attack itself is not worth publishing: it's not in any way novel or interesting, the "anonymization" ejmr did was fundamentally broken from presumably day 1. Nothing the authors did here was new, novel, or complex - the only change is that what the cost of reversing has dropped from "a large organisation" to "a single PI's budget for a single paper" over 12 years.
We need to be very clear here: there is no part of the ejmr "anonymization" scheme that was correct for what they were trying to do. They did not salt the hash, the hash algorithm they used was considered deprecated a decade prior to ejmr existing, even the hash family they used is inappropriate for this purpose.
The reason for public disclosure of vulnerabilities is that the victims of those vulnerabilities need to know that they have been victims, and they need to know what information has been leaked by ejmr. Based on the actions ejmr took to change their hashing schema, it's fairly clear ejmr found out about the vulnerability (maybe the researchers told them, maybe the researchers were not unique in discovering this). But we also know that ejmr did not inform any of its users that ejmr had been leaking information about them for 12 years.
Which is why it is necessary to publish this information - if this paper did not detail how terrible ejmr's "anonymization" was, it's pretty clear ejmr would not have told its users, and as the HN and similar comments indicate, plenty of people would believe that breaking ejmr's system was too hard for anyone else to do.
I'm tired of repeating this: ejmr was not anonymous, their attempt at anonymization was trivially broken from day 1, and defeating the anonymization is absolutely trivial and is not remotely challenging - literally the only difficulty is how long vs how much money to spend.
> They worked out that they could do that, and so did - for the paper they wanted to publish having that information was the goal
My claim is that they shouldn't have.
>Which is why it is necessary to publish this information - if this paper did not detail how terrible ejmr's "anonymization" was, it's pretty clear ejmr would not have told its users
I agree it's necessary to disclose the vulnerability to the victims (especially if ejmr wouldn't have), but it wasn't necessary collect as much data as possible themselves and write a paper about it for their own gain.
Studying the disposition and demographics of forum posters is not new, nor is this a unique example. The only issue here is the forum posters believe, based on incorrect claims from the forum, that they were anonymous. But their posts were not, and this is the first time it came up publicly, because this is the first time someone looked at this particular forum, in the context of "I want to publish a paper about the demographics of this forum".
The forum users have the right to feel angry that their posts were not anonymous, but that anger should be directed at ejmr, not the academic that made it clear their posts were not.
The posts on ejmr were not fully anonymous, and nothing can change that - there are more than 10 years of posts, all of which are public, none of which are [fully] anonymous. It does not matter whether this academic collected any of the information, because in a hypothetical world where they don't and simply disclosed that none of the last decade+ forum posts are anonymous, anyone else could do exactly the same thing. This is assuming of course no one has done this in the past.
> I agree it's necessary to disclose the vulnerability to the victims (especially if ejmr wouldn't have), but it wasn't necessary collect as much data as possible themselves and write a paper about it for their own gain.
What harm do you think writing a paper on forum demographics did? I am genuinely curious, because this seems like you're still just trying to find ways to blame the gross negligence of the ejmr folk on the authors of this paper.
Someone else is going to do it anyway. And you need a proof of concept. If you write "N thousand users may be exposed", perhaps you will include some proof of that.
I think I wooshed some people with this comment. Gonzo journalism is when the journalist participates in the story. Gonzo research then is when the researcher participates in what they're studying. In this case, the researchers ostensibly researching toxicity are doxing people, which is toxic behavior. This makes them gonzo researchers.
That's a logical fallacy. Breathing must be toxic because dirtbags breathe.
AIUI, doxxing requires either publishing a subject's days or taking action against the subject. The authors of this paper day they have no intention of doing so. It's mild.
My dad has been saying for years, even before the internet took off, never write anything you don’t want the whole world to see. When the internet came along he reiterated the same message saying don’t write anything online you don’t want your friends, your boss, the police, a judge or anyone else to see. Anyone who goes online with the belief that they are anonymous and writes things that can seriously hurt their career or something that might put them in jail is foolish.
Indeed. Don't say, read, write, watch or do anything you wouldn't want your friends, boss, police, or judge to know... and now you are no longer your own person but rather an inoffensive amalgam of the beliefs of others.
Because people keep on acting like these researchers have retroactively removed the anonymity of this forum, or somehow everything was anonymous before this published, lets go over the facts:
1. ejmr made a system that includes hashes that could be trivially linked to ip addresses
2. ejmr claimed posts were anonymous
3. this researcher realized that the hashes could be trivially linked to ip addresses
4. the researcher presumably informed ejmr (as ejmr changed their scheme prior to publication)
5. the researcher published the findings
The posts made on the forum could be linked to ip addresses from step 1, if this series of events stopped at step 2 or 3, the posts would still not be anonymous, and forum users would still believe that they were.
We know that at step 3 this researcher realized that the forum posts were not anonymous, we have no way of knowing how many other people may have also discovered this.
At step 4, we know ejmr changed their hashing scheme to actually make it [maybe] anonymous, and despite now knowing their existing scheme was not anonymous they did not inform any existing users that their posts were not anonymous.
At step 5 the people using these forums finally discovered that their posts were not actually anonymous, because they were never anonymous. People on that forum, and commenters on HN, act like the researcher was responsible for the technical failure of ejmr, and somehow the act of telling people that their posts were not anonymous is what actually removed anonymity.
Because people continue to struggle with this, let's imagine I made a forum where every post had an id that was computed as the first 10 characters of base64(rot13(ip || iso date)). A decade later someone goes "hang on, this looks like base 64", and then publishes their findings: you can get a post's IP address by decoding the truncated base 64 and reversing rot13.
Is that person responsible for de-anonymizing the users of my forum, or is it my fault for misrepresenting the anonymity of my forum?
> ejmr made a system that includes hashes that could trivially be linked to ip addresses
"trivially be linked" = searching 3 quadrillion possibilities?
Suppose that in the near future that a quantum computer enables the "trivial" piercing of current anonymity assumptions, should those individuals also be fair game for doxxing: "they were never anonymous"?
Your casual appropriation of "triviality" to dismiss moral concerns over this paper and the authors' possible motives rings hollow in me.
> "trivially be linked" = searching 3 quadrillion possibilities?
Which is trivial. Doing the same thing many times is literally what computers were invented for. Whether it's 3 times or 3 quadrillion times, it does not matter.
> Suppose that in the near future that a quantum computer enables the "trivial" piercing of current anonymity assumptions, should those individuals also be fair game for doxxing: "they were never anonymous"?
There are myriad ways to have provable anonymity, quantum computers are not magic. More over the best known algorithm for some kind of deanonymization under QC is still Grover's search which is a sqrt improvement, rather than anything catastrophic like Shor's. But that's also irrelevant.
ejmr's "anonymization" was not anonymous under the standard cryptographic assumptions of 20 years ago, let alone 12 years ago when the software originated.
To be clear, when ejmr was first started:
* SHA1 was mostly cryptographically broken (that is it was considered a sufficiently determined adversary with unlimited money could break it), hence any new use of SHA-1 is definitionally wrong.
* SHA is the wrong family anyway, SHA hashes are authentication codes and are therefore intentionally extremely fast to compute. It was well established in the _90s_ that authentication hashes are not appropriate for anything other than authentication, alongside numerous demonstrations of breaking password hashes which is what ejmr was essentially doing.
* ejmr was not salting anything, and literally anyone with actual experience in any actual field using hashes knows that salting hashes is mandatory.
This isn't "this was anonymous until computers got faster", this was not anonymous at the time it was first written, under standard cryptographic assumptions. Let's say it cost $10k for this PI to compute those hashes, then 12 years ago, assuming Moore's law, it would cost $5million to break (under simple assumptions, so I doubled to be conservative).
That. is. broken.
> Your casual appropriation of "triviality" to dismiss moral concerns over this paper and the authors' possible motives rings hollow in me.
No. My claims are purely related to the claims that the authors of this paper are responsible for deanonymizing people that on ejmr, when ejmr catastrophically failed and misled its users.
Your immediate response to my statement about triviality was to repeat "it's a big number" which belies a gross misunderstanding of the field. Anything involving hashing or cryptography is filled with giant numbers. A non-trivial attack is one that involves doing something clever to reduce the search space to make the attack possible. This attack was _literally_ "we just tried every option as fast as possible". That attack on misuse of hashing operations was identified in the 90s when people demonstrated breaking of password hashes.
This attack is not clever. It does not - afaict - do anything that in anyway reduces the complexity from "try every option", it is a dumb solution to the incompetent "anonymization" performed by ejmr. That "try every option" was an option speaks to how poor the ejmr code was, and how trivial this was.
As for the "morality" of the paper: there are endless "studies" of forum culture and demographics that haven't caused problems.
The only problem I see is that ejmr is refusing to acknowledge that they rolled their own crypto, and predictably got it wrong. That and people like you who seem to believe this mediocre research paper is somehow responsible.
No. Black vs white hat is "did you break this and then use it to <do something illegal>.
The responsible vs. irresponsible disclosure question is "do you tell the responsible party ahead of time and give them time to repair it". From articles it certainly appears that ejmr learned how broken their code was prior to this paper being published.
But responsible vs irresponsible disclosure is not a question of "should this be disclosed at all?", which the security community as whole seems to have determined that the answer is "yes".
The problem is that ejmr was not anonymous, and if you publish something that is not anonymous, it is forever not anonymous.
The only option would be to not disclose that there was any problem, not notify people that their posts were not anonymous, and this paper (the actual "research" about where posters lived/worked?) could also not be published. Because any acknowledgement or indication that the you could get form id to ip in any forum would cause people to go "huh, how did they do that?" a Streisand effect your way to everyone knowing.
This is of course assuming that no one else interested in commenter identities has ever looked at ejmr either, because these researchers did not do anything clever to break the scheme.
> Black vs white hat is "did you break this and then use it to <do something illegal>.
That is a very narrow interpretation of "black hat". I think mainstream take is that black hat includes many legal but ethically dubious actions. Maybe you would call it "grey hat", I don't know. But publishing vulnerability without a responsible disclosure can be considered unethical.
> But responsible vs irresponsible disclosure is not a question of "should this be disclosed at all?", which the security community as whole seems to have determined that the answer is "yes".
Yes, I don't know if you misread but by 'responsible disclosure' I meant 'tell ejmr about this before publishing'.
> The only option
No. If they were informed about this issue, after changing the schema EJMR could take down all preexisting posts made with the old schema and request public archives to remove them (and reindex new ones). It's not foolproof because many posts may happen to be archived independently but it would be something. And of course notify users.
> But publishing vulnerability without a responsible disclosure can be considered unethical.
Yes, there is debate on that, and there are arguments on either side. But given ejmr went 12 years without changing their "anonymization" scheme, and then changed it a short time prior to an article being published that demonstrated the scheme was broken, I think it's reasonable to presume ejmr was notified prior to publication, and had time to correct the flaw, which is the canonical example of responsible disclosure.
That ejmr did not tell its users is an example of the behavior that the anti-responsible disclosure folk point to. Organizations that say "you should tell us about vulnerabilities in our products, but you cannot tell our users, and neither will we" are a large part of the reason some people oppose responsible disclosure.
> No. If they were informed about this issue, after changing the schema EJMR could take down all preexisting posts made with the old schema and request public archives to remove them (and reindex new ones).
There are multiple existing libraries online to support scraping ejmr specifically, as well as who knows how many archives and search engines we don't know about.
Every person who posted need to be made aware that their posts could be tracked at least to the IP (though at any institution you're behind a NAT so generally IP != person, and the idea of ISPs having per hour IP<->user logs from a decade ago seems suspect).
Also we know that ejmr found out about the gaping hole somehow - we don't know exactly, we just know they addressed the incompetence, though I assume they're still doing it wrong - and they didn't even pull and re-index their own archive let alone ask anyone else to do so.
> It's not foolproof because many posts may happen to be archived independently but it would be something.
Either you're anonymous or you're not, so you can't just say "we doubt there are any other archives so you're safe". The user IPs are not secret, as they were never secret.
We also have no way to know if anyone else had already done this, and we likely never will.
You are never ever fully anonymous writing online. It all depends on how difficult it is and how determined the threat.
Everything you post can be correlated with your other writing and online activity (even if you fake the style), ISP subpoenaed and tor nodes compromised. But most people are sorta anonymous because no one bothers to go through the trouble.
Snippets of posts with IP addresses at Harvard, Stanford, Yale, University of Chicago, and the National Bureau of Economic Research headquarters include: "Rapefugees Welcome!!!!! - Merkel"; "bietches are fugly"; and "Its about ching chong taking bubba's job and bubba putting on a white pointy hood in response."
Except that isn't what's happened here. What's been published in this case, to use your example, is akin to someone saying "If you use this particular format in john, you can identify 4chan users." Nobody who knows what they're talking about has ever considered anything like that egregious; what's egregious is the idiotic self-rolled crypto that these researchers have exposed by doing so.
> Do you or do you not think that it should be acceptable to use language like "d4mn j3ws" in an academic forum?
Of course it's not acceptable. It's not acceptable for any academic to disagree with me period. All those posters need to be rooted out, fired, and blacklisted.
Sorry if this sounds harsh, but you need to have a little more empathy for those aren't part of your "white boys club." Don't you think women and minorities need to know if their colleagues are posting their horribly sexist and racist thoughts online? Read the examples in the paper and then tell me that the colleagues of these people don't have a right to know.
Well, if you don't mind being harsh, I'll tell you this: In your woke-scolding, you are assuming the both the gender and race of the person who you are replying to; probably based on a single comment.
People have always been fired/distanced/exiled for antisocial behavior. Not to mention employees are terminated for much, much less justified actions all the time. This isn't new, it's as old as time. Bigots can go mask off when they're not on my payroll.
It's at best parroting a the most common layman beliefs, which may not agree with the law. That stood out to me too. Now I want to read about what dumb thing the site did to even make this possible. Ods should have been random, not hashes of pii
The topic_id would be shared for everyone who posted on it. For each topic_id, it is then a matter of hashing 4 billion IPs to match each post to the topic. A different salt applied to each user so that would require the 4 billion hashes for each user post to a topic (topic_id+IP+salt).
I would consider the topic a salt - the problem is that the input is so small - just a 32 bit number which makes the "password" (user ip) fast to break.
The sane solution would be to generate large random ids per ip address, topic. And burn the mapping after some time.
This is a weird use case (deliberately making the hash public) and the usual concept of a salt feels weird here. Any kind of server-side secret would have effectively stopped this attack, even if it was the same in every hash.
Assuming even distribution of four hex values: 16^4 = 65k potential IP collisions. From my quick skim of the paper, the authors made some assumptions about posting tendencies (frequent posters likely to comment on multiple topics) and looked for enriched patterns of IP addresses. An IP address assigned to multiple topics within a short timeframe is more likely to be real. As a control set, they took a different four values of the hash function (eg true function samples [10:14], false set took [11:15]) and used that as their statistical threshold.
The paper explicitly claims that it doesn't identify users, merely links posts to IP addresses which belong to autonomous systems that are owned by universities or other elite institutions.
Good points however some universities at least in the US that I've bothered to poke around at give clients public ip addresses. It blows my mind, it's unsafe and wasteful but they do it. This may have changed or been unique to the ones I found looking around for open RDP not believing how dumb people were to expose it to the Internet
No, it's more than just words. Would you want to participate in a profession where roughly 50% of your colleagues go behind your back to spew outright hatred at your ethnic group? Do you think you have a right to know about such behavior?
Here's a direct quote the paper has from EJMR: "And America lost its war against blks. [...] At least until we resolve to final solution"
Do you have any empathy at all for those on the other side of this?
this is rly awful "research". ederer prob got bullied on there and got mad, cause he has a reputation as a bitch.
more importantly, ejmr has been important in uncovering multiple cases of research fraud (including one of the literal damn authors, this is some vindictive ass bs) and is the best source for actual unfiltered info on opinions of econ departments. includes important info like info abt people's political biases, potential toxic departmental cultures (or rather, which are more or less toxic, this being econ), and even info on sexual harassment allegations that depts would rather cover up. chilling effect of stripping anonymity is awful.
> this is rly awful "research". ederer prob got bullied on there and got mad, cause he has a reputation as a bitch.
No. Also name calling, while commenting on something you clearly don't understand is not a good look.
EJMR claimed to be anonymous. It was not, and what they were doing skipped the most absolutely trivial of steps for actual anonymization.
The only difference between this week and last week, is that now people know that their IP addresses were leaked rather than believing that they were anonymous. I would argue, it is beneficial to people to know that anyone could have done exactly what this researcher did, and we would have no way of knowing.
Blaming the person who found out how terrible EJMR's "anonymization" was, is classic shooting the messenger.
It's also a really good example of why we so "don't roll your own crypto". Any person who specializes in cryptography (or hopefully anyone who has done a basic intro to cryptography course) should have been able to point out the issues.
> more importantly, ejmr has been important in uncovering multiple cases of research fraud and is the best source for actual unfiltered info on opinions of econ departments. chilling effect of stripping anonymity is awful.
This sounds like the kind of thing where people need anonymity, it's a good thing that this research has demonstrated that ejmr was not providing such. Again, this research has not "stripped anonymity", there was none to begin with.
>Blaming the person who found out how terrible EJMR's "anonymization" was, is classic shooting the messenger.
Found out! They had an enemy: a small forum that they did not control. They looked for ways to screw it. This isn't some good-natured happenstance, they targeted someone they didn't like so they could screw them. The result, the point, wasn't, "Hey, security is important, kids, let me highlight your errors" it was, "Hey, you goddamn blasphemers, you have trod upon my fickle religious beliefs, so with the institutional and state power vested in me I will screw you."
So you're saying its good that the obviously vindictive "researcher" targeted them for personal reasons because he dislikes political/religious opinions displayed on their casual rumors forum. "It was a public service," he claims! I understand that you probably want to white knight for your team, but perhaps take a moment to realize how ghoulish your disingenuous equivocation is.
Dude, you're hiding behind an explicitly anonymous account throwing random personal attacks at people.
I literally had not heard about ejmr until this week.
Direct your anger at ejmr, they're the people who made bogus claims about anonymity while using tools they lack the most basic understanding of.
It also does not matter if it was some kind of personal "I hate this forum" or "I hate the creator". ejmr's anonymization was incompetently written, and screwed up the most basic usage of the most basic cryptographic primitives, and was using the wrong primitives in the first place.
The fact that we're hearing about this in a paper by a person you have declared to be on a vendetta is irrelevant - given that person is explicitly not a cryptographic specialist and was able to find that the ejmr posts were not anonymous means that the idea that no one else could have done so without publishing an academic paper is implausible.
As I have said elsewhere, ejmr's "anonymization" was so broken that even the attack itself was trivial (the article's author is an academic and would have absolutely made a separate publication on the deanonymization process if they could have).
then i think that's why you're so cavalier, you don't understand the area you're discussing. ejmr is an actual important site in econ, and by cracking and holding that info rather than just reporting a bug, ederer is creating and maintaining a chilling effect because "wahh, he doesn't like what people say about him on there"
imagine i find a sql injection vuln in a site. i have 2 options:
1. report it like a good person
2. exploit it and dump the whole damn list of hashes
1 is research. 2 is blackhat shit. i agree the anonymization was bad, i agree rolling your own crypto is dumb, i'm arguing by addressing it the way he did, ederer is (consciously) attempting to break the valid role of anonymity and introduce a chilling effect.
there is a big difference between reporting a bug and using a rack of a100s to crack and hold info, with the subtle undercurrent that it could be released.
there's an obvious conflict here. ederer et al. don't like ejmr, so instead of looking to actually help, they went after something totally outside their usual just to be dicks about it.
That's really dangerous (beyond the immediate stupidity of asking such question to a LLM) because it may give an impression that ChatGPT answers should be trusted and neutral... It's neither.