I read this story in the most chilling manner: the same tactics they use to perform this analysis will eventually be used to link people who anonymously post now but might say something in the future that can be linked to them using the same type of analysis used here. To phrase in another way, we have come to a point where your very prose is a digital signature.
Actually in the article, the author says he got the impression it might be more than one person (but his sources do not confirm or deny it) which if true probably means the NSA found the writings to belong to several people.
Sorry if my original comment seemed to imply otherwise.
I keep forgetting the general public is not aware of these things.
Data is like nuclear waste. Everything you do online leaves a pattern of behavior that is unique to you. Your only saving grace is no one cares about you specifically, until they do.
It was already known that a simple Markov Chain was used to detect another author had written a chapter inside a book. It was in 2003 I think, unfortunately I cannot find a reference about this. Just to tell that Markov chains are a very basic and old ML method quite efficient for this kind of task.
In 2002 a Dutch newspaper discovered newcomer Marek van der Jagt to be Arnon Grunberg, a famous Dutch writer. They sent some blind samples of various writers to a research centre in Italy, who matched the two.
This sort of analysis is older than Tolkien. There are pretty substantial processing requirements to do it at scale and it's pretty inaccurate. In the future people who say controversial things will use short sentence long statements to render this sort of analysis useless.
There are rephrasing services available, presumably for helping users plagiarise. Some are laughable bad but possible helpful, while othersare quite good.
Eg:
https://quillbot.com/app
Isn't this the reason the grep was created? It was used to determine which parts of the Federalist Papers were written by which author.[0]
Considering this occurred in 1974, I can only imagine that techniques for de-anonymizing authors have gotten much better due to how much written text individuals post on social media sites, like hn. Uh oh.
It's already a thing, see how the FBI caught Silkroad admin. Although not in the automated fashion that you suggest, I am pretty sure the algos are already in use.
But the whole cat and mouse game hasn't really started yet. Once people find out what the algo looks at they can try to game it. Eg if you know or looks for the same phrases like "first of all" you can stop using that. Or if it looks at specific errors you can sprinkle it in one text but not another.
Wasn't that how they caught the Unabomber? I saw a documentary about the guy who caught him by using this sort of analysis, although his method was quite analog (scanning through written letters and Unabomber's correspondences to the press).
The Unabomber was turned in by his brother, who recognized some unique phrasing in the released manifesto and reported it to authorities.
EDIT:
After looking, it looks like the brother turned him in after having this sort of analysis done to confirm his suspicions. So it does look like there's a tie to the story, but the analysis wasn't done blindly in that case.
Has the idea that Beowulf started as an oral traditional fallen out of favor? There is no mention of that in the article and that would seem to be an obvious flaw in the study if it truly was originally told orally. There is certainly less freedom when transcribing structured verse like Beowulf compared to prose, but the transcriber is still going to have an impact. A single transcriber could therefore provide a stabilizing voice that helps mesh together a story that was originally built piece by piece.
That was more or less the point of Tolkien's Lecture "The Monster and the Critics". He compared it to building a tower out rocks that came from a historical ruin. Everybody was interested in taking it apart to see where the rocks came from, but he felt it had value in considering it as a single creative work in its own right.
It's also the kind of idea that Tolkien was in a unique position to entertain. If one man can create an entire mythology spanning The LOTR, The Hobbit, and The Silmarillion, there is no reason to suppose that another man cannot achieve a similar feat with the Beowulf.
According to the article, doubts about single authorship began to be raised in the 19th century. This was a time when a lot of people thought that they were living at the pinnacle of history, and that cultures of the distant past must have been strictly inferior to the modern one. Troy could not have possibly existed; no single person in such a barbaric age could possibly have produced great poems like the Homeric epics -- or Beowulf.
Well, we found Troy. We also found evidence of great devastation at Troy right around the time when the Homeric war supposedly took place. It seems that people of the distant past did possess the ability to tell a great story after all, moving freely between history and mythology, filled with allegory and philosophical depth. Just like Tolkien did, but hundreds or even thousands of years earlier.
> Has the idea that Beowulf started as an oral traditional fallen out of favor?
It's hard to imagine that that idea could have fallen out of favor, given the incredibly sparse written record of the Germanic tribes. They didn't produce enough literature to have a non-oral tradition.
Even if originally an oral tradition, the current version could still be the work of a single author. Many of Shakespeare's plays are built on pre-existing stories/histories, but the plays as they exist now are his.
We could say the same about the Homeric epics. One does not end up producing the canonical version of an oral tradition by blindly recording whatever lyrics are circulating at the time. A great poet(s?) rearranges, reinterprets, and infuses the story with a unique perspective that can be identified as his (or theirs), even thousands of years later.
"Across many of the proposed breaks in the poem, we see that these measures are homogeneous," said Krieger. "So as far as the actual text of Beowulf is concerned, it doesn't act as though there is supposed to be a major stylistic change at these breaks. The absence of major stylistic shifts is an argument for unity."
I'm imaging this methodology applied across many other literary works. So many insights can be generated throughout the ages!
And have been! Careful attention to stylistic features has allowed us to get a pretty good idea of the chronological groupings of Plato's dialogues, for example, which helps us understand things like how his views evolved over time. That's a typical sort of use of stylometry.
"Like Beowulf, the Greek epics Iliad and Odyssey have also generated much debate about their authorship and composition.
Conventionally attributed to a single author—Homer—both works
nevertheless clearly originate in a long oral tradition and show signs
of considerable evolution in the course of their transmission history,
including the possible influence of written versions[37,38]. Since the
two Homeric epics have numerous features in common, we hypothesized that they might also have a similar pattern of sense-pauses.
However, as shown in Fig. 2a, the Odyssey has a higher proportion of
intraline sense-pauses relative to the Iliad. This difference suggests
a slight change of compositional practice between the two Greek
poems, whether due to a single poet’s stylistic evolution or natural variation across the oral tradition. "
Homer's a bit of an unusual case. The Illiad was the first written work produced after a long dark age (or close to it; I'm not sure where the consensus is right now on whether Hesiod came earlier), so Homer was drawing on a few centuries of pent-up oral tradition from a culture that had itinerant hostorian-poets. As such, he probably didn't compose all his own verses but could well have been the first to write them down. I'm not sure how effective the technique referred to in the article (stylometry) would be at teasing apart the distinction between composer-of-verse and author-of-lines.
What about Shakespeare, doesn't he have one of the most extensive authorship research? The article quickly mentions that an older analysis mistakenly attributed someone's poem to Shakespeare, to me, that only adds to the mystery of the authorship question.
Why? There's no point. If the results come back indicating multiple authors, great. If the results come back indicating a single author, you could just make the argument that it is the result of the book having a single translator.
Academic pursuits should have more meaning than trying to dunk on other people's belief systems when those belief systems are fairly harmless.
Seamus Heaney’s translation of Beowulf two decades got quite a bit of attention in mainstream magazines and newspapers. Plus, 2007 saw a film adaptation of Beowulf directed by Robert Zemeckis and written by Neil Gaiman and Roger Avary. There is public interest in this poem as a classic of English literature even when Tolkien is not involved.
Reading a chunk of Beowulf in the original was, along with learning Maxwell's equations and reading Gibbon one of the best things I ever did for myself. It's ... fairly obvious it's one person. The latter half with the dragon could have been an add-on.
Heaney's translation is pretty fun to read, I don't know what you're talking about with this "bloody awful" stuff. It's a poets translation for sure, and he admittedly takes license, but "bloody awful" feels like a stretch.
Heaney's translation felt like he took Tolkien's "The Monsters and the Critics" to heart with his understanding, and clear love of the source.
I don't care if he loves the source; it was an awful translation and it gives me the heebie jeebies like nails on a chalkboard. Just picking it up gives me the creeps; vandalism.
Rebsamen is closer to something like what was actually written.
Your Rebsamen comment is true - everything else you say makes you sound like those people who go around telling others they can't possibly appreciate or enjoy sushi if they've never been to Japan.
You may want to try dislodging the stick from your wart ridden anus.
Man, you're really butthurt; are you a relative of Haney's? One of the terrible diseases of our time, along with things like never reading the anglo saxon classics (I learned OE and read Beowulf in the original because both my parents did so in high school, as, apparently did everyone in the US at one point in time) is not calling out terrible things as terrible, and ascribing importance to people the media has declared as "great." Haney's translation is bloody awful, and will be remembered as such for as long as anyone remembers what his name is.
>Plus, 2007 saw a film adaptation of Beowulf directed by Robert Zemeckis and written by Neil Gaiman and Roger Avary.
That film would likely not have happened if not for the success of the Lord of the Rings films, and most people likely never heard of Beowulf until then, unless they dimly remembered having to read it in class once.
And no translation of anything, much less any book without a big media tie in, gets anything close to "quite a bit of attention" in the mainstream press. Coverage in literature sections of the newspapers or dedicated literary sites are far from mainstream.
And this is an article in Ars Technica, which to HN may seem mainstream, but which is far from it for the masses. A quick Google of "Tolkein Beowulf single author" brings up little in the way of mainstream coverage, with the Ars article being on top.
Don't get me wrong, I love Beowulf and Seamus Heaney's translation is one of the few books I'll reread regularly, but elfakyn is correct. If Tolkein's name weren't involved, no one would be covering this at all, and really, almost no one is now.
Well it does become difficult to separate considering Tolkein lectured about Beowulf, translated it in the 1920s (not published until this decade!), and his decades of work on ancient languages, philology and linguistics.
Nothing to do with LotR films, more to do with an intellectual giant well known in the field (who also wrote LotR). The books were far better anyway.
Neither is it Tolken's fault Beowulf is considered the most significant work from Old English. Often discussed in the broadsheets I once read, not ever likely to reach those who read the Sun or the Mirror. Still doesn't stop it being a highly significant work (without Tolken or Jackson).
The Beeb trot it out regularly - not buried in dusty literate sections that no one normal would encounter, which seems to be what you're driving at.
Mind it probably even reached down to tabloid readers from time to time. There was a fun Australian cartoon version, narrated by Peter Ustinov retelling from Grendel's point of view. Managed to become a bit of a cult classic in its day. There's been a couple of TV mini series. Probably a game and festival too for all I know!
> That film would likely not have happened if not for the success of the Lord of the Rings films
That particular film may not have been made, but it’s not hard to imagine an adaptation being made by someone even in the absence of the Lord of the Rings trilogy. Michael Crichton’s Eaters of the Dead, which riffs on the Beowulf story, got a film adaptation (as The Thirteen Warrior) in 1999. The Beowulf story isn’t The Dream of the Rood or other esoteric Old English literature; it has adventure elements that will attract ordinary audiences from time to time.
> Coverage in literature sections of the newspapers or dedicated literary sites are far from mainstream.
Literature sections of mainstream newspapers are mainstream reporting, even if many readers are going to skip over those columns. And are you seriously arguing that mags like e.g. The New Yorker or The New York Review of Books are not mainstream? Those may be bought by a certain demographic of bookish people, but those mags are sold at ordinary newsagents. They are not specialist journals.
>And are you seriously arguing that mags like e.g. The New Yorker or The New York Review of Books are not mainstream?
Maybe. Most people read neither nowadays. Unless my understanding of the definition of "mainstream" is flawed, that makes them essentially niche publications.
But that wasn't actually my argument. My argument is that most people don't care about literature beyond anything not tied into a popular media franchise, non-literary books or books by famous authors, and Beowulf is none of those things.
Whether "most people ... care about [it as] literature", I'm not sure, but most people in many school districts and universities were at least forced to read Beowulf as part of a standard curriculum, possibly more than once over the years. Isn't the question merely whether they would've cared enough to upvote or comment on the HN posting without seeing "Tolkien" in the headline? Beowulf is part of what one might call literary canon. What constitutes a literary canon is always going to be subject to debate, as it is ultimately subjective at some level. How one is to define "popular" media franchise, "literary" books, or "famous" authors can only pose an even greater challenge in forming any consensus.
Most people I know are familiar with The 13th Warrior, and know that it is a (reimagined) retelling of Beowulf by Michael Crichton. From wikipedia:
> In an afterword in the novel Crichton gives a few comments on its origin. A good friend of Crichton's was giving a lecture on the "Bores of Literature". Included in his lecture was an argument on Beowulf and why it was simply uninteresting. Crichton stated his views that the story was not a bore and was, in fact, a very interesting work. The argument escalated until Crichton stated that he would prove to him that the story could be interesting if presented in the correct way.
To be fair, chances are you and your friends are not representative of the mainstream. Just being on Hacker News makes that unlikely.
Michael Crichton is a famous enough author that people are more likely than not to see a movie based on his work because it's a "Michael Crichton movie" and neither know nor care about the source material. To most people, the Beowulf movie is just a fantasy movie where Angelina Jolie plays a sexy demon, not the adaptation of Beowulf they've been waiting for years to see, the way people were waiting to see (or dreading to see) the Lord of the Rings.
Beowulf just isn't that significant or relevant in popular culture - it just isn't. I don't even know why this is controversial.
How interesting. In that case am I right in guessing that your coverage of the Medieval part of the canon was limited to Chaucer and didn't include anything else? I'm just curious how much things have changed.
Chaucer was covered in a sense, but in History rather than English. The class did not read him, except for one student who chose that as the focus of a class project.
I did have a high school English class covering (among other, non-medieval works) Sir Gawain and the Green Knight, and the story of Tristan and Iseult. Sir Gawain and the Green Knight was read in translation, but Tristan and Iseult was a fairly modern reimagining (set in the original period), with an author's introduction discussing how she chose to omit the magic that was present in the original because she thought it detracted from the agency of the characters.
That sounds really good, I don't think I ever read Tristan and Iseult.
I'd have slotted Sir Gawain and the Green Knight in with Beowulf in the "medieval" part of the literary canon but I could be off-base there. I remember reading Beowulf in high school but not the other. That might be a function of which one I found more interesting at the time, I'm not sure.
I agree that Sir Gawain and the Green Knight is "medieval". I meant to say that the English class covering it was not focused on a historical period, covering literature that was much more modern in the same year.
Beowulf is from around the 8th century; I guess that's technically "medieval" but I think of it as belonging to some nameless period that's older than "medieval". There's a huge difference between Old English of the 8th century and Middle English of the 14th.
I think the "medieval" terminology is a little dated anyhow. I guess Harold Bloom's categorizations and listings and so on are a lot more authoritative now (they sure pop on a google search) and it doesn't look like he uses the term. I have no real opinion on how much any of that matters.
Memory is unreliable but I recall my high school class using a pretty good textbook that included Beowulf with both old English and modern translations, but also the chapter of The Hobbit where Bard shoots the dragon, which stylistically invited some interesting comparisons. It was a pretty good lesson for a high school kid who was also a fan of Tolkien, back before that was something you could be without reading any books.
I don't understand what point you're trying to make. Are you trying to criticize the general public for not caring about Beowulf in "the right way" or are you trying to criticize the media for not caring about Beowulf in "the right way"? Or do you think this story should not have been reported at all?
> Are you trying to criticize the general public for not caring about Beowulf in "the right way" or are you trying to criticize the media for not caring about Beowulf in "the right way"? Or do you think this story should not have been reported at all?
I'm criticizing the premise that Beowulf is as well known as Tolkien's works in popular culture, or even that well known at all outside of niche literary circles, as counter to the claim that Tolkien's attachment to the story has no relevance to the degree of its coverage, which, itself, is limited to begin with.
Beowulf is required reading for many many students, when I worked in book stores students came in every new semester from all levels of schooling to buy either the Heaney or Raffel translations - so I think it's probably interesting to more people than you think.
I think this claim is true in its explicit sense. The Tolkien connection makes an interesting story even more interesting, which probably increases media attention by some non-zero value.
But it would be an interesting story for many of us without the Tolkien connection. Beowulf is an important artifact in the history of the language many of us are deeply attached to. And better than a potsherd, this artifact literally speaks to us from the distant past (literal if you consider writing of this sort to be a form of speech, as I do.) If the claim is implying that most of the coverage is due to the Tolkien angle, and it would have little to no coverage without it, I believe that to be incorrect. But I don't know if that is what was meant, and the explicit interpretation of the claim is probably correct.
And IIRC was largely responsible for getting people to read Beowulf in particular as literature. I mean to appreciate the work of art, as opposed to dissecting it as evidence about language etc.
It was my impression (not based on a lot) that Beowulf was sort of forced into the status of "great literature" by the fact that it is the only major work of Anglic literature at all, and English elites wanted something from their own native tradition (which, again, didn't really exist) to compete with the classical epics.
I've noticed the same, unfortunately. Lot's more posts on anything that isn't coding or directly tech related that just say something like, "this shouldn't even be on HN"
At least the tend to get downvoted relatively fast