I wonder if they've actually only replaced letters with X's. It seems like a poor decision to leave spaces in place, as people can attempt to guess the words based on their length. It's probably randomized, but why bother doing that instead of just "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"? Odd.
I wonder if GPT3 could be used to infer reacted text prompted by the surrounding article..?
But for something like this which is sort of a semi-serious report I kind of have the feeling maybe they're just doing some of these as spurious redactions for the laughs.
>
I wonder if GPT3 could be used to infer reacted text prompted by the surrounding article..?
Generally, I would assume that the part of the text with the most entropy (redacted nouns and their adjectives) would be especially hard for a system like GPT3 to generate. There might be some clues in the surrounding texts, but perhaps someone would need to train it specifically on similar (un-redacted) documents?
Also, the other way around, marking parts as viable-for-redaction and then having GPT3 generate text to fill in those gaps would be a really insidious way of redacting content. I'm guessing a GAN-type network could even generate the visuals that are appropriate to the document that has been redacted..
If the redacted part was somebody's name or a project codename, there's basically no way to infer that unless the system has had source text with that info. But I guess in the case where they are redacting something like a "method or procedure" that has been written about unredacted in other contexts and then its application in this context is classified, I think that would be possible. I'm sure people are already thinking about how to guard against this.
Your second paragraph is interesting. It's basically an automatic disinformation generator. I think there's almost more security value in releasing a document like that with faked interpolated text standing in for redacted sections, and then a few blacked out sections to add to intrigue and credibility, than in doing a regular redacted document...but maybe there's benefits of the old way I don't understand. Not sure if I got what you were saying there, but I think I got your meaning correctly.