The work is more about running named entity recognition on the text and correlating the names with other sources of data than it is about deducing the redacted words (which is probably impossible for the most interesting words). For example if flight XX1234 is mentioned in the text, he might be able to deduce that the plane is owned by some Russian oligarch.
I already have a self-curated list of tail numbers of the private aircraft owned by oligarchs (and other international parties of interest), and a while bunch of ADSB data. Combine that with a timeline of events (and, for some, locations), and there are some…interesting correlations there.
Yeah, that’s bumming me out a little. It’s the NER (which is getting a lot of my time on this one because spaCy is an amazing tool to extend) and the correlation with other data that’s the interesting part.
Especially those flights.
I wish I could delete the tweet about “unredacting” (or at least edit it to point to my decision later on) without breaking the thread. It was written in a moment of nerd glee, but the ethical considerations are more important to me.