It's helpful to understand a little bit of how a PDF file works.
A PDF file stores a sequence of instructions for what do draw on a page. The instructions are commands, such as "write this text here" or "draw a circle there." When you draw black redaction boxes, you are just appending an instruction to the end of the list ("draw a box here"), but most of the time the previous instructions are all still there.
Really, the only way to redact a PDF with certainty, short of manually reading the file itself (yes, PDF files are text-readable; you just have to decompress them) is to rasterize the file to a large image. Otherwise, the instructions for writing the text are all still in the file, and there are any number of ways to extract them (for example, simply by deleting the instruction to draw the redaction box).
A government document can't (shouldn't) be rasterized because it would loose all the tagging information making it non-accessible in a 508 sense.
It is however trivial for even a non-technically inclined person to remove the text content through Acrobat's content editing pane.
Just as you suggest, things like this seem to be an example of someone in a large organization not quite understanding how the "magic" behind the tools they use work and winding up with results they didn't expect.
From what I have seen in working with the government and the web I think that anigbrowl's comment is spot on. Many people use tools that are designed for the web or the desktop in the same way that they used to use tools designed for print. In this case just slapping a black line over it works in print, and it does not occur to many people that don't fully understand the technology they use that it might not be directly analogous.
A significant portion of my day job involves working with people in government that have a mental model of content distribution that was constructed during the era of print. Its a fun challenge to help people adjust those models to make things more efficient / more informative / less dangerous.
Perhaps they want to keep the formatting, and don't want pages to reflow after they've already been layed out? The Unicode character "█" works well for this purpose.
I remember a similar story from a few years ago where redacted names where discovered from the length of the redacted characters. Turned out it wasn't too difficult to brute force names and see which ones matched the exact size of the redaction.
Mrs Browl works at a e-discovery company where documents are often redacted depending on which bits are admissible in court; I gather they use TIFF files for exactly the reasons you describe.
I suppose the (extremely limited) redaction features in Acrobat stem from the belief that the document is ultimately destined for printing rather than computer reading.
I also work at an e-discovery company and can confirm the use of TIFF files as the court-admissible format for producing any sort of document (especially if it has redactions).
Here is an example of the process to apply redactions to a PDF file (or any sort of electronic document). The document is opened in a third-party native viewer with redaction capabilities (and not in the original application like Arcobat or Office). The end-user draws a black box over the appropriate text using the third-party native viewer. This redaction is saved as a separate layer that can be rendered on top of the native document when viewed in the software, as well as printed out. Finally, when documents need to be produced for court, they are printed out to a TIFF-printer driver, effectively converting them to TIFF and "burning in" the redaction layer. The TIFFs are delivered to the courts and opposing council.
I have no personal experience and at best perhaps a vague recollection of having read or heard something, but would document images be or in the past have been used also to make discovery as inefficient as permitted for the opposition? Images require a person to analyze, or pre-processing before electronic search and analysis can be performed against them.
OCR is very good nowadays, so that's not so big a problem - and for big lawsuits (eg patent fights) there are literally hundreds of lawyers hired to comb through the vast array of documents deciding whether they're responsive or not.
The pre-processing is largely automated but there's certainly a portion (maybe 5%?) of documents that need to be hand-classified in the database before they get to the lawyers. It an interesting field - lots of money to be made, intense competition for it, relatively simple technology requirements but a legal industry which has been resistant to technology for quite a long time. Autonomy seems to be the leading software company in this space.
Cuba (along with Syria, Sudan, and Iran) compromise the State Department's list of State Sponsors of Terrorism - which I would assume is the origin of their being on that list.
America has been living under the shadow of imminent Cuban aggression for 50years. There have been Cuban attempts to invade America and numerous assassination attempts on the US president by Cuban agents.
And their chemical weapons kill 1000s of Americans after being secretly delivered from Canada.
This isn't the first time this has happened. Several years back, another government agency made the same mistake. On a slow computer, the text would render first, and could be seen for several seconds before the black boxes would appear.
Googling around, it seems to be a fairly common mistake, going back to at least 2000.
In the same vein, if not the same format: Distributing a Word document from which you've not removed change tracking (and/or other metadata). (Or whatever Word calls it; it's been a while.) I had to correct my teammates on this one, a while back (and yes, the documents were going to external clients). Nothing novel; the problem's been in the news repeatedly. Nonetheless, people -- even "technical" -- still don't get it right.
(I encouraged them to go further and switch to PDF format for the distribution, but they wouldn't.)
Nice one. I had to giggle at the ubiquity of the MS Office assistant down in the corner... It looks like you're trying to conceal some information from prying eyes. Would you like help with that?
I downloaded it earlier but it seems to have been taken down now. Very interesting. I bet that will lose some people their jobs - but I think it definitely highlights an underlying issue within the government bureaucracy that shows why people getting paid 30k with degrees from the University of Phoenix shouldn't be entrusted with national security... I wonder how many security people that passed through...
I doubt it. the more likely outcome is they mandate all their employees attend document redaction training developed and administered at great tax payer cost by a government contracting company. It's also quite likely that they use the incident to justify spending millions to have a government contractor write document redaction software.
Using the link in the article, it appears that it's already been taken down (link leads to a 404 page). However, from the comments in the article, it's been mirrored at http://cryptome.org (at the top of the list, in fact, filename is tsa-screening.zip, in case that makes it easier to find it).
Looking at the dates on the document, it was last revised in May 2008, so it seems like the horse left the stable some 18 months ago as far as actual bad guys and spies (who presumably seek out this information more actively than your or I) are concerned.
My sister is an attorney and asked me about software that can catch things like this, remove metadata, etc. Does anyone have experience with something along these lines?
A PDF file stores a sequence of instructions for what do draw on a page. The instructions are commands, such as "write this text here" or "draw a circle there." When you draw black redaction boxes, you are just appending an instruction to the end of the list ("draw a box here"), but most of the time the previous instructions are all still there.
Really, the only way to redact a PDF with certainty, short of manually reading the file itself (yes, PDF files are text-readable; you just have to decompress them) is to rasterize the file to a large image. Otherwise, the instructions for writing the text are all still in the file, and there are any number of ways to extract them (for example, simply by deleting the instruction to draw the redaction box).