Hacker News new | past | comments | ask | show | jobs | submit login

Pixelation is one of these hilarious instances of where a needlessly complicated approach is used to do something that can be done much quicker, easier, and, as it turns out, way more secure.

Wanna redact text?

Put a solid bar over it. There, done. No takesies-backsies from that one.




> Put a solid bar over it. There, done. No tacksies-backsies from that one.

And this is not so good advice if you're using software that supports layers and your solid bar is in another layer than what you're trying to hide.

Wanna redact text?

Delete it.


Lol, that's really a stretch/nitpick, like saying:

> Delete it.

And this is not so good advice if you're using software that supports history and your deletion action is in another action that you can simply revert.


> if you're using software that supports layers

...then really you should know how to flatten? Not understanding the tools you use for sensitive tasks like hiding/redacting data is a recipe for disaster.


Seems like "flatten" ought to be a trivially easy mechanism to implement in software somewhere, and maybe even as a standalone application for the less familiar. Basically just a nicely trimmed screenshot, no?


> Delete it.

Redaction and Deletion are not equivalent. There are many instances where a visible redaction is required rather than a deletion.


Sure, but you should still delete it. You can always add a bar at the old position afterwards. Then it's still a visible redaction, but you don't end up having a hidden copy of the unredacted text.


Also make sure that you print the document with solid bar over confidential text, and then scan it.

Replicating this process digitally is quite hard. Most pdf editors will put a square over the text, so that the original can be easily recovered.


Tell that to Paul Manafort's laywers.

https://www.alfresco.com/blogs/corporate-news/redacting-pdfs...

>Redacting PDFs – What did the Manafort Lawyers get wrong? Date: May 18, 2020

>With Paul Manafort being released from Jail on May 13th, for those in the document space like Alfresco, it was worth revisiting the PDF redaction issue that surfaced during his trial. Back in 2019, lawyers representing Paul Manafort (a former lobbyist, political consultant, and lawyer, who chaired the Trump Presidential campaign team) filed a response to special counsel Robert Mueller’s claims that he violated his cooperation agreement by repeatedly lying to prosecutors. On pages five, six, seven and nine either the lawyers or the special council staffers attempted to redact sensitive passages.

>Although parts of the public version of this filing appeared to be redacted by black bars at first glance, it quickly became apparent that anyone with Adobe Acrobat, or other PDF viewing tool, or even browser-based viewing tools, could easily copy and paste the text that still existed under the redaction blocks to another document to simply reveal the passages that had been redacted. From the UK, a similar incident happened back in 2011 with the Ministry of Defence.

https://www.bbc.com/news/uk-13107413

>Internet mistake reveals UK nuclear submarine secrets

>The Ministry of Defence has admitted that secret information about the UK's nuclear powered submarines was made available on the internet by mistake.

>A technical error meant blacked-out parts of an online MoD report could be read by pasting into another document.

>Details were reported to include expert opinion on how well the fleet could cope with a catastrophic accident.

>The MoD said a secure version had now been published and it was working to stop such an incident happening again.

>Information also included measures used by the US Navy to protect its nuclear submarines, the Daily Star Sunday reported.


> Most pdf editors will put a square over the text

Redaction is usually a "Pro" feature in commercial PDF editors. People who use PDF's annotation feature to draw a big filled shape over the text/image to be redacted, don't know what they're doing and should be prepared for poor outcomes.

Adobe Acrobat and Foxit have "Pro" variants that redact properly, I'd love to know exactly how they do it. The open-source approach I'm familiar with, essentially converts the PDF to a flattened image and edits it to apply a colored shape over it. Example[1].

[1] https://github.com/firstlookmedia/pdf-redact-tools


Exactly... Mistakes happen even in court documents...

https://www.reddit.com/r/Wellthatsucks/comments/i1cdpl/the_d...


I saw a variant of that last year. Someone sent courthouse floor plan to a contractor who needed office layouts, with security information "removed". Turns out it was just an overlay and he could see all security cameras by dragging the layer aside in an editor.


Would "print to PDF" be a good enough scenario for most people putting a square over the text? I admittedly haven't looked hard enough into it but theoretically it should be flattening it.


No, because “Print to PDF” preserves text (i.e. text is selectable in the resulting .pdf document), so it’s gonna be selectable behind a black square as well.

Although I just checked, MacOS Preview has a feature “redact” that (is supposed to) actually redact text. Well done!


On MacOS 10.15 Preview, I drew an "annotated" black box over a paragraph that was recognized as text. I exported it in two ways, 1) "Export as PDF" and 2) Print -> Save as PDF. Both results did not include the underlying text under the box.

That's not to say it's definitely not there, but it's at least something.


1. Delete the text

2. Set Background Text Color to black

3. Put spaces in place of redacted text

4. Save


For further security, put a random number of spaces (or just one) so that the length of the secret is not leaked.


If you really want to remove the information you just have to [REDACTED] and then nobody knows anything, only that the information isn't present.


> Put a solid bar over it. There, done. No takesies-backsies from that one.

Well, mostly. But say you know the name that's redacted belongs to a small group (eg: US president since 1970 to today) - you could probably rule out (and mabye rule-in, determine) the redacted name, based on font-size, kerning etc in the document.

I wonder how fare a machine learning model could go, for longer reports - say 1000 pages with ~100 pages redacted - and the style of writing could be approximately inferred from the visible content - how many sentences/paragraphs could the AI fill in with some probability?


> But say you know the name that's redacted belongs to a small group

True, but sometimes there is no way around this, because redacting part of the text visually, so that it is clear where and how much was redacted, may be a requirement.

If there is no such requirement, one can always just replace the part of text like so:

   This is the original text that I am going to redact now.

   This is [REDACTED] now.


> you could probably rule out (and mabye rule-in, determine) the redacted name, based on font-size, kerning etc in the document.

Depending on what you're redacting you can eliminate this variable.

For example when I want to redact a piece of sensitive text in a video often times I'll make the solid black bar longer than the text being hidden. This way you can't infer the length of it based on the length of the bar. Of course this only works when you can extend the bar in such a way where it won't hide non-sensitive info that's important to see. In practice it works well, for example for redacting browser history just make the bar the entire width of the browser URL bar and for API keys or secrets often times the key exists as an env variable on its own line so extending the black bar is no problem.


I always get a little paranoid with a solid bar even. For example, if I'm trying to block my SSN on a PDF out, I don't trust that the block won't stay a mutable block when saving it. I tend to "print" a new PDF in hopes that it flattens the document.


> I tend to "print" a new PDF in hopes that it flattens the document.

I wouldn't count on that. A well-implemented print-to-PDF feature will try to avoid rasterising whenever possible.

If you want to turn your document (whether in PDF format or something else) into a raster image, there are proper tools for that.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: