This is somewhat bad. Trezor's dictionary is only 2048 words. So to brute force 8 remaining words would only take 2048^8 attempts. Which is 88 bits of security.
With a redaction as bad as a Gaussian blur at such a high resolution (and a dictionary of only 2048 candidates), you don't need multiple angles to decipher those words.
It'd be great if they replaced the key with bogus words first before blurring to troll people, but somehow I doubt it.
> It'd be great if they replaced the key with bogus words first before blurring to troll people, but somehow I doubt it.
This is what I do 99% of the time I use blur to censor information; just replace the text with, say, colorful words of the same length before blurring. Would be neat if there was an automated tool that could do something similar.
I'm not sure why PULSE is being called out here. This is extremely common on any super-resolution trained on extremely biased datasets like FFHQ. In fact, this discussion is something that led to the dataset being changed. The authors actually extended their research to discuss the bias[0]. I'll also link to the Reddit discussion[1]. IMO the authors here responded to this correctly. It is important to remember that algorithms are only good on their in distribution datasets. And kudos on the authors for doing more experiments and including work on a less racially and sexually biased dataset.
Indeed, however one has to note that this article is mainly about pixelated text, which is a bit different, especially considering that it (usually) has a known alphabet and you often even know what the individual font glyphs look like from context.
I basically only saw "Barry O'Bama" at first, and thought "oh good, another rightwingey troll/rant". I managed to pick up on the PULSE GAN before my mouse made it over to the downvote button, tho, so it wasn't me. I think you'd be better off just adding half a sentence of tl;dr like "... Barry O'Bama, where PULSE GAN incorrectly reconstructed a pixelated image of Barack Obama, resulting in a white guy that didn't look anything like the original" or something like that.
I recently made use of my right to access personal information from someone who had sent me unsolicited marketing material.
The person who fulfilled my request sent me a PDF copy of their full customer list, where all entries had been blacked out except mine.
As you may anticipate, that blacking out was simply a black box drawn on top of the actual data. It took me all of 3 seconds to select all, copy, then paste in a word processor to confirm that the file contained personal data from hundreds of other individuals.
More generally if you want to redact pixels, then don't replace them with information that depend on those pixel values. It doesn't necessarily mean a black rectangle, but that's certainly simple and it works.
I’m slightly curious as to how, in these cases, someone came to decide that blurring was the right way to do it. It seems unlikely that they never thought of simply blacking/ blanking the text, but if so, then blurring must have seemed preferable.
My best guess is that, having seen blurring of faces (which is arguably OK when one merely wants to avoid casual attempts at identification, while retaining a ‘natural’ look), they assumed this was the proper way to do it in all cases.
You could use filler text like lorem ipsum to keep a natural appearance without exposing any information. Of course that is a bit more work than just dropping a blur effect on an existing document and exporting it as PDF.
Have you ever hand-written a word but want to hide it? One cross-out line doesn't do the trick. In fact, a full minute of trying to hide it with an ugly darkened-in box usually doesn't even do it. But if you just write a couple random letters over each existing one, people have basically no ability to recognize your original word.
That is my one insight. Take it for what it's worth.
AI upscaling? Yes. Seems like some fans used AI upscaling for the older Star Gate SG-1 seasons, since these were direct-to-VHS and hence shot on low quality media (by now there are official bluray releases which are also somehow upscaled, but I was told those would be of inferior quality compared to the fan effort). Not sure if those efforts worked on a frame-by-frame basis or used information across frames.
The remastered edition of Command and Conquer also used AI up scaling for the cut scenes. They lost the original recordings and the videos from the PlayStation release were the best they could track down. The result is far from perfect but probably the best one could hope for https://www.youtube.com/watch?v=ikJLYYTrIxs&t=689s.
I previously replied to the wrong comment on accident.
As University libraries have moved online, one reads many poorly scanned journal articles. I often wonder about taking the time to clean them up. What replaces temporal information here is the same characters appearing over and over.
So of course I read this article hoping to learn about an off-the-shelf tool that would do a great job of scanned text reconstruction. Alas, the best candidates were "no code available."
Math typesetting is too messy for current OCR tools. It would be nice to reverse-engineer the LaTeX source for a math paper, but not likely soon. OCR for the language would help in mind-mapping a web connecting my saved papers, but I wouldn't use it for reading.
I want everything to look like a 600dpi scan mixed down, as I would make, rather than what the libraries thought would be acceptable. For the pure joy of reading.
The easiest approach that might work would be language agnostic, understanding only what clean scans of characters look like. Can we back-solve a clean scan from a lower resolution mess, matching up similar characters in the text without identifying the characters?
Somehow I imagine this is a giant singular value problem. I'm ok if it takes a day to run per paper, I have spare machines.
You should be able to do better than just aligning and averaging frames. (Edit: looks like MauranKilom knows what they're talking about here, and expresses in their comment it clearer than I could.)
Imagine you were running averages on successive windows of a 1D array--when the average changes, that tells you the difference between the values that entered your window and the ones that just left. That's information about a sliver of data much smaller than the overall window. It's weirder with 2D and random-ish movement, but if your average (pixelation) filter is moving across text due to camera wobble or such, when the average goes up and down tells you something about where edges are in the content underneath.
I'm butchering the words because this isn't my thing, but this feels like it might be related to some actual signal-processing task (i.e. undoing some kind of signal-mangling that happens in the wild) which increases the chance that there's some good or at least well-studied solution.
The brute-force-ish approach for text reconstruction would also probably more effective if it checked against a few shifted-around blurred copies of the text, rather than just one.
Funny how the whole article talks about this approach, and then at the end shows the approach failing in the real world. I don't know about you, but I can't conclusively come up with a license plate in that final video.
Sure, but the technique used was also very trivial. Just aligning and averaging all the video frames basically leaves a mosaic-pixel-sized blur on everything (assuming the camera movement is uncorrelated with the mosaic grid).
You can get much further by applying deconvolutions and using more math. I've been meaning to put some time into this myself but never got it off the ground.
I wonder if the author would be open to making e.g. the car data available?
First challenge is going to be figuring out the grid alignment in the stabilized frames. But I have a decent idea how to tackle that, which I'll hopefully get to tomorrow!
Yes, I've also always felt there must be ways to extract more data from a moving clip, precisely because of the effect he explains, but then it seems that just superimposing the images doesn't actually extract that information, at least not all of it.
But I wonder how to actually do it, do you have concrete ideas for a simple algorithm?
If you can figure out, for each frame, which sets of (pre-aligned) pixels have been averaged, you can create a large system of equations that captures those relations and solve it to find the unblurred pixel values.
Depending on camera movement (and whether you might get "ground truth" information from pixels entering and leaving the areas near the borders) the system will be more or less well-conditioned. I'm going to try this for the data the author graciously provided and report back!
I seem to recall that there used to be a video showing this approach in action. As input it took a video panning across a shelf full of books where the resolution was so low that the titles were illegible. And as output it produced a video with higher resolution and all the titles easily readable. Unfortunately I can't find that video any longer.
Yes, it all boils down to point spread functions. In the mosaic case, the PSF varies locally (per pixel) and temporally (in different video frames). The paper you link similarly details how they figure out the PSF. You can theoretically also do the entire thing without knowing the PSF, which is called blind deconvolution: https://en.wikipedia.org/wiki/Blind_deconvolution
> I wonder if the author would be open to making e.g. the car data available?
Interesting - this is same incorrect use of e.g. that the author made in a couple of places. Contrary to (apparently popular) belief, "i.e." and "e.g." can't simply be used as direct replacements for their English equivalents.
"e.g." is used to introduce one or more examples that satisfy a previously provided general form, for example:
I prefer fruit, e.g. apples or pears, over vegetables. Apples and pears being examples of fruit, not that an example is needed in this case, but for the sake of simplicity.
In the former example, "the car data" is not an example of "making".
"i.e." follows a similar rule. If there are exceptions for either, I'd be interested to know of them.
Interesting, thanks for bringing this to my attention! Do you have any reference that explains this rule? I noticed on Wikipedia that introducing multiple examples used to actually have a different abbreviation (ee.g. or ee.gg.), so clearly something is already lost in translation here...
In any case, what I wrote is really just a shorthand for more cumbersome formulations (like "...open to making your data, e.g. the car [data], available?" - that would hopefully be correct?), and reducing text is the whole point of using an abbreviation in the first place. But I'm open to striving for more consistent usage, so if you can refer me to some kind of authority on how to mix Latin abbreviations with English text, I'd be curious about it!
Hmm. I've been wondering about the motion-deblur approach for a while, for use in cleaning up VHS / youtube quality videos. Might even be able to get a head start given that h264 contains a certain amount of inferred motion information anyway.
It seems that we could increase camera resolution by putting the sensor on a vibrating platform, capturing a stream of frames, and processing them into a single image. The paper mentions Google camera software doing this with hand tremor. Is there any instance of intentionally shaking a camera to increase resolution like this?
I predict that future super high-resolution camera rigs will be whirling contraptions, spinning in 3 dimensions to improve 3D resolution. And the best still camera will be a wand (linear sensor array) on an articulated head that moves like a chicken's head, capturing during movement. The sound of a camera will be whoosh instead of click.
OCR algorithms typically aren’t targeting heavily blurred text. It’s more about handling all the ways letters can look, including ligatures, determining paragraph breaks, detecting tables, ignoring staples and coffee stains, etc. than about correcting for bad scans/photos.
> Side note: The potentially most extensive research on the problem of programmatically unblurring mosaic'ed regions from videos was done by Japanese Adult Video enthusiasts. Javplayer automatically detects blurred regions and performs upscaling via TecoGAN, and another person spent months improving their custom GAN that was trained with leaked videos (search for "De-Mosaic JAV with AI, Deep Learning and Adversarial Networks").
"I hacked a hardware crypto wallet and recovered $2M [video]" https://news.ycombinator.com/item?id=30067340
Showing a blurry 16 out of 24 trezor wallet seed words https://youtu.be/dT9y-KQbqi4?t=1720