I used to roll my eyes at crime television shows, whenever they said "Enhance" f...

ACow_Adonis · on Sept 25, 2017

Except, and this is really the fundamental catch, it's not so much "enhance" as it is "project a believable substitute/interpretation".

You fundamentally can't get back information that has been destroyed/or never captured in the first place.

What you can do is fill in the gaps/information with plausible values.

I don't know whether this sounds like I'm splitting hairs, but it's really important that the general public not think we're extracting information in these procedures, we're interpolating or projecting information that is not there.

Very useful for artificially generating skins for each shoe on a shoe rack in a computer game or simulation, potentially disastrous if the general public starts to think it's applicable to security camera footage or admissible as evidence...

ZeroGravitas · on Sept 25, 2017

To give specific examples from their test data, it added stubble to people who didn't have stubble, gave them a different shape of glasses, changed the color of cats, changed the color and brand of sport shoe.

And even then, I'm a little suspicious of how close some of the images got to original without being given color information.

It appears that info was either hidden in the original in a way not apparent to humans or was implicit in their data set in some way that would make it fail on photos of people with different skin tones.

omtinez · on Sept 25, 2017

I haven't read the paper in full detail, but reading between the lines I'm guessing that there's a significant portion of manual processing and hand waving involved. From the abstract, emphasis mine:

> the second stage uses a pixel-wise nearest neighbor method to map the smoothed output to multiple high-quality, high-frequency outputs in a controllable manner.

My interpretation is that they select training data by hand and generate a bunch of outputs. Repeating the process until they like the final result. From the paper:

> we allow a user to have an arbitrarily-fine level of control through on-the-fly editing of the exemplar set (E.g., “resynthesize an image using the eye from this image and the nose from that one”).

WhitneyLand · on Sept 25, 2017

There's nothing weak or negative about that, it's exactly what'd you expect. Obviously for a given input there will be multiple plausible outputs. With any such system it would make sense to allow some control in choosing among the outputs.

IshKebab · on Sept 25, 2017

Could be pretty great for police sketch artists. (Although pretty misleading for juries too.)

adrianN · on Sept 25, 2017

Just train the model with the suspect's Facebook photostream and presto you have convincing evidence.

chaosite · on Sept 26, 2017

Sounds similar to the problems with JBIG2 lossy compression.

https://en.wikipedia.org/wiki/JBIG2#Disadvantages

sweezyjeezy · on Sept 25, 2017

> Except, and this is really the fundamental catch, it's not so much "enhance" as it is "project a believable substitute/interpretation".

I would argue that this is a form of enhancement though, and in some cases will be enough to completely reconstruct the original image. For example, if I give you a scanned PDF, and you know for a fact that it was size 12 black Ariel text on a white background, this can feasibly let you reconstruct the original image perfectly. The 'prior' that has been encoded by the model from the large amount of other images increases the mutual information between grainy image and high-res. The catch is that uncertainty cannot be removed entirely, and you need to know that the target image comes from roughly the same distribution as the training set. But knowing this gives you information that is not encoded in the pixels themselves, so you can't necessarily argue that some enhancement is impossible. For example with celebrity images, if the model is able to figure out who is in the picture, this massively decreases the set of plausible outputs.

LeifCarrotson · on Sept 25, 2017

Or it might not! This reminds me of the Xerox bug from a couple years ago, that turned one number into another.

http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_...

Enhancing some things incorrectly would be worse than leaving them ambiguous.

trevyn · on Sept 25, 2017

> The catch is that you need to know that the target image comes from roughly the same distribution as the training set.

When humans think about "enhance", they imagine extracting subtle details that were not obvious from the original, which implies that they know very little about what distribution the original image comes from. If they did, they wouldn't have a need for "enhance" 99% of the time -- the remaining 1% is for artistic purposes, which this is indeed suited for.

It'll be interesting to see how society copes with the removal of the "photographs = evidence" prior.

> when enhancing celebrity images, if the model is able to figure out who is in the picture this massively decreases the set of plausible outputs.

This is an excellent insight.

ZenPsycho · on Sept 26, 2017

Do you think knowing which state the license plate is from is enough prior knowledge?

yorwba · on Sept 25, 2017

See also https://dheera.net/projects/blur

usrusr · on Sept 25, 2017

Yeah, replace the training set with cartoon characters and the crime show dialog goes like this:

"Zoom! Enhance! Zoom! Enhance! Enhance! Oh my god it's full of Smurfs..."

7952 · on Sept 25, 2017

The benefit depends on how predictable the phenomenon is that your are interpolating from. Sometimes it will be quantitatively better than a low resolution version, sometimes not.

A good example is with compression algorithms for media. They work because the sound or image is predictable. And they are ineffective when the input becomes more unpredictable. But if the output is all you have then running the decompression will probably be better than just reading the raw compressed data. But you have to be aware of the limitations.

matt4077 · on Sept 25, 2017

> You fundamentally can't get back information that has been destroyed/or never captured in the first place.

I love this cliché. I've seen it thousands of times, and probably written it myself a few times. We all repeat stuff like that ad nauseam, without ever thinking.

Because it's fundamentally flawed, especially in the context that it has usually been applied to, namely criticising the CSI:XYZ trope of "enhancing images".

The truth is that there is a lot more information in a low-res image than meets the eye.

Even if you can't read the letters on a license plate, it can be recovered by an algorithm. If the Empire State Building is in the background, it's likely to be a US license plate. Maybe only some letters would result in the photo's low-res pattern. If you only see part of a letter, knowing the font may allow you to rule out many letters or numbers etc...

It's similar to that guy who used Photoshop's swirl effect to hide his face, not knowing that the effect is deterministic, and can easily be undone.

The error mostly appears to be in assuming that the information has been destroyed, when in reality it's often just obscured. And Neural Nets are excellent in squeezing all the information out noisy data.

amelius · on Sept 25, 2017

> It's similar to that guy who used Photoshop's swirl effect to hide his face, not knowing that the effect is deterministic, and can easily be undone.

The effect does not only need to be deterministic, but also invertible.

A low-res image has multiple "inverses" (yikes), supposedly each with an associated probability (if you would model it that way). So it would be more honest if the algorithm shows them all.

tentaTherapist · on Sept 25, 2017

Showing them all seems a bit impossible because the number would blow up really quickly, wouldn't it? Maybe it could categorise them, but that could be misleading, too... I don't know.

pedrosorio · on Sept 25, 2017

It's what they call an https://en.wikipedia.org/wiki/Inverse_problem

tentaTherapist · on Sept 25, 2017

That is a very well-named problem.

asfdsfggtfd · on Sept 25, 2017

>> You fundamentally can't get back information that has been destroyed/or never captured in the first place.

> I love this cliché. I've seen it thousands of times, and probably written it myself a few times. We all repeat stuff like that ad nauseam, without ever thinking.

It is not a cliche it is an absolute truth. Information not present cannot be retrieved. There may be more information present than is immediately obvious.

> Neural Nets are excellent in squeezing all the information out noisy data

Maybe but they are also good at overfitting onto noisy data (the original article is an example of such overfitting).

rootw0rm · on Sept 25, 2017

It's not cliché, it's true. You fundamentally can't get back information that has been destroyed/or never captured in the first place.

Yes, a low-res image has lots of information. You can process that information in many ways. Missing data can't just be magically blinked into existence though.

Copy/pasting bits of guessed data is NOT getting back information that has been destroyed or never captured. Obscured data is very different from non-existent data. Could the software recreate a destroyed painting of mine based on a simple sketch? Of course not, because it would have to invent details it knows nothing about.

I think it's almost dangerous to call this line of thinking cliché. It should be celebrated, not ridiculed.

scarface74 · on Sept 25, 2017

What you can do though, in limited circumstances, is create a still picture with more detail from a lower quality video.

https://photo.stackexchange.com/questions/17098/csi-image-re...

murkle · on Sept 25, 2017

It's a well-known technique in astronomy, eg https://www.aanda.org/articles/aa/ps/2005/22/aa2320-04.ps.gz

mcbits · on Sept 25, 2017

For anyone put off by the .ps.gz, it's actually just a normal web page that links to the full article in HTML and PDF. Not sure what they were thinking with that URL. I almost didn't bother to look. (Maybe that's what they were thinking?)

dahart · on Sept 25, 2017

I seem to remember from my computer vision class way back when that there's a fundamental theoretical limit to the amount of detail you can get out of a moving sequence. Recovering frequencies a little higher than the pixel sampling is definitely possible, but I feel like it was maybe something like 10x theoretical maximum. I also get the feeling, from looking around at available software, that in practice achieving 2-3x is the most you can get in ideal conditions, and most video is far from ideal.

k__ · on Sept 25, 2017

On the other hand, this is what the brain does all the time.

eternalban · on Sept 25, 2017

Wouldn't it be ironic if a mystified and superstitious GAI emerges out of all these efforts.

throwaway613834 · on Sept 25, 2017

> I don't know whether this sounds like I'm splitting hairs

Somewhat no, but somewhat yes. Thing is, while there can be lots of input images that generate the same output, it could be that only one (or a handful) of them would occur in reality. If this happens to sometimes be the case, and if you could somehow guarantee this was the case in some particular scenario, it could very well make sense to admit it as evidence. Of course, the issue is that figuring this out may not be possible...

jonathanstrange · on Sept 25, 2017

The white shoe output vs black shoe output illustrates this fairly well.

WhitneyLand · on Sept 25, 2017

>we're interpolating or projecting information that is not there

But that's not fully accurate either. Sometimes the information in total will really be a more accurate representation of reality than the blurred image. Maybe it could be described as an educated guess, sometimes wrong, sometimes invaluable.

It would be interesting to see the results starting with higher quality images. With the camera quality increasing, many times there should be more data to start with.

A

phkahler · on Sept 25, 2017

>> Maybe it could be described as an educated guess, sometimes wrong, sometimes invaluable.

When is a guess invaluable?

WhitneyLand · on Sept 25, 2017

When it identifies an established terrorist and prevents a mass casualty event.

teekert · on Sept 26, 2017

Exactly, this may be possible: [0] but only of the NN has seen such images before, the output will match the training data but says nothing about reality.

[0] https://i.pinimg.com/originals/b5/29/1b/b5291bba7250abd12010...

gus_massa · on Sept 25, 2017

In comparison of output vs original it is clear that the skin color is not accurate.

kevin_thibedeau · on Sept 25, 2017

"Ladies and gentlemen of the jury, we will definitively prove that the black smudge captured on camera was in fact a gun"

Already being done today with DNA.

jopsen · on Sept 25, 2017

Sometimes US justice system seems very "approximate". So why not convict people based on interpolated evidence?

- I'm joking of course :) hehe

dahart · on Sept 25, 2017

Sadly, it actually happens sometimes.

https://www.wired.com/2017/04/courts-using-ai-sentence-crimi...

This thread a year ago worried about it too, but the paper itself seems implausible and problematic.

https://news.ycombinator.com/item?id=12983827

WillReplyfFood · on Sept 25, 2017

You seem like a funny guy- the batmaNN interpolates that you are very likely a joker.

gambiting · on Sept 25, 2017

No, but think of these blurred images as a "hash" - in an ideal situation, you only have one value that encodes to a certain hash value, right? So If you are given a hash X you technically can work out that it was derived from value Y - you're not getting back information that was lost - in a way it was merely encoded into the blurred image, and it should be possible to produce a real image which, when blurred, will match what you have.

Don't get me wrong, I think we're still far far far off situation where we can get those reliably, but I can see how you could get the actual face out of a blurred image.

ComputerGuru · on Sept 25, 2017

> you only have one value that encodes to a certain hash value, right?

Errr wrong. A perfect hash, yes. But they're never perfect. You have a collision domain and you hope that you don't have enough inputs to trigger a birthday paradox.

Look at the pictures on the article. It's an outline of the shoe. That's your hash. ANY shoe with that general outline resolves to that same hash.

If your input is objects found in the Oxford English Dictionary, you'll have low collisions. An elephant doesn't hash to that outline. But if your inputs is the Kohl's catalog, you'll have an unacceptable collision rate.

Hashes are attempts at creating a _truncated_ "unique" representation of an input. They throw away data they hope isn't necessary to uniquely identify between possible inputs (bits). A perfect hash for all possible 32 bit values is 32 bits. You can't even have a collision free 31 bit hash.

So back to the blurry security camera footage of a license plate or a face. Sure, that "hash" can reliably tell you that it wasn't a sasquatch that committed the robbery, but it literally doesn't contain the data necessary to _ever_ prove it was the suspect in question, even if the techs _can_ prove that the suspect hashes to the image in the footage.

chrismorgan · on Sept 25, 2017

FYI (not because it’s particularly relevant to the sort of hashing that is being talked about, but because it’s a useful piece of info that might interest people, and corrects what I think is a misunderstanding in the parent comment): perfect hash functions are a thing, and are useful: https://en.wikipedia.org/wiki/Perfect_hash_function. So long as you’re dealing with a known, finite set of values, you can craft a useful perfect hash function. As an example of how this can be useful, there’s a set of crates in Rust that make it easy to generate efficient string lookup tables using the magic of perfect hash functions: https://github.com/sfackler/rust-phf#phf_macros. (A regular hash map for such a thing would be substantially less efficient.)

Crafting a perfect hash function with keys being the set of words from the OED is perfectly reasonable. It’ll take a short while to produce it, but it’ll work just fine. (rust-phf says that it “can generate a 100,000 entry map in roughly .4 seconds when compiling with optimizations”, and the OED word count is in the hundreds of thousands.)

ComputerGuru · on Sept 25, 2017

Yeah, I debated bringing it up but since we were in the context of not knowing set members ahead of time, decided not to.

Thanks for the rust-phf link. I'm bookmarking for my next project!

jaclaz · on Sept 25, 2017

>So back to the blurry security camera footage of a license plate or a face. Sure, that "hash" can reliably tell you that it wasn't a sasquatch that committed the robbery, but it literally doesn't contain the data necessary to _ever_ prove it was the suspect in question, even if the techs _can_ prove that the suspect hashes to the image in the footage.

For a face, sure, for printed text/license plates there are effective deblurring algorithms that in some cases may rebuild a readable image.

A (IMHO good) software is this one (was freeware, now it is Commercial, this is the last freeware version):

https://github.com/Y-Vladimir/SmartDeblur/downloads

You can try it (just for the fun of it) on these two images:

https://articles.forensicfocus.com/2014/10/08/can-you-get-th...

https://forensicfocus.files.wordpress.com/2014/09/out-of-foc...

https://forensicfocus.files.wordpress.com/2014/09/moving-car...

For the first choose "Out of Focus Blur" and play with the values, you should get a decent image at roughly Radius 8, Smooth 40%, Correction Strength 0%, Edge Feather 10%

For the second choose "motion Blur" and play with the values, you should get a decent image at roughly Length 14, Angle 34, Smooth 50%,

consp · on Sept 25, 2017

Fortunately there is a limit: the universe (in a practical sense). You cannot encode all states it has in a hash as it would require more states than you want to encode as you already mentioned (pigeon hole). But representing macroscopic data like text (or basically anything bigger than atomic scale) uniquely can be done with 128+ bits. Double that and you are likely safe for collisions, assuming the method you use is uniform and not biased to some input.

If you want ease collision examples you can take a look at people using CRC32 as hashes/digests. It is notoriously prone to collisions (since only 32 bits).

IncRnd · on Sept 25, 2017

That won't work. A lot of people have tried to create systems that they claim always compress movies or files or something else. Yet, none of those systems ever come to market. They get backers to give them cash, then they disappear. The reason they don't come to market is that they don't exist. Look up the pigeon-hole principle. It's the very first principle of data compression.

You can't compress a file by repeatedly storing a series of hashes, then hashes of those hashes, down into smaller and smaller representations. The reason that you cannot do this is that you cannot create a lossless file smaller than the original entropy. If you could happen to do so, however, you would get down to ever smaller files, until you had one byte left. But, you could never decompress such a file, because there is no single correct interpretation of such a decompression. In other words, your decompression is not the original file.

ACow_Adonis · on Sept 25, 2017

Without getting too technical because I hate typing on a phone, you're technically right in the sense of a theoretical hash.

But in real life there's collisions.

And in real life image or sound compression, blurs, artifacts and resolutions, it is fundamentally destroying information in practice. It is no longer the comparatively difficult but theoretically possible task of reversing a perfect hash, but more like mapping a name to the characters/bucket RXXHXXXX where x could be anything.

There are lots of values we can replace X with which are plausible, but without an outside source of information, we can't know what the real values in the original name was.

dispo001 · on Sept 25, 2017

Out of sheer curiosity I had a go at manually enhancing the Roundhay Garden Scene by dramatically enlarging the frames, stacking them, aligning them, erasing the most blurred ones and the obvious artifacts.

It went from this:

https://media.giphy.com/media/pUf3YfamV7BV6/giphy.gif

To this:

http://img.go-here.nl/Roundhay_Garden_Scene.gif

The funniest part was that the resolution really goes up if you make 1 px into 40 and align the frames accurately (then adjust opacity to the level of blur)

The crime television thing would be possible if you have enough frames of the gangster.

thaw13579 · on Sept 25, 2017

Approaches like these are hallucinating the high resolution images though--not something that we'd ever want being used for police work. That said, I wonder if it would perform better than eyewitness testimony...

smallnamespace · on Sept 25, 2017

> hallucinating the high resolution images though

To play devil's advocate though, modern neuroscience and neuropsychology basically tells us that that our brains reconstruct and recreate our memories every time we try to remember them. Our memories are highly malleable and prone to false implantation... and yet witness testimony is still the gold standard in courts.

gvx · on Sept 25, 2017

And experts have been calling for a long time to at least limit the power of witness testimony, precisely for those reasons.

smelterdemon · on Sept 25, 2017

I wouldn't want to see it used as evidence in court (and I doubt it would be allowed anyway but IANAL) but I could see this being a useful in certain circumstances for generating the photo-realistic equivalent of a police sketch e.g. if you had low-res security footage of a suspect and an eyewitness to guide the output.

netsharc · on Sept 25, 2017

It would be useful to reduce the number of suspects... calculate possible combinations, match them against the mugshots database and investigate/interrogate those people. Or if you're the NSA/KGB, you can match against the social media pictures database, and then ask the social media company to tell you where these users were at the time of the crime (since the social media app on the phone track their users' location...)

xyzzy_plugh · on Sept 25, 2017

You could e.g. ostensibly produce valid license plates, which could be further reduced by matching the car color and model, to produce a small set of calid records.

gambiting · on Sept 25, 2017

Sure, but if we go by how the police works now, they will take a plate produced by the computer as 100% given and arrest/shoot the owner of that plate because "computer said so".

IncRnd · on Sept 25, 2017

Such an algorithm would likely get the state wrong. This is error prone and fraught with real world difficulties that could get people shot.

asfdsfggtfd · on Sept 25, 2017

You could also just pick a random license plate. It would be just as accurate.

oever · on Sept 25, 2017

This image from the article shows that the original image and the fantasy image are not alike at all. The faces look to have different ages. The computer even fantasized a beauty mark.

http://www.cs.cmu.edu/~aayushb/pixelNN/freq_analysis.png

The computer is fantasizing.

O1111OOO · on Sept 25, 2017

> This image from the article shows that the original image and the fantasy image are not alike at all.

This is another avenue that could be further explored, which I quite like. That is, a non-artist can doodle images and create a completely new photo-realistic image based on the line drawings.

I was modifying a few images (from link on another comment here: https://affinelayer.com/pixsrv/ ) and the end results were interesting.

c12 · on Sept 25, 2017

The low resolution to high resolution image synthesis reminds me of the unblur tool that Adobe demoed during Adobe MAX in 2011. Here is the relevant clip if you're interested https://www.youtube.com/watch?v=xxjiQoTp864

ajnin · on Sept 26, 2017

That demo was quite impressive, but the technique is completely different. Adobe uses deconvolution to recover information and details that are actually in the picture, but not visible (unintuitively blurring is a mathematically reversible transformation. If you know the characteristics of the blur, then you can reverse it. In fact most of Adobe demo's magic comes from knowing the blur kernel and path in advance, not sure how it works in practice for real photos). But the Neural net demoed in this post just "makes up" the missing info using examples from photos it learned from, there is no information recovery.

seanmcdirmid · on Sept 25, 2017

You'll get something that looks plausible for sure, maybe not what was originally there though. In the future, someone will be falsely convicted of a crime because a DNN enhance decided to put their picture in some fuzzy context.

jlebrech · on Sept 25, 2017

It can give possible matches, i don't think it would be admissible in court. they could still trick a confession out of someone using that image.

ZeroGravitas · on Sept 25, 2017

You don't specify, but presumably you mean a true confession.

It could also be used to generate a false confession. If the prosecutor says "We have proof you were there at the scene" and shows you some generated image, then you as an innocent person have to weigh the chances of the jury being fooled by the image (and even if it's not admissable in court, it may be enough to convince the investiging team that you are responsible and stop looking for the real perpetrator) and the expected sentences if you maintain your innocence vs "admitting" your guilt.

KGIII · on Sept 25, 2017

It could also narrow down the list of suspects. From there, additional investigation can find more evidence. Having access to big data can help this.

jlebrech · on Sept 25, 2017

true, it cannot be used to "nail" a perp tho, just to help gain extra evidence.

KGIII · on Sept 25, 2017

Yup. In a court of law, the value as evidence is going to be weighted fairly low, even with expert testimony. It may be enough to get a warrant, or a piece in the process of deduction during the investigation phase.

api · on Sept 25, 2017

It's still impossible. These algorithms find in gaps with their biases, not reality. If information is not there it is not there.

mathw · on Sept 25, 2017

Yes!

Although what we don't have is any certainty that the enhanced face actually looks like the killer.