The site appears to be overworked for the moment, so I can't read it.
My question (which may or may not be answered in the post): what made you embark on this project?
I don't mean "because I can" stuff - it's an awesome hack. What I mean is: what made the Gameboy Camera culturally relevant for you to hack on? What ideas started this? Did you own a Gameboy Camera in your youth? Did you have a fantasy of something like this then?
> Back in 1998, the Gameboy camera got the world record as “smallest digital camera” in the Guinness book of records. An accessory you could buy was the small printer you could use to print your images. When I was 10 years old we had one of these cameras at home and used it a lot. Although we did not have the printer, taking pictures, editing them, and playing minigames was a lot of fun. Unfortunately, I could not find my old camera (no colored young Roland pictures, unfortunately), but I did buy a new one so I could test my application.
>As you can see the random noise on top of the image creates the “gradients” you see in the gameboy camera images that give the illusion of more than 4 colors.
The general approach being described is called "dithering". It usually involves more than adding random noise to the image. One common algorithm is called Floyd-Steinberg (https://en.wikipedia.org/wiki/Floyd%E2%80%93Steinberg_dither...), and it involves tracking cumulative error between neighboring pixels and adjusting color when enough error accumulates.
> A big problem trying to create color images from my own face was getting them off the gameboy camera. ... What was left was the great method of taking images of the screen.
There is another option! You can get a backup device called the "Mega Memory Card" and an EMS64M Game Boy flash cart. Back up the GB Camera's SRAM with the Mega Memory Card and restore to the EMS64M. Then you can use the flash cart's transfer the save to PC and dump with software.
I regularly use this method together with a little utility I wrote[1] to get GB Cam images onto my website. The Game Boy Camera is a cool little gadget!
Yeah, I saw the same thing with WGAN. ConvTranspose2d is not that great for upscaling because it creates artifacts. That said, the post actually recommends doing a 'subpixel' convolution for upscaling (something like, do a convolution out to channels for each pixel, then use PixelShuffle to map it back into a 3-channel image), not doing a bilinear/nearest-neighbor + Conv2d(3,3).
(A GAN would probably also deliver better colorizing results in general.)
I wonder which is better, actually. On the one hand, a lot of people will swear that what their internal pattern matcher produces is reality (although people with training tend to know that you can't put that much stock in it). So maybe the output of a computer will feel a little less real to laymen.
But I kind of doubt it. I have a feeling that, for example, it's going to be difficult to explain to a jury that the image that the computer spat out from eight pixels on a security feed is not reliable – that any resemblance they see to the defendant is simply not relevant. If an artist took those eight pixels and drew a picture, they'd be laughed out of the room. If a computer does it, people primed by shows like CSI might think that it's actually valid.
Maybe you can appeal to people's common sense, and show them the original input. A crafty defense might show alternative "enhancements" based on non-face training sets to drive the point home. But in the end, we're probably going to need to ban this kind of technology as evidence to avoid confusing jurors.
I wish they would have compared a JPEG with quality reduced to match the filesize of their NN compressed image. It'd be useful to compare what artifacts are introduced and seeing just how much better it does for the same size.
Any and every machine learning algorithm can be used for compression. Most ConvNet shit cannot be run in true real time in any way shape or form even with GPU: latencies are on the order of 500ms-1sec, so noticeable even if you have more datacenters than God, like Google does.
What would be really interesting to see in these kinds of articles is how well you could do with just a batch Photoshop operation hand-tuned on a handful of photos, and then a comparison of how much better the NN does.
No, he generated "fake" Gameboy Camera images from the full color shots, by applying a crude form of dithering to graytone. I think that's perfectly reasonable.
What I'm saying is I would like to see comparison with a fourth image produced from the first (black and white) image by a batch Photoshop operation. I would guess a suitably tuned application of "brown tint, then selective Gaussian blur, then unsharp mask" would get you pretty close.
That's pretty much what I thought. I think the concept was pretty neat, and it does a pretty job of smoothing the gradients while maintaining the detail, and I think that's probably the hardest part.
It would be hard for it to accurately get skin color, since the camera is only seeing skin, and it all gets leveled to a similar lightness. So, I'm not surprised that skin color was hard, but the report still probably should not have included the line "Note that even skincolor is accurate most of the times" -- I guess "most" could almost be considered accurate, if the majority of the samples had the same pinkish skin to start with. Almost all the results from the celebrity dataset ended up with the exact same tone, despite wildly different input tones: http://imgur.com/a/daJUa
It would be interesting to see what the real people looked like, maybe even use this data against what's generated to gauge the version that's generated.
My question (which may or may not be answered in the post): what made you embark on this project?
I don't mean "because I can" stuff - it's an awesome hack. What I mean is: what made the Gameboy Camera culturally relevant for you to hack on? What ideas started this? Did you own a Gameboy Camera in your youth? Did you have a fantasy of something like this then?