The most interesting exhibit for me is "People watching a television set for the first time", where everything is colorized except the TV image, which correctly remains B&W. I wonder what kind of a training set provided the neural network with this notion.
Author here- So I'll just be brutally honest on that one- not all renders are doing that. I cherry picked the one that did that because yeah, it's amazing. There's a simple explanation for why it sometimes doesn't pick up on the guy on tv to color it- The source material is fuzzy and small.
I wish I could claim it was something more awesome than that but that's the truth! I'm treating these outputs as an art of selection to a certain extent because it's simply not 100% consistent yet. That's one of the things I'm going to continue to try to improve upon.
Shooting from the hip here, but I'd much rather you optimized for speed and allowed selection from a rotating palette of local maxima. I noticed your sadness about the limitations on the picture of the indian woman leaning against a tree whose hand came out red, presumably because of vignetting or some chemical inconsistency in the film substrate. But that superposition of possible interpretations on noisy data is something that shouldn't be thrown away - it's the same 'error' that makes optical illusions interesting when they seem to flip back and forth between being a vase and 2 faces or a duck vs. a rabbit. The model is doing such a great job that trying to push it too far in any one direction risks overfitting.
What I'd love to see in the future are compound networks where a few nodes like this can be mixed with a few nodes that extract vector data, a few others that infer depth maps from images, modulated by similarity detectors that match objects and individuals.
I'm very impressed by the work you've already done - I have a huge library of images I'd like to run it against for both forensic and aesthetic purposes.
> I noticed your sadness about the limitations on the picture of the indian woman leaning against a tree whose hand came out red,
I think the biggest problem is that picture is not the hand (its very visible and it could be easily fixed in post processing), it's the blue shade in the clothes that just should not be there. Otherwise, the colors are great (skin and all look very real).
Are we discounting the possibility of the NN calculations resolving to show her hand as it really existed? Tonally, it may have been differentiated from the general population in such a way the algorithm amplified this difference?
If you had like to brutely honest, you should put randomly selected set, along with the hand-picked set - labeling each set how it was selected. This is a cancer in current deep learning research. You see paper with such a glowing cool examples but in reality they are just hiding all problematic cases while being fully aware of it. If this happened anywhere else in any other domain people would say they got ripped off and outright lied to.
I understand the frustration and in fact share it to a certain extent with science in general. Keep in mind that this wasn't intended to be published as a paper or anything like that. I'm just a software engineer who picked a problem and found a pretty cool solution.
Primarily I thought it was cool because it should be useful in many other image modification domains. And then it blew up in popularity today (didn't expect that). But yeah in the notes in the readme at github I do say this:
>To expand on the above- Getting the best images really boils down to the art of selection.
I added that after getting some feedback similar to yours, because before that, this disclaimer wasn't quite cutting it apparently:
>You'll have to play around with the size of the image a bit to get the best result output.
So yeah I'm trying to stay honest here. I'm not going as far as picking completely random samples, admittedly, but really what I'm trying to drive at here is you can produce cool results with this tool. It's not perfect, but it's a tool. And even if you pick at random, they still look pretty damn good. Just sometimes it renders the tv as color and sometimes it doesn't, and i picked the cool option.
Yeah I tried pointing that out in the Known Issues section by alluding to adjusting the size parameter as a means to get the best images. But I think I'll just go ahead and be crystal clear on the "art of selection" part so that this doesn't come across as snake oil.
That seems like it would be quite difficult. If it is colorizing a black and white image wouldn't it colorize the black and white image on the TV screen. You would almost have to train it to recognize old TV's that produce black and white images, so that it wouldn't colorize the TV screen. Unless you can get a unique signature from a black and white photo of a black and white screen. Fun stuff.
(I'm a moderator on HN.) Your account was being rate limited by our software. I'm sorry! We've marked it legit so this won't happen again. Please participate as much as you like.
There's some haze on the original image. I think it simples ignores the hazy portion of an image or attempts a very light colorization which seems to be the case here.
I've to admit I have no clue about machine learning, but what I notice is that this seems to have preferred colors for things that can actually have many different colors, most notably clothes. They're almost always this blueish slightly purple color here, even the samurai. Don't get me wrong, this is still awesome and I might try this on some old photos from my grandparents. I'm just wondering if and how one can prevent these things from picking this one ideal color for something and instead have it randomize a bit, since obviously you can't really know what color some jacket really was. (except maybe if the picture is a black and white photo of a PAL TV program.)
> since obviously you can't really know what color some jacket really was.
That’s why colorizing companies employ historians and researchers. You can have a pretty accurate idea of this color with enough research, but it takes time (and thus money).
That might work when the job is colorizing Hollywood productions, but for documentary photos, it's not going to be possible in most cases. You just won't have any leads at all about the origin of the garment, apart from whatever you can observe of its style. An expert can certainly suggest a few colours that don't look anachronistic, but that doesn't make the end result historically "accurate", just plausible/convincing.
The Seneca Native in 1908 example seems the most absurd to me. I know the software has no notion of a "fabric" or "clothing" but it's very rare for brown or beige things to fade to blue (or vise versa). In real life things when transition from brown/beige to another color that other color tends to be a red orange or yellow. I know from the known issue that it likes blue but it still seems very odd that it chose to fade from brown to blue like that.
To me the Seneca native's skin on the hand seems a bit to reddish. I find these photos to have very high saturation. I think this could be adjusted and get subtler effects. It's still amazing that this is possible with no human intervention but at the same time, from a different perspective, I find that the originals have their own charm that I would leave it like that.
Is it unthinkable that the seneca girl actually had her hand painted red, for decoration or as a symbol of something? Perhaps her father/brother etc was a fighter and this was a way to keep spirits up while he was in the war?
Very interesting... seems to basically learn that:
Faces -> some variety of flesh-colored from light to dark
Fabric/clothing -> blue
Sky -> blue
Vegetation -> green
Wood -> brown
Blank -> turquoise or tan
Small details -> fascinating variety of colors, but often a brilliant red
Which all seems fairly reasonable. For many things (like wood or skin) it seems accurate.
Obviously things like clothes come in such a variety of colors that there's simply no way at all to predict accurately, zero meaningful signal -- so if it settles on whatever the most common color is, it doesn't surprise me that would be blue.
Author here. Yeah you're basically right. GANs vastly improve the situation though because being safe with "green for grass, blue for sky, brown as default" doesn't work in the Generative-Adversarial setting. The critic will assign lower scores if the generator keeps outputting brown. Now I'd think the generator would get more creative than going for blue constantly but that might just be a matter or more/better training (...?)
Like other colorisers I've seen, it does seem to have non-uniform colouring of clothing, with bleed from surrounding areas into clothing being fairly common. That's seems slightly weird, as I am sure few training images would have that.
I know this is HN and we always hope machines will help us anywhere but i suspect (and hope too) that the human perspective will always be needed. Photography is as subjective as anything can be.
When I saw "restoring" in the title I was expecting higher resolution. For example seeing in modern photo level detail eyelashes, wrinkles, etc. I get that, like the colors, this would require the adding lots of made up information about scene and feature details but IMO it would blur the lines between restoration and reconstruction/storytelling in a really awesome way. Old photos are cool in their own way but their lack of detail makes them seem so alien. Would be exciting to get a hyper real reconstruction.
Are there examples of ML doing something like that? (also know little about ML)
Author here: It's early, and currently resolution is limited primarily by model size. Which drives me nuts. It's one of my top priorities to address because that would be a great improvement. Adding super-resolution to the pipeline should also be pretty easy but I want to at least output a reasonable base resolution on the photos first before I go that route.
Oh yeah to answer your question- super resolution does indeed make up details as you describe there and arguably does blur the line with restoration/story telling. But so does colorization- not all the colors added by the model are going to be what was actually going on there, of course.
There reminds me of a recent 99% Invisible episode [1] in which they discuss the same topic in the world of dinosaurs. It details how dinosaurs used to be depicted with the goal of only showing the things that we are confident in being true (although what we are confident in obviously changes over time). This results in mostly just greenish-brown skin draped over a muscle structure attached to the fossilized skeletons.
In recent decades there has been a push to show the animals more realistically. The fossilized evidence is studied and compared to the skeletal structure of animals that exist today. Inferences and educated guesses are made from there to project a more realistic but more subjective image of the dinosaurs. We now get much more varied and interesting depictions with feathers, bright coloring, fat deposits, and other features that can neither be completely confirmed or ruled out based on the evidence.
Hah yeah I just listened to that a couple days ago but didn't make the connection. Probably was rattling around my subconscious when I wrote this question because yeah it is very similar. That's a particularly interesting comparison too because the whole point was that just filling in conservatively based on experience misses a ton a real-world crazy and interesting diversity. The best example was how if we were imagining what elephants looked like just based on their fossilized skeletons, they wouldn't have trunks!
One of the reasons why these photos look so convincingly realistic is the same reason https://en.wikipedia.org/wiki/Chroma_subsampling is done --- the human eye has less sensitivity to colour resolution, and so even relatively vague blobs of colour can evoke the right perception as long as there is sufficient luma detail (provided by the original monochrome image); but if you inspect the photos closely, you'll see there are plenty of unnatural gradients in clothes and such, and the colours of objects blend into each other.
This was my thought too - it may not matter if the colors are 100% accurate as long they are enough to trick the human eye and brain into filling in what’s missing. Besides, the reality is, these are not color source photos and will never be. A black and white photo does not contain the color information, it was never captured. All we really can do is use historically accurate colors and afaik, that is the same thing professionally recolorists do as well.
This seems almost too good to be true. One thing I find very striking is how it gets skin tones very plausible across people of different ethnicities (though the majority of subjects in the picture appear of european descent).
Unless a) my brain is applying more interpretation to these pictures than I realize or b) the author (intentionally or not) picked out pictures that show the best results
One thing I find very striking is how it gets skin tones very plausible across people of different ethnicities (though the majority of subjects in the picture appear of european descent).
Look at the Chinese Opium Smokers in 1880. They appear slightly too caucasian-coloured to me.
Yeah some of the details are absurdly good, especially the picture of the "Texas Woman", how it gets the dogs ears perfect, perfect colors on the apples, and renders the copper pot a perfect copper hue.
His face is arguably too red. But on average it’s fine. (Amusing: is this comment correct, or unconsciously biased by the lack of knowledge of what native Americans actually look like? I admit the latter is possible.)
Humans interpret colors thanks to context. When you strip away context, it’s easy to come up with things that fool you. (Optical illusions are the limit case of this.)
It's interesting to see how the algorithm seems to turn aerial photos into romantic paintings. My guess is that the model was trained on mostly up-close photos and that the colors don't map exactly to aerial photos because color intensity fades over large distances.
On that note it is cool to see how the algorithm does work for both indoor and outdoor photos. Indoor settings tend to have dark backgrounds and outdoor settings have light backgrounds.
Colorizing single images will always been a bespoke task. There are just too much missing data in the image to be able to create high quality colorizations from the photo alone.
However, I think the real application here is colorizing frames of movies. Imagine being able to turn black and white historical footage into color. It won't be as good looking as a single image, but it would be good enough i bet.
As amazingly plausible the pictures look, I personally have some dislike towards such applications (nothing against the author of course, just about ML in general) because I always feel a bit as if I'm being duped by the neural net. When I see image restoration, I'm subconsciously expecting historical fidelity even if I'm just marveling at the nice colorization. But of course such historical accuracy is not the primary goal of the GAN.
Maybe another cool avenue to explore would be combining models like this with some NLP approach that parses a historian's rough description of how the scene should be colored and biases the generator with prior information that way. (Maybe related to visual question answering or something.)
15 years ago I was in the first cohort of a brand new college program in Digital Imaging Technology. I spent the cost of a college diploma and over 10,000 hours learning to do this by hand. Now it's AI on Github for all. The Times They Are A Changin'
This is one of the few colorizing algorithms that I've seen which creates desirable output. The images really do look like old colorized images. I wonder how the authors dealt with the differences in spectral sensitivity of their source material. There's clearly some orthochromatic plates or film being used. The image of the Seneca native 1908 is a good example. Notice how dark the field is on the patch on her skirt. With orthochromatic emulsions, the patch could have been either black or red since the emulsion isn't sensitive to red. It's most sensitive to blue, which is part of the reason skies look so white in old photos.
Author here. Easy to answer that one- altering the training photos with random lighting/contrast changes (yet keeping the color targets the same) really helped to deal with varying qualities of photos. But also neural networks are just particular good at picking up on context, so that has a lot to do with why the results are so robust.
The colorized photos on https://www.reddit.com/r/Colorization/ are just marvelous. If that could be combined with the AI colorization to colorize old BW movies, that would make them so much more watchable. Other attempts at colorizing them, like what Turner did in the 80's, were a commendable attempt but didn't turn out well.
Well it has to pick something right? Am I wrong in thinking the color, except for very specific known items, is simply lost and can't be inferred by any level intelligence? Maybe the solution is "If I_HAVE_NO_IDEA -> randomColor()" which I realize doesn't jibe with how ML works (does it?)
> And yes, I'm definitely interested in doing video
As someone familiar with the libraries space, I'd actually be very interested in seeing a machine learning model that could deal with "cleaning up" old film (I've actually brought this up w/ several of my ML friends occasionally). One of the biggest challenges in the world of media preservation is migrating analogue content to digital media before physical deterioration kicks in. Oftentimes, libraries aren't able to migrate content quickly enough, and you end up with frames that have been partially eaten away by mold.
As a heads-up, these are some of the problems you might encounter on the film front (which you might not otherwise find with photos due to differences in materials used, etc):
I believe that Peter Jackson's recent endeavour in cleaning up WW1 footage employs significant ML for de-noising, frame interpolation, and colorising. I haven't seen the final film, but some of the clips are staggeringly good: https://www.bbc.com/news/av/entertainment-arts-45884501/pete...
I'm actually not sure much ML was involved here - depends where you draw the line I guess, but denoising and interpolation for restoration typically use more traditional wavelet and optical flow algorithms. The work for this was done by Park Road Post and StereoD, which are established post-production facilities using fairly off-the-shelf image processing software. The colorisation likely leant heavily on manual rotoscoping, in the same way that post-conversion to stereo 3D does.
I'd love to hear otherwise but I'm not aware of any commercial "machine learning" for post-production aside from the Nvidia Optix denoiser and one early beta of an image segmentation plugin.
Huh, I recall seeing an article at one point (can't find the link) where it said or suggested that ML was involved. Of course this could have just been a journalist failing to make the distinction; I've seen everything from linear regression on up naively lumped into the ML bucket.
In any case the results are damned impressive -- can't say I've seen anything like it before.
The pictures were basically perfect to myeyes, until I scrolled down to the "gotchas" section, at which point I started to notice a lot of details that are wrong, mostly fading colors, on clothes or otherwise.
Now, there seems to be a distinct loss of details in the restored images. The network being resolution-limited, is the black-and-white image displayed at full resolution besides the restored one?
What I would like to see is the output of the network to be treated as chrominance only.
Take the YUV transform of both the input and output images, scale back the UV matrix of the restored one to match the input, and replace the original channels. I'd be really curious to look at the output (and would do it myself if I was not on asmartphone)!
Nevertheless, that's some awesome work, and I can't wait to see where it goes!
Author here. That's actually what I find quite fascinating myself about the results- that they look almost perfect at first glance, yet you drill down a bit closer and you see another "zombie hand". The resolution issue you mention is definitely something I'm painfully aware of- it just comes down to lack of memory resources to support bigger renderings. That's going to be something I'm going to try to attack next.
However, I feel like you glossed over the proposed workaround, which I feel is appropriate (though more complicated if you want to implement"defade"), and extremely easy to implement.
I took a couple minutes to write an octave script that implement the workaround [1], it would have been even easier if both images had already been distinct files, and perfectly aligned.
The basic idea here is the same as the one behind the YUV transform: our brains are much less sensitive to the chroma channels than the luma channel. So I separate those, and keep the original luma channel, while I use the reconstructed chroma, which is lower-resolution.
Judge the results by yourself, but it seems to me that the end results are a whole lot better: https://imgur.com/a/n2sBYCi
And it could still be improved a lot more (by using the original high-resolution image, and avoiding to hand-align the images).
Edit: also, ironically, the Indigo dye (thus blue clothes) didn't become common before the 1900s [2], so the bias might produce historically-inaccurate images!
Although I would have made it a fully-fledged github issue, with a link in your board, instead of a text entry, to add supplementary material in the issue thread.
Bonus: if you are only interested in chrominance, you can train your network to use YUV as an input instead, and output only UV. I suspect this might lead to substantial gains in the training time and network complexity.
Update: I got this working, and dude- it's so awesome in every way. This is the most substantial improvement I've seen yet. Most importantly- it massively reduces memory requirements. Thank you so much. I'll commit within a day or so and make sure to mention you, on Twitter.
Hey, thank you a lot, that's awesome!
One more thing I recently thought about, but didn't get around to mention, is that you can probably reduce the input of your net to the Y (luminance) channel (with UV-only output), to trim it further ;)
But that might already be what you are doing, for all I know. I am just really glad I could be of any help! And this feels like an "free-lunch" improvement.
Yeah the more I churn over this idea in my head the more excited I get about it. This really sounds like a big win.
I'm not sure what I want to do about the Kanban board versus issues tracker yet... I'm used to JIRA mostly. I'll figure it out but do know your contribution is very very much appreciated. I don't think I would have come up with that.
I don't know much about ML, but would it be possible to use some kind of attention model to iteratively construct the final colouring? The memory limit of the GPU would then limit the attention region size, but not the maximum image size. Talkin' outta my rear here, though.
I was actually thinking along the same lines because yeah...if you could break this problem down into smaller pieces, it would probably be the most effective way to reduce memory requirements. But I do think that's easier said than done. This is where I think I'll have to rely on Ian Goodfellow and others to come up with another something brilliant for me to stick in the code lol
>> BEEFY Graphics card. I'd really like to have more memory than the 11 GB in my GeForce 1080TI (11GB). You'll have a tough time with less. The Unet and Critic are ridiculously large but honestly I just kept getting better results the bigger I made them.
This is a cool application of ML. Not to diminish the work, just to point out that humans are more sensitive to luminance than color (hence YUV encoding [1] and others), so it might make inaccuracies less visible.
For example, in "Interior of Miller and Shoemaker Soda Fountain, 1899" the colors from the counter and chairs blend, but the luma help our eyes to separate it.
> The model loves blue clothing. Not quite sure what the answer is yet, but I'll be on the lookout for a solution!
Just throwing a thought out here that you might have considered, but, maybe it's because traditional black-and-white film is over-sensitive to blue? It's why when one uses traditional black and white films one usually uses at least a yellow filter and if you have blue sky in a shot you use a red filter. This may or may not be useful; either way, keep up the awesome work!
I think the issue is that the hand is round-ish and surrounded by wood texture, the model might apply learnings from photos apples or other fruit on trees.
How stable is the result with respect to augmentation?
If you get an image with a funny artifact, like a super-red hand, can you fix it by running the network on a slightly augmented image? For this kind of work, it seems reasonable that you could keep re-colorising an image until you got one that was acceptable (as in the case with the B+W TV).
Seems like an easy problem for DL, as you have an enormous amount of data available (just take any color image, convert it to grayscale and you have a pair of training images).
(This is also the case for e.g. the superresolution problem.)
You probably need an enormous GPU (24GB RAM) as well to make as large model as possible for as good generalization as you can (there are so many different types of objects/surfaces/fabric and their compositions).
It's Deep Learning, not much to do with any analytical model, it's not thinking like a human :-(. Recently even good NLP processing needs 24GB+ for training (won't fit into 16GB), a good quality colorizing (no spills, natural colors) could be expected to be as demanding.
From the article:
"BEEFY Graphics card. I'd really like to have more memory than the 11 GB in my GeForce 1080TI (11GB). You'll have a tough time with less. The Unet and Critic are ridiculously large but honestly I just kept getting better results the bigger I made them."
There is something wrong with that woman’s hand. It is either extremely swollen, a glove, or not a real hand (wooden?). Perhaps your model didn’t make a mistake after all
How it works for modern color photos which have been converted to BW? It would be interesting to see how the colors change compared to the original color photo.