Hacker News new | past | comments | ask | show | jobs | submit login
Using Waifu2x to Upscale Japanese Prints (ejohn.org)
245 points by mxfh on May 20, 2015 | hide | past | favorite | 64 comments



The "cleanliness" of the resulting images is undeniable, but once you get past the sheer awe at how crisp and clear the upscaled image is, you'll immediately notice the loss of detail. It completely does away with any and all texturing, which is especially noticeable in the last image ([1] vs [2]) - look at the scales and patterned lines on the snake (?) around his neck and the white strands in his hair, and of course, the letters have been turned into (unrecognizable?) squiggles.

Still, in terms of pure shock and awe - they're jaw-droppingly nice for upscaled versions, to the point where if you didn't have the original, it wouldn't occur to you that this wasn't it.

1: http://ukiyo-e.org/image/mfa/sc165440

2: http://i.imgur.com/541uG5t.png


I find this an unfair criticism.

This is trained to upscale anime images and not woodblock prints - and anime images are typically flat, uniform colors. This may have issues scaling up a still of a background scene from 5cm/second but would fair much better with a character still from .hack//sign. You have to keep in mind what it is trying to scale up.

>(Naturally I could train a new CNN to do this, but it may not even be necessary!)

Training to upscale woodblock prints and retaining the texture might make sense if you care to retain texture. It only works as well as it does because the style is very similar.


One thing I should note is that when looking at prints (at least for when it comes to technical analysis) being able to see accurate representations of the lines is far more important than the uniformity of colored regions. Color is almost always at the whim of the printer on any given day, whereas the black lines (from the keyblock) should always remain the same. Granted you're going to have issues either way (using this tool or doing normal scaling) as the source material is inherently compromised.

Although it's not clear what scenes exactly the upscaler was trained up, I suspect that it's currently best suited towards scenes that have lots of large bold lines and not lots of tiny details.


I imagine it was trained on 'typical' anime-style art of black/bold character outlines and mostly flat colors.

A common solution to resizing anime characters is to create a colored vector of an image. The differences between these vectors and the original stills are minimal and usually 'satisfactory'. There is an entire scene of people who create these vectors and another scene of people who use the vectors to create wallpapers and other graphics. [0] Waifu2x can help replace the need to vector these images by increasing the quality of upscaling them.

This is the prevalent 'style' for anime - at least from the past 8-10 years or so. There are a few outliers and I imagine Waifu2x would work poorly on them. For example, I do not see it working well on a still from "The Garden of Words". [1]

[0] http://img04.deviantart.net/96b1/i/2015/105/8/d/oumae_kumiko...

[1] https://24framesps.files.wordpress.com/2014/11/the-garden-of...


One potential point of improvement is training the neural network in prints with Japanese text, this seems to be the weakest point in using the Anime trained one.


cheers for everything, and this comment


To be fair, the demo site provides a configurable level of artifact reduction. This article uses the highest level. Here it is with none and some:

http://imgur.com/a/cVVnC


Yeah, on second thought, after seeing the low noise reduction result again, I suspect that may be an even better result for what I'm looking to achieve. Many of the details in his rope are preserved and the calligraphy appears to be in better shape.


I've updated the post to include additional low-noise-reduction filter examples. Thank you for reminding me to look at them!


Damn, but the "some" looks great.


Maybe? I'm definitely biased in that I have substantial computer vision & image processing experience, but the output looks riddled with obvious filter and vectorization artifacts to me.

Re: Image 2, http://i.imgur.com/541uG5t.png


Yeah - the background looking like crumpled-up wrapping paper is definitely not ideal. I suspect that it's having trouble with mostly-uniform areas of color that have slight variations. It appears to be extrapolating and creating these larger effects.


mostly-uniform areas of color that have slight variations

That's precisely what it's attempting to smooth over - and it works well for anime images because in them, those variations would be considered noise.

Wondering if it would anime-ise it, I fed Waifu2x the standard Lenna image - twice - ended up with this:

http://i.imgur.com/iNWRjIS.png


How did you achieve that? I've run it through several times and whatever settings I use, can't get anything close to yours?


I took the original, scaled it down to 1/4, and then back up to original with Waifu2x at 2x scale, maximum noise reduction.


That's fantastic!


Have you tried running the original through a high pass filter (to get the textures) and applying it over the vectorized version? It might work for the background texture, though it would probably suck for the text.


It would preserve all the JPEG artifacts, which are the main reason the naively upscaled version looks so crappy in the first place.


There are well-known techniques to clean JPEG artifacts, but they need the tables from inside the JPEG to work best.

Since JPEG uses an 8x8 block transform, you can find artifacts by shifting the image a few pixels over and looking for how the transformed block changes, basically.

https://github.com/FFmpeg/FFmpeg/blob/master/libavfilter/vf_...

Also, using a better chroma upscale can help for small images. libjpeg just uses nearest-neighbor (no "real" resizing) and hardly anyone notices, but it helps with lines and edges.


I have not - although that's an interesting idea, thank you! Relative to my other projects this is a very low-priority exploration. I was very interested to see if there could be a "cheap win" for this particular sub-problem that I will be dealing with, should I get around to digitizing these books.


The article is unclear, but I think he was upscaling from the small "source images" in the article, not the full images linked through them. Compare that (http://i0.wp.com/data.ukiyo-e.org/mfa/scaled/sc165440.jpg) to Waifu2x low noise reduction (http://i.imgur.com/pDmgNZS.png), and sharpness and detail definitely improve. And then they get worse again with high noise reduction (http://i.imgur.com/541uG5t.png), so that says something about the best parameters to use.

(Edit: It looks like the low-noise-reduction version was added later, and you were talking about the high-noise-reduction version, in which case, fair enough.)


The version he gives is smaller than the source image, which makes me think the image is upscaled from the grainy preview.

I tried it on the actual source myself (using http://waifu2x.udp.jp/), and there was very little actual loss of this kind.


I feel like you hit a borderline-pathological case for the noise reducer with that image. It hasn't just blurred the cross-hatching (? is that what those patterned lines are called?), its completely removed it.

I tried it with just upscaling and no noise reduction, and the result is about what you'd about: a really nice upscale, perfectly preserving all those patterns (as well as the noise, unfortunately). Doing that and filtering in another program might work better.


Waifu2x is not actually the first or only image scaler to use neural networks - NNEDI3[1], an Avisynth[2] filter used for deinterlacing can also do really nice image upscaling (and it's a lot faster than waifu2x). Here's an example of what it can do to the images in the blogpost:

Image 1: http://i.imgur.com/4cXr51v.png

Image 2: http://i.imgur.com/PZAXeM8.png

It doesn't come with any noise reduction, but nothing stops you from doing that separately from the upscaling process itself, and that way you should be able to control it better anyway (I find the reduction options provided by waifu2x really aggressive even with the low setting, it just kills tons of detail).

As a sidenote, when talking about something like image scaling, it would be a good idea to avoid saying something like "image scaled 2x (normally)" as there are lots of ways to scale images and what's "normal" can vary a lot depending on what you're using.

[1] http://bengal.missouri.edu/~kes25c/

[2] http://avisynth.org


NNEDI3 is fantastic - thank you for providing a link and some samples!

You're absolutely right that I shouldn't have said "normal". I update the post to clarify that this was using "OSX Preview". I did some hunting but didn't find any obvious pointers as to which algorithm they're using. If anyone knows offhand I'll be happy to include it!


Talk with the imgix.com folks about the CoreImage stuff. They're using the built-in re-sampling in their product.

Also chat with @deepbluecea who's done a lot of image processing stuff, including for Apple.


I looked at what imgix was using a few weeks ago on HN. The resampling they do is really poor. You can do much better with imagemagick.

https://news.ycombinator.com/item?id=9501601


Yeah, why they're going through all that hardware effort, I dunno. Simpler developer workflow I guess. Would be interesting to do a cost/benefit vs. just using a Linux stack.


The 'hardware effort' is to get dramatically improved processing time by using the GPU since they're trying to do it on a much larger scale.

I have/continue to use imagemagick and similar software-based solutions and they're pretty slow for multi-MB images (but most servers don't have good GPUs so it's the only solution unless you're building custom racks as imgix does).


Yeah, I'm not super sure about the dramatically improved processing time. Especially compared to a SIMD-optimized scaler. You have to spend some time sending the image to the GPU and reading it back too.

Especially if you set imagemagick to use the much worse scaler that imgix uses, I imagine it'd be pretty fast.

On the other hand, if you replaced imgix's stack with the high quality scalers from mpv (written as OpenGL pixel shaders), and then compared to expensive CPU scalers, I would expect a GPU solution to be a win.

Note that imgix also has to recompress the image as PNG or JPEG at the end. This has to be done on the CPU and is probably more resource intensive than any of the scaling.


You can upload 100s of MBs of texture data to a GPU in milliseconds. Sending and receiving from GPU doesn't actually take that long in comparison to the time it takes to process a multi-MB file in software.


At least use GraphicsMagick, it's a lot faster in my (limited) experience. Unfortunately, I couldn't turn up any decent benchmarks.


Thanks for the tip, I had assumed it would be similar in feature-set/speed to imagemagick so hadn't tried it yet.


Well, OSX Preview seems to be doing something interesting as I can't seem to find an exact match with some quick attempts, but whatever method they use it looks rather similar to Lanczos scaling.


After doing some more poking it appears as if Avisynth (and thus NNEDI3) is Windows-only. Do you happen to know if there are ways to run it in Linux or OSX? Or if there's a comparable set of software for those platforms?


Avisynth should run in Wine, but there is also Vapoursynth[1] (which works natively on OSX & Linux) and a NNEDI3 port[2] for it. After getting both of them up and running, a script like this[3] ran with the vspipe program that comes with Vapoursynth should do the trick. It's a bit cumbersome since Avisynth and Vapoursynth are primarily intended for processing video, not images, but it gets the job done in absence of a dedicated NNEDI3 resizing tool. I'm actually using this exact setup at work myself when I need to do any image upscaling.

[1] http://www.vapoursynth.com/

[2] https://github.com/dubhater/vapoursynth-nnedi3

[3] http://pastebin.com/3k9TEL7Y


The NN was explicitly trained for artifact-free PNG sources of anime fanart. Which it handles quite well according to my own testing[1]

Its benefits are questionable if used on anything else.

I've also tested it on anime screenshots and in that case it's it pretty much is en par there with NNEDI3 (which is computationally much cheaper) because real world encodes actually have compression artifacts and those get scaled too if you disable noise reduction or everything is smoothed out too much if you leave it on.

So if you want to use it on anything else you really do have to retrain the NN first, otherwise you get results you could also achieve by other means (e.g. warpsharp, NNEDI or Photoshop Topaz)

Also, waifu2x only scales luma. Its chroma handling is just regular upscaling (whatever imagemagick uses by default. I think), so even that part could be improved.

[1] http://forum.doom9.org/showpost.php?p=1722990&postcount=3



This looks like it could be applied to a real-life "ENHANCE" button. By training similar algorithms with photographs instead of anime prints, would this be a feasible means of approximating detail from enlarged photographs CSI-style (not quite to the extreme one sees on TV, but perhaps enough for a police sketch or something)?


Something to keep in mind is that when upscaling, you are actually inventing (fabricating) detail. Tools like the one presented here are content to invent detail that looks pleasing to the eye, but if you tried to do something like this for photographs you wouldn't get anything that would hold up as evidence. You also wouldn't want to use this to guide a police sketch, because the "enhanced" image actually contains false information compared to the original.


This upscaling can be considered a form of lossy compression. The pixelated images, while ugly, contain more information - the process cannot be reversed due to this loss of information.

> You also wouldn't want to use this to guide a police sketch...

You would, could and can. Take a look at police sketch software, it is a very manual version of this process. There are a very limited number of potential variations of the human face, that is why eigenfaces work for facial recognition. Consider the scenario where you have a photo of a human face, where one half is occluded. In manually reconstructing the image, you wouldn't place the reconstructed eye an inch above the chin - because human faces don't work that way. Neural networks pick up on that. There is software that tailors use in fitting suits, where a few measurements (like weight and arm length) can be used to extrapolate the rest of the measurements (like chest size and torso length). This works because of the limited number of potential human dimensions.

As far as use of these techniques for evidence... I'd actually prefer it to reliance on eye witness accounts, as the algos are open to exact measurement - unlike most of the other stuff that passes for "criminal science" (humans are still in the loop for fingerprint analysis, wtf?).


Yes, I'm not trying to say that software is useless for assisting in approximating detail for facial recognition, but software like this, where in goes a single image and out pops a single "clean and enhanced" image, with no manual guidance in between, sounds like it would be fantastically misleading. Somehow you have to express to the decision-makers (investigators, prosecutors, jury) that there is error and guesswork involved in this process, lest you end up with techno-magic like polygraph tests that are popularly understood to produce evidence that they really don't.


The "manual guidance in between" is where CSI is so incredibly screwed up though (toolmark and fingerprint examiners, polygraph operators, etc). The only criminal science that is actually reliable has cut humans out of the loop (dna, forensic document analysis, computer forensics). Even with the reliable methods, they are still probabilistic, which is exactly how the software we are discussing would work. As far as misleading decision-makers, well that is a more fundamental problem with the justice system... we really need to cut as much human judgement out of the process as possible. I'm looking forward to the day when speech recognition and language parsing are solved problems, because formal logic will fix this situation pretty quickly.


It's great to cut out humans from the loop where we can, but we cannot do so here. As you say, upscaling is lossy (de)compression, and no amount of math is going to reveal information that fundamentally does not exist in a source image. Furthermore, neural networks are trivially fooled: http://news.cornell.edu/stories/2015/03/images-fool-computer... . I'd actually trust a trained neural network far less than a human, just like I'd trust the upscaling technique in this article far less than a human artist. Speed and automation are their advantages compared to trained humans, not quality.


> As you say, upscaling is lossy (de)compression

As are eye witness accounts, which have been demonstrated to be pretty useless.

As are fingerprints, a tiny sliver of (maybe?[0]) uniquely identifiable information.

As are autopsies, where the state of the corpse is maintained only in whatever the examiner writes down, x-rays, or snaps a polaroid of.

As are bite marks...

So you've got all that, plus your lawyer's sweaty appeals to emotion in a group of 12 people - of whom four will express a belief in haunted houses and two will claim to have actually seen a ghost [1]. You'd prefer that over an application of math that can be challenged and rationally discussed?

> Furthermore, neural networks are trivially fooled...

A neural network was fooled with the equivalent of a hash collision, one guess as to how to fix that :)

> I'd actually trust a trained neural network far less than a human...

I can't think of a single person I'd trust over math, once maybe Bill Cosby - but not anymore.

> Speed and automation are their advantages compared to trained humans, not quality.

Well in this context I'd say that impartiality and repeatability are pretty important, which are characteristics more likely to describe a math model than an individual with the qualifications of a mailing address - and all the training that can be packing into a 20 minute vhs about civic duty played on a wheeled TV.

[0] http://www.academia.edu/447251/The_Current_Position_of_Finge...

[1] http://www.pewresearch.org/fact-tank/2013/10/30/18-of-americ...


Interesting! The effect looks quite similar to warpsharp (http://avisynth.nl/index.php/WarpSharp), a sharpening filter that seemed to have some vanity among anime encoders back when video sources were not as crisp as they are today. There's quite a lot of detail loss in Resig's ukiyo-e example, but I imagine for most people the most striking part of it will be how much smoother the result appears.


Great use case, upscaling print thumbnails.

Norman Tasfi made a Neural Net upscaler for Flipoard http://engineering.flipboard.com/2015/05/scaling-convnets/

I expect video upscaling next.


> I expect video upscaling next.

There is a directshow filter (madvr[1]) for windows that already offers a neural network scaler {NNEDI3, simpler network than waifu2x) in realtime.

[1] http://forum.doom9.org/showthread.php?t=146228


Interesting... If a Neural Network can be used for up scaling a video it means that you need to send less data over the wire for getting the same quality. This means neural networks can be used as a compression algorithm.


from the flipboard article:

"The final use case that we thought of was saving bandwidth. A smaller image could be sent to the client which would run a client side version of this model to gain a larger image."

Can be applied to gifs and videos, but this really depends on the usage case and if the client would tolerate such a thing.


NNEDI3, which I mentioned in another comment, can be used for video upscaling, and was in fact built for video processing in the first place.


It performs better on larger features.

Anime is almost never drawn with finer detail than the output resolution, so artifacts are not a problem. This is a low resolution scan of something with very fine detail, something which it is not trained on.


He forgot to comment on how the filter destroyed the letters.


I'm not sure I'd go so far as to say "destroyed". Compare the text in this cartouche: https://imgur.com/7fGJg4s,iWf4pXG

At worst it seems comparable to the previous result. At least to my eyes.


For example, 4th character, two lines were converted into a stain.


Great point. FWIW, I've updated the post to include some of the cartouches, along with a cartouche at the "low noise reduction" level. The two lines in the fourth character appear to still be relatively distinct in this case.


That being said, I don't want by any means to disrespect the work, because it's clearly impressive.


I'd just like to express my appreciation for Waifu2x's informative name. More projects could do with such evocative labels.


My neural network sarcasm detector is confused by this post. I was going to complain about it being the same kind of dim unintentional sexism as the original choice of Lena as reference image.


It was a backhanded compliment. The name is a bit... unprofessional.

On the other hand, it does help you grok its function, and I suspect the 'memorable' name is at least partially responsible for its popularity.


I don't buy "unintentional".


I did a quick comparison between ImageMagick and Waifu2x using a common anime-style image: http://imgur.com/a/teKVY


The scans look to have jpeg artifacts?

If you really are working with the original source, you should rescan to png or tiff or even just higher rate jpeg?


for comparison sake, can someone share the time taken to upscale some of these images?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: