"The proposed algorithm can be quickly described as an iterative algorithm that treats color information as a heightmap and 'pushes' pixels towards probable edges using gradient-ascent. This is very likely what learning-based approaches are already doing under the hood (eg. VDSR[1], waifu2x[2])."
This is interesting to me because it hints at the direction I really want to see ML stuff go.
Some problems may not lend themselves to this concept, but hear me out: We train models, they start giving reliable output, then we put it in production really having no idea what the thing is doing inside. Here we have a traditional image processing algorithm that's doing something similar to what the author suspects the ML-based solution is doing... only the authors solution is much more performant. What I think we'd love to see is the ML approach yield a result that not only works, but is transparent in how it works. So plain old human engineers can internalize what the machine learned, and re-implement the solution as a run-of-the mill algorithm that does the job faster than pretending to be a brain.
Feasible? That seems highly dependent on the task at hand. Worthy? Absolutely!
Perhaps MT (machine teaching?) is the next evolution of ML.
My enthusiasm in this instance is probably tempered by the fact that image resizing is on the simple end of things we're using ML for, I'd think.
It's a two dimensional grid of data points. That's it. I mean, that's certainly not trivial (look at all the algorithms we've come up with just in the last 10-20 years! imagine all the people-hours!) but it pales in complexity to, say, weather models or automated scanning of PET scans for tumors or something.
Image the output of any given image sizing algorithm can be quickly assessed by eye so that's a very convenient feedback loop. As opposed to say, using ML to come up with proposed oil drilling locations where testing out each proposed drilling spot is a very expensive proposition.
So plain old human engineers can internalize what
the machine learned, and re-implement the solution
as a run-of-the mill algorithm that does the job
faster than pretending to be a brain.
Perhaps we can cut out the middleman here. Maybe the answer is not for ML models to come up with human-understandable algorithms. Perhaps the answer is for them to produce optimized code that implements the algorithms they've discovered.
Disclaimer, in case it's not blindingly obvious - I am not versed in ML at all.
Re: Perhaps MT (machine teaching?) is the next evolution of ML. ...
Any sufficiently complex system acts as a black box when it becomes easier to experiment with than to understand. Hence, black-box optimization has become increasingly important as systems become more complex. - Google Vizier: a service for black-box optimization Golovin et al., KDD'17
> Perhaps we can cut out the middleman here. Maybe the answer is not for ML models to come up with human-understandable algorithms. Perhaps the answer is for them to produce optimized code that implements the algorithms they've discovered.
I would rather a high level algorithm description as an output -- which could definitely be fed into some sort of compiler that ultimately outputs executable code.
I feel like going straight to executable code isn't solving the problem GP was interested in, which I believe to be the problem of transferring knowledge from machine to engineer in much the way an engineer would transfer it to another engineer.
An algorithm that outputs code without any high level understanding or documentation is about as useful to me in a large project as an intern who can copy-paste from Stack Overflow and produce volumes of code with no documentation, in the long term.
You can throw the algorithm through a logic simplifier and pattern matcher, like the one available in Isabelle. This often helps figure things out, but not if the algorithm is just a bunch of weights.
To be understandable, ML solutions need to cleanly separate the "characteristic finding" parts with the "decision tree" parts, however the most efficient networks may well have optimised these things together, like a compiler might.
For example the first impresive ImageNet solvers clearly worked by coming up with a number of characteristics based mainly around various "textures" rather than "shapes", but this wasn't obvious when it was first published. It really seemed like it could "recognise a Panda" etc.
Definitely a breath of fresh air that someone's still trying to do super-resolution without neural networks. This example shows that at the moment, it can still be better and MUCH faster to use classical CV techniques for certain applications.
A similar thing happened with upscaling algorithms for video games. AMD's Contrast Adaptive Sharpening was shown to have superior image quality than Nvidia's Deep Learning Super Sampling[1]. Plus the former algorithm works on every game and doesn't need a training set unlike the deep learning algorithm.
ML implementations can insert detail that was never present in the original image. You can't get that with other methods. That may or not be a good thing depending on the source material and your desired result.
ML can insert "its best guess based on a training set". A human-tuned algo can insert "its output as defined by the handwritten aglo", which presumably is based on the human's own "training set" of personal experience.
but the truth of any lossy encoding is that... information is lost, period. best you can do is guess as to what was there.
This is one of those old talking points people for some reason love...
"Information is lost" is too vague. You're counting bits on disk, but fewer bits does not always mean "less information" when your algorithm gets smarter. Compression is the obvious, classical example. Even for lossy compression, information loss is << change in size.
ML offers the promise to take this to extreme levels: give it a picture of (part of) the NY skyline, and it adds the rest from memory, adjusting weather and time of day to your sample. Is that new information "real"? That's really up to your definition.
The best example of this idea is those CSI-Style "Enhance" effects: It used to be true that people on Slashdot and later HN would outrank each other with the superior smartitude of saying "That's impossible! Information was lost!".
Funny story: that effect now exists. It's quite obvious that, for example, a low-res image of a license plate still contains some data, and that an algorithm can find a license plate number that maximizes the probability of that specific low-res image. With a bit of ML, those algorithms have become better than the human brain in almost zero time flat.
This is quite capable of producing a high-res image of some license plate, yes. But it's only probabilistic: there's no proof that the license plate definitely has the exact same number as the one in the low-res photo. You have to allow for the possibility of the system hallucinating the wrong result and enhancing the certainty of it. While you could use it as input to a police search it would be grossly unjust to show such an enhanced image to a jury.
Maximizing probability naively sometimes works, but of course it can produce misleading garbage.
And then you can get fooled instead of actually correctly believing the image was unreadable.
There is no free lunch, even with robust estimators. They will make mistakes. For image quality, it is ok to make a mistake here or there. For actual recognition? Terrible.
Better than human brain? Show it.
People are pretty good at reading blurry text when trained, but I'm not aware of a test pitting trained people against a machine.
(No, Mechanical Turk does not count as trained at a specific task.)
Human brain can just as easily predict erroneously, we just seldom happen to have only a single shot at it. For visual recognition we usually look at it for an extended amount of time, waiting with "judgement" until the probability that what we see is indeed what we think it is is sufficiently high. Neural networks also output a probability (when trained in a problem that require it), that can signal their confidence in their answer.
> a low-res image of a license plate still contains some data, and that an algorithm can find a license plate number that maximizes the probability of that specific low-res image.
That's because there was enough information (data) present to extrapolate.
Let's say you take a photo of someone across the room, and downsize it so it's low res, then use machine learning to upscale it.
It will do it's best to reconstruct the face/other features based off it's data. It might even get pretty close. But it still has no way of knowing where every single freckle or mole on their skin is - it might try placing some based off what it's learn but they aren't related to the actual person.
Here's another good example [0], it doesn't know what color the bridge should be. Maybe it was painted white, and should stay white! We humans know other information such as which bridge that is, so we know what color it should be, but there's not enough data to extrapolate that from the image alone.
The license plate may as well be printed with the exact pattern you see on screen, and the assumption that it’s a low resolution sampling of some higher density information would be false. Any additional information derived from it is conjecture, however based on reasonable assumptions. By ”enhancing” the image you may gain information, but that information doesn’t inherently relate to the information you lost.
I think it only works for this particular use case though. Anime visuals are much less complicated then real world pictures. The fact that they are (mostly) created digitally in the first place means a algorithmic solution is likely to be available
Um, that's not really accurate. There is a huuuuuge library of anime that was created by hand, recorded to film, and then edited. It wasn't until the mid '00s that digital was becoming a thing. It was cheaper to do it by hand than have to render out digitally.
Interesting bit since I always figured waifu2x was the best at upscaling:
>Interesting enough, waifu2x performed very poorly on anime. A plausible explaination is that the network was simply not trained to upscale these types of images. Usually anime style art have sharper lines and contain much more small details/textures compared to anime. The distribution of images used to train waifu2x must have been mostly art images from sites like DevianArt/Danbooru/Pixiv, and not anime.
I'm not sure I understand how the author compares the quality in the preprint.
In the chart, it says to compare "perceptual quality", but the axis is only marked with "blurry" and "less blurry". Sharpness is not the only thing about the (perceptual or not) quality. I can tell that Anime4K's result is indeed very sharp, but the quality of the edges/lines are very unnatural even for the examples author provided. I personally would prefer a slightly blurry lines with less "oily effect".
Also, I didn't see any comparison with ground truth, i.e. having a high-resolution image first, resize it down, use the proposed algorithms (among existing ones) to upscale it back, and then compare the upscaled results with the original image. I understand it may be hard to find enough examples of 4k animes, but we can do so with 1080p -> 480p -> 1080p etc.
(I am not familiar with this domain, do similar researches normally do this or not in their analysis?)
There is no ground truth because AFAIK there are no native 4k anime produced yet. There are _very_ few produced at 1080p. Most 1080p that are released by studios are just an upscaled 720p master. Fansubbers will sometimes release their own upscaled 720p when the studio upscale was done very poorly.
To my knowledge, not much has changed since 2017 where only a single anime (Clockwork Planet) was produced in 1080p. The only two studios I can name offhand that I know have done 1080p masters are KyoAni and JC Staff.
Excuse my ignorance, I would have thought Anime would be much easier to be produced in 4K or even 8K when compared to movies that requires 4K / 8K Camera.
It is actually harder to produce in 4K/8K due to more details that need to be drawn to not make it looked too empty and need to make sure the lines are not too thick (e.g. by using a larger paper). TV series are usually drawn on an A4 paper with 1-2 inch margin while a proper theatrical releases are drawn on a B4 paper.
Another factor, I believe, is the know-how. In my opinion, despite anime being broadcasted in 16:9 for so long, it is only in recent years where the extra width are put in a good use during layouting.
A reasonable substitute for ground truth might be to get a single drawing done in an anime style with the appropriate pixel dimensions. There are many artists out there who can produce such a thing for a modest fee.
> Interesting enough, waifu2x performed very poorly on anime. A plausible explaination is that the network was simply not trained to upscale these types of images. Usually anime style art have sharper lines and contain much more small details/textures compared to anime. The distribution of images used to train waifu2x must have been mostly art images from sites like DevianArt/Danbooru/Pixiv, and not anime.
Your Name and the 2 Gundam Thunderbolt movies have 4K BDs, but I don't know if they are upscaled or native 4K. There's also a Gundam F91 and Space Adventure Cobra movie 4K remaster, but those were animated with cells, so I don't know how useful those would be for testing an upscaler.
There is a 4k remaster of Akira being produced from the original 35mm film that should be out next year. 35mm to 4k should be a downscaling, so that might do when it becomes available.
Not sure how the myth that anime isn't made at 1080p perpetuates itself, but it's not true. It only holds true for most anime made between ~2000 and ~2007, because those were made already digitally, but with DVD in mind. Anything prior is hand drawn on cells, anything after is done in 1080p or above, with some exceptions. There are a couple of 4K anime already out there.
The list I provided was from 2017 and only a single anime of an entire season was mastered at 1080p. Even in 2018 and 2019 that's still typical. Maybe 1-2 series produced at 1080p with every other series being an upscale. Some seasons even have 0 series produced at 1080p. Instead of calling it a myth, try to find a season where a large number of anime were produced at 1080p as counter-evidence.
Being upscaled and released on DVD or BluRay at 1080p (which most anime have been for most of the past decade) is not the same as being produced at 1080p.
I wasn't aware of the two Gundam movies mentioned by fireattack, but I can't confirm they were mastered at 4k and aren't just upscales. So if you could name some of those 4k releases that would be helpful, especially if you can provide information as to them being mastered at 4k and not just upscaled to 4k.
This is not correct. Many productions are produced at in-between resolutions between 720p and 1080p and then upscaled to 1080p. It's even in the link you provided.
The point is they are upscaled to 1080p and few are produced at 1080p. Yes, I didn't label every in-between resolution that they get produced at. This seems rather needlessly pedantic, since as you cited, the resolutions are in the link I provided.
Would you feel better if I said "they upscale 720p and 837p and 900p and 810p and 806p and 873p and 864p and 957p and 878p and 719p to 1080p"? I excluded non-standard resolutions for simplicity since it doesn't really change my greater point: most 1080p releases are just upscales. 19 of 41 listed are 720p and 720p is the most common resolution listed.
I'm in the same boat. There are different metrics to judge the quality of an upscale (peak signal to noise ratio comes to mind), but it's obvious they are limited in that they can't capture perceptual quality very closely. While it's obvious the filter is much sharper than even NGU sharp, it also seems to come with a weird gradient effect and some artifacting. Another thing I find is that sharper filters like NGU sharp don't upscale as well as other upscaling options on frames with edges that aren't supposed to be sharp, probably because in some sense they try to hard ink in parts that aren't supposed to be so. This can happen either because the source is composited to be blurry for artistic effect, or because the source is low quality. I admittedly have not tried Anime4K, but I imagine Anime4K will have a similar effect.
Of course, in the end, it's an entirely subjective thing. Personally I hold off using on using NGU sharp and use NGU Anti-Alias instead for the above reasons.
EDIT: this is addressed in the readme:
I think the results are worse!
-Surely some people like sharper edges, some like softer ones. Do try it yourself on a few anime before reaching a definite conclusion. People tend to prefer sharper edges. Also, seeing the comparisons on a 1080p screen is not representative of the final results on a 4K screen, the pixel density and sharpness of the final image is simply not comparable.
EDIT: I just tried this filter on a 4k monitor. To honest I don't think is very good. To me it reminds me of the bad parts of sharpeners turned up to the max. All the edges turn into a weird, sometimes jagged, smear, and originally blurry but detailed backgrounds just become a weird mess. I really don't think even people who like sharpness will prefer this filter for general viewing, and I find the chart given in the preprint (https://raw.githubusercontent.com/bloc97/Anime4K/master/resu...) extremely dubious.
Edges seem pretty reasonable to me in the 1:1 image. I'd dramatically prefer to watch shows with the sharpness of the post-processed image. This is a taste thing, and different folks will have different strokes.
I've also been wondering if there is something similar to Content Aware Fill that can help process old 4:3 cartoons to 16:9.
A lot of the really old cartoons would use a background art image and would pan over it with the characters dong stuff to create a sense of motion. Sometimes the characters would move over a still background image but the 'camera's would zoom in.
Something that could extract the full size background image to apply it to the frames to enlarge the aspect ratio could go a long way toward revitalizing a lot of older cartoons. Especially fit could fill in any gaps using the opensource equivilent of Content Aware Fill (is there an FOSS equal?)
I've been trying to get my kids into Space Ghost Coast to Coast, Home Movies, Sealab 2021, the Simpsons, etc. If the video is wide screen they try it and enjoy it. If it's 4:3 they barely give it a chance because it's "too old"
”Adobe Systems acquired a non-exclusive license to seam carving technology from MERL, and implemented it as a feature in Photoshop CS4, where it is called Content Aware Scaling. As the license is non-exclusive, other popular computer graphics applications, among which are GIMP, digiKam, ImageMagick, as well as some stand-alone programs, among which are iResizer, also have implementations of this technique, some of which are released as free and open source software”
Seam carving removes stuff, but the principle is the same. The Gimp plug-in is http://www.logarithmic.net/pfh/resynthesizer, and apparently also can do the filling-in. I haven’t used it, so I don’t know how good it is.
My only experience with Photoshop is through memes; is Content Aware Fill the same as Content Aware Scaling? I thought the former tried to guess what was "behind" something you removed, while the latter just moves the existing pixels around by guessing which ones need to stay together when you resize something.
See the PatchMatch research project and associated papers[1] for more detail. They are different tools in presentation and implementation within Photoshop but are based on similar concepts of randomized correspondence.
”We propose a simple image operator, we term seam-carving, that can change the size of an image by gracefully carving-out OR INSERTING pixels in different parts of the image”
That paper (which I think is the paper introducing the seam carving technique) also has examples of widening pictures.
Anime4k looks obviously like a filter (I think Photoshop has an effect that looks like that, but I can't remember the name at the moment), particular at the 4x setting.
I found the preprint somewhat confusing with its talk of approximate residuals and "pushing" pixels. Let me propose another way to think of this and someone can tell me if I'm off base. Disclaimer, I haven't read the source code.
Consider a grayscale morphological operator such as erosion. For each pixel, you would replace the value with the minimum value found inside a structuring element surrounding the pixel. This is kind of like a weird morphological operator with a 3x3 box structuring element, where instead of choosing values based on a simple criterion such as 'min' or 'max' you use information from an approximation of the image gradient. If the gradient magnitude is above some threshold, you select the neighbor pixel in the 3x3 structuring element in the opposite direction of the gradient.
This generally has the effect of making the edges more pronounced. Intuitively, you're distorting the image by "pinching" along the edges. To prevent weird color artifacts, they're using edges computed on grayscale data so that the identical morphological filter is applied to each color channel.
> [...] the proposed method [...] is tailored to content that puts importance to well defined lines/edges while tolerates a sacrifice of the finer textures.
and
> [...] a big weakness of our algorithm [...] is texture detail, however since upscaling art was not our main goal, our results are acceptable.
That sounds like a multiobjective optimization problem. If this multiobjective optimization problem was solved (permitting the nature or structure of the multiobjective optimization problem, of course), then the algorithm would be improved, don't you agree?
Did the authors of this algorithm not have the capability to formulate or recognize the multiobjective optimization problem?
Or if they did have the formulation capabilities, but that they did not have the capability to solve the multiobjective optimization problem? Why if so? Too difficult? Not enough time? Limited by a resource? No intention to have done so, excepting that they said that a specific trade-off was acceptable?
You're welcome to share your speculation or opinion, Hacker News reader.
I believe they recognise the problem is a multiobjective optimization problem (hence the formulation of their sentence) but their algorithm is not parametrizable : it is a single point on the pareto front and you would need other algorithms to explore the rest of the front.
I'm not super clear on why speed was a primary goal if the intended application is upscaling anime. If this were intended for, say, sharpening the graphical output from a game console, sure, but why does premade video content like anime need upscaling that only takes 3ms instead of 6ms or even 60ms?
So you can simply have it as an option in a media player (as they indeed have theirs) instead of requiring a cumbersome preprocessing pass which will in addition produce a much larger file size.
Agreed. Being able to do it real time is definitely nice but I don't think it's very important. I'd rather optimize for quality.
FWIW I tried doing the same thing using waifu2x, but it was about one or two orders of magnitude too slow. I don't remember the details but I think it worked out to about 2 weeks of 24/7 operation on a 1070 to upscale a full show (don't remember if it was 1-cour or 2-cour) to 1080p. Results were okay, gave kind of an oily texture to it but the denoising worked quite well. If it took only a day or two to convert a full show I'd consider doing it on some old 480p shows with bad quality, though I probably would just watch the original video myself.
I have a small resolution video of a (static) scene, and a high resolution photograph of the same scene. Does anyone know of an upscaling algorithm that takes an image as auxiliary input?
Maybe some style-transfer related algorithm could be useful in this situation?
Any video examples? If you want a good subject, take the final fight scene from the 1080p latest episode of Kimetsu no Yaiba (Ep 19) and upscale to 4K.
Also does this run on Linux or Mac? Haven't had a Windows machine in years.
If I understand it correctly, the whole project is one shader file. Sure it's portable, just pick the glsl file from the repository and plug it into your favorite video player.
It's explain in some detail in the article, but in essence, imagine a fine pen line which in 540p would be less than one pixel wide but in 2160p would be multiple pixels wide.
The problem solved by Anime4K algorithm is essentially producing sharp edges of the line when upscaled to 4k, which is a different problem from upscaling a <1 pixel antialiased line.
It's very likely that the results won't be as desired. Anime is nearly always a synthetic image which is intended to be clean and geometrically based (even if there are gradients and more real world additions; it's a synthetic).
The application domain for this includes any other sort of abstract logical synthesis, charts and maybe videogames (even ones that look realistic).
Real world content also has sharp boundaries between objects, and whatever part happens to do that work might be shared, but within objects fuzzier is probably better. IIRC someone was making an AI assisted upscaling of DS9 which would probably be closer to a generic algorithm for 'filmed' content.
Going from something like 30fps to 60fps, interpolation works decently well in many cases, because there's already so much information encoded in the 30fps. And some 15fps can work too.
But with 5fps, each frame can be so radically different, I think interpolation is generally just not possible. You can generate something smooth, but it will be so far away from whatever an animator would actually have inserted, that it will seem more strange/surreal than natural, and thus achieve the opposite effect as intended.
E.g. see [1] which shows animation at 15/30/60fps... you can see that even with the 15, it's hard to imagine an algorithm that would port well to 60. (Use the period on your keyboard to advance frame-by-frame.)
It will essentially look like flash animation of old, with moving unanimated components.
Even high grade interpolation sometimes has this problem - or the comparable "wake of water near moving object" one. Essentially you'd get tons of inpainting kind of artifacts.
Subjectively, temporal interpolation looks rather bizarre. I’m all in favor of using advanced upscaling techniques to recover lost information, but temporal interpolation adds information in that the artist never intended. I think the best approach is to have the display refresh rate be an integer multiple of the source content frame rate and duplicate frames.
I don't know. I've found a few discussions on Reddit and elsewhere, but everyone seems to credit this particular Imgur post with no context as the source:
They're decent in terms of image quality, terrible in terms of animation.
Aside from the obvious problem of motion interpolation having problems with acceleration/deceleration, there's a lot of nuance in the original animation that gets lost when you try to interpolate from one sprite to the next.
Even if you can avoid obvious artifacts, no interpolation algorithm can create new information, it can only derive from what's already there and guess at what's missing.
EDIT: If you dig through twitter you'll find some tweets from animators explaining why the results are bad. As mere consumers we might be tempted to dismiss that criticism as snobbery but animating is a craft and the interpolated results are objectively worse than the original.
I think in this case, they're attempting to keep the "inked" look where lines start and stop. Pixel doubling would result in aliasing (or, rather, a "pixelated" look) and bilinear filtering results in a "blurred" effect. The intended effect with this goal being to give the appearance that the anime was produced in 4K.
Per the name (Deep Learning Super Sampling), DLSS uses a trained neural network to achieve high-quality upsampling. The neural network is trained on representative output of the game at the internal framebuffer resolution and at the target output resolution (with SSAA and such).
The upsampling algorithm in the OP is not based on machine learning but is also fairly domain specific and of limited general applicability.
Probably. Everything seems to work better than NVidia DLSS though. AMD apparently managed to beat it using a pretty standard content aware sharpening algorithm.
"The proposed algorithm can be quickly described as an iterative algorithm that treats color information as a heightmap and 'pushes' pixels towards probable edges using gradient-ascent. This is very likely what learning-based approaches are already doing under the hood (eg. VDSR[1], waifu2x[2])."
This is interesting to me because it hints at the direction I really want to see ML stuff go.
Some problems may not lend themselves to this concept, but hear me out: We train models, they start giving reliable output, then we put it in production really having no idea what the thing is doing inside. Here we have a traditional image processing algorithm that's doing something similar to what the author suspects the ML-based solution is doing... only the authors solution is much more performant. What I think we'd love to see is the ML approach yield a result that not only works, but is transparent in how it works. So plain old human engineers can internalize what the machine learned, and re-implement the solution as a run-of-the mill algorithm that does the job faster than pretending to be a brain.
Is this feasible?