Hacker News new | past | comments | ask | show | jobs | submit login
High-Resolution Photorealistic Image Translation in Real-Time (louisbouchard.ai)
69 points by jonbaer on June 1, 2021 | hide | past | favorite | 15 comments



I'm honestly surprised that it took this long for someone to use wavlets as a starting point in the image translation architecture. Frequency separation has been used for ages in the pro-photoshop realm for advanced editing / restyling. It makes sense that you'd get excellent performance with it in a GAN.


This isn't really wavelets, though I guess it's a related technique. It's using a laplacian pyramid, which as you suggest is doing frequency separation, but it's achieved in the spatial domain by blurring and up/downsampling. (http://www.liralab.it/teaching/SINA_10/papers/burt-adelson19...)

It's far from the first use of this technique for CNNs, even for similar tasks. For example, there's LapGANs, that combined GANs with laplacian pyramids about 5 years ago. There's also LapSRN, that used this representation for superresolution.


This doesn't use wavelets afaics (they are not mentioned in the article), it uses something called 'Laplacian Pyramid'. That can be used separate a signal by frequency but seems to do the whole image at once. Looks to be the good old add-laplacian (ie 'blur') and subtract-from-the-original method (to get remaining high frequency).

I think wavelets usually have a limited extent in space (depending which basis shape you choose I guess).

Maybe they could make use of FFT for this separation, but it's probably hard to beat blur-and subtract for simplicity of implementation.


Essentialy, you do the same thing as an FFT for separation. A gaussian blur is a low-pass filter, and functionally the same as selecting all frequency component within a radius r of the DC component (assuming the DC component is in the center of the image). The radius r depends on the width of the gaussian kernel.

The only difference could be in computational time (might be that the FFT is faster) and will be different if you use a different kind of blurring (i.e. non-gaussian such as selecting a square region in the frequency domain).


Agreed! And in cases where the convolution kernel (bokeh, but in frequency space) is something more complicated than gaussian, FFT can definitely be a very good option (kernel/bokeh-res can be near-ish to the res of the image-to-convolve, for example, and full-colour kernels/bokehs too).


I don't have any quantitative evidence on computational performance of the FFT versus the typical laplacian pyramid technique, but I'd be surprised if FFT is faster. The classical laplacian pyramid -- developed in the 80s -- is really quite simple and fast. http://www.liralab.it/teaching/SINA_10/papers/burt-adelson19...


I found a paper from 2019 in the references of this current paper: https://arxiv.org/abs/1903.09760


The video at the end is word for word the article so skip to that if you’re gonna watch anyway.


Or, alternatively, skip the video if you're going to read the article anyway. In case you're thinking, as I did, "maybe the video will show the thing actually running in real time to transform video", nope: the video isn't from the people who did the research, it's from this Louis Bouchard chap and is just summarizing the paper. The video does contain some more examples than are on the webpage, though.

(It's a little surprising to me that the researchers don't seem to have produced any videos demonstrating what happens when you actually apply the process in real time. My best guess is that it turns out that slightly different images can produce substantially different outputs, so that if you apply it independently frame-by-frame to video you get flickery/shimmery/wobbly output. Personally, I'd still like to see it, even with the understanding that further work would be needed to make something that turns video into usable video.)


Alternate point of view made me laugh. Also true!

Would also love to see a video like that. Could be applied creatively in some way


I'm somewhat left wondering what really happens to the high frequency bits? Can they move around a lot like in the style-transfer examples early in the video? It seems a bit like this method assumes the image will not be 'deformed' much overall and will stay basically in the same location (which I think is a bit less interesting than the full style-transfer examples they show at the start to compare tbh).


If a lot of scientific papers were explained like this, I think it would bring knowledge to so many people with a non-scientific background :)


The season translations aren't looking great here, but this seems to pave the way for faster and accurate style transfer.


Ultimately you will be able to put on your AR glasses and see the world around you as an anime. Standing on the corpses of forgotten adventurers, such as magic leap and Google glass


Can we just take a moment to appreciate Anaconda? CUDA, Python and Tensor versioning are a mess in the ML era, having wrestled these dependencies manually multiple times I feel I need to take a chance to praise Anaconda when ever possible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: