New algorithm unlocks high-resolution insights for computer vision

TOMDM · 2024-03-19T21:32:25 1710883945

The papers actual page feels like a clearer explanation to me.

fxtentacle · 2024-03-19T22:22:18 1710886938

What an amazing idea :)

They reproject the input images and run the low-res network multiple times. Then they use an approach similar to NeRF to merge the knowledge from those reprojected images into a super-resolution result.

So in a way, this is quite similar to how modern Pixel phones can take a burst of frames and merge them into a final image that has a higher resolution than the sensor. Except that they run useful AI processing in between and then do the super-resolution merge on the results.

Grieverheart · 2024-03-20T08:42:08 1710924128

Also similar to temporal antialiasing https://en.wikipedia.org/wiki/Temporal_anti-aliasing .

jeffdaniels27x · 2024-03-20T13:08:10 1710940090

Perhaps similar in some ways to how big cats' eyes reflect the light back from behind the retina (i.e. back through it for a second pass) to capture more light. I'm sure I heard that on a nature documentary ...

3abiton · 2024-03-19T23:22:08 1710890528

Very interesting, I am curious how do people reach that train of thought to a successful idea. So many great algorithms based on small twists.

kreelman · 2024-03-19T23:45:15 1710891915

It is interesting indeed. One wonders if the researchers of this particular bit of work made it mandatory to go for walks at lunch and think about how their vision chunked/filtered the information it was receiving. Interesting that they "perturb" the image to get some noise involved. I'll need to read it over again.

davecanderson · 2024-03-20T02:23:24 1710901404

Nature is such a good source of inspiration, the "perturb" approach reminded me of [fixational eye movement][1] but maybe that's only a clear link in retrospect.

[1]: https://en.wikipedia.org/wiki/Fixation_(visual)

cryptonector · 2024-03-20T02:56:48 1710903408

This seems like it could have been inspired by how human vision works.

radq · 2024-03-19T23:46:24 1710891984

The training technique used here (fitting something similar to a NeRF to different views of the same image) is pretty similar to this paper which uses a similar technique to denoise (instead of upscale) output features: https://arxiv.org/abs/2401.02957

skybrian · 2024-03-19T22:34:16 1710887656

It's not that clear why they are downsampling and then upsampling again. Why not do all the work at the original resolution?

Apparently, the issue is that some vision algorithms only output a low-res representation and that needs to be upsampled to match the original?

og_kalu · 2024-03-19T22:43:49 1710888229

>It's not that clear why they are downsampling and then upsampling again. Why not do all the work at the original resolution?

For NNs, This is pretty much a compute efficiency thing. Working on the original resolution directly is more compute intensive.

refulgentis · 2024-03-19T23:22:24 1710890544

Correct, s/some/vast majority of. Ex. major video conference software ML blur algos run at like 100x100 - the weird edge is much more about resolution of input/output than ML.

frozenport · 2024-03-19T22:16:56 1710886616

Is a learned downsampler a form of inverse crime? https://arxiv.org/abs/math-ph/0401050

ta8645 · 2024-03-19T23:57:07 1710892627

Don't think that's applicable in this case. This "FeatUp" technique does not feed its output back into the model in any way.

Rather, it's just producing a higher resolution output by taking multiple passes of the input image (subtly shifting the input image before each pass) producing a slightly different low-resolution feature map.

Each of these low-resolution feature maps represent contributions from differing areas of the input image. "FeatUp" can then create a higher-resolution feature map, "simply" by taking the color from the pass with the most appropriate input shift.

A very rough sketch:

     Input Image:  abcdefgh

Create multiple low resolution feature maps using your model, shifting the input image, a few pixels each pass:

     Pass 1:  abcdefgh   --> ACEG    
     Pass 2:  bcdefgh    --> BDFH

Now take all the low resolution feature passes and combine into a single higher resolution version:

     FeatUp:  ACEG,BDFH -->  ABCDEFGH

pksebben · 2024-03-20T01:15:00 1710897300

I wonder what you'd get if you did something similar on the latent space in a diffusion model, before decoding to an image.

kreelman · 2024-03-19T23:39:47 1710891587

This looks like it could be useful. Remote sensing uses feature extraction tools. Being able to upsample again would make the data a lot easier to view and interpret. Nice work.

jvdvegt · 2024-03-20T10:46:16 1710931576

Seems pretty scary that their demo video shows medical images having their resolution 'increased'. Does this add anything to the original images?

Oioioioiio · 2024-03-20T10:55:05 1710932105

I'm really not sure what your concern is?

They do this based on sematics with data which doesn't has the data. You can get more information out of pixelated data if you know what the semantics are.

The search space is much much smaller if you only optimize for bloodcells than for everything. If this adds a chance of seeing things which you couldn't do before, it adds value.

It could mean doing a cheap analysis with low res and doing a high res and much more expensive one when you detect something. Like being in a rural area and traveling to the big city after you found something.

Overall the chances are that more people get help not less

Grimblewald · 2024-03-20T21:22:14 1710969734

The images aren't being upscaled, I dont think. Rather, the features from low resolution representations are being upscaled. Images are origional, to show how upscaled features still line up with reality.

sorenjan · 2024-03-20T17:42:38 1710956558

What does the learned downsampler add? Isn't the output of the algorithm the upsampled features that's fed into the downsampler?

albert_e · 2024-03-20T03:25:44 1710905144

What can this do for Satellite imagery?