They reproject the input images and run the low-res network multiple times. Then they use an approach similar to NeRF to merge the knowledge from those reprojected images into a super-resolution result.
So in a way, this is quite similar to how modern Pixel phones can take a burst of frames and merge them into a final image that has a higher resolution than the sensor. Except that they run useful AI processing in between and then do the super-resolution merge on the results.
Perhaps similar in some ways to how big cats' eyes reflect the light back from behind the retina (i.e. back through it for a second pass) to capture more light. I'm sure I heard that on a nature documentary ...
It is interesting indeed.
One wonders if the researchers of this particular bit of work made it mandatory to go for walks at lunch and think about how their vision chunked/filtered the information it was receiving.
Interesting that they "perturb" the image to get some noise involved. I'll need to read it over again.
Nature is such a good source of inspiration, the "perturb" approach reminded me of [fixational eye movement][1] but maybe that's only a clear link in retrospect.
The training technique used here (fitting something similar to a NeRF to different views of the same image) is pretty similar to this paper which uses a similar technique to denoise (instead of upscale) output features: https://arxiv.org/abs/2401.02957
Correct, s/some/vast majority of. Ex. major video conference software ML blur algos run at like 100x100 - the weird edge is much more about resolution of input/output than ML.
Don't think that's applicable in this case. This "FeatUp" technique does not feed its output back into the model in any way.
Rather, it's just producing a higher resolution output by taking multiple passes of the input image (subtly shifting the input image before each pass) producing a slightly different low-resolution feature map.
Each of these low-resolution feature maps represent contributions from differing areas of the input image. "FeatUp" can then create a higher-resolution feature map, "simply" by taking the color from the pass with the most appropriate input shift.
A very rough sketch:
Input Image: abcdefgh
Create multiple low resolution feature maps using your model,
shifting the input image, a few pixels each pass:
This looks like it could be useful.
Remote sensing uses feature extraction tools. Being able to upsample again would make the data a lot easier to view and interpret.
Nice work.
They do this based on sematics with data which doesn't has the data. You can get more information out of pixelated data if you know what the semantics are.
The search space is much much smaller if you only optimize for bloodcells than for everything. If this adds a chance of seeing things which you couldn't do before, it adds value.
It could mean doing a cheap analysis with low res and doing a high res and much more expensive one when you detect something. Like being in a rural area and traveling to the big city after you found something.
Overall the chances are that more people get help not less
The images aren't being upscaled, I dont think. Rather, the features from low resolution representations are being upscaled. Images are origional, to show how upscaled features still line up with reality.
https://mhamilton.net/featup.html