As a metrologist (and photographer), the difficulty with these techniques is that they can over-represent the information contained within an image; they present an image of what was "probably" there, rather than representing what was. These aren't so different from our own brains, which remember what we thought we saw, rather than the light that reached our retinas.
These methods are already in extensive use (most smartphone images use extensive noise-reduction techniques), but we must be ever-cognizant that image-processing techniques can add yet another layer of nuance and uncertainty when we try to understand an image.
I think this point is worth pushing on a bit harder, which is to say that the "additional details" in the picture are guesses by the software, not actual additional details. The data present in the picture is fixed, the software uses that data to build educated guesses on what was actually there. If the photo doesn't contain enough data to actually determine what a given piece of text in the image says, the software can provide a guess, but it's just that, a guess. Similarly, if the photo doesn't provide enough detail to positively identify a person, the "super resolution" one cannot be used to positively identify them either, as it's a guess made from incomplete data, not genuinely new data.
The point is worth belaboring because people have a tendency to take the output from these systems as Truth, and while they can be interesting and useful, they should not be used for things for which the truth has consequences without understanding their limitations.
You're right to compare this to how our brains reconstruct our own memories, and the implications that has for eyewitness testimony should inform how we consider the outputs from these systems.
This “guessing” is nice for the sake of artistry, but we’ve got to be careful when knowing what actually was there is important—like when photos are submitted as evidence in court cases, or when determining the identity of a person from a photo as part of an investigation.
I hope such photos are submitted as camera takes them. With our without this new feature, photoshopping a photo before presenting it to court must be illegal.
If you consider photos taken by cell phones, it's hard to really say what "as the camera takes them" means - a lot of ML-driven retouching happens "automagically" with most modern cell phones already and I'd expect more in the future.
It goes even further than that. Image sensors don't capture images. They record electricity that can be interpreted as an image.
This might seem like a quibble, but once you dive a little deeper into it, you realise that there's enormous latitude and subjectivity in the way you do that interpretation.
What's even crazier is that this didn't come with digital photography. Analogue film photography has the same problem. The silver on the film doesn't become an image until it's interpreted by someone in the darkroom.
There is no such thing as an objective photograph. It's always a subjective interpretation of an ambiguous record.
There is a difference in the degree of subjectivity. In interpreting electricity, it's highly localized, and probably doesn't affect the macro structure of the image.
With ML-enhanced photos, you might have a distanced face that is "enhanced" by the model, to become a face that wasn't there. Or a fingerprint, a birthmark, a mole, etc.
Analog photography you could at least use E-6. Processing was tightly controlled and standardized, and once processed, you had an image.
The nice thing about this was that you could hand the E-6 off to a magazine and end up with a photograph printed in the magazine that was very close to the original film. Any color shifts or changes in contrast you could see just with your eyes. You could drop the film in a scanner and visually confirm that the scan looks identical to the original. (You cannot do this with C-41.)
This was not used for forensic photography, though. The point of using E-6 was for the photographer to make artistic decisions and capture them on film, so they can get back to taking photos. My understanding is that crime scene photography was largely C-41, once it was relatively cheap.
1. However good the guess is, it's still just that: a guess. Taking the standard of "evidence in a murder case", the OCR can and probably should be used to point investigators in the right direction so they can go and collect more data, but it should not be considered sufficient as evidence itself.
2. OCR is a relatively constrained solution space - success in those conditions doesn't mean the same level of accuracy can or will be reached outside of that constrained space.
To be clear, though - I'm making a primarily epistemic argument, not one based on utility. There are a lot of areas for which these kind of machine guessing systems are of enormous utility, we just shouldn't confuse what they're doing with actual data collection.
I'm not sure about the OCR example, but there are information / sampling theory limits on what can be discerned in an image, based on sampling rate (pixels basically) and optics. Any extrapolation outside these limits is proveably guessing.
Edit - re OCR do you mean e.g. from a picture of a blurred license plate we could rule in or out a subset of possible numbers, depending on how blurred, like a B could be a 8 but not a L? (And sorry if your example is unrelated). This is valid, and unrelated to super resolution, you can do this analysis with Nyquist and point spread functions.
I think this point is worth pushing on a bit harder, which is to say that the "additional details" in the picture are guesses by the software, not actual additional details.
I don't. Everyone knows this already and it seems like a lot of people are just saying it over and over to look clever.
What worries me is that COTS photo equipment increasingly comes with these algorithmic retouches that "over-represent" the data - or, put another way, bake its own interpretation into image, in a way that cannot be distinguished from source data.
It's nice for a casual Instagrammer, but then a lot of science and engineering also gets done using COTS equipment. I worry that at some point, a lot of money will be burned, a lot of time wasted, or even lives lost, because someone didn't notice they've based the conclusions of their scientific experiment/engineering analysis on such "computer best guesses". As a researcher, you'll see a weird pattern on some of the photos and will be left wondering, is that a real phenomenon, or is it just one of the black box, trade secret neural networks in the camera choking on input data it wasn't trained for?
A compression algorithm knows which data was lost and can optimize the discarded data for a good lost data/saved space ratio. Data "lost" by a low resolution sensor most definitely does not fit this description. Imagine saving a FullHD png instead of a 4k jpg - the former is most likely far worse.
It's not too dissimilar, I agree, but there are differences.
I did a web search for "cots" and learned that a cot is...
> a small usually collapsible bed often of fabric stretched on a frame
But in this case, COTS is apparently...
> commercial, off-the-shelf
In other words "photo equipment" or "consumer/retail photo equipment."
Upon further reading[0] it seems odd to use the term here, but maybe I'm misunderstanding something. It's often used for software and has a key phrase...
> packaged solutions which are then adapted to satisfy the needs of the purchasing organization
But it's possible the term has been co-opted to mean something else now.
I use it in a way it's used in disciplines that also work with specialty-built, or even custom-built equipment. Such as science, military and some types of engineering (e.g. aerospace). The first sentence of the linked article describes it:
"Commercial off-the-shelf or commercially available off-the-shelf[1] (COTS) products are packaged solutions[buzzword] which are then adapted to satisfy the needs of the purchasing organization, rather than the commissioning of custom-made, or bespoke, solutions."
So for example, a research team may decide to not spend money on expensive scientific cameras for monitoring experiment, and instead opt to buy an expensive - but still much cheaper - DSLR sold to photographers, or strap a couple of iPhones 15 they found in the drawer (it's the future, they're all using iPhones 17, which is two generations behind the newest one). That's using COTS equipment. COTS is typically sold to less sophisticated users, but is often useful for less sophisticated needs of more sophisticated users too. But if COTS cameras start to accrue built-in algorithms that literally fake data, it may be a while before such researchers realize they're looking at photos where most of the pixels don't correspond to observable reality, in a complicated way they didn't expect.
In the novel, quantum computers (rather than ML per-se) are tasked with interpolating more and more detailed data from astronomical observations, to the point that tracking individual members of an alien species on a distant world, underground, is possible. Eventually it is noticed that cutting off the astronomical data entirely doesn't interrupt the interpolated data. Then things get weird.
I won't go into further plot details, as that would be spoilery, but it is a pretty good book, reminiscent to me of Greg Egan's oeuvre (the novel is actually by Robert Charles Wilson).
It’s a common acronym in the tech world. I’ve usually used it in the context of a “buy-or-build” conversation about software (e.g. “most businesses are best off using COTS applications than doing custom development” - that sort of thing). But the acronym means what it means, so when OP talks about COTS camera gear, it makes sense to me.
As an aside, the term of art is "make-or-buy" if you want to be able to Google it.
The discussion we are having is interesting because COTS are notorious for their hidden costs and how difficult they are to properly budget. Having to find a way to disable or reverse advance post-processing in a camera would be a fairly typical example of that. In this specific case it might mean having to commission a custom firmware from the camera manufacturer - something which is very much doable but might end up costing you as much as buying bespoke equipments for inferior results in the end.
Interesting, in software world I’ve always heard/used build/buy rather than make/buy, and I’m guessing that comes from construction industry as a lot of traditional software PM methodology was inspired by that world. If you Google ‘build vs buy’[0] (no quotes) all your top results are software discussions. If you Google ‘build or buy’, it’s all about housing. [1]
Make-or-buy seems more a term for manufacturing industry/SCM. TIL
I did the same web search (although in all caps) and was immediately pointed to “Commercial of-the-shelf”, and I redid it now in incognito mode over a VPN, and the first answer is still “ Commercial-off-the-shelf” (with an added hyphen probably due to the language where the VPN endpoint is located.)
>It's nice for a casual Instagrammer, but then a lot of science and engineering also gets done using COTS equipment. I worry that at some point, a lot of money will be burned, a lot of time wasted, or even lives lost, because someone didn't notice they've based the conclusions of their scientific experiment/engineering analysis on such "computer best guesses".
Most research papers are crap anyway, in a much more fundamental way and for much worse reasons/bad incentives with far more impact than "computational imaging".
This is probably the last thing I'd worry about when thinking about "millions/time/wasted" for some research.
I think this is probably good for what people use photos for; it lets them show a crop without the image looking pixelated. That means if they just want a photo to draw you in to their blog post, they don't have to take a perfect photograph with the right lens and right composition at the right time. And I think that's fine. No new information is created by ML upscaling, but it will look just good enough to fade into the background.
I personally take a lot of high resolution art photos. One that is deeply in my memory is a picture I took of the Manhattan bridge from the Brooklyn side with a 4x5 camera. I can get out the negative and view it under magnification and read the street signs across the river. (I would link you, but Google downrez'd all my photos, so the negatives are all I have.) ML upscaling probably won't let you do that, but on the other hand, it's probably pointless. It's not something that has a commercial use, it's just neat. If you want to know what the street signs on the FDR say, you can just look at Google Street View.
(OK, maybe it does have some value. I used to work in an office that had pictures blown up to room-size used as wallpaper in conference rooms. It looked great, and satisfied my desire to get close and see every detail. But, you know you're taking that kind of picture in advance, and you use the right tools. You can rent a digital medium format camera. You can use film and get it drum scanned. But, for people that just need a picture for an article, fake upscaling is probably good enough. The picture isn't an art exhibit, or an attempt to collect visual data. It's just something to draw you into the article in the 3 milliseconds before you see a wall of text and bounce.)
> Google downrez'd all my photos, so the negatives are all I have
Wow, Google ate your one digital copy? That's tragic.
What's the approximate resolution you could get out of a scan from these labs?
I was interested in getting into film cameras at one point, and I was disappointed with how low the scanning resolution is from most labs. For example mpix only advertises 18MB, which they say is good enough for a 12in by 18in print. North Coast Photo (what Ken Rockwell recommends) is even worse! What if you want something to put on the wall?
Granted, if the original film you shoot is perfect you can have a print done from the negatives, but that kind of defeats the point of having a high quality scan as a backup.
Yeah, paying people to scan your photos doesn't yield good results. I did that early on and found it expensive and low quality. Honestly, the process they use doesn't scale well, and I think they offer film scans to be nice, rather than because it's a viable business.
With my home setup, I can easily do 50-80 megapixels on a 4x5 negative. I use a flatbed photo scanner (the Epson V800) and wet-mount the film. It is Quite The Process involving a lot of parts (liquids, optical film to place on top of the mount, careful calibration of the focus point, etc.) but the results are excellent and relatively repeatable. But, all in, I'd estimate that it's probably a half hour of labor per photo, so you can see why labs charge so much. (Dry mounting doesn't save that much time, because of the amount of time you spend avoiding dust and optical artifacts intrinsic in using two extra sheets of glass.)
The real professionals use drum scanners. They are quite expensive, but offer incredibly high resolution and decent throughput for the operator. Looking around at prices, for $100 you can get a 320MP scan of a 4x5 negative yielding a 1.7GB file. http://www.drumscanning.com/rates.html For fine grained black and white films, you can certainly extract information that actually exists. As someone who mostly uses T-Max 400, though, that would be overkill. I can't imagine getting much more information out of my photos than I get with a flatbed.
In summary, you can see why even pixel peepers are content with their Sony A7R. Press button, get 50 megapixels. And no toxic chemicals being absorbed through your skin.
Wow, I didn't realize you did your own scans. That's very interesting and cool, thanks for the information about it.
Looks like you can get those scanners used for pretty reasonable prices. Maybe if I've got a house one day and I think the odds of having to move within a few years are low I'll get into it and try setting up a lab.
> In summary, you can see why even pixel peepers are content with their Sony A7R. Press button, get 50 megapixels. And no toxic chemicals being absorbed through your skin.
Yep, and on top of that we're not limited by the sRGB gamut or bit depth issues of early digital cameras. Recent ones produce raw files that are extremely easy to develop and manipulate into something very nice looking.
The thing is, even on top of the enjoyment some people get out of working with film, if you're after a particular film-like look you might be able to save yourself a significant amount of post-processing time by just going with film. I've seen no one-click filter that can approximate it.
If you're using this to try and enhance super grainy CCTV footage to get a face or license plate I'd agree. Purely in the context of this article, the author is just upscaling an already high-definition image 2x. There's very little artifice that can be really added at this level that a human could perceive IMO.
> Never mind memories; there are parts of our eyes that aren’t responsive to light at all. We’re always hallucinating.
Are you referring to the blind spot, or something else?
Interestingly, the blind spot turns out not to be a design requirement, it is a contingent feature that cephalopods like octopuses (whose eyes evolved independently from vertebrates') don't have.
I take a lot of pictures of mountain scenery. I blew up one of these that had an interesting composition consisting of rocks/fields/mountain peaks. On inspection the resulting image had substantially changed the composition by increasing the size of a field relative to all other objects.
It’s much easier for the model to blow up uninteresting pieces of the photograph than interesting pieces.
An example I saw getting traction on Twitter a few months ago was a photo of Melania Trump that was purported to be a body double. Since the original image was blurry, someone used an AI upscaler to "enhance" the photograph and increase the resolution. Then the comments started to roll in: the teeth are different! The tip of her nose doesn't match! It's not her!
Technically, they were correct -- it wasn't her. It was an algorithm's best-guess reconstruction based on training data of other people's faces. Unfortunately, neither the original poster or anyone else in the thread seemed to grasp this concept.
I have been using neural-enhance (gh:alexj) and Topaz tools to upscale PAL/NTSC artworks the last three years and would not be so judicious in describing these tools. They are hallucinating what the model assumes an upscaled image should look like and not enhancing in any way as the word is understood. The original image ceases to exist. A more honest term might be “Render As Upscaled” or “Generate Higher Resolution Image” (likewise “ML”, not “AI”).
When playing around funny things happen too: recursive upscale/sharpen and analogue artifacts begin resembling topography, molten metal etc.
Now imagine this being fitted into military drones, which it almost certainly is.
Would it be right to say it is an synthesis on top of a analysis? It wasn’t what was observed. For some things it might not matter, but “it looks shopped” isn’t really a positive in my book. Although the use case in the article is pretty handy, to print stuff a lot larger.
No - 000000 is not based on a statistical model of what’s most likely to be right if the decimal place. In natural images - the statistical structure allows for this image upscaling but without revealing any previously hidden detail - just using know statistics of the world to show what might be there.
That's not how floating point math works? At least not for standard floats (IEEE 754), and except for very large integers (near 2^m, where m is the mantissa of the FP type). Floats have an exact representation for integers within their mantissa range -- i.e., '18' is exactly the same as '18.0000'.
They're correct when it comes to scientific fields - the number of significant figures is important, so 18.0 and 18.000 really do mean different things.
I dont think you mean for floating point, but for mechanical tolerances. Many times, you dont want to pay an extra $50,000 for the 5 digits of precision... but sometimes you do. Shitty system if it automatically messed up all your part tolerances.
I would say that it's like pixel's RGB at address 1x1 is 0-0-0 and pixel at address 1x2 is 0-0-2 and squeezing between them a pixel with color 0-0-1 (averaging the two values near it)(assuming doing this on a image that has 1 pixel height and e.g. 2 pixes width; so that the new image would be would be:
What you're describing is relatively straightforward (bi)linear interpolation. It is worth noting that even at this relatively simple level, going with bicubic interpolation instead will usually give you nicer results, except in cases where the hard edges in the image are only horizontal or vertical.
These methods are already in extensive use (most smartphone images use extensive noise-reduction techniques), but we must be ever-cognizant that image-processing techniques can add yet another layer of nuance and uncertainty when we try to understand an image.