As a metrologist (and photographer), the difficulty with these techniques is that they can over-represent the information contained within an image; they present an image of what was "probably" there, rather than representing what was. These aren't so different from our own brains, which remember what we thought we saw, rather than the light that reached our retinas.
These methods are already in extensive use (most smartphone images use extensive noise-reduction techniques), but we must be ever-cognizant that image-processing techniques can add yet another layer of nuance and uncertainty when we try to understand an image.
I think this point is worth pushing on a bit harder, which is to say that the "additional details" in the picture are guesses by the software, not actual additional details. The data present in the picture is fixed, the software uses that data to build educated guesses on what was actually there. If the photo doesn't contain enough data to actually determine what a given piece of text in the image says, the software can provide a guess, but it's just that, a guess. Similarly, if the photo doesn't provide enough detail to positively identify a person, the "super resolution" one cannot be used to positively identify them either, as it's a guess made from incomplete data, not genuinely new data.
The point is worth belaboring because people have a tendency to take the output from these systems as Truth, and while they can be interesting and useful, they should not be used for things for which the truth has consequences without understanding their limitations.
You're right to compare this to how our brains reconstruct our own memories, and the implications that has for eyewitness testimony should inform how we consider the outputs from these systems.
This “guessing” is nice for the sake of artistry, but we’ve got to be careful when knowing what actually was there is important—like when photos are submitted as evidence in court cases, or when determining the identity of a person from a photo as part of an investigation.
I hope such photos are submitted as camera takes them. With our without this new feature, photoshopping a photo before presenting it to court must be illegal.
If you consider photos taken by cell phones, it's hard to really say what "as the camera takes them" means - a lot of ML-driven retouching happens "automagically" with most modern cell phones already and I'd expect more in the future.
It goes even further than that. Image sensors don't capture images. They record electricity that can be interpreted as an image.
This might seem like a quibble, but once you dive a little deeper into it, you realise that there's enormous latitude and subjectivity in the way you do that interpretation.
What's even crazier is that this didn't come with digital photography. Analogue film photography has the same problem. The silver on the film doesn't become an image until it's interpreted by someone in the darkroom.
There is no such thing as an objective photograph. It's always a subjective interpretation of an ambiguous record.
There is a difference in the degree of subjectivity. In interpreting electricity, it's highly localized, and probably doesn't affect the macro structure of the image.
With ML-enhanced photos, you might have a distanced face that is "enhanced" by the model, to become a face that wasn't there. Or a fingerprint, a birthmark, a mole, etc.
Analog photography you could at least use E-6. Processing was tightly controlled and standardized, and once processed, you had an image.
The nice thing about this was that you could hand the E-6 off to a magazine and end up with a photograph printed in the magazine that was very close to the original film. Any color shifts or changes in contrast you could see just with your eyes. You could drop the film in a scanner and visually confirm that the scan looks identical to the original. (You cannot do this with C-41.)
This was not used for forensic photography, though. The point of using E-6 was for the photographer to make artistic decisions and capture them on film, so they can get back to taking photos. My understanding is that crime scene photography was largely C-41, once it was relatively cheap.
1. However good the guess is, it's still just that: a guess. Taking the standard of "evidence in a murder case", the OCR can and probably should be used to point investigators in the right direction so they can go and collect more data, but it should not be considered sufficient as evidence itself.
2. OCR is a relatively constrained solution space - success in those conditions doesn't mean the same level of accuracy can or will be reached outside of that constrained space.
To be clear, though - I'm making a primarily epistemic argument, not one based on utility. There are a lot of areas for which these kind of machine guessing systems are of enormous utility, we just shouldn't confuse what they're doing with actual data collection.
I'm not sure about the OCR example, but there are information / sampling theory limits on what can be discerned in an image, based on sampling rate (pixels basically) and optics. Any extrapolation outside these limits is proveably guessing.
Edit - re OCR do you mean e.g. from a picture of a blurred license plate we could rule in or out a subset of possible numbers, depending on how blurred, like a B could be a 8 but not a L? (And sorry if your example is unrelated). This is valid, and unrelated to super resolution, you can do this analysis with Nyquist and point spread functions.
I think this point is worth pushing on a bit harder, which is to say that the "additional details" in the picture are guesses by the software, not actual additional details.
I don't. Everyone knows this already and it seems like a lot of people are just saying it over and over to look clever.
What worries me is that COTS photo equipment increasingly comes with these algorithmic retouches that "over-represent" the data - or, put another way, bake its own interpretation into image, in a way that cannot be distinguished from source data.
It's nice for a casual Instagrammer, but then a lot of science and engineering also gets done using COTS equipment. I worry that at some point, a lot of money will be burned, a lot of time wasted, or even lives lost, because someone didn't notice they've based the conclusions of their scientific experiment/engineering analysis on such "computer best guesses". As a researcher, you'll see a weird pattern on some of the photos and will be left wondering, is that a real phenomenon, or is it just one of the black box, trade secret neural networks in the camera choking on input data it wasn't trained for?
A compression algorithm knows which data was lost and can optimize the discarded data for a good lost data/saved space ratio. Data "lost" by a low resolution sensor most definitely does not fit this description. Imagine saving a FullHD png instead of a 4k jpg - the former is most likely far worse.
It's not too dissimilar, I agree, but there are differences.
I did a web search for "cots" and learned that a cot is...
> a small usually collapsible bed often of fabric stretched on a frame
But in this case, COTS is apparently...
> commercial, off-the-shelf
In other words "photo equipment" or "consumer/retail photo equipment."
Upon further reading[0] it seems odd to use the term here, but maybe I'm misunderstanding something. It's often used for software and has a key phrase...
> packaged solutions which are then adapted to satisfy the needs of the purchasing organization
But it's possible the term has been co-opted to mean something else now.
I use it in a way it's used in disciplines that also work with specialty-built, or even custom-built equipment. Such as science, military and some types of engineering (e.g. aerospace). The first sentence of the linked article describes it:
"Commercial off-the-shelf or commercially available off-the-shelf[1] (COTS) products are packaged solutions[buzzword] which are then adapted to satisfy the needs of the purchasing organization, rather than the commissioning of custom-made, or bespoke, solutions."
So for example, a research team may decide to not spend money on expensive scientific cameras for monitoring experiment, and instead opt to buy an expensive - but still much cheaper - DSLR sold to photographers, or strap a couple of iPhones 15 they found in the drawer (it's the future, they're all using iPhones 17, which is two generations behind the newest one). That's using COTS equipment. COTS is typically sold to less sophisticated users, but is often useful for less sophisticated needs of more sophisticated users too. But if COTS cameras start to accrue built-in algorithms that literally fake data, it may be a while before such researchers realize they're looking at photos where most of the pixels don't correspond to observable reality, in a complicated way they didn't expect.
In the novel, quantum computers (rather than ML per-se) are tasked with interpolating more and more detailed data from astronomical observations, to the point that tracking individual members of an alien species on a distant world, underground, is possible. Eventually it is noticed that cutting off the astronomical data entirely doesn't interrupt the interpolated data. Then things get weird.
I won't go into further plot details, as that would be spoilery, but it is a pretty good book, reminiscent to me of Greg Egan's oeuvre (the novel is actually by Robert Charles Wilson).
It’s a common acronym in the tech world. I’ve usually used it in the context of a “buy-or-build” conversation about software (e.g. “most businesses are best off using COTS applications than doing custom development” - that sort of thing). But the acronym means what it means, so when OP talks about COTS camera gear, it makes sense to me.
As an aside, the term of art is "make-or-buy" if you want to be able to Google it.
The discussion we are having is interesting because COTS are notorious for their hidden costs and how difficult they are to properly budget. Having to find a way to disable or reverse advance post-processing in a camera would be a fairly typical example of that. In this specific case it might mean having to commission a custom firmware from the camera manufacturer - something which is very much doable but might end up costing you as much as buying bespoke equipments for inferior results in the end.
Interesting, in software world I’ve always heard/used build/buy rather than make/buy, and I’m guessing that comes from construction industry as a lot of traditional software PM methodology was inspired by that world. If you Google ‘build vs buy’[0] (no quotes) all your top results are software discussions. If you Google ‘build or buy’, it’s all about housing. [1]
Make-or-buy seems more a term for manufacturing industry/SCM. TIL
I did the same web search (although in all caps) and was immediately pointed to “Commercial of-the-shelf”, and I redid it now in incognito mode over a VPN, and the first answer is still “ Commercial-off-the-shelf” (with an added hyphen probably due to the language where the VPN endpoint is located.)
>It's nice for a casual Instagrammer, but then a lot of science and engineering also gets done using COTS equipment. I worry that at some point, a lot of money will be burned, a lot of time wasted, or even lives lost, because someone didn't notice they've based the conclusions of their scientific experiment/engineering analysis on such "computer best guesses".
Most research papers are crap anyway, in a much more fundamental way and for much worse reasons/bad incentives with far more impact than "computational imaging".
This is probably the last thing I'd worry about when thinking about "millions/time/wasted" for some research.
I think this is probably good for what people use photos for; it lets them show a crop without the image looking pixelated. That means if they just want a photo to draw you in to their blog post, they don't have to take a perfect photograph with the right lens and right composition at the right time. And I think that's fine. No new information is created by ML upscaling, but it will look just good enough to fade into the background.
I personally take a lot of high resolution art photos. One that is deeply in my memory is a picture I took of the Manhattan bridge from the Brooklyn side with a 4x5 camera. I can get out the negative and view it under magnification and read the street signs across the river. (I would link you, but Google downrez'd all my photos, so the negatives are all I have.) ML upscaling probably won't let you do that, but on the other hand, it's probably pointless. It's not something that has a commercial use, it's just neat. If you want to know what the street signs on the FDR say, you can just look at Google Street View.
(OK, maybe it does have some value. I used to work in an office that had pictures blown up to room-size used as wallpaper in conference rooms. It looked great, and satisfied my desire to get close and see every detail. But, you know you're taking that kind of picture in advance, and you use the right tools. You can rent a digital medium format camera. You can use film and get it drum scanned. But, for people that just need a picture for an article, fake upscaling is probably good enough. The picture isn't an art exhibit, or an attempt to collect visual data. It's just something to draw you into the article in the 3 milliseconds before you see a wall of text and bounce.)
> Google downrez'd all my photos, so the negatives are all I have
Wow, Google ate your one digital copy? That's tragic.
What's the approximate resolution you could get out of a scan from these labs?
I was interested in getting into film cameras at one point, and I was disappointed with how low the scanning resolution is from most labs. For example mpix only advertises 18MB, which they say is good enough for a 12in by 18in print. North Coast Photo (what Ken Rockwell recommends) is even worse! What if you want something to put on the wall?
Granted, if the original film you shoot is perfect you can have a print done from the negatives, but that kind of defeats the point of having a high quality scan as a backup.
Yeah, paying people to scan your photos doesn't yield good results. I did that early on and found it expensive and low quality. Honestly, the process they use doesn't scale well, and I think they offer film scans to be nice, rather than because it's a viable business.
With my home setup, I can easily do 50-80 megapixels on a 4x5 negative. I use a flatbed photo scanner (the Epson V800) and wet-mount the film. It is Quite The Process involving a lot of parts (liquids, optical film to place on top of the mount, careful calibration of the focus point, etc.) but the results are excellent and relatively repeatable. But, all in, I'd estimate that it's probably a half hour of labor per photo, so you can see why labs charge so much. (Dry mounting doesn't save that much time, because of the amount of time you spend avoiding dust and optical artifacts intrinsic in using two extra sheets of glass.)
The real professionals use drum scanners. They are quite expensive, but offer incredibly high resolution and decent throughput for the operator. Looking around at prices, for $100 you can get a 320MP scan of a 4x5 negative yielding a 1.7GB file. http://www.drumscanning.com/rates.html For fine grained black and white films, you can certainly extract information that actually exists. As someone who mostly uses T-Max 400, though, that would be overkill. I can't imagine getting much more information out of my photos than I get with a flatbed.
In summary, you can see why even pixel peepers are content with their Sony A7R. Press button, get 50 megapixels. And no toxic chemicals being absorbed through your skin.
Wow, I didn't realize you did your own scans. That's very interesting and cool, thanks for the information about it.
Looks like you can get those scanners used for pretty reasonable prices. Maybe if I've got a house one day and I think the odds of having to move within a few years are low I'll get into it and try setting up a lab.
> In summary, you can see why even pixel peepers are content with their Sony A7R. Press button, get 50 megapixels. And no toxic chemicals being absorbed through your skin.
Yep, and on top of that we're not limited by the sRGB gamut or bit depth issues of early digital cameras. Recent ones produce raw files that are extremely easy to develop and manipulate into something very nice looking.
The thing is, even on top of the enjoyment some people get out of working with film, if you're after a particular film-like look you might be able to save yourself a significant amount of post-processing time by just going with film. I've seen no one-click filter that can approximate it.
If you're using this to try and enhance super grainy CCTV footage to get a face or license plate I'd agree. Purely in the context of this article, the author is just upscaling an already high-definition image 2x. There's very little artifice that can be really added at this level that a human could perceive IMO.
> Never mind memories; there are parts of our eyes that aren’t responsive to light at all. We’re always hallucinating.
Are you referring to the blind spot, or something else?
Interestingly, the blind spot turns out not to be a design requirement, it is a contingent feature that cephalopods like octopuses (whose eyes evolved independently from vertebrates') don't have.
I take a lot of pictures of mountain scenery. I blew up one of these that had an interesting composition consisting of rocks/fields/mountain peaks. On inspection the resulting image had substantially changed the composition by increasing the size of a field relative to all other objects.
It’s much easier for the model to blow up uninteresting pieces of the photograph than interesting pieces.
An example I saw getting traction on Twitter a few months ago was a photo of Melania Trump that was purported to be a body double. Since the original image was blurry, someone used an AI upscaler to "enhance" the photograph and increase the resolution. Then the comments started to roll in: the teeth are different! The tip of her nose doesn't match! It's not her!
Technically, they were correct -- it wasn't her. It was an algorithm's best-guess reconstruction based on training data of other people's faces. Unfortunately, neither the original poster or anyone else in the thread seemed to grasp this concept.
I have been using neural-enhance (gh:alexj) and Topaz tools to upscale PAL/NTSC artworks the last three years and would not be so judicious in describing these tools. They are hallucinating what the model assumes an upscaled image should look like and not enhancing in any way as the word is understood. The original image ceases to exist. A more honest term might be “Render As Upscaled” or “Generate Higher Resolution Image” (likewise “ML”, not “AI”).
When playing around funny things happen too: recursive upscale/sharpen and analogue artifacts begin resembling topography, molten metal etc.
Now imagine this being fitted into military drones, which it almost certainly is.
Would it be right to say it is an synthesis on top of a analysis? It wasn’t what was observed. For some things it might not matter, but “it looks shopped” isn’t really a positive in my book. Although the use case in the article is pretty handy, to print stuff a lot larger.
No - 000000 is not based on a statistical model of what’s most likely to be right if the decimal place. In natural images - the statistical structure allows for this image upscaling but without revealing any previously hidden detail - just using know statistics of the world to show what might be there.
That's not how floating point math works? At least not for standard floats (IEEE 754), and except for very large integers (near 2^m, where m is the mantissa of the FP type). Floats have an exact representation for integers within their mantissa range -- i.e., '18' is exactly the same as '18.0000'.
They're correct when it comes to scientific fields - the number of significant figures is important, so 18.0 and 18.000 really do mean different things.
I dont think you mean for floating point, but for mechanical tolerances. Many times, you dont want to pay an extra $50,000 for the 5 digits of precision... but sometimes you do. Shitty system if it automatically messed up all your part tolerances.
I would say that it's like pixel's RGB at address 1x1 is 0-0-0 and pixel at address 1x2 is 0-0-2 and squeezing between them a pixel with color 0-0-1 (averaging the two values near it)(assuming doing this on a image that has 1 pixel height and e.g. 2 pixes width; so that the new image would be would be:
What you're describing is relatively straightforward (bi)linear interpolation. It is worth noting that even at this relatively simple level, going with bicubic interpolation instead will usually give you nicer results, except in cases where the hard edges in the image are only horizontal or vertical.
Even better comparisons are in the blog post for a competing product: https://www.pixelmator.com/blog/2019/12/17/all-about-the-new... (likely the same algorithm, but using a different training set, so results will be different from what Adobes product does).
It has comparisons with nearest neighbor, bilinear and Lanczos filters and uses a slider to make it easier to see the difference.
A few examples in the blog post stood out to me - both refine ambigous letters to clear ones, which demonstrates concerns other folks were talking about in this thread.
In this one, i's are nearly illegible, but are corrected. It appears to be correct here, but could be wrong.
This one's weirder. In the original, I can't read many of the letters. It looks like G[EL]NE[XR]A[IL]DI[ER][XK][IT]OR - lots of guesses. The enhanced version is GENERALDIREKTOR (I think). At any rate, it's much more confident in the spelling than I am, as a human.
> A few examples in the blog post stood out to me - both refine ambigous letters to clear ones, which demonstrates concerns other folks were talking about in this thread.
Those images show an effect that looks similar to LCD subpixel rendering, which is an artifact of a scanner working at the limit of what it's sensor is capable of producing (typical CCDs have subpixel stripes (or arrays), just like LCD screens.
Scanners typically overcome this by oversampling and then downsampling the raw data to smooth out the effect. In theory you could also do this with less oversampling if you could manage to get the scanner to do subpixel offsets, and oversampling isn't needed at all if the CCD doesn't use striping or a Bayer array but instead layers the RGB detectors on top of each other, like the Foveon X3 CCD.
Anyway, it is pretty clear that the main benefit of the upsampling interpolation in these particular images is in correcting these subpixel color fringes. Downsampling back to the original resolution should still yield an improved image, which is quite intriguing.
>On the scale of things too horrible to contemplate, "document-altering scanner" is right up there with "flesh-eating bacteria". Since 2006, Xerox scancopiers literally are making stuff up. They, for example, replace digits with others in scans.
> It has comparisons with nearest neighbor, bilinear and Lanczos filters and uses a slider to make it easier to see the difference.
Really glad to see they included Lanczos. It's extremely frustrating to those in the know to see comparisons that only use subpar algorithms. The worst only use a B-spline or nearest-neighbor upscale, and end up looking like one of those eyeglass prescription ads for seniors. Something like Lanczos is the minimum acceptable, I think.
Adobe's bicubic produces obvious severe artifacts, I assume it's something like a Catmull-Rom spline cubic.
> Even better comparisons are in the blog post for a competing product: https://www.pixelmator.com/blog/2019/12/17/all-about-the-new... (likely the same algorithm, but using a different training set, so results will be different from what Adobes product does).
Aren't those comparisons misleading though? The ML sample is 4x the resolution, comparing to something that is supposed to be used to also make something 4x the resolution. They aren't upsampling the comparison, so I don't know what they're actually doing with it. It's so misleading I just assume the company is scummy.
Hm. So I took the example image, upscaled by 200%, applied a sharpen filter (all in Paint.NET) and compared the result to the AI upscaled image.
TBH, I couldn't see a difference.
2x upscaling isn't all that impressive to begin with (e.g. produce 4 pixels from 1) and can be done in fairly high quality using traditional non-learning algorithms.
I'm much more impressed by 4x and 8x super-resolution. I'm really not sure what the big deal is with 2x.
For sure, but the title is quite an overstatement and reads like it's from someone who haven't really been paying attention to the many existing open source super resolution offerings.
No, it's not just you. An overlay with a slider is pretty much the standard way of comparing two near-identical images these days, but this article not even having a side-by-side is just downright lazy.
I don't even understand why someone would use a headline like "Jaw hit the floor" without even bothering to share the two images. It's not like Adobe Photoshop doesn't have the ability to export images...
I got this one too. Try it with my old, pixelate JPEG photos taken with friends. AI decide to enhances faces in the photos, all seems good, until I saw some face..
Topaz Gigapixel is quite good though despite the negative comments in the article. I'd rather give my money to Topaz for this one feature than keep paying Adobe subscription fees indefinitely
I don't know about losing data, but if you install the base Creative Cloud application on Mac, a whole bunch of processes run in the background that can't be easily terminated.
Because they're David to Adobe's Goliath, I also feel compelled to mention that I've just recently discovered/purchased this and am incredibly impressed with it.
For those wanting to try this out without paying for Creative Suite, Pixelmator Pro on the Mac ($40) has something similar[1]. The iPad version ($8) also has this feature now[2].
If it's just a matter of "trying it out", Adobe's Creative Cloud has a free trial already, is only $10 a month normally (IIRC), and has the advantage of running on Windows.
Edit: the photography bundle, which includes Photoshop, is $10/month, not Creative Cloud as a whole. Thanks to mastazi.
Just to clarify, the one that costs US $9.99 per month is the Photography bundle (Lightroom and Photoshop). Just adding this as there are many different Creative Cloud subscriptions with monthly prices as high as US $52 https://www.adobe.com/creativecloud/plans.html
Yes, and confusingly, as I just discovered from the link I posted above, Photoshop alone costs more than the Photography bundle! I didn't realise that at first, I was planning to get Photoshop by itself, so it seems that your comment just ended up saving me a few bucks :-D
Last time I signed up they showed the pricing per month but billed for the whole year and there was no way to cancel early. Had to contact support to get my money back once I realized this.
That's an interesting article, and relevant in the sense that the "magic kernel" can be used for purposes of super-resolution, but Adobe is using a fairly different approach. Instead of using analytically-derived functions Adobe is using a deep learning model trained on a large dataset of Low resolution-High resolution image pairs. The details are propietary, obviously, but it's likely similar to various deep learning superresolution algorithms in the academic literature. (Some more info here https://blog.adobe.com/en/publish/2021/03/10/from-the-acr-te...)
This sounds like how nvidia have implemented Deep Learning Super Sampling (DLSS) into computer graphics cards for gaming. Allowing people to run at higher resolutions (e.g. 4k) and in some cases the image looks better than native.
For the tl;dr the bit at the end (and linked paper) can cover the topic without the backstory if that's not your sort of thing:
"As noted above, in 2021 I analytically derived the Fourier transform of the Magic Kernel in closed form, and found, incredulously, that it is simply the cube of the sinc function. This implies that the Magic Kernel is just the rectangular window function convolved with itself twice—which, in retrospect, is completely obvious. This observation, together with a precise definition of the requirement of the Sharp kernel, allowed me to obtain an analytical expression for the exact Sharp kernel, and hence also for the exact Magic Kernel Sharp kernel, which I recognized is just the third in a sequence of fundamental resizing kernels. These findings allowed me to explicitly show why Magic Kernel Sharp is superior to any of the Lanczos kernels. It also allowed me to derive further members of this fundamental sequence of kernels, in particular the sixth member, which has the same computational efficiency as Lanczos-3, but has far superior properties."
The other path to SuperResolution is to take multiple images, keeping track of camera orientation and rotation to stabilize and then merge the multiple exposures into a single image with much more real image information than the native resolution of the imager. The only physical requirements are that the camera not be absolutely stable, and the subject not be moving.
You can get on the order of sqrt(N) improvement in resolution from N images with an optimized system. I've done it in the past, with a hand held DSLR and Hugin (the panorama stitching program, in this case used to align the stack of almost identical pictures with subpixel accuracy)
This is already the case. I rarely have a need to take out my SLR - it's just too bulky to have a reason for it, unless I'm going on an adventure where photography is one of the or the main purpose.
I've gone on hiking trips where my "challenge" was to only use my phone camera. It wasn't much of a challenge for landscapes.
Most people are like that. But when I go for a 'photowalk' I cannot imagine not using my (#1) DSLR (or my (#2) super-duper zoom point-and-shoot camera).
Phone (imho) is for quick and dirty, not for a 'it's time to do proper photography'.
stuff like this - assuming it's a GAN under the hood it just tries to guess a 'plausible' possible interpolation, but if you're giving it very little information about what's in the original image, there will be a wide range of plausible images it could have arisen from, so the output can be very far from the truth.
I use it regularly (for my personal photos) and it depends on the photo. My observation is that pictures of natural elements (clouds, water, stars) tend to yield better results than for example a family picture in my house.
In case anyone interested, vas3k has a very good quality write-up of similar advances made in the field: https://vas3k.com/blog/computational_photography/ (even mentioning the related ML Enhance feature in Pixelmator)
The 'Preserve Details 2.0' upscaler from photoshop does an amazingly good job, in particularly I started with a 500x500 square image of a gundam sketch illustration which showed scan lines when printed directly on a 8 inch square but with 4x4 scaling the image was close to perfect.
I guess professional photos have always been touched up and this is just an automation of this process. But I've long felt a little odd about the use of machine learning in photography this way. How long before Google Photos recreates dark photos by just reassembling all of the items it believes are there from machine learning?
I mean, the only reason things like this aren't commonplace is cost (of the skillset, tools). Basically anything's possible these days with CGI, everything is purely a matter of the amount of effort you want to put in.
And for artistic purposes, why does it matter how the final result was arrived at? If we have powerful and easy techniques for realising an artistic vision, that doesn't seem like a bad thing?
The samples on the article show very good results at preserving details so that curves do not get blurry when scaled up but are not particularly impressive.
Off topic: I remember a few years ago, some students got very impressed by GIMP's Lanczos-3 upscaling that was much better than the photoshop version they had access at the time.
It's a very hit and miss feature, like most AI enhance stuff.
Often enough it will look worse than bicubic interpolation. Chroma noise get's crazy most of the time.
What's remarkable at: pattern cloth, straight lines with large contrast (electric wires against a blue sky, dark glasses mount against white skin, etc).
What's the point? If display resolution was way higher than capture resolution than I could see it. But the opposite is true. A 4K display is 10 megapixels, most entry level cameras are in the region of 20-25mp.
The model is trained on Adobe servers and run locally on your device. The training of the model is much more processor intensive than actually utilizing the trained model, usually by multiple orders of magnitude.
Is it common practice now to take open source techniques and ship them as proprietary software? I'm seeing a lot of Photoshop tools which I just saw in Two Minute Papers a couple months ago...
I was BLOWN AWAY by this STUNNING new technology! The brilliant minds at Adobe have done it yet again. This changes everything. Adobe (ticker symbol ADBE) keeps innovating and defining the very future of creative imaging software. From the widely loved Cloud platform and easy rental model to the file formats which are so packed with features that each new update takes the competition possibly years to fully reverse engineer for interoperability. Probably because their engineers aren’t nearly as good!
Yes my jaw hit the floor when I saw this headline.
I hate articles where author shows an option but won't actually tell where it is located in the application. I spent 10 minutes looking for it in latest PS and couldn't find it. Then I clicked at link to related article about "Enhance Details" and it seems like the option could be in Lightroom instead?
I tried to use it myself because the illustrations in the article don't look to impressive, but authors enthusiasm got me to look for it.
Absolutely not. If there's not enough information available, there's not enough information, full stop.
Plausible (i.e. "good looking" or "believable") results are not the same as actual data, which is why enhance wouldn't work on vehicle licence plates or faces for example.
Sure, the result might be a plausible looking face or text, but it's still not a valid representation of what was originally captured. That's the danger with using such methods for extracting actual information - it looks fine and is suitable for decorative purposes, but nothing else.
No there certainly is a chance for ML to improve here.
Let’s take the classic example of enhancing a blurry photo to get a license plate.
Humans may not be able to see much in the blur, but an AI trained on many different highly down-res’d images could at least give you plausible outcomes using far less data than a human brain would be able to say anything with confidence.
You wouldn’t hold it up as the absolute truth, but you’d run the potential plate and see if it matched some other data you have.
So yes, it wouldn’t magically add any more information to the image, but it could be far better at taking low information and giving plausible outcomes that are then necessary to verify.
> Let’s take the classic example of enhancing a blurry photo to get a license plate.
That's not the same as fabricating information, though. A blurry image still contains a whole bunch of information and correlation data that just isn't present in a handful of pixels.
This is not super-resolution, but something different entirely. Super-resolution would mean to produce a readable license plate from just a handful of pixels. That is an impossible task, since the pixels alone would necessarily match more than one plate.
The algorithm would therefore have to "guess" and the result will match something that is has been trained on (read: plausible), but by no means the correct one, no matter how many checks you run on a database.
To illustrate the point, I took an image of a random license plate, and scaled it down to 12x6 pixels. 4x super-resolution would bring it to 48x24 pixels and should produce perfectly readable results.
The 48x24 pixel version could easily be upscaled to even make the state perfectly readable. A 4x super-resolution upscale of the 12x6 version, however, would be doomed to fail no matter what.
I was simply pointing out that AI enhancement to find details otherwise not possible by humans that could be useful/accurate is very possible, and I don’t think you refuted it.
I also never denied that. But there's a difference between finding details that would otherwise go amiss (i.e. in lieu of a microscope revealing features invisible to the unaided eye) and reproducing data that simply isn't there to begin with (as is the core of the "enhance"-trope).
Actually the opposite. These algorithms are more susceptible to noise, they may generate sharp perfect license plate numbers (that are totally fabricated and completely wrong) from a blurry image. But by no means should you even consider the results to have hints of truth.
GAN produces totally different results if you slightly change the input.
So, as others are also saying, these "enhances" are great for decoration and absolutely should be ignored as facts or truth (specially when it comes to face and license plate and others used by the law enforcement).
>These algorithms are more susceptible to noise, they may generate sharp perfect license plate numbers (that are totally fabricated and completely wrong) from a blurry image.
This is not really an issue that is new or limited to things that are called AI.
No. If their training set isn’t too far off what you use it for, it is valid. Just because it’s not guaranteed, doesn’t mean it’s not more accurate than hitting sharpen and squinting.
You’re fighting against “would it be reliable” but that isn’t the claim.
The claim is could it be better than human, and the answer is yes, it just depends on how well trained it is and the dataset.
But this is also entirely testable. I guarantee much like Go, if we set up a “human vs AI guess the blurry image” competition that AI will blow us out of the water. It’s simply a data * training issue, and humans don’t spend hours on end practicing enhancing images like they do playing Chess.
Again - it won’t be perfect, obviously. It will have false positives, of course.
Doesn’t mean it can’t be better than human.
Also GANs are pretty irrelevant, the model structure has nothing to do with the theory.
Well just because that would be a bad dangerous idea does not mean that police will not do it. After all police uses lie detectors, fingerprinting and DNA evidence without much care for an error rate.
I hope some day there will be an episode of a crime show where by chance two teams of detectives will independently work on the same case without noticing each other and by using standard police methods they will come to completely different incompatible conclusions and detain two different suspects who of course both confess after going through standard police interrogation. Screenwriters, use your powers for good!
(Actually Czech writer Karel Čapek (the same who invented the word robot) did practically the same thing in one of the short stories in Stories from Another Pocket, everybody should read it together with Stories from a Pocket)
edit: I was joking, but people pointing out that you still can't create something out of nothing etc might not be thinking big enough. I think this technology absolutely has the potential to help. police are literally still using artists impressions - photofits, to find perpetrators
I think the artist impression has a lot more value than a highly realistic generated face. If you see an artistic impression, you will see the facial features that were noticeable. Such as a mole, the shape of the nose, or the thickness of the eyebrows. Then you have a template that your brain uses to match those features with any face that you see.
However, if I show you a highly realistic face, your brain will take a different impression. Your brain is trained on faces for thousands of years. It will try to match the face perfectly.
An artist impression tells the audience that it is inaccurate. A realistic photo tells the audience that this is _exactly_ who we are looking for.
Yep. To be useful for exploring potential "true" values, a system would probably need some way of showing you the distribution of its guesses, so you can get an idea of whether there is any significant information there.
That aside, you'd still probably need a ML PhD to have a chance of correctly interpreting the results, given the myriad potential issues with current systems.
I’m not sure if that’s the case with this tech. I could see in the near future a scenario in which many, many individuals (thousands) are photographing the same things in the same area and you can intelligently superimpose things to “enhance”, tho.
Neat. If only Adobe would do away with their absurd pricing models. I'll never use an adobe product again after trying to end my subscription with them.
Absolutely. And the results shown in this article aren't particularly impressive. I'll be sticking with photopea.com, even though I get free CC through work.
Creative Cloud was the point I noticed a shift in Adobe's priorities. I don't know if they switched CEO's at the time, but I starting disliking Adobe more and more from that point forward. I couldn't believe the amount of crud Creative Cloud puts on your system, not the mention all of the tracking and phoning home their software does.
I think what bothers me the most is that its not just a monthly subscription. When you sign up, you are entering a one year contract with them. Sure, you can cancel at any time...just pay the remaining amount due and you can walk away.
Exactly. One of the arguments Adobe was making in professional circles about the subscription switch was that people will save money because they will be able to subscribe to each piece of software and for short periods of time when they need it.
Truth is that anything other than the full suite (and maybe the photographer plan) doesn't make sense financially. And then they killed the month by month subscription as you said.
A non-tech muggle's jaw hitting the floor is practically par for course. I'm so tired of reading these breathless assessments from people who don't know any better
These methods are already in extensive use (most smartphone images use extensive noise-reduction techniques), but we must be ever-cognizant that image-processing techniques can add yet another layer of nuance and uncertainty when we try to understand an image.