Image Scaling Attacks

tgsovlerkhgsel · on Oct 29, 2020

This obviously works when the image is "scaled" by sampling/nearest-neighbor (e.g. downscaling 2x by taking every second pixel and discarding the rest), not actually scaled through some better method (by doing math that involves all pixel values).

What the article doesn't mention, and the paper it links to probably mentions somewhere alongside so much irrelevant information that I couldn't find it yet, is whether this also works on some of the better scaling algorithms, and thus whether it's a "duh, OBVIOUSLY" or actually interesting research.

The blog post gives a cv2.resize example which seems to default to "bilinear", but I'm not sure what this means for downscaling, in particular for downscaling by a large factor.

I suspect that the key takeaway is "default downscaling methods are bad".

bonoboTP · on Oct 29, 2020

You have to use AREA interpolation for downscaling. Bilinear will only interpolate among the 4 nearest source image pixels. It still ignores most of the source pixels.

This is in essence a special version of sampling artifacts, aliasing artifacts. Anyone writing image processing software should already know about aliasing, the Nyquist theorem etc. Or, well, perhaps not in the current hype, where everyone is a computer vision expert who took one Keras tutorial...

Resizing with nearest neighbor or bilinear (ie ignoring aliasing) also hurts ML accuracy, so they better fix it even regardless of this specific "attack".

contravariant · on Oct 29, 2020

Bilinear could mean downscaling with a triangle kernel, but it might well be the standard bilinear interpolation that's native to most GPUs and OSs.

Also area interpolation still has some pretty terrible aliasing, since box kernels are terrible at filtering high frequencies.

And of course with downscaling you could still freely manipulate the downscaled image if you're allowed to use ridiculously high or low values, provided you knew the exact kernel used.

bonoboTP · on Oct 30, 2020

Bilinear uses the triangular kernel over the source image (with size corresponding to the input pixel size).

Area interp works very well in practice, it's more sophisticated than just a box filter on the input and sampling. It calculates the exact intersecting footprint sizes and computes a weighted average. Do you have examples where this causes aliasing and can show a better alternative?

contravariant · on Oct 30, 2020

You can use any image with a high frequency regular pattern. Wikipedia has the following example: https://en.wikipedia.org/wiki/File:Moire_pattern_of_bricks_s....

Anything softer than area will help with those kind of issues (which is why the original https://en.wikipedia.org/wiki/Aliasing#/media/File:Moire_pat..., looks fine in most browsers even if your resize it). Bicubic tends to do better in this respect. It's a trade off though.

bonoboTP · on Oct 30, 2020

Sorry, but this is wrong. Area has no aliasing, all others introduce aliasing artifacts when DOWNscaling.

https://imgur.com/a/C6utkwr

Now you could use pre-smoothing with a kernel and then resampling, but then we are talking about something else.

It's important to understand that interpolation happens in the source pixels, so it does not help when downscaling. Cubic tends to look nice, yes, but only when UPscaling.

contravariant · on Oct 31, 2020

Yeah if you're going to be using interpolation to downscale it's obviously going to look worse than even the most basic version of downscaling. That's why downscaling uses the transpose of the the interpolation kernel, not doing that and being surprised the result doesn't look good is just silly.

bonoboTP · on Nov 1, 2020

Do you know of any image processing library that has an implementation for that?

contravariant · on Nov 1, 2020

Imagemagick should work. It also has quite an extensive documentation: https://legacy.imagemagick.org/Usage/resize/. Though it's a bit hard to know where to start. I'm fairly certain it'll tell you somewhere that interpolation and downscaling use their kernels differently, but I couldn't tell you where.

dietrichepp · on Oct 29, 2020

There's another way to hide the image, and that is to exploit the nonlinearity of the response curves (gamma).

I have an image I crafted a long time ago which looks something like gray noise when you open it up, but when you downscale it, you see an image of Lt Cmdr Data from Star Trek. I wonder if I can dig it up.

The technique itself was not novel when I did it, a more sophisticated version involving embedded gamma values (which you can make quite large or small) was routinely used on image boards some ten or fifteen years ago.

kuschku · on Oct 29, 2020

It's ridiculous that so few websites actually handle this well. Even my own self-written imgur clone does it just fine:

https://i.k8r.eu/i/F_XCMA

https://i.k8r.eu/F_XCMAm.png

https://i.k8r.eu/F_XCMAt.png

You just have to go into a linear colorspace and use an area filter.

05greg · on Oct 29, 2020

Related, You can get an idea of what your browser display is doing in this shadertoy: https://www.shadertoy.com/view/Wd2yRt

bawolff · on Oct 30, 2020

Fwiw, the reason why wikipedis doesn't do this when rescaling images (or at least didn't years ago when i was working on image resizing code for wikipedia) is that to do that (with off the shelf software) required keeping the entire image in memory, which was a big no no. I mean, i guess it would be fine for small images, but then you're using two different algorithms depending on image size, which seems bad.

VMG · on Oct 29, 2020

is it like this? http://www.ericbrasseur.org/gamma.html?i=1

TeMPOraL · on Oct 29, 2020

The article links to this browser test page:

http://www.ericbrasseur.org/gamma_dalai_lama.html

On my machine, both Firefox and Chrome display grey rectangles when scaling down. Why do the browsers get this wrong?

tech2 · on Oct 29, 2020

Because resizing in a linear colorspace is more costly. JPEG can be resized without shifting colorspaces VERY cheaply, but requires loading into RAM if a change in colorspace (or gamma shift) is performed. The hit can be quite significant. On a phone or laptop it would hurt battery, on an online service (dynamic resizer service) it would impact latency.

bawolff · on Oct 30, 2020

> on an online service (dynamic resizer service) it would impact latency.

If its even possible at all. Sometimes users upload things like https://commons.wikimedia.org/wiki/File:“Declaration_of_vict...

choppaface · on Oct 29, 2020

Can also depend on the monitor? When I drag this page between monitors I see different effects.

tpoacher · on Oct 29, 2020

Max Pooling could also be targeted extremely easily with this technique, and it is immensely popular as a scale reduction technique in convolutional neural networks. So, yes, it could very well be a relevant and non-trivial attack in the context of 'dataset poisoning'. (it would also be relatively easy to defend against; just don't use max-pooling in the first layer -- but the point is this is a steganographic attack).

kevingadd · on Oct 29, 2020

One key thing to be aware of is that not all "bilinear" scaling algorithms are created equal. If the "bilinear" in question is GPU-accelerated, it's quite possible that it's the Direct3D/OpenGL bilinear filter, which samples exactly 4 taps of the image from the highest appropriate mip level (which may be the only one, unless the application goes out of its way to generate more). That means if the scaling ratio is less than 50%, it becomes something like a smoothed nearest neighbor filter and is vulnerable to this attack.

The introduction of a mip chain + enabling mip mapping mitigates this, because when the scaling ratio is less than 50% the GPU's texture units will select lower mips to sample from, approximating a "correct" bilinear filter. This does also require generating mips with an appropriate algorithm - there are varying approaches to this, so I suspect it is possible to create attacks against mip chain generation as well.

Thankfully, quality-focused rendering libraries are generally not vulnerable to this, because users demand high-quality filtering. A high-quality bilinear filter will use various measures to ensure that it samples an appropriate number of points in order to provide a smooth result that matches expectations.

One other potential attack against applications relying on the GPU to filter textures is that if you can manually provide mip map data, you can use that to hide alternate texture data or otherwise manipulate the result of downscaling. As far as I know the only common formats that allow providing mip data are DDS and Basis, and DDS support in most software is nonexistent. Basis is an increasingly relevant format though and could potentially be a threat, but as a lossy format it poses unique challenges.

genpfault · on Oct 29, 2020

> This does also require generating mips with an appropriate algorithm - there are varying approaches to this

http://number-none.com/product/Mipmapping,%20Part%201/index....

http://number-none.com/product/Mipmapping,%20Part%202/index....

NohatCoder · on Oct 29, 2020

Bilinear and trilinear with mipmap is still relatively poor. 3D also use anisotropic filtering, that eliminates a lot of artifacts, even in 2D scenarios.

NohatCoder · on Oct 29, 2020

It is a very common, and often overlooked issue in image processing. Bilinear is widely used, and not particularly good. For large factor downscaling it is reminiscent of nearest pixel.

enriquto · on Oct 29, 2020

> It is a very common (...)

Bilinear interpolation is perfectly acceptable for zooming-in an image (making it larger by adding new pixel values). If you want to zoom-out, you have can still use bilinear interpolation, but of course you have to filter the image data beforehand to avoid aliasing.

NohatCoder · on Oct 29, 2020

Most often scaling and filtering is an integrated process, when one says bilinear it is usually implied that it is combined with nothing else.

Sesse__ · on Oct 29, 2020

Indeed. If you filter the image data, you should _not_ do bilinear on top of that, since bilinear is a box filter, so you'd soften the image for no good reason.

enriquto · on Oct 30, 2020

You still need some kind of interpolation if the zoom factor is non-integer, and bilinear is a good choice in that case.

bonoboTP · on Oct 29, 2020

Yeah, the default implementation should check the scaling factor and use AREA interpolation when downscaling and bilinear for upscaling.

jchw · on Oct 29, 2020

Whether it works or not depends on how many samples are used to downscale. Amusingly, this attack was used for bait-and-switch and “click here to [x]” gimmicks on some websites, especially 4chan, and you can find examples tuned primarily for typical thumbnail generators (which, probably for performance reasons, tend to only sample a small number of pixels.)

https://thume.ca/projects/2012/11/14/magic-png-files/

hailwren · on Oct 29, 2020

You're looking for section 3.1 in [1] where they analyze the effect of scaling width and kernel size for any abitrary downscaling kernel.

> Any algorithm is vulnerable to image-scaling attacks if the ratior of pixels with highweight is small enough.

1 - https://www.usenix.org/system/files/sec20-quiring.pdf

DarkWiiPlayer · on Oct 29, 2020

Just a quick thought: If you just average the surrounding pixels, you could possibly still add occasional pixels to skew the average and create a different image, though that may be much more noticeable.

JacobiX · on Oct 29, 2020

If you add occasional pixels to skew the average then probably it will be noticeable in the original image. But the interpolation scheme that uses only the four corners while ignoring the rest can be easily fooled. You can blend an entire lower resolution image in the four corner.

blueflow · on Oct 29, 2020

I remember seeing this techinqie 8 or 10 years ago on 4chan. The thumbnail was some innocuous picture, when clicked on it, it expanded to the larger version with a banana. The larger version also had these kind of dots on it.

marcan_42 · on Oct 29, 2020

This is a different, related trick, which I explored in detail in PoC||GTFO 15:13.

https://archive.org/stream/pocorgtfo15#page/n96/mode/1up

This isn't based on attacking scaling algorithms per se, but rather on the fact that most browsers honor the gAMA gamma setting in PNG file headers, while most image processing libraries don't and strip it when downscaling them.

The abuse potential for AI training exists here too, but both attacks are a bit of a stretch.

m12k · on Oct 29, 2020

I'm curious about the use of the word 'attack' here - is that really what this is? If so, what exactly is being attacked? I thought this kind of thing was called steganography

josefx · on Oct 29, 2020

The attack part seems to be that husky ai is down scaling images it uses to train its model. If it was vulnerable to this attack its down scaling would expose the hidden image and use that for training instead of the user visible image. I think this could be used to trick manual or even automated reviews of the input.

javchz · on Oct 29, 2020

My guess it's an evil actor could contaminate a training data set with hidden images, resulting in a faulty ML model.

... but yeah, it's a screech as the application in the real world, seems to be a really specific case to work.

steerablesafe · on Oct 29, 2020

I guess you can potentially bypass automatic content filters on social media for example.

andreareina · on Oct 29, 2020

Steganography usually has the recipient intending to get the hidden message. Since this is about fooling the recipient "attack" seems apt.

DominikD · on Oct 29, 2020

There was a very popular yet useless trick in the late '90, early 2000 where you'd combine two images in a checkerboard pattern: one at regular intensity, the other very bright (so it doesn't stand out that much upon regular viewing.

Internet Explorer had this feature that if you CTRL+A page contents, it would overlay images with 1px grid to indicate selection. If you got your pattern right, the hidden image would appear. This is essentially the same effect, but on steroids.

Mordisquitos · on Oct 29, 2020

This reminds me that a few years ago (almost two decades?) there was a lot of concern online, almost "moral panic", about the potential of digital steganography to hide information in public image files.

Even if this method is not feasible as an attack vector, at the very least it looks like a very practical way to share information that otherwise would be censored or restricted–all the more so if the hidden image data can be encrypted, which may make it impossible to detect.

On the other hand, I know nothing about steganography and I'm talking out of my arse, so maybe current steganography methods are much more powerful.

Karawebnetwork · on Oct 29, 2020

I remember in the early 00's that people would share books and movies by using a simple command that would allow someone to zip an archive into a jpeg. For example, they would put a book's pdf file in an image of its cover. Someone else could then download the image, unzip it and get it's contents.

I can easily imagine how someone could use this for nefarious purposes.

callamdelaney · on Oct 29, 2020

very recently someone created a method to encode files and data into videos - this video could then be uploaded to youtube, and distributed / stored permanently there.

stevewodil · on Oct 29, 2020

How could that be possible though as YouTube doesn't serve the original video file back to users? It get processed to create different video streams, so this seems pretty crazy

callamdelaney · on Oct 29, 2020

According to the creator, /u/T0X1K01 on reddit:

> No, that's what's so cool about it. I explain it in more detail in the video, but basically because the videos are created using 1-bit color images, it makes it easy to retrieve data without having to worry about how YouTube changes the video.

There's a video explanation here: https://www.youtube.com/watch?v=yu_ZIr0q5rU&feature=youtu.be

Source code here: https://github.com/AlfredoSequeida/fvid/

An example here: https://www.youtube.com/watch?v=NzZDFxM5Coo

SomeoneFromCA · on Oct 30, 2020

But you won't be able to download it - youtube-dl is not with us anymore.

account42 · on Nov 2, 2020

There was a new version released yesterday. [0]

[0] http://youtube-dl.org/

metafunctor · on Oct 29, 2020

I suppose a very stupid thumbnail generator could be attacked with something like this. Proper tools for downscaling images already take this (and also gamma correction) into account.

See http://www.ericbrasseur.org/gamma.html

steerablesafe · on Oct 29, 2020

It's one thing to take non-linearity into account. But you also need to take into account the embedded colorspace information of your source image, if it has one. It's not necessarily sRGB.

oarfish · on Oct 29, 2020

This redirects me to https://support.google.com/accounts/answer/61416

ssl232 · on Oct 29, 2020

I was expecting the article to mention another use for this attack: to share porn on regular hosting sites and bypass automated detection systems.

mattigames · on Oct 29, 2020

Mmm, I wonder if it would work for videos too.

qayxc · on Oct 29, 2020

It would be spectacularly difficult to do for videos:

first there's lossy compression, which means that there's no guarantee your injected pixels survive the encoding pass.

Then there's the additional hurdle of motion vectors, which will most likely be misaligned between the original video and the injected one.

This would result in hard to predict artefacts after encoding.

Finally, each decoder handles scaling slightly differently, so even if your embedded video trick works on one software/hardware decoder, it might fail on another (sometimes even depending on just the version or additional settings/filters being enabled).

nautilus12 · on Oct 29, 2020

This is what came to mind to me, to break major social sites automated censorship mechanisms, although I feel like its largely crowdsourced these days?

arendtio · on Oct 29, 2020

Imagine combining this technique with the encoding of software like youtube-dl in this twitter post:

https://twitter.com/GalacticFurball/status/13197659867911577...

Probably hard to get it working in every environment, but if you know what you are up against, it might be possible ;-)

rademacher · on Oct 29, 2020

Typically when you downdsample you're going to want to filter than use whatever downdsample kernel you want with the correct stride. Since the filter is lowpass, think just Fourier transform then taking an inner smaller square of the image and inverting, then you can embed the poison image only in that frequency spectrum. Now by playing with the power, if we downdsample by a factor of 4 then just assume that we lose a quarter of the power in the original image while the poison image loses no power. So right off the bat, we are scaling up the poison image power by a factor of the downsampling ratio. For example, we might go from 1/4 power in the poison image relative to the true image then to equivalent power. The other aspect would be if the interpolation kernel and strides are known we can just make sure that the poison image has large values at those specific pixels and further increase the gain.

pfortuny · on Oct 29, 2020

Really impressive and ingenious. This looks scary in some sense...

barbegal · on Oct 29, 2020

I thought almost everyone uses some form of interpolation when resizing which would defeat this attack completely. Or are there use cases for not using interpolation (I know it requires less processing)?

tsbinz · on Oct 29, 2020

Opencv does use linear interpolation by default. What you'd need to do is to use something that helps against aliasing, for example first blurring the image with a kernel of the appropriate size or to use a scaling method like opencv's INTER_AREA.

steerablesafe · on Oct 29, 2020

What actually helps here is to use linear colorspace for downscaling and to correctly detect the source image's colorspace.

tsbinz · on Oct 29, 2020

Colorspaces are an issue with scaling/averaging, but it's not what's happening here.

ssgh · on Oct 29, 2020

Try filling a large square image with thin vertical lines, interpolated for smoothness, but still visibly separate from each other. The width of each line should be about 1-3 pixels. Then map the image onto polar coordinates, so the lines would meet in the middle. Finally, downscale it a couple times with a basic avg(2x2) -> 1x1 mapping. Observe an elaborate "shadow shape" in the middle that looks like r=cos(4 pi a), but with a lot more nuanced details.

enriquto · on Oct 29, 2020

This is oddly specific. Can you point to a realistic scenario where this makes sense?

ssgh · on Oct 29, 2020

I'm just working on a very particular app in the webgl rendering space and noticed this mysterious glitch in this very case. First I though I've discovered something interesting, but turned out it was just the antialiasing bug being discussed here. I'll share a link to that demo on HN a little later: my account is still green and I'm afraid HN would shadowban me and my domain for sharing links now.

tgv · on Oct 29, 2020

For those wondering how it works: it's explained in the article linked in the third paragraph. It takes advantage of aliasing.

miguelmota · on Oct 29, 2020

More info: https://scaling-attacks.net/

lifeisstillgood · on Oct 29, 2020

So in the same way we build pipelines to sanitise user text input (Little Bobby Tables etc) we need to treat image data in the same way - I guess a pipeline that uses openCV to detect an image in full size and thumbnail - and if they are widely different flag for review ?

It's still cool though

Aerroon · on Oct 29, 2020

It's definitely neat that it works that way, but I don't really see it as a problem.

DDR0 · on Oct 29, 2020

Hah, this is kind of brilliant. Hiding in plain sight…

SeeManDo · on Oct 29, 2020

In the example "atttack image" the husky and the outline of the fence in the sky. "That's amazing!"

nullc · on Oct 29, 2020

Oh good, perhaps things will start defaulting to less aliasy downsampling kernels now.

chrisallick · on Oct 29, 2020

thats incredible!

redgc · on Oct 29, 2020

I did not see in the article nor so far here in the comments one example of this in the wild, which perhaps indicates that using such simple sampling approach isn't common? If someone could successfully execute this against Twitter or Reddit, for example - that changes its newsworthiness completely.