Image Compression with Singular Value Decomposition

ntr-- · on Feb 10, 2023

I worked on a very interesting project aligning point clouds using SVD, for a pair of point clouds of the same scene that are not aligned:

- select 3+ pairs of matching points in each cloud (tops of trees, edges of a building etc)

- calculate the vector to the centroid of each cloud

- use SVD to calculate the rotation that gives a minimum distance when applied from the source to the target

- translate and rotate the source cloud to the target

I did this using the rust nalgebra^1 crate after reading a very helpful paper^2 detailing the process. I had planned to build the rust lib into WASM so the process could be run alongside the browser based point cloud visualiser we were using, but had limited time and instead used Neon^3 to build a native binding for our nodejs server to use.

^1 https://docs.rs/nalgebra/latest/nalgebra/linalg/struct.SVD.h...

^2 https://igl.ethz.ch/projects/ARAP/svd_rot.pdf

^3 https://github.com/neon-bindings/neon

foooobaba · on Feb 10, 2023

This is cool, I briefly read through the paper, one thing I’m curious of is how hard it would be to add ability to do scale transformation (eg zoom in/zoom out) in addition to translation and rotational transformations. Would it be as simple as just adding a scale factor to the optimization objective and rework a bit of the math?

nighthawk454 · on Feb 10, 2023

I think you're looking for the Procrustes transform, which uses SVD to optimize a minimal transform for rotation, translation, _and_ scale.

https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem

foooobaba · on Feb 10, 2023

Wow, yes this looks like it will do the job!

chpatrick · on Feb 10, 2023

https://web.stanford.edu/class/cs273/refs/umeyama.pdf

bob1029 · on Feb 9, 2023

Very interesting. I will be digging into this one. The first thing that popped out at me:

> We can decompose a given image into the three color channels red, green and blue. Each channel can be represented as a (m × n)‑matrix with values ranging from 0 to 255. We will now compress the matrix A representing one of the channels.

I wonder if the author considered converting to YCbCr colorspace first. The luminance component (Y) is substantially more important to the human visual system than the Cb/Cr components. Some subsampling of the chrominance components would probably work well in these schemes.

nwallin · on Feb 9, 2023

This leaped out at me too. Doing lossy compression on RGB is not setting yourself up for success. Of course, then you'd need two sliders in the UI; one for how much to compress the Y and one for the CbCr.

SVD also works on complex matrices. I imagine there's value in compressing the subsampled Cb/Cr channels as real/imaginary components in a complex matrix.

tomrod · on Feb 9, 2023

The representation by SVD would work almost identically. SVD is one of those absolutely magical, amazing algorithms that power a ton of things we do every day.

wiredfool · on Feb 9, 2023

Especially if YCbCr is a linear transform of the RGB. (I'm not sure if it is, but it's likely to be close). If it is, it's essentially just making eigenvectors from a matrix with a different set of bases.

bob1029 · on Feb 9, 2023

> is a linear transform of the RGB

It is for every implementation I've seen. I have it defined both ways as 4x4 matrices in my code (3x3 is not in System.Numerics).

planede · on Feb 10, 2023

It appears to be a linear transform from "gamma-adjusted" RGB. So probably not straight from sRGB. [1]

The article does not mention it, but I assume it just takes sRGB color components.

[1] https://en.wikipedia.org/wiki/YCbCr#R'G'B'_to_Y'PbPr

daturkel · on Feb 9, 2023

Funny enough, I did this same project (minus the fancy web interface) for a numerical linear algebra course in college—except I had to do it in Matlab.

It's worse than just about any "real" image compression algorithm, but it works! (Plus you get lossless compression if your image is low-rank.)

bee_rider · on Feb 9, 2023

It is interesting that lossy compression algorithms are better, despite the Eckart–Young–Mirsky theorem. I guess “best low rank approximation under unitarily invariant norms“ doesn’t mean much to eyeballs though.

sfpotter · on Feb 10, 2023

Eckart-Young-Mirsky just relates to approximation by low rank matrices, and it says that the error is best in terms of the operator norm, i.e. the action of the approximation is closest.

When you’re trying to compress an image you’re trying to optimize something quite different, so it is actually not too surprising.

CaptainNegative · on Feb 10, 2023

Images have structure that can be exploited, and exploitable structure is everything. A Fibonacci spiral isn't going to have a nice low rank decomposition but it can be specified exactly (and thus reproduced exactly) in a small handful of bytes. EYM is a structure-free guarantee, so we can't expect it to show that SVD would perform better on the natural image manifold than would purpose-built algorithms.

snovv_crash · on Feb 10, 2023

Perceptual loss is different from mathematical RMSe loss. There has been a lot of work into making things that decompress into something mathematically completely different, but which to Mk1 Human Eyeball looks very similar.

abetusk · on Feb 9, 2023

For those that don't know about it already, Steve Brunton has some good videos on many math topics, including image compression using SVD [0].

[0] https://www.youtube.com/watch?v=QQ8vxj-9OfQ

antegamisou · on Feb 9, 2023

A great primer to SVD from AMS:

https://www.ams.org/publicoutreach/feature-column/fcarc-svd

imranq · on Feb 9, 2023

SVD is probably the most important theorem in linear algebra. Basically you can take any matrix and find how much it rotates and stretches as a linear transformation

lqr · on Feb 10, 2023

It is a tough battle between SVD and the concept of eigenvalues/vectors. SVD is only meaningful for linear operators between inner product spaces, whereas eigenvalues/vectors do not even require a norm. On the other hand, eigenvectors are only meaningful for linear operators from a space to itself.

snovv_crash · on Feb 10, 2023

SVD is just an extension of the Eigenvector Decomposition to allow the two orthogonal matrices to not be equal. Think of SVD as Eigenvectors of your data both in a rowwise and colwise perspective and intuitively it works out pretty well.

devit · on Feb 10, 2023

Basically this expresses the image as an image where each row is a linear combination of a set of K rows, with different coefficients for each row (or equivalently for columns).

In general it doesn't make sense to compress images this way, since the algorithm is not invariant with respect to 2D image rotation, a very relevant operation for realistic images, but is invariant with respect to row/column permutations, which are not a relevant operation for realistic images.

chalcolithic · on Feb 12, 2023

I wonder if it's possible to exploit 2d structure of images more efficiently with this algorithm? Perhaps remapping pixel coordinates somehow..

ant6n · on Feb 9, 2023

Mmh, compression ratio doesnt seem super high. I wonder how the values are stored, perhaps floats? Some thoughts:

Maybe using yuv would be better than rgb?

Maybe values could be represented as 0..1, and the values itself could be stored as 0.16 fixed point numbers.

Use some generic compression on top.

Is there a smart way to store the basis vectors? Something about angles somehow (analogue to quaternions)? Also, once U is known, isnt the inverse implied? Also, if U is approximated and fixed, perhaps the singular values can be adjusted again to minimize errors.

derf_ · on Feb 9, 2023

> Is there a smart way to store the basis vectors?

Consider, instead, compressing small blocks of the image instead of a whole image plane at once. Let's say 8x8.

Over such a small region, pixels are usually very highly correlated. You could model them statistically as an order-1 autoregressive process with a high correlation coefficient (e.g., x[i + 1] = x[i]*rho + noise, with rho >= 0.95 usually, separately in both the horizontal and vertical directions).

Then, computing the eigendecomposition of the 8x8 cross-correlation matrix C_{ij} = rho^|i - j| produces a set of common eigenvectors that you can use for all blocks (recall that the left and right singular vectors U and V of a matrix M are just the eigenvectors of M.M^T and M^T.M, respectively). So now instead of several very large vectors, you only need a single 8x8 matrix of basis vectors (because it's square and orthonormal, the inverse is just the transpose).

But wait. Let's take the limit as rho -> 1. It just so happens that the resulting eigenvectors converge to the Type 2 Discrete Cosine Transform [0]. So, in fact, you don't need to transmit the basis at all.

You can store the output of the transform as fixed point numbers with some arbitrary base (the choice controls the amount of compression: anything too small just gets rounded to zero). Maybe use run-length encoding as your generic compression, because you expect to have a lot of zeroes (visit them in a zig-zag order, to get the big values first and maximize the length of the runs of zeroes), with some Huffman coding of the actual symbols. Add a simple predictor for the (0,0) coefficient of each block (it winds up being the average of the pixel values, so it should be pretty similar from block to block).

Congratulations, you just invented JPEG.

[0] http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1672...

klodolph · on Feb 9, 2023

I’m guessing there’s just a lot left on the table—

> Maybe values could be represented as 0..1, and the values itself could be stored as 0.16 fixed point numbers. Use some generic compression on top.

Codecs like JPEG use a variable amount of quantization. You quantize by scaling the 0..1 float by some scalar, putting it in the range 0..k, and then truncating it to an integer, and encoding the integer. The integer value is encoded using an entropy coder like Huffman. The parameter k must also be encoded somehow, or fixed.

Look up “JPEG coefficient quantization” for how JPEG does it.

Codecs for audio and images are often made up of understandable parts that fit together: transformations, quantization, entropy coding. If you come up with a new transformation, you can make a whole codec by putting together the remaining pieces—but something like a new transformation is itself interesting, because somebody else can always assemble it into a codec if it shows promise.

bob1029 · on Feb 9, 2023

> Codecs like JPEG use a variable amount of quantization

The reason quantization works so well in JPEG is because of the DCT step and its energy compaction properties. This gets most of the coefficients near zero. I think without this transform you would be introducing a lot more noise in the final result.

At some point, we are going to end up re-implementing a thing approximating jpeg here. Colorspace convert, subsampling & DCT+quantization is most of the magic sauce.

klodolph · on Feb 9, 2023

The singular value decomposition is doing the same thing as the DCT, sort of. It’s transforming the image into coefficients where the important coefficients are packed in one location. It’s worth applying similar techniques. You can see the demo where it’s truncating the coefficients.

Half the image codecs out there look like JPEG if you squint hard enough. That’s ok, “reimplementing a thing approximating” another codec is how a lot of marginal improvements have been made in the past, and if you add up enough marginal improvements, you end up with a competitive new codec in its own right.

The DCT isn’t magic, it just happens to work really well and requires little computational power. There’s a lot of possibilities for replacing the DCT with something else, especially now that we have more computational power to throw around.

kelseyfrog · on Feb 9, 2023

Fun fact, you can also do it with various flavors of tensor decomposition[1].

1. http://tensorly.org/stable/auto_examples/decomposition/plot_...

DesiLurker · on Feb 10, 2023

Have you done any PSNR analysis & comparison to prevailing standards like webp (or even jpeg)? from the look of it, it seems not very efficient as the compression ratio gets pretty bad for any kind of visually comparable scenarios.

kookamamie · on Feb 10, 2023

It's quite crappy, to be honest. Even a simple 4x4 block-based BC7 texture compression (1:4 ratio in ARGB) beats this, hands down.

rullelito · on Feb 9, 2023

I remember doing this in uni 14 years ago. Can't remember it being very efficient.

ekianjo · on Feb 10, 2023

It's just PCA applied to image compression. Nothing new...

lagrange77 · on Feb 10, 2023

Or POD.

https://en.wikipedia.org/wiki/Proper_orthogonal_decompositio...

frazar0 · on Feb 9, 2023

One issue with SVD is its significant time complexity compared to, for example, the Discrete Cosine Transform used in JPEG

VHRanger · on Feb 10, 2023

SVD is used more for mathematical elegance than practicality (like ordinary least squares)

In data science most traditional usecases for SVD are superceded by other algorithms (UMAP is especially popular these days).

sfpotter · on Feb 10, 2023

There are loads of numerical algorithms where the SVD is the tool of choice because of its particular optimality properties.

VHRanger · on Feb 10, 2023

Right, like OLS.

Don't get me wrong -- they're great tools. Especially OLS for analysis has this whole framework for understanding errors you will not get in models fit using maximum likelihood methods.

But as a final usecase for a product there's generally better out there.

sfpotter · on Feb 11, 2023

I think you're mainly thinking of machine learning and data science applications, and so your perspective may be a bit limited. But, of course, you didn't actually give any explanation of what you mean other than mentioning ordinary least squares. Would you like to elaborate and back your point up?

In computational science and engineering, there are many applications in which the SVD is a very reasonable and good choice. Some examples: fast direct solvers for integral equations, model order reduction, solving inverse problems, etc.

scotty79 · on Feb 10, 2023

Can you build image compression on UMAP?

ahartmetz · on Feb 10, 2023

It's funny how the higher compression ratios make everything look similar to a style Gerhard Richter often used. It could be used as an effect in painting apps.

chromatin · on Feb 10, 2023

Neat. I remember reading about this technique in my university numerical linear algebra textbook in 2001 =)

outside1234 · on Feb 10, 2023

Crazy how much you can compress the Mondrian with this method with almost no loss