I have mixed feelings about this. It's always good to teach people things they d...

lifthrasiir · on Oct 6, 2020

I had a similar feeling a decade ago, but I now think otherwise. We may have reached the good-enough compression ratio but not the good-enough performance, and modern algorithms now rentlessly improve on this. On the other hand lossy compression always tends to exploit the currently available computing power; the ability to continue doing so for decades implies that improvements on lossy compression are more "incremental" than those on lossless compression.

bransonf · on Oct 6, 2020

What do you make of the Hutter prize [0]?

I think the greatest breakthrough in lossless compression would come when the development of compute outpaces storage density. Maximum possible compression (entropy i think?) isn’t impossible per say, but it is computationally impossible with today’s computing resources.

Of course, largely hypothetical now, but I envision a time in the distant future where we produce so much data that we must compress it to its principled minimum.

I do agree that current lossless compression is pretty pressed for further innovation given current resource constraints.

[0] http://prize.hutter1.net/

ralusek · on Oct 6, 2020

What if compute capability became so powerful, that the easiest way to store something was to create it, store it for a millisecond, and then simply record the exact place and time since the big bang that the record was stored, as a comparably short set of coordinates, and then delete it. Then accessing the file is as simple as simulating the universe up until that millisecond and reading the file.

blago · on Oct 6, 2020

Or we just store the hash and then brute-force it every time we need the original content.

rhexs · on Oct 6, 2020

Probably easier to just have every computer connected on a much faster network. Query the internet for a hash, have it sent to you after a 100ms lookup at 1TBps.

Seems impossible considering in my lifetime I’ve basically seen ISP speeds go from 5/1 on my first cable connection to 20/768 20 years later on DSL. I always find it a bit weird knowing how important the internet is to commerce and the economy and my only real way to access it is through the same phone line that has been around forever.

ralusek · on Oct 6, 2020

That'd be lossy with hash collisions, though.

blago · on Oct 6, 2020

You are right, although we might be able to make it lossless in practical terms with probabilistic approaches like combining multiple algorithms, selecting the hash length, and using heuristics to pick the most likely source.

blago · on Oct 7, 2020

I think this assumes a deterministic universe. I'm not sure if that's the case.

pstuart · on Oct 6, 2020

DRM will ruin it.

wolfgke · on Oct 6, 2020

Isn't Heisenberg's uncertainty principle actually some kind of DRM on the data that we can measure?

prerok · on Oct 6, 2020

I somewhat agree. I think that Shannon's theorem about information entropy comes into play and AFAIK we are very close already so improvements are bound to be small as we try to get closer and closer to the theoretical limit.

Still, I am reminded of the Huffman compression: there researchers already thought they reached the optimum. They were proven fenomemally wrong when the researchers relized they should look at compressing sequences rather than individual bytes (I believe it was LZ77 or LZW algorithm that was the first).

If we take that into account and following the same thought, improvements may still be possible if we could compress based on a larger body of data. For example, headers of the same file type contain the same common information. In theory, these could only be stored once on the hard drive. This is something deduplicated storage is already trying to achieve, so nothing new here, just trying to point to one avenue that can still yield future improvements :)

Dylan16807 · on Oct 6, 2020

If you only care about compressing a megabyte a minute there's not a lot left to do. But in practice speed is usually a major consideration and better methods let you fit significantly better compression into your time budget.

Also going one byte at a time with minimal context is pretty good for text but pretty bad for many other kinds of data. There's a lot of room to improve on those fronts.

Rebelgecko · on Oct 6, 2020

How do we know that we've hit a wall with lossless compression? As a non-expert, I've been really impressed with the likes of zstd and lz4 which are relatively new

dreamcompiler · on Oct 6, 2020

I should clarify: There's still room for improvement in the speed of lossless compression/decompression operations, and that's where most improvements happen nowadays. But there's little room for improvement in compression ratio, because most modern algorithms produce outputs that are very close to random already.

More background: https://marknelson.us/posts/2006/06/20/million-digit-challen...

fomine3 · on Oct 8, 2020

> because most modern algorithms produce outputs that are very close to random already.

Compressing well-compressed or random data would be impossible but how is it indicates there's little room?