The Neuralink compression challenge seems impossible

sweezyjeezy · 2024-05-31T04:10:35 1717128635

The task is not to compress general data by a factor of 200, the task is to compress a very domain-specific kind of data by a factor of 200. Presumably the hope is this data has lower entropy than e.g. the Hutter prize data.

If I tell you to write an image compression algorithm, you aren't going to be able to do much with a bitmap of uniform randomly generated pixels. However if I tell you that in the domain I'm working in there are only two colors white and black, immediately I can reduce storing each pixel from 24 bits to 1bit, saving a factor of 24. If I tell you further that >99% of pixels are going to be black, more compression tricks become possible, etc.

I don't have expertise in this particular problem, but a priori dismissing it by comparing to Hutter is not valid.

Terr_ · 2024-05-31T04:01:12 1717128072

You're missing the most important part: It entirely depends on what the data looks like.

For example, if you hand me a gigabyte of pure zeroes I can quickly compress that over a hundred-million-fold by hand. [0]

In contrast, if it's a patternless random distribution, you're gonna have a real bad time and you might as well not even bother.

> The state-of-the-art is only slightly better: the current winner of the Hutter Prize (http://prize.hutter1.net/), a nearly two decade old prize to encourage the development of better compressors, achieved compression by 9x. 200x is a pipe dream.

While I agree 200x sounds wonky, that 9x is for English text, so that numeric comparison is apples-to--uh, to mystery fruit.

[0] "Fine artisanal compression for the palate and palette of the discerning algo-gourmand." Pitch deck in progress.

nullc · 2024-05-31T03:55:29 1717127729

enh. The hutter prize is working on textual data which is already extremely information dense. The 200:1 goal is likely unreasonable to losslessly compress their data, but you can't say that for sure by reasoning from the hutter prize.

Their page also has a scoreboard. I'm sure they're quite interested in solutions which achieve state of the art compression even if its far from their goal.

Presumably their ultimate solution will be lossy, in which case 200:1 is possibly reasonable*. But it is much harder to judge lossy compression, which makes lossless compression a more interesting target. Insights that allow state of the art lossless compression may also be helpful for lossy compression.

* 4k 60fps rgb data is about 12gbit/sec. but we send 4k lossy compressed video at 50mbit/sec -- which is more than a 200 to 1 compression ratio.

wai1234 · 2024-05-31T04:18:10 1717129090

The OP covers all the salient points. We might as well throw in antigravity and FTL travel since we have left reality behind. This is classic Musk. Right down to asking the public to solve his problem for free. Nuts.

As far as the dream that maybe THIS data can be compressed that much, this is analog waveform data converted to digital samples (10b each). IF the information is actually low enough frequency that Nyquist can be satisfied at that compression level, the simplest approach is to just drop the sampling frequency and cut out the middle man. Either the people designing this monster are ignorant or the information is actually at frequencies too high to satisfy Nyquist, leading us right back to la la land.

djdyyz · 2024-05-31T15:28:52 1717169332

Neuralink should skip the compression, and replace the radio link with optics. A tiny embedded LED can transmit data optically at the required speed, without compression, and a second transceiver outside the body, with access to more room, power and electronics, can transmit the large data stream wirelessly, if required.

The real question is why such a high data rate with the final outcome is typing speed or game cursor control, i.e. very low data rate? Neuralink should reframe the challenge to find a lossy compression scheme that doesn't alter the outcome. Who cares about lossless if the result is the same? Neuralink should provide a portal to dump imperfectly recoded data, and report the fidelity to the original outcome -- in detail if possible. That way coders can seek the most compressive lossy scheme that does not affect outcome.

It would also be nice if Neuralink released a complete set of data, i.e. 1024 channels, 20KHz sampling rate, 10 bits per sample, for an hour. That would be the starting place in order to find redundancy patterns, perhaps between channels.

I'd also like to say that the data currently provided is inconsistently scaled, and that the A/D data indicates poor linearity of the A/D itself. The A/D needs attention.

Terrk_ · 2024-06-04T17:59:38 1717523978

Apparently, someone solved it and achieved an 1187:1 compression ratio. These are the results:

All recordings were successfully compressed. Original size (bytes): 146,800,526 Compressed size (bytes): 123,624 Compression ratio: 1187.47

The eval.sh script was downloaded, and the files were decode and encode without loss, as verified using the "diff" function.

What do you think? Is this true?

https://www.linkedin.com/pulse/neuralink-compression-challen... context: https://www.youtube.com/watch?v=X5hsQ6zbKIo

echoangle · 2024-05-31T03:57:30 1717127850

Doesn’t compression ratio depend heavily on the file type / entropy of the file? You can probably zip 10 GB of zeroes to under 1 KB. Without taking the content of the data into account, you can’t just say “more than 10x is impossible because it’s uncommon”. The benchmarks you’re looking at are compressing text, I think. Maybe text is higher entropy than the Neuralink data?

h2odragon · 2024-05-31T04:13:59 1717128839

Its an interesting problem but the "work for us for free" aspect of it is offputting.

mbrubeck · 2024-05-31T04:04:07 1717128247

> Most data compression algorithms can compress data by 2x to 5x.

I mean, that’s highly dependent on the nature of the data. I just took an uncompressed TIFF screenshot of a Firefox window and compressed it as PNG for a 20x compression ratio.