Hacker News new | past | comments | ask | show | jobs | submit login
The Neuralink compression challenge seems impossible
11 points by thinktechthurs 4 months ago | hide | past | favorite | 9 comments
Having a job means that you have to do two main tasks. The first is doing whatever your job description says. If you're an engineer, you have to build widgets; if you're in sales, you have to sell widgets; if you're a designer, you have to design better widgets. The second is doing whatever your manager says. Unfortunately, it's not rare that your manager makes your job worse. If you're an engineer, your manager might tell you to build subpar widgets. You could do what your job description says and build good widgets instead, but that won’t get you promoted.

The Neuralink compression challenge (https://content.neuralink.com/compression-challenge/README.html) is a good example of this. The challenge asks members of the public to design a data compression algorithm that compresses data by 200x. In other words, the algorithm should take 200 pieces of data and then compress them down to 1, with the ability to re-extract all original 200 pieces of data.

The challenge is impossible.

Most data compression algorithms can compress data by 2x to 5x. This includes zip, which we all know and love. The state-of-the-art is only slightly better: the current winner of the Hutter Prize (http://prize.hutter1.net/), a nearly two decade old prize to encourage the development of better compressors, achieved compression by 9x. 200x is a pipe dream.

What’s worse is that the challenge doesn’t merely want entrants to design an amazing compression algorithm. Remember, the challenge is being run by Neuralink, a company that puts implants in brains. The algorithm will be used to compress signals coming out of the brain. Because the brain works fast, the signals have to be compressed extremely fast, specifically, at the rate of 200,000 bytes every millisecond. The state-of-the-art Hutter Prize winner from up above would take 36 times as long to compress that data.

What’s even worse is that, obviously, you can’t put a big battery inside a brain. The risk of a big battery overheating or leaking is too high. So the brain implant runs on very little power, about one-thousandth of what an LED light bulb uses. The compression algorithm has to run on this very small, very weak implant. Keep in mind, the compression algorithm can only use a fraction of the power the implant is running on, because the entire contraption also includes other pieces like a radio for actually transmitting the compressed brain signals.

In other words, the challenge wants entrants to not only do something impossible, but also do it quickly and efficiently.

For now, let's pretend your boss gave you this problem. Don't worry about questions like “Why am I solving this impossibly hard problem for free?” and “If I figure out how to do this, why wouldn't I start my own company and become a billionaire?”

What should you tell your boss? If you're being honest, you would say something like “this won't work; we need to start over.”

A Twitter user working on the problem did exactly that. With somewhat inartful phrasing, he said (https://x.com/lookoutitsbbear/status/1794962035714785570) that he came up with a solution that simplified the brain signal before compressing it. That is, instead of compressing all 200 pieces of data, his algorithm compresses fewer. Does this fulfill the terms of the challenge? No. Hoards of Twitter users descended upon him to tell him this over and over. The challenge doesn't allow simplifying the brain signal.

However, in the real world, this is absolutely the kind of cludge a good engineer would implement. Instead of building a machine that collects 200 pieces of data but can't do anything with it, a good engineer should build a machine that collects less data and transmits it so it can be useful. @lookoutitsbbear’s solution will lose the challenge, but something like it will be essential for getting Neuralink working.

Except, you know, this information won't make the boss (or the Neuralink challenge or its fans) happy.




The task is not to compress general data by a factor of 200, the task is to compress a very domain-specific kind of data by a factor of 200. Presumably the hope is this data has lower entropy than e.g. the Hutter prize data.

If I tell you to write an image compression algorithm, you aren't going to be able to do much with a bitmap of uniform randomly generated pixels. However if I tell you that in the domain I'm working in there are only two colors white and black, immediately I can reduce storing each pixel from 24 bits to 1bit, saving a factor of 24. If I tell you further that >99% of pixels are going to be black, more compression tricks become possible, etc.

I don't have expertise in this particular problem, but a priori dismissing it by comparing to Hutter is not valid.


You're missing the most important part: It entirely depends on what the data looks like.

For example, if you hand me a gigabyte of pure zeroes I can quickly compress that over a hundred-million-fold by hand. [0]

In contrast, if it's a patternless random distribution, you're gonna have a real bad time and you might as well not even bother.

> The state-of-the-art is only slightly better: the current winner of the Hutter Prize (http://prize.hutter1.net/), a nearly two decade old prize to encourage the development of better compressors, achieved compression by 9x. 200x is a pipe dream.

While I agree 200x sounds wonky, that 9x is for English text, so that numeric comparison is apples-to--uh, to mystery fruit.

[0] "Fine artisanal compression for the palate and palette of the discerning algo-gourmand." Pitch deck in progress.


enh. The hutter prize is working on textual data which is already extremely information dense. The 200:1 goal is likely unreasonable to losslessly compress their data, but you can't say that for sure by reasoning from the hutter prize.

Their page also has a scoreboard. I'm sure they're quite interested in solutions which achieve state of the art compression even if its far from their goal.

Presumably their ultimate solution will be lossy, in which case 200:1 is possibly reasonable*. But it is much harder to judge lossy compression, which makes lossless compression a more interesting target. Insights that allow state of the art lossless compression may also be helpful for lossy compression.

* 4k 60fps rgb data is about 12gbit/sec. but we send 4k lossy compressed video at 50mbit/sec -- which is more than a 200 to 1 compression ratio.


The OP covers all the salient points. We might as well throw in antigravity and FTL travel since we have left reality behind. This is classic Musk. Right down to asking the public to solve his problem for free. Nuts.

As far as the dream that maybe THIS data can be compressed that much, this is analog waveform data converted to digital samples (10b each). IF the information is actually low enough frequency that Nyquist can be satisfied at that compression level, the simplest approach is to just drop the sampling frequency and cut out the middle man. Either the people designing this monster are ignorant or the information is actually at frequencies too high to satisfy Nyquist, leading us right back to la la land.


Neuralink should skip the compression, and replace the radio link with optics. A tiny embedded LED can transmit data optically at the required speed, without compression, and a second transceiver outside the body, with access to more room, power and electronics, can transmit the large data stream wirelessly, if required.

The real question is why such a high data rate with the final outcome is typing speed or game cursor control, i.e. very low data rate? Neuralink should reframe the challenge to find a lossy compression scheme that doesn't alter the outcome. Who cares about lossless if the result is the same? Neuralink should provide a portal to dump imperfectly recoded data, and report the fidelity to the original outcome -- in detail if possible. That way coders can seek the most compressive lossy scheme that does not affect outcome.

It would also be nice if Neuralink released a complete set of data, i.e. 1024 channels, 20KHz sampling rate, 10 bits per sample, for an hour. That would be the starting place in order to find redundancy patterns, perhaps between channels.

I'd also like to say that the data currently provided is inconsistently scaled, and that the A/D data indicates poor linearity of the A/D itself. The A/D needs attention.


Apparently, someone solved it and achieved an 1187:1 compression ratio. These are the results:

All recordings were successfully compressed. Original size (bytes): 146,800,526 Compressed size (bytes): 123,624 Compression ratio: 1187.47

The eval.sh script was downloaded, and the files were decode and encode without loss, as verified using the "diff" function.

What do you think? Is this true?

https://www.linkedin.com/pulse/neuralink-compression-challen... context: https://www.youtube.com/watch?v=X5hsQ6zbKIo


Doesn’t compression ratio depend heavily on the file type / entropy of the file? You can probably zip 10 GB of zeroes to under 1 KB. Without taking the content of the data into account, you can’t just say “more than 10x is impossible because it’s uncommon”. The benchmarks you’re looking at are compressing text, I think. Maybe text is higher entropy than the Neuralink data?


Its an interesting problem but the "work for us for free" aspect of it is offputting.


> Most data compression algorithms can compress data by 2x to 5x.

I mean, that’s highly dependent on the nature of the data. I just took an uncompressed TIFF screenshot of a Firefox window and compressed it as PNG for a 20x compression ratio.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: