Hacker News new | past | comments | ask | show | jobs | submit login

We have been experimenting with the paper(https://www.researchgate.net/publication/372834606_ON_NON-IT...).

There is a mathematical proof that binary representation is enough to capture the latent space. And in fact we don't even need to do "training" to get that representation.

The practical application we tried out for this algorithm was to create an alternate space for mpnet embeddings of Wikipedia paragraphs. Using Bit embedding we are able to represent 36 million passages of Wikipedia in 2GB.(https://gpt3experiments.substack.com/p/building-a-vector-dat...)




Wow, this works better than I would've thought.

> Who moderates Hacker News?

First result:

> Hacker News

> At the end of March 2014, Graham stepped away from his leadership role at Y Combinator, leaving Hacker News administration in the hands of other staff members. The site is currently moderated by Daniel Gackle who posts under the username "dang".


how did you test this?


First link in the substack article

https://speech-kws.ozonetel.com/wiki


You're talking about mapping floating-point vector representations, i.e., embeddings, computed by a pretrained LLM to binary vector representations, right? And you're talking about doing this by first having someone else's pretrained LLM compute the embeddings, right? Sorry, but that seems only minimally, tangentially related to the topic of running LLMs in ternary space. I don't see how your comment is relevant to the discussion here.


Yeah, sorry, needed a much bigger canvas than a comment to explain. Let me try again. The example I took was to show mapping from one space to another space and it may have just come across as not learning anything. Yes. You are right it was someone else's pretrained LLM. But this new space learnt the latent representations of the original embedding space. Now, instead of the original embedding space it could also have been some image representation or some audio representation. Even neural networks take input in X space and learn a representation in Y space. The paper shows that any layer of a neural network can in fact be replaced with a set of planes and we can represent a space using those planes and that those planes can be created in a non iterative way. Not sure if I am being clear, but have written a small blog post to show for MNIST how an NN creates the planes(https://gpt3experiments.substack.com/p/understanding-neural-...). Will write more on how once these planes are written, how we can use a bit representation instead of floating point values to get similar accuracy in prediction and next how we can draw those planes without the iterative training process.


> how we can draw those planes without the iterative training process.

Sounds interesting, but this is the part I would need more explanation on.

Just started reading your linked blog, I see it goes into some details there.


Will add a lot more details next week. Have been postponing it for a long time.


I find this extremely interesting. Do you share the source code of the process? any more references?


Unfortunately the source code is currently not open sourced. Some more details at (https://www.researchgate.net/publication/370980395_A_NEURAL_...), the source code is built on top of this.

The approach is used to solve other problems and papers have been published under https://www.researchgate.net/profile/K-Eswaran

We are currently trying a build a full fledged LLM using just this approach(no LLM training etc) and also an ASR. We should have something to share in a couple of months.


Am I missing something or is this just a linear transformation?

It says here ( https://www.researchgate.net/publication/370980395_A_NEURAL_... ) that each layer can be represented as a matrix multiplication (equation 3): Ax = s

So concatenating multiple layers could just be reduced to a single matrix multiplication?

If there is no non-linearity I don't see how this could replace neural networks, or am I missing something?


The attempt is not to replace a particular neural network which has already been trained by using Sigmoid or Rel functions. If one does this then one would necessarily have to use non-linear maps. The whole point is that such a non-linear technique is not necessary for classifications. It is not necessary to confine clusters by hyperplanes for solving a classification problem. Our focus is on individual points.

We believe the brain does not do nonlinear maps!


How is this not lossy compression?


LLMs and vector embeddings are always lossy compression, yes?


Almost always. Though you can use them in a lossless compression system, too, with a few tricks.


.. but you don't want to tell us?


Two possible implementation:

(1) Take your data as a stream. Use your machine learning gadget to give you the (predicted) probability for each of the possible next tokens. Then use those probability in arithmetic coding to specify which token actually came next.

(2) Take your data D. Apply lossy compression to it. Store the result L := lossy(D). Also compute the residue R := D - uncompress(L). If your lossy compression is good, R will be mostly zeroes (and only a few actually differences), so it will compress well with a lossless compression algorithm.

Approach (1) is a more sophisticated version of (2). None of this is anything I came up with, those approaches are well known.

See eg https://arxiv.org/abs/2306.04050 and https://en.wikipedia.org/wiki/Audio_Lossless_Coding or https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049... (Probably not the best links, but something I could find quickly.)



It kind of is!




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: