We have been experimenting with the paper(https://www.researchgate.net/publicati...

SushiHippie · 2024-02-28T18:29:39 1709144979

Wow, this works better than I would've thought.

> Who moderates Hacker News?

First result:

> Hacker News

> At the end of March 2014, Graham stepped away from his leadership role at Y Combinator, leaving Hacker News administration in the hands of other staff members. The site is currently moderated by Daniel Gackle who posts under the username "dang".

khimaros · 2024-03-01T17:18:58 1709313538

how did you test this?

SushiHippie · 2024-03-01T18:25:50 1709317550

First link in the substack article

https://speech-kws.ozonetel.com/wiki

cs702 · 2024-02-28T16:37:57 1709138277

You're talking about mapping floating-point vector representations, i.e., embeddings, computed by a pretrained LLM to binary vector representations, right? And you're talking about doing this by first having someone else's pretrained LLM compute the embeddings, right? Sorry, but that seems only minimally, tangentially related to the topic of running LLMs in ternary space. I don't see how your comment is relevant to the discussion here.

nutanc · 2024-02-28T16:57:19 1709139439

Yeah, sorry, needed a much bigger canvas than a comment to explain. Let me try again. The example I took was to show mapping from one space to another space and it may have just come across as not learning anything. Yes. You are right it was someone else's pretrained LLM. But this new space learnt the latent representations of the original embedding space. Now, instead of the original embedding space it could also have been some image representation or some audio representation. Even neural networks take input in X space and learn a representation in Y space. The paper shows that any layer of a neural network can in fact be replaced with a set of planes and we can represent a space using those planes and that those planes can be created in a non iterative way. Not sure if I am being clear, but have written a small blog post to show for MNIST how an NN creates the planes(https://gpt3experiments.substack.com/p/understanding-neural-...). Will write more on how once these planes are written, how we can use a bit representation instead of floating point values to get similar accuracy in prediction and next how we can draw those planes without the iterative training process.

pests · 2024-02-28T23:28:18 1709162898

> how we can draw those planes without the iterative training process.

Sounds interesting, but this is the part I would need more explanation on.

Just started reading your linked blog, I see it goes into some details there.

nutanc · 2024-02-29T03:45:37 1709178337

Will add a lot more details next week. Have been postponing it for a long time.

fabmilo · 2024-02-28T21:27:01 1709155621

I find this extremely interesting. Do you share the source code of the process? any more references?

nutanc · 2024-02-29T03:49:39 1709178579

Unfortunately the source code is currently not open sourced. Some more details at (https://www.researchgate.net/publication/370980395_A_NEURAL_...), the source code is built on top of this.

The approach is used to solve other problems and papers have been published under https://www.researchgate.net/profile/K-Eswaran

We are currently trying a build a full fledged LLM using just this approach(no LLM training etc) and also an ASR. We should have something to share in a couple of months.

licnep · 2024-02-29T11:52:04 1709207524

Am I missing something or is this just a linear transformation?

It says here ( https://www.researchgate.net/publication/370980395_A_NEURAL_... ) that each layer can be represented as a matrix multiplication (equation 3): Ax = s

So concatenating multiple layers could just be reduced to a single matrix multiplication?

If there is no non-linearity I don't see how this could replace neural networks, or am I missing something?

nutanc · 2024-03-01T03:48:02 1709264882

The attempt is not to replace a particular neural network which has already been trained by using Sigmoid or Rel functions. If one does this then one would necessarily have to use non-linear maps. The whole point is that such a non-linear technique is not necessary for classifications. It is not necessary to confine clusters by hyperplanes for solving a classification problem. Our focus is on individual points.

We believe the brain does not do nonlinear maps!

m3kw9 · 2024-02-28T18:39:00 1709145540

How is this not lossy compression?

sandyarmstrong · 2024-02-28T19:01:29 1709146889

LLMs and vector embeddings are always lossy compression, yes?

eru · 2024-02-29T01:54:04 1709171644

Almost always. Though you can use them in a lossless compression system, too, with a few tricks.

stolsvik · 2024-03-03T09:35:25 1709458525

.. but you don't want to tell us?

eru · 2024-03-04T05:43:37 1709531017

Two possible implementation:

(1) Take your data as a stream. Use your machine learning gadget to give you the (predicted) probability for each of the possible next tokens. Then use those probability in arithmetic coding to specify which token actually came next.

(2) Take your data D. Apply lossy compression to it. Store the result L := lossy(D). Also compute the residue R := D - uncompress(L). If your lossy compression is good, R will be mostly zeroes (and only a few actually differences), so it will compress well with a lossless compression algorithm.

Approach (1) is a more sophisticated version of (2). None of this is anything I came up with, those approaches are well known.

See eg https://arxiv.org/abs/2306.04050 and https://en.wikipedia.org/wiki/Audio_Lossless_Coding or https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049... (Probably not the best links, but something I could find quickly.)

baq · 2024-02-28T19:25:18 1709148318

kind of related: https://medium.com/@heinrichpeters/commentary-gzip-knn-beats...

rf15 · 2024-02-28T18:58:39 1709146719

It kind of is!