Hacker News new | past | comments | ask | show | jobs | submit login

After reading the results I skipped back to the comment section to ask if this was real because it looks a little too good to be true, but figured I should check authors and it's Microsoft research and UCAS so yeah, real. This is going to change a lot of things, obviously the edge computing applications they point out, but also this is going to bottom out the cost of providing high-performance LLMs in the cloud. I don't know what that means for the economics long term, naively way less costs maybe means new entrants without an entire cloud available can compete easier? I do wonder if something like this has already been found and implemented by either OpenAI or Google.



After playing with OpenAI's GPT4 API, I'm quite convinced that LLMs would be in everything and everywhere today if inference cost is as low as loading a website and context size is 100x higher.

In other words, only inference cost is holding it back from completely changing everything.

So if we have a shortcut to getting something like GPT4 to run locally on a small device, watch out.


LLMs will give normal people a firmer standing in technological society. That's a good thing. But will it change everything? Not a chance. Even if LLMs did change everything, that probably would not be a good thing. Dijkstra says Muslim algebra died when it returned to the rhetoric style, and the modern civilized world could only emerge —for better or for worse— when Western Europe could free itself from the fetters of medieval scholasticism —a vain attempt at verbal precision!—thanks to the carefully, or at least consciously designed formal symbolisms that we owe to people like Vieta, Descartes, Leibniz, and (later) Boole. So don't be so proud of these graphics cards you've made, because the ability to understand the human tongue is insignificant compared to the power of math.


> the modern civilized world could only emerge —for better or for worse— when Western Europe could free itself from the fetters of medieval scholasticism

I can propose an alternate view of things. Not that I'm going to argue that it is the only true statement in the world, but I think it is necessary for a thought to progress to have an alternative hypothesis.

So the proposition is: formal symbolisms can deal only with those problems that where already solved in imprecise human's languages.

To invent calculus and orbital mechanics you need first to talk for a several centuries (or thousands of years?) about what is position and velocity, you need to talk your way upto acceleration, and then you need to find a way to measure them and to define in a strict geometric terms. Ah, and infinity, it was a very counter-intuitive idea, Xenon invented some of his paradoxes specifically to point at counter-intuitiveness. When Newton came all these talks and debates did the most of work for him.

> the ability to understand the human tongue is insignificant compared to the power of math.

But the fun is: you cannot know if someone understands math if they do not understand human language too. You cannot teach math to those who cannot speak human language.

Math is a cream on top with a limited applicability. What math can say about love? I do not like to sound like Dumbledor, but really behind all we do there is an emotions motivating us. Math cannot deal with emotions, because it was built that way and because non-math talks about emotions hadn't bring a good model for emotions, which math could express in a formalized language.

> Dijkstra says

I wonder when he said it? Before AI concluded that expert-systems based on logic were acknowledged to be a failure or after that?


> So the proposition is: formal symbolisms can deal only with those problems that where already solved in imprecise human's languages.

> To invent calculus and orbital mechanics you need first to talk for a several centuries (or thousands of years?) about what is position and velocity, you need to talk your way upto acceleration, and then you need to find a way to measure them and to define in a strict geometric terms. Ah, and infinity, it was a very counter-intuitive idea, Xenon invented some of his paradoxes specifically to point at counter-intuitiveness. When Newton came all these talks and debates did the most of work for him.

For the sake of argument, let's grant your story about what you need to invent calculus.

But once you invented calculus, you can then use it to solve all kinds of problems that you would never in a thousand years be able to handle with mere talk.


> all kinds of problems that you would never in a thousand years be able to handle with mere talk

Not "all kinds of problems" but very specific kinds of problems which is possible to formalize into a math language. How would you go about inventing thermodynamics if you didn't know words "temperature" and "pressure"? You'd need to start for your senses that can tell you "this is a hot surface", or "this is a cold one", or "this one is colder than that", you need to decide that "coldness" is a "negative heat" (it is not the most obvious idea for an animal, because animals have as receptors for a cold, so receptors for a heat, you could feel hot and cold at the same time, if you managed to stimulate both kinds of receptors at the same time). Then you need to notice that some materials change volume when heated, then you need to come up with an idea to use measurements of a volume to measure a temperature, and only then you can try to invent pV=nRT, which becomes almost tautological at that point, because your operational definition of a temperature makes it equivalent to a volume.

After that you really can use calculus and make all sorts of quantitative statements about thermodynamic systems. But before all that "mere talk" was finished thermodynamics was not a kind of a problem calculus can deal with.


The 'mere talk' doesn't have to finish. You can have pretty nebulous ideas, and still start making progress with the formalism. The formalism can even help you 'finish' your thoughts.

In fact that kind of 'finishing' is very important, because otherwise you can waste a lot of time talking without noticing that you are not going anywhere. See eg philosophy or theology or pre-scientific-revolution science (ie natural philosohpy and natural history).


One possible way of looking at this is that human language is the way most people deal with abstraction, and abstract concepts. And there does seem to be some evidence that some of these abstractions in language may be universal to humans (I don’t fully buy all of the universal grammar stuff but still)

I think you could conceive of abstraction from other forms, maybe something like platonic forms as a base instead of language (again probably not in humans, but in others)


I agree with your basic thesis here, retrospection will view LLMs as a transitional architecture.

However, this paper is evidence that the field is figuring out how to built what's actually needed, which is a good thing.


LLM's can do math as well.


Last time I checked, GPT-4 couldn't reliably add 2 numbers, never mind anything more complex.


Last I checked (and confirmed by repeating it just now) GPT-4 did just fine at adding 2 numbers up, because it knows better now than to do that manually and will express it as Python. It does worse if you try to force it to do it step by step like a child and don't reinforce adherence to the rules every step, because just like humans it gets "sloppy" when you try to get it to repeat the same steps over and over.

If you want to measure its ability to do mindlessly repetitive tasks without diverging from instructions, you should compare it to humans doing the same, not expect it to act like a calculator.

If you want to measure its ability to solve problems that involve many such steps that are simple to express but tedious to carry out, ask it to write and evaluate code to do it instead.


The claim was that "LLMs can do math". Below they linked a model from Google that might be capable of that, but as a general rule (and with OpenAI's models specifically) LLMs can't "do math" by any reasonable definition.


I've had it do plenty of math. Some it does badly at, some it does fine. Generally it's not "disciplined" enough to do things that requires lots of rote repetitive tasks, but neither are most humans, and that has improved drastically as they've adjusted it to instead do what most humans do and use tools. Would it be nice if it also got more willing to "stick to it" when given rote tasks? Sure.

But whether or not it can "do maths" to your definition depends very much on what you want it to do, and how you define "do maths". To me it's irrelevant if it's doing the low-level calculations as long as it knows how to express them as code. If I wanted a calculator I'd use a calculator. And I don't consider a calculator able to "do math" just because it can precisely add numbers.

Meanwhile I've had lengthy discussions with GPT about subjects like orbital mechanics and calculating atmospheric effects where it correctly used maths that I had to double-check not because I didn't trust GPT (though I also want't to verify for that reason) but because I didn't know the maths (not that it was anything particularly advanced, but I lost interest in maths during my CS degree and picked the minimum amount of maths I could get away with).

By my definition it can "do maths" just fine. I guess you don't consider my view of that "reasonable". I can live with that, as meanwhile, it will keep doing maths for me when I need it.

Of course this was also a case of moving the goalposts to set up a strawman - in the comment of yours I replied to, you claimed it couldn't reliably add two numbers.


It often fails at basic 3-4 digit arithmetic. If you're stretching that definition far enough to claim that GPT4 can "do math" then I should be able to call myself a commercial pilot because I can land a plane in a sim 20% of the time.

I'm not moving goalposts, the original claim was that LLMs can "do math". Primary school arithmetic is math.

GPT-4 can't do math and that's okay, I don't understand why so many of you are so touchy and defensive about this. It's a limitation that exists, nothing more, nothing less.


GPT-4 is a tiny subset of "LLMs".

If you train a model to do math (and optimize representation for that), it'll do math. GPT-4 just isn't, and, generally speaking, they aren't, because it's much more efficient to train them to "use a calculator". Same as with humans.


You do realize that arithmetic is a very simple symbolic manipulation task? All you have to do is keep track of the carry. I haven't seen an LLM that couldn't get digit by digit addition done, but they always mess up the carry.


Just like humans. Try to get regular people do e.g. add 15-16 digit numbers (where is typically where I'd see GPT4 start to get "sloppy" unless you prompt it the way you would a child who's learning and is still prone to get annoyed and wonder why the hell you make them to it manually), and see how many start making mistakes.

I find it really comical that this is what people complain about GPT over - there's zero benefit to get LLMs to get good at this over other tasks. To the extent we get it "for free" as a benefit of other learning, sure, but when we make kids practice this over and over again to drill doing it without getting sloppy, it has traditionally been out of some belief that it's important, but a computer will always have a "calculator" that is far more efficient than the LLM at its disposal and it's idiocy to care about whether it does that part well the tedious and hard way or knows how to describe the problem to a more efficient tool

I also find it comical that people use tasks where LLMs behaviour is if anything mot human-like, in its tendency to lose focus and start taking shortcuts (before GPT4 started writing Python instead, it'd for a while try really hard to not give you a step by step breakdown and instead clearly take shortcuts even you prompted it heavily to reason through it step by step), when presented with stupidly repetitive tasks as examples of how they're not good enough.


this goes into the heart of what it means to "know".

All human knowledge is "symbolic". that is, knowledge is a set of abstractions (concepts) along with relations between concepts. As an example, by "knowing" addition is to understand the "algorithm" or operations involved in adding two numbers. reasoning is the act of traversing concept chains.

LLMs dont yet operate at the symbolic level, and hence, it could be argued that they dont know anything. LLM is a modern sophist excelling at language but not at reasoning.


Is this rant really necessary? Most models, especially ChatGPT4 can perform carry based addition and there is zero reason for them to fail at it, but the moment you start using quantized models such as the 5 bit mixtral 8x7b the quality drops annoyingly. Is it really too much to ask? It's possible and it has been done. Now I'm supposed to whip out a python interpreter for this stuff, because the LLM is literally pretending to be a stupid human, really?


GPT-x can't add, or subtract, or do anything else of the type... it can APPEAR to do so, because that's what it was built to do.... act like the text it's seen previously and predict what the next text would be.

If you include a large amount of properly solved math in its training text, it gets MUCH better at that kind of math.

It has a very deep set of intelligences that are alien to us, that allow it to predict and ACT LIKE us, when it comes to generating the next word. You're only seeing the output of those intelligences through a very lossy channel.

As a side note, there are structures in human language that apparently encode much more information that you might think at first glance. The fact that Word2Vec had such mathematical properties, despite it's relative simplicity, astound me to this day. Throwing a bunch of sine/cosine values on top of that to represent position in a sentence to enable LLMs is also amazing in that it works.


This comment reminded me of that scene in Indiana Jones where the guy is spinning the sword around about to attack Indy, and then Indy just pulls out his pistol and shoots him.


- Hey ChatGTP ! What it 69*94 ?

- The result of 69*94 is 6466.


What makes you think that? Which LLMs?



most open models do it poorly though. ChatGPT is better at it.


I'll agree with you, and add that inference speed is a big factor too.

SDXL-ligtning/cascade can generate images in 200ms which is fast enough to fit in a web request, and paradoxically makes it even cheaper to generate.

And using groq at 500 t/s is wild compared to any of the other platforms.


500 t/s is uncomfortably fast to me. Generating high quality answers at speeds faster than I can read is the point at which I feel like LLMs are magic.

I’m glad people are doing it though, and I’ll happily adapt to accessing inference at that speed.


That's important for new applications to emerge where this happens on lots of data. You can't run LLMs at scale on tasks like Google might (every webpage) when the cost of each document is so high to process. Interactive chatbots are just the tip.


That is the plan. Even if these independent software improvements don't create 10x improvements NVDA and others are making huge improvements.


It's coming in October with the new Apple chip


I'd be very surprised if Apple can put something on the level of GPT4 on a handheld. Remember, GPT4 is estimated to be around 1.7 trillion parameters. That's 3.4TB at 16 bit and it would still be ~340GB at 1.58bits. The best we can hope for is a low-ish level few billion parameter model. Which would still be cool on a phone, but as of today these models are nowhere near GPT4.


You don't need "GPT4" though. Mixtral 8x7B is robust and can be run in 36 Gb, 24 Gb if you're willing to compromise. A 1.5 bit quantization should bring it down to 16. That's still a lot compared to the iPhone 15's 6, but it's close enough to imagine it happening soon. With some kind of streaming-from-flash architecture you might be in the realm already.


> With some kind of streaming-from-flash architecture you might be in the realm already.

I thought mmap'ing models to only keep the currently needed pieces in RAM was something that was figured out ~6 months ago? Performance wasn't terribly great iirc, but with how much faster 1.58B is, it should still be okay-ish.


There is a more detailed paper from Apple on this. Basically, you can do a little bit better than only keeping current weights in RAM with mmap.

For LLM, you are mostly dealing with b = W @ a where a and b are vectors, only W is the matrix. If a is sparse (i.e. have a few 0s), you don't need all the columns from W to do the matrix-vector multiplication. A cleverly arranged W can make sure during inference, only related columns loaded from flash. Further more, if you can apply "One Weird Trick" paper to this matrix-vector multiplication, you can shard W by rows, i.e. `b[i:i+n] = W[i:i+n,:] @ a[i:i+n] for i in range(N, N / b)` such that while the previous b[i:i+n] is still computing, you have visibility on which columns of the next matrix to be loaded already.


You need all of the model in RAM to perform the matmult that gets you the next token from it. There's no shortcut.


I'm not sure what use that is, other than to maintain the KV cache across requests.


They won't have something at that size because as you pointed out, it is still huge. But depending on how they are used, smaller parameter models may be better for specific on-phone tasks that start to make the size of the model not a problem. GPT4 is so large because it is very general purpose with the goal seeming to be to answer anything. You could have a smaller model focused solely on Siri or something that wouldn't require the parameter size of GPT4


The thing a about GPT4 that matters so much is not just raw knowledge retention, but complex, abstract reasoning and even knowing what it doesn't know. We haven't seen that yet in smaller models and it's unclear if it is even possible. The best we could hope for right now is a better natural language interface than Siri for calling OS functions.


I wouldn't be surprised if this causes hardware startups to pop up that build accelerator cards tuned for this architecture. It seems stupidly simple to do inference in hardware, and with most of the training being quantized as well you might even be able to provide speedups (and energy savings) for training with reasonable investment and on cheaper processor nodes than what Nvidia is using.

Sure, Nvidia might eat their lunch in a couple of years, but bitcoin ASICs prove that you can have a niche producing specialized processors, and VCs would probably jump at the thought of disrupting Nvidia's high margin business.


There's like a million startups promising analog / bit-level computation, inference-only, cheap computation.

There's rain.ai, d-matrix, etc.


If this dethrones Nvidia, it would be a wonderful side effect


It's more likely that Nvidia will offer support to INT2 in the next generation and keep their dominance.


INT2 ternary is equivalent to INT1 + binary mask. Nvidia supprted INT1 matrix multiply in RTX20 and RTX30 generations, nobody used it, so they removed INT1 support from RTX40 generation.


What I get from your comment is now older RTX gens are going to be in high demand soon.


"next generation" those two words mean a whole lot.

Intel and AMD could also implement support in their "next generation" and that would be huge.


It also means the largest models can be scaled up significantly with the same inference budget.


Depends. The only paper they cite for training: https://arxiv.org/pdf/2310.11453.pdf doesn't improve training costs much and most models are already training constrained. Not everyone has $200m to throw at training another model from scratch.


Is there any scope for indie builders?


Not really. These are slightly better for memory during pre-training and fine turning but not enough to make a 4090 usable even for a 7b model.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: