This is pretty much aligned with our findings (am the author of this post).
I came away feeling that:
- M1 is a solid baseline
- M2 improves performance by about 60%
- M3 Pro is marginal on the M2, more like 10%
- M3 Max (for our use case) didn’t seem that much different on the M3 Pro, though we had less data on this than other models
I suspect Apple saw the M3 Pro as “maintain performance and improve efficiency” which is consistent with the reduction in P-cores from the M2.
The bit I’m interested about is that you say the M3 Pro is only a bit better than the M2 at LLM work, as I’d assumed there were improvements in the AI processing hardware between the M2 and M3. Not that we tested that, but I would’ve guessed it.
Yeah, agreed. I'll say I do use the M3 Max for Baldur's gate :).
On LLMs, the issue is largely that memory bandwidth: M2 Ultra is 800GB/s, M3 Max is 400GB/s. Inference on larger models are simple math on what's in memory, so the performance is roughly double. Probably perf / watt suffers a little, but when you're trying to chew through 128GB of RAM and do math on all of it, you're generally maxing your thermal budget.
Also, note that it's absolutely incredible how cheap it is to run a model on an M2 Ultra vs an H100 -- Apple's integrated system memory makes a lot possible at much lower price points.
I've been considering buying a Mac specifically for LLMs, and I've come across a lot of info/misinfo on the topic of bandwidth. I see you are talking about M2 bandwidth issues that you read about on linkedin, so I wanted to expand upon that in case there is any confusion on your part or someone else who is following this comment chain.
M2 Ultra at 800 GB/s is for the mac studio only. So it's not quite apples to apples when comparing against the M3 which is currently only offered for macbooks.
M2 Max has bandwidth at 400 GB/s. This is a better comparison to the current M3 macbook line. I believe it tops out at 96GB of memory.
M3 Max has a bandwidth of either 300 GB/s or 400 GB/s depending on the cpu/gpu you choose. There is a lower line cpu/gpu w/ a max memory size of 96GB, this has a bandwidth of 300 GB/s. There is a top of the line cpu/gpu with a max memory size of 128GB, this has the same bandwidth as the previous M2 chip at 400 GB/s.
The different bandwidths depending on the M3 max configuration chosen has led to a lot of confusion on this topic, and some criticism for the complexity of trade offs for the most recent generation of macbook (number of efficiency/performance cores being another source of criticism).
Sorry if this was already clear to you, just thought it might be helpful to you or others reading the thread who have had similar questions :)
Worth noting that when AnandTech did their initial M1 Max review, they never were able to achieve full 400GB/s memory bandwidth saturation, the max they saw when engaging all CPU/GPU cores was 243GB/s - https://www.anandtech.com/show/17024/apple-m1-max-performanc....
I have not seen the equivalent comparisons with M[2-3] Max.
Interesting! There are anecdotal reports here and there on local llama about real world performance, but yeah I'm just reporting what Apple advertises for those devices on their spec sheet
If money is no object, and you don't need a laptop, and you want a suggestion, then I'd say the M2 Ultra / Studio is the way to go. If money is still no object and you need a laptop, M3 with maxed RAM.
I have a 300GB/s M3 and a 400 GB/s M1 with more RAM, and generally the LLM difference is minimal; the extra RAM is helpful though.
If you want to try some stuff out, and don't anticipate running an LLM more than 10 hours a week, lambda labs or together.ai will save you a lot of money. :)
The tech geek in me really wants to get a studio with an M2 ultra just for the cool factor, but yeah I think cost effectiveness wise it makes more sense to rent something in the cloud for now.
Things are moving so quickly with local llms too it's hard to say what the ideal hardware setup will be 6 months from now, so locking into a platform might not be the best idea.
This is the most shocking part of the article for me since the difference between M1 and M2 build times has been more marginal in my experience.
Are you sure the people with M1 and M2 machines were really doing similar work (and builds)? Is there a possibility that the non-random assignment of laptops (employees received M1, M2, or M3 based on when they were hired) is showing up in the results as different cohorts aren’t working on identical problems?
The build events track the files that were changed that triggered the build, along with a load of other stats such as free memory, whether docker was running, etc.
I took a selection of builds that were triggered by the same code module (one that frequently changes to provide enough data) and compared models on just that, finding the same results.
This feels as close as you could get for an apples-to-apples comparison, so I'm quite confident these figures are (within statistical bounds of the dataset) correct!
I came away feeling that:
- M1 is a solid baseline
- M2 improves performance by about 60% - M3 Pro is marginal on the M2, more like 10%
- M3 Max (for our use case) didn’t seem that much different on the M3 Pro, though we had less data on this than other models
I suspect Apple saw the M3 Pro as “maintain performance and improve efficiency” which is consistent with the reduction in P-cores from the M2.
The bit I’m interested about is that you say the M3 Pro is only a bit better than the M2 at LLM work, as I’d assumed there were improvements in the AI processing hardware between the M2 and M3. Not that we tested that, but I would’ve guessed it.