With the integrated ram and cpu and gpu on apple silicon, however it's done it y...

rasz · on April 12, 2023

Its not done for perf results, Xbox doesnt have ram on package and somehow does 560 GB/s

rnk · on April 12, 2023

The perf results I was referring to was the ability to run an llm locally (like llama.cpp) that uses a giant amount of ram in the gpu, like 40gig. Without this uniform memory model, you end up paging endlessly, so it's actually much faster for this application in this scenario. Unlike on a pc with a graphics card, you can use your entire ram for gpu. This isn't possible on the xbox because it doesn't have uniform memory as far as I know. So having incredible throughput still won't match not having to page.

Edit - I found an example from h.n. user anentropic, pointing at https://github.com/remixer-dec/llama-mps . "The goal of this fork is to use GPU acceleration on Apple M1/M2 devices.... After the model is loaded, inference for max_gen_len=20 takes about 3 seconds on a 24-core M1 Max vs 12+ minutes on a CPU (running on a single core). "