Hacker News new | past | comments | ask | show | jobs | submit login

Such a pity no one else can compete here presently. Would that others be able to gain a position where their software made them competitive on the free market.



Compete with Llama.cpp? Like transformers llama [0], exllama [1] (really fast), or litllama [2] ?

exllama is really memory efficient and really fast

[0] https://huggingface.co/docs/transformers/main/model_doc/llam...

[1] https://github.com/turboderp/exllama

[2] https://github.com/Lightning-AI/lit-llama

EDIT: Or do you mean cuda? Because yeah, it's such a shame AMD's Rocm is so bad even geohot gave up. it's examples don't even run without crashing.

https://github.com/RadeonOpenCompute/ROCm/issues/2198#issuec...


Also https://github.com/kayvr/TokenHawk, a WebGPU implementation of LLaMA.

edit: Note that this is my project.


Thanks for the tip about exllama, I've been on the lookout for a readable python implementation to play with that is also fast and has support for quantized datasets.


There was free competition here, a while ago. OpenCL was formed by Apple, Khronos et al. to stave off CUDA's dominance. The platform languished from a lack of commitment though, and Apple eventually gave up on open GPU APIs entirely. Nvidia continued funding CUDA and scaling it for industry application, and the rest is history. The landscape of stakeholders is just too bitter to unseat CUDA for what it's used for - your best shot at democratizing AI inferencing acceleration is through something like Microsoft's ONNX[0] runtime.

[0] https://onnxruntime.ai/


CUDA had a lot of inertia and opencl brought half baked docs and half baked support out of the gate. If they had focused on simplifying their api to be more user friendly for the 80% use case it could've been a success. Opencl always looked nice on the surface but a few hours in and you've exhausted the docs trying to figure out what to do and there's no good example code around. Of course if they really wanted it to succeed they would've built a Cuda to opencl transpiler for the c api or at least a comprehensive migration guide. I'm not convinced anyone involved was trying to make it popular.


Note that Llama supports acceleration on both OpenCL and Apple Metal


There’s also geohot’s tiny corp betting on AMD gpus.


Not any more.


https://geohot.github.io/blog/jekyll/update/2023/06/07/a-div...

AMD gave him a binary blob driver and that fixed his problem. Also, tinygrad is the only Python framework I know that has full OpenCL acceleration.


What do you mean? At least as of June 7 geohot was still working on amd drivers builds and stability. https://geohot.github.io/blog/jekyll/update/2023/06/07/a-div...

So far it doesn’t look that AMD is fully on board with Tiny Corp, but they are talking…


Why not ggml?


Unclear what this is referring to, but if it means CUDA vs other things it is worth noting that:

a) CUDA won in a free market because NVidia showed they cared about it

b) Llama has support for OpenCL (via CLBlast) and Apple Metal

The OpenCL support already has a custom kernel for token generation.


There's Fabrice Bellard's textsynth server. https://bellard.org/ts_server/

No open source though.


This isn't a market.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: