Gpu.cpp: A lightweight library for portable low-level GPU computation

pavlov · 2024-07-13T12:35:14.000000Z

Lovely! I like how the API is in a single header file that you can read through and understand in one sitting.

I've worked with OpenGL and Direct3D and Metal in the past, but the pure compute side of GPUs is mostly foreign to me. Learning CUDA always felt like a big time investment when I never had an obvious need at hand.

So I'm definitely going to play with library and try to get up to speed. Thanks for publishing it.

austinvhuang · 2024-07-14T00:05:54.000000Z

Thanks very much!

You're probably better prepared than you think. The funny thing is after working on making compute workflows work with graphics APIs like vulkan and webgpu, CUDA is so user friendly by comparison :)

Feel free to say hi or ping us if you run into issues in the discord channel https://discord.gg/Q9PWDckbnR

0xf00ff00f · 2024-07-13T20:43:27.000000Z

This is cool, but they should have just used Vulkan. Dawn is a massive dependency (and a PITA to build, in my experience) to get what's basically a wrapper around Vulkan. Vulkan has a reputation for being difficult to work with, but if you just want to use a compute queue it's not that horrible. Also, since Vulkan uses SPIR-V, the user would have more choices for shading languages. Additionally, with RenderDoc you get source-level shader debugging.

Shameless plug: in case anyone wants to see how doing just compute with Vulkan looks like, I wrote a similar library to compete on SHAllenge [0], which was posted here on HN a few days ago. My library is here: https://github.com/0xf00ff00f/vulkan-compute-playground/

[0] https://shallenge.quirino.net/

austinvhuang · 2024-07-13T23:38:49.000000Z

Vulkan is definitely a valid angle and I seriously considered it as well. There's a few things that, in aggregate, led me to explore a different direction:

First, there's already a few teams taking a stab at the vulkan approach like kompute, so it's not like that's uncovered territory. At the same time I first looked into this the khronos/apple drama + complaints about moltenvk didn't seem encouraging but I'd be happy to hear if the situation is a lot better.

Second, even though it's not the initial focus, the possibility of browser targets is interesting.

Finally, there's not much in the fairly minimalist gpu.cpp design that couldn't be retargeted to a vulkan backend at some point in the future if it becomes clear that (eg w/ the right combination of vulkan-specific extensions) the performance differential is sufficient to justify the higher implementation complexity and the metal/vulkan tug of war issues are a thing of the past.

Ultimately there's much less happening with webgpu and the things that are happening tend to be in the ml inference infra rather than libraries. it seemed to be a point in the design space worth exploring.

Regarding Dawn - I've lived where your coming from. Some non-trivial amount of effort went into smoothing out the friction. First, if you look at the bottom of the repo README you'll see others have done a lot to make building easier - fetchcontent with Elie's repo worked on the first try, but w/ gpu.cpp users shouldn't even have to deal with that if they don't want to. The reason there's a small script that takes the few seconds to fetch a prebuilt shared library on the first build is so that you can avoid the dawn build by default. After that it should be almost instantaneous to link and compile cycles should be a second or two.

But as I mention elsewhere in these threads, if the Dawn team shipped prebuilt shared libraries themselves, that would be an even better solution (if anyone at Google is reading this)!

rahkiin · 2024-07-13T21:13:33.000000Z

Your suggestion would not work on mac or ios

rice7th · 2024-07-13T21:33:43.000000Z

Moltenvk is a great solution

austinvhuang · 2024-07-13T22:34:24.000000Z

Hi, author here! Agh I was intending for the project to fly under the radar for a few more days before making the announcement and blog post (please look/upvote that when you see it haha :)

But since this is starting I'm happy to chat. Nice to see the interest here!

JackYoustra · 2024-07-14T00:14:56.000000Z

Thoughts on this vs wgpu (and the associated projects)?

austinvhuang · 2024-07-14T02:29:00.000000Z

wgpu is an implementation of the WebGPU API, so it's basically an alternative to Dawn.

gpu.cpp is one level up - it's implemented using the WebGPU API, not an implementation of the WebGPU API. In theory it should work with both wgpu and dawn but in practice you find there's enough differences it takes some conditional branching + testing to support both.

Having both wgpu and dawn support would be nice and I think we'll get there in the coming months but for faster early iteration I wanted to keep things simple for now. There's implementation + maintenance + testing overhead that you start to have to carry around so it isn't free.

almostgotcaught · 2024-07-13T16:33:09.000000Z

TIL you can run the WebGPU runtime without a browser.

summarity · 2024-07-13T16:37:59.000000Z

For me that’s its most promising feature. At last a truly cross platform compute library (not this, WebGPU itself). With two complete and mature implementations no less (dawn and wgpu).

binary132 · 2024-07-13T17:00:16.000000Z

I do not think of dawn or wgpu as complete and mature, has something changed?

moffkalast · 2024-07-13T17:17:20.000000Z

Yeah does Firefox support it yet in stable, or are they still a solid year behind Chrome as usual?

rahkiin · 2024-07-13T21:15:02.000000Z

WebGPU is interesting outside the browser: both dawn and wgpu-rs can be used as cross playform native gpu layer. That does not depend on firefox having webgpu support

austinvhuang · 2024-07-13T23:46:09.000000Z

You're not alone.

I've had hour long conversations explaining the project talking about how webgpu can be used natively, how rust and zig people are using webgpu as a main GPU APIs (with wgpu and mach) and at the end there's still clarification questions about differences from WebGL and WASM.

The phrase "native webgpu" might as well be a Stroop Effect prank in technology branding.

jph00 · 2024-07-13T22:53:00.000000Z

We just published an article introducing gpu.cpp, what it's for, and how it works:

https://www.answer.ai/posts/2024-07-11--gpu-cpp.html

ngcc_hk · 2024-07-15T05:24:58.000000Z

Wonder whether one can use Lua or even love2d to further the reach of Gpu usage.

soci · 2024-07-14T15:51:09.000000Z

I watched the video mentioned in the post [1], but now I’m more confused than before…

What are the benefits, if any, of using gpu.cpp instead of just webgpu.h (webgpu native) directly? Maybe each is tailored for different use cases?

[1] https://youtu.be/qHrx41aOTUQ?si=CehJnYQWCg3XklHj

austinvhuang · 2024-07-14T17:32:28.000000Z

The raw WebGPUAPI is geared towards infrastructure type of usage, eg ML compilers, game engines, etc and is pretty verbose for application and research use cases.

Under examples/, for pedagogical purposes + help contributors understand what happens with WebGPU under the hood, I actually included an example of invoking the same GELU kernel as in the hello world example without gpu.cpp. It looks like this and is ~ 400+ LoC and also will take several minutes to build Dawn:

https://github.com/AnswerDotAI/gpu.cpp/blob/main/examples/we...

A goal of gpu.cpp is to make the power of webgpu much less painful to integrate into a project without having to jump through as many hoops (+ also sets up the prebuilt shared library so builds are instantaneous and painless instead of reams of cmake hassles + 5-10 minutes of waiting for dawn to build):

https://github.com/AnswerDotAI/gpu.cpp/blob/main/examples/he...

uLogMicheal · 2024-07-13T16:12:39.000000Z

This is awesome! Was looking at creating similar, inspired by the miniaudio approach. Will likely contribute a dart wrapper soon.

austinvhuang · 2024-07-14T00:10:02.000000Z

Thanks! If there are binding projects, feel free to get in touch so we can link it + trade notes.

hpen · 2024-07-13T13:55:16.000000Z

Any performance metrics vs Vulkan, metal, etc?

austinvhuang · 2024-07-14T00:31:29.000000Z

The data that is out there is reasonably promising with WebGPU already in use in some production ML inference engines. TVM of course is way ahead of the curve as usual - https://tvm.apache.org/2020/05/14/compiling-machine-learning... though this post is quite old now.

It's still early days for pushing compute use cases to WebGPU (OctoML being super early notwithstanding). There's a small matmul in the examples directory but it only has the most basic tiling optimizations. One of my goals the next few weeks is porting the transformer block kernels from llm.c - I think that will flesh out the picture far better. If there's interest, happy to collaborate + could potentially do a writeup if there's enough interest.

There's always some tradeoffs that comes with portability, but part of my goal with gpu.cpp is to create a scaffold to experiment and see how far we can push portable GPU performance.

koolala · 2024-07-14T13:19:39.000000Z

WebGPU is slower than WebGL2 on the GPU but faster on the CPU.

mpreda · 2024-07-13T14:26:01.000000Z

vs OpenCL, ROCm, CUDA?

zamadatix · 2024-07-13T18:37:05.000000Z

Since this library ends up acting as a layer on top of the listed specifications it'd be more applicable to see benchmarks comparing the performance to building on top of said specifications directly to get an idea of overhead. At that point you could layer existing generic comparisons for the specifications you listed (or anything else for that matter) instead of needing them all to be redone specifically with this in mind.

captaincrowbar · 2024-07-13T23:43:14.000000Z

This looks useful but I'm worried about portability. Are there any plans for native Windows support?

austinvhuang · 2024-07-13T23:59:39.000000Z

Windows should work since WebGPU can target DirectX or Vulkan and it should be possible to build in WSL.

However I was planning to announce next week after I've had a chance to test with my Windows-using colleagues and this thread came early, so it's possible we'll run into some hiccups.

Meet us on discord here if anyone needs helps or just wants to say hello - https://discord.gg/Q9PWDckbnR

kookamamie · 2024-07-14T07:41:41.000000Z

I would say most people would not consider WSL to be "Windows".

captaincrowbar · 2024-07-14T08:39:04.000000Z

Put it this way: Can I build an executable using this, that I could confidently give to a Windows user who has never heard of WSL?

austinvhuang · 2024-07-14T10:40:15.000000Z

Fair enough - I don't think there's any hard blockers to doing this, but to get the same QoL we'll want to add a dawn dll to the available prebuilt binaries and adjust the download script.

Will look into this in the coming weeks (or if anyone is up for contributing let us know).

Arech · 2024-07-13T13:39:11.000000Z

Very interesting... I wonder, how does code performance compares to raw Vulkan?

austinvhuang · 2024-07-14T13:17:48.000000Z

See https://news.ycombinator.com/item?id=40952182#40957959

It's early but my current since WGSL -> SPIRV is fairly shallow mapping you should be able to get close modulo extensions. Extensions can be important though, in particular I'm tracking this closely:

https://github.com/gpuweb/gpuweb/issues/4195

One subgoal of gpu.cpp is to be able to have a canvas to experiment and see how far we can push the limits.

coffeeaddict1 · 2024-07-14T13:29:54.000000Z

Is this intended to integrate well in an existing WebGPU project?

austinvhuang · 2024-07-14T13:53:48.000000Z

Part of the goal is not to get in the way if there's other aspects of a project that talk to WebGPU directly. If you're already using WebGPU the correspondence should be pretty familiar if you look at the `gpu.h` source. We specifically avoided extra layers of indirection so that you can mix in direct calls against the WebGPU API when needed.

coffeeaddict1 · 2024-07-27T10:49:26.000000Z

Just tried building this on Linux and running your examples. I have to say that your CMake scripts are quite unidiomatic and your build instructions are not really clear (e.g why are you suggesting using "make" instead of "cmake --build build"?).

When I tried to run the matmul example, nothing gets output on the terminal. Is logging disabled by default? The executable runs and just exits.

apatheticonion · 2024-07-14T02:36:02.000000Z

Oh nice! Would love to see a Rust crate wrapping bindings for this

austinvhuang · 2024-07-14T10:36:07.000000Z

Thanks!

If anyone adds bindings let us know so we can link it in the readme.

01HNNWZ0MV43FF · 2024-07-13T15:32:23.000000Z

> The only library dependency of gpu.cpp is a WebGPU implementation.

Noo

austinvhuang · 2024-07-13T23:20:29.000000Z

I understand what you mean. We tried to make it as painless as possible by providing a downloadable prebuilt shared library so user's don't need to know the pain of building dawn from scratch. It's just a few seconds to download the first time and after that you just link instantaneously

For those that really do want to build end-to-end, there are community efforts (which I've leaned on) that make dawn builds much more palatable which I link at the bottom of the README.

We'll need to kick the tires to see if anyone reports ABI issues (I had more testing to do before announcing the project but this thread came early). I really want the Google Dawn team to ship a shared library though so we in the community don't have to roll our own.

thrtythreeforty · 2024-07-14T13:25:49.000000Z

I know you said elsewhere in this thread that you want to focus on a single WebGPU runtime for the moment, but I just want to plug how easy it is to build wgpu even as a submodule of a C++ project. I had a demo integrated into my project in less than an hour of tinkering with CMake.

austinvhuang · 2024-07-14T15:36:41.000000Z

Yes wgpu is a much lighter build and has a lot going for it.

The situation has gotten a lot better for both dawn and wgpu integration in C++ with:

https://github.com/eliemichel/WebGPU-distribution/

Getting a shared library build was a revelation though, credit to:

https://github.com/jspanchu/webgpu-dawn-binaries

because the FetchContent cache invalidations would still periodically lead to recompiling which gets quite annoying. When it's just a matter of linking you get few-second builds consistently. The cost is we'll have a bit of hardening around potential ABI bugs but it's ultimately worth it.

We'll work towards wgpu support. There's some sharp edges in the non-overlap w/ dawn which seem most pronounced with the async handling (which is pretty critical), but I don't think anything is a hard blocker.

thrtythreeforty · 2024-07-15T15:48:59.000000Z

WebGPU-distribution is news to me; thanks for the link!

sieste · 2024-07-13T20:39:02.000000Z

What's the problem?

kookamamie · 2024-07-14T07:40:41.000000Z

Portable, as in Windows native is not supported?

xaxaxb · 2024-07-13T15:24:42.000000Z

[flagged]

abenga · 2024-07-13T16:10:35.000000Z

Be the change you want to see in the world. Write the rust implementation yourself.

yazzku · 2024-07-13T16:59:03.000000Z

Because C++ is better.

ranger_danger · 2024-07-13T17:36:48.000000Z

I wish rust people spent their time writing software instead of going around telling other people to do the job for them. They only manage to get others annoyed with this attitude.

lukan · 2024-07-13T19:22:03.000000Z

I am >95% certain is was a joke ..

(but after looking at the comment history, maybe less)

byefruit · 2024-07-13T14:17:41.000000Z

This looks great. Is there an equivalent project in rust?

LegNeato · 2024-07-13T15:30:48.000000Z

https://github.com/charles-r-earp/krnl, and more broadly https://github.com/EmbarkStudios/rust-gpu.

byefruit · 2024-07-14T08:34:08.000000Z

Thank you!