A great alternative to numba for accelerated Python is Taichi. Trivial to convert a regular python program into a taichi kernel, and then it can target CUDA (and a variety of other options) as the backend. No need to worry about block/grid/thread allocation etc. at the same time, it’s super deep with great support for data classes, custom memory layouts for complexly nested classes, etc etc, comes with autograd, etc. I’m a huge fan - makes writing code that runs on the GPU and integrates with your python libraries an absolute breeze. Super powerful. By far the best tool in the accelerated python toolbox IMO.
>they made a lame excuse that Pytorch didn't support 3.12
how is this a lame excuse
>but it fails on a bunch of PyTorch-related tests. We then figured out that PyTorch does not have Python 3.12 support
they have a dep that was blocking them from upgrading. you would have them do what? push pytorch to upgrade?
>Later, even when Pytorch added support for 3.12, nothing changed (so far) in Taichi.
my friend that "Later" is feb/march of this year ie 2-3 months ago. exactly how fast would you like for this open source project to service your needs? not to mention there is a PR up for the bump.