I believe the fp64 limitation came from the laptop-grade GPU I had rather than i...

KeplerBoy · on Oct 6, 2023

Might very well be true. I don't blame anyone for not diving deeper into figuring out why this stuff doesn't work.

But this is one of the great strengths of CUDA: I can develop a kernel on my workstation, my boss can demo it on his laptop and we can deploy it on Jetsons or the multi-gpu cluster with minimal changes and i can be sure that everything runs everywhere.

brutus1213 · on Oct 7, 2023

There is indeed something excellent about CUDA from a user perspective that is hard to beat. I do high-level DNN and it is not clear to me what it is or why that is. Anytime I have worked on optimizing to mobile hardware (not Jetson, but actual phones or accelerators), it is just a world of hurt and incompatibilities. This notion that operators or subgraphs can be accelerated by lower level closed blobs .. I wonder if that is part of the issue. But then why doesn't OpenCL not just work? I thought it gave a CUDA kernel like general purpose abstraction.

I just don't understand the details enough to understand why things are problematic without CUDA :(

iopq · on Oct 7, 2023

Sorry, still trying to install some dependencies for DNN and CUDA, not sure why it says my Clang version is too new (!)