OpenCL works on both AMD and Nvidia GPUs with mostly the same source code. By supporting at-runtime compilation it allows a lot of code particularization/instantiation before compilation, which reduces the power (cost) of the generated code. In general OpenCL is close enough to the HW and the generated code is improving over time (LLVM).
Motivation: a long time ago I had an AMD GPU and no way to run an LL test on it, so I decided to write my own. And I was hooked by the power of the GPU and the quest for ever more efficient, faster implem.
The HW setup for finding the prime was Nvidia and AMD GPUs with good FP64 in the cloud, using "spot" instances for better price. This allowed scaling up quickly to many GPUs, and it did have a significant cost.
My personal setup is 8x Radeon Pro VII which also provide heating during the cold season. During summer the effort is in removing the excess heat, and the GPUs run in a reduced-power mode (slower & more efficient).
Indeed, I’m curious why you’ve used OpenCL. And what was the hardware/general setup used for finding the prime?
What was your motivation behind building this software?