I would like to warn people away from taichi if possible. At least back in 1.7.0 there were some bugs in the code that made it very difficult to work with.
Do you have any more specifics about these limitations? I'm considering trying Taichi for a project because it seems to be GPU vendor agnostic (unlike CuPy).
I only dabbled in Taichi, but I find its magic has limitation. I took a provided example, just increased the length of the loop and bam! it crashed the Windows driver. Obviously it ran out of memory but I have no idea how how to adjust except experiment with different values. If it has information about the GPU and its memory, I thought it could automatically adjust the block size but apparently not. There is a config command to fine tune the for loop parallelizing but the docs says we normally do not need to use them.