You could just use LoopVectorization on the CPU side. It's been shown to match w...

p1esk · on April 13, 2020

In that benchmark, was Julia tested against CuDNN accelerated neural network CUDA code? If not, is it possible (and beneficial) to call CuDNN functions from Julia?

ChrisRackauckas · on April 13, 2020

That wasn't a benchmark with CuDNN since it was a benchmark about writing such kernels. However, Julia libraries call into optimized kernels whenever they exist, and things like NNLib.jl (the backbone of Flux.jl) and KNet.jl expose operations like `conv` that dispatch CuArrays to automatically use CuDNN.