Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for the tip. I'll see where I can apply it.

Most of the time is usually spent in the convolution layers. Convolution is not a matrix multiplication in the current implementation. I guess it would be a matrix multiplication in frequency domain or by using a Toeplitz matrix.

I've implemented a CPU Parallel version and gave a try at GPU implementation. But I'm not satisfied at all by the GPU version :)

https://github.com/cbovar/ConvNetSharp/blob/master/src/ConvN...

https://github.com/cbovar/ConvNetSharp/blob/Gpu/src/ConvNetS...

Pull requests more than welcome!




> Convolution is not a matrix multiplication in the current implementation

I figure there's a code re-organisation task since propagating node activations through a layer of weights is essentially a matrix multiplication (fully connected => fully dense matrix).

The optimised routines make use of vectorised CPU instructions and the FMA instruction (fused multiply and add), all of which are perfect fits for [dense] matrix multiplcation. Not so great for sparse matrices, but they help, usually unless it's very sparse it's faster to use a dense matrix format with zeros for the missing weights.

> Pull requests more than welcome!

Duly noted :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: