I presume you're already familiar with computing the numerical and analytical Ja...

agibsonccc · on May 26, 2017

We spent a ton of time thinking about this. We have an "op executioner" in our tensor library that handles special cases like this. We call it "grid execution" where we look for opportunities for grouping ops automatically. We will be combining that with our new computation graph to automatically look for optimization opportunities like that.

Right now we hand write all of our own gradients as well.

The overhead can come from a ton of different places. This is why we wrote workspaces: http://deeplearning4j.org/workspaces

Allocation reduction and op grouping are only a few things you can do.

cf · on May 25, 2017

I did not know about gradcheck. Thanks for the pointer! I have some handwritten code that does some of this for me. But essentially, yes! I want better tooling to catch my mistakes.