Saving memory using gradient-checkpointing

eggie5 · on Jan 16, 2018

It takes about 4GB of memory to train a VGG network w/ batch size 8 on Tensorflow. Could I use a larger batch size at the expense of computation time w/ this module?

iaw · on Jan 17, 2018

Yes, potentially a batch-size 64 with a 25% increase in run-time based on the figures they reported.

supermdguy · on Jan 16, 2018

I wonder if this would work when using keras with the tensorflow backend. Has anyone tried it?