Hacker News new | past | comments | ask | show | jobs | submit login
Saving memory using gradient-checkpointing (github.com/openai)
52 points by stablemap on Jan 15, 2018 | hide | past | favorite | 3 comments



It takes about 4GB of memory to train a VGG network w/ batch size 8 on Tensorflow. Could I use a larger batch size at the expense of computation time w/ this module?


Yes, potentially a batch-size 64 with a 25% increase in run-time based on the figures they reported.


I wonder if this would work when using keras with the tensorflow backend. Has anyone tried it?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: