We considered to use sth like this to cache some Python program state to speed u...

nravic · 2024-06-22T00:06:16.000000Z

How were you handling GPU state w/ pytorch? We added some custom code around CRIU to enable GPU checkpointing fwiw: https://docs.cedana.ai/setup/gpu-checkpointing/

albertzeyer · 2024-06-22T00:21:57.000000Z

Not at all. I forked before I used anything with CUDA. I didn't need it but I guessed this could cause all kind of weird problems.

alksjdalkj · 2024-06-22T00:18:46.000000Z

This sounds similar to what's been done to speedup FaaS cold starts - snapshot the VM after the startup code runs, then launch functions from the snapshot. E.g., https://www.sysnet.ucsd.edu/~voelker/pubs/faasnap-eurosys22.....