I work with large-scale neuroscience imaging, and this is exactly how we compress 3D image stacks (i.e. 3d volumes) captured with confocal microscopes. Since adjacent frames are usually quite similar, there's a ton of redundancy that H.265 can exploit, and the compression ratios are amazing. For multi-channel volumetric imaging, we use ffmpeg to encode each channel as a movie and then combine all the channels into a single HDF5 file.
Oh wow, this is a great idea. How do you deal with the lossy compression? There must be a lossless codec which uses the redundancy better than deflate?
This format is meant for visualization in 3d, and even though it's lossy, it's "visually lossless" for humans. We to start with the archived lossless stacks (compressed with bz2) for any reprocessing.
What you're suggesting with a lossless movie codec would be a great addition, we just haven't had the need for it yet.
I love the idea that you rotate a spacial dimension into a time dimension, and after decompression you get artifacts from the time dimension visualised in a space dimension again.