Hacker News new | past | comments | ask | show | jobs | submit login

I built crik[1] to orchestrate CRIU operations inside a container running in Kubernetes so that you can migrate containers when spot node gets a shutdown signal. Presented it at KubeCon Paris 2024 [2] with a deep dive for those interested in the technical details.

[1]: https://github.com/qawolf/crik

[2]: The Party Must Go On - Resume Pods After Spot Instance Shutdown, https://kccnceu2024.sched.com/event/1YeP3




My process connects to, say, Postgres. What's going to happen to that connection upon restore?

Does crik guarantee the order of events (saving a checkpoint should be followed by killing the old process/pod, which should be followed by a restoration - the order of these 3 events is strict) and given that criu can checkpoint and restore sockets state correctly - how does that work for kubernetes? The new pod will have a different IP.


TCP connections are identified with source IP:port and target IP:port tuples. When a new pod is created, it gets a new IP so there is not much way to restore the TCP connections. So crik drops all TCP connections and lets the application handle the reconnection logic. There are some CNIs that can give a static IP to pod, but that’s rather unorthodox in k8s.


Right, and this shouldn't be a big issue for [competent] cloud-native software: it's a transient fault. If your software can't recover from transient faults then this is the wrong ecosystem to be considering.


> The new pod will have a different IP.

Usually clients would connect to a Kubernetes svc to not have the problem with changing IPs. Even for just a single pod I would do that.


The app in the pod is the client (of a DBMS server). The client's IP gets changed. A service in k8s is a network node with an address, but it is used for inbound connections, outbound connections (like from the app to a DBMS server, which may be outside of k8s cluster) usually do not use services (as it gives no benefits).


great talk! I’m curious about an approach like this combined with CUDA checkpoint for GPU workloads https://github.com/NVIDIA/cuda-checkpoint


This makes sense for checkpointing and restoring long ML training runs.

Doing this on a networked application is going to be iffy. The restored program sees a time jump. The world in which it lives sees a replay of things the restore program already did once, if restore is from a checkpoint before a later crash.

If you just want to migrate jobs within a cluster, there's Xen.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: