Containers are meant to be stateless infrastructure. By downloading something at startup, you're breaking that contract implicitly. Secondly, depending on where you're deploying, downloads from S3 (and then loading to memory) may take a non-negligible amount of time that can impact the availability of your pods (again, depending on their configuration).
Synchronicity everywhere may cause request loss if your ML pipeline is not very reliable, which in most cases it isn't. Relying on a message queuing system will also increase system observability because it's easier to expose metrics, log requests, take advantage of time travelling for debugging, etc.
> Containers are meant to be stateless infrastructure. By downloading something at startup, you're breaking that contract implicitly.
I feel that mounting a NFS partition is a similar break of contract. I.e. you could see the same image behave differently depending on what's in the NFS partition. I feel like to get data in a "reproducible" way you need to pull it from a data versioning system. I think there's different ways to implement data versioning with their own trade-offs. NFS and S3, among others, could be used to implement data versioning.
I agree with you that in theory an NFS is more performant because it allows you to load lazily.
This is mainly relevant if your data is used for training.
It seems like you'd want to use a log-based system like kafka to manage versioning and state in this case. I imagine you could:
1. Store incoming training data in a "raw data" topic.
2. A model trainer consumes incoming training data, updates a model's state, and at a pre-determined period writes the model's state as of a given offset in the "raw data" topic in a "model state checkpoint" topic.
3. Then you probably have some "regression testing" workflow that reads from the "model state checkpoint" topic and upon success writes to a "latest best model" topic.
4. Workers that use the model in production read from the "latest best model" topic and update their state upon a change.
I imagine you could add constraints about "model" continuity or gradual release to production that would make the process more complex, but I feel like fundamentally kafka solves a lot of the distributed systems problems.
Synchronicity everywhere may cause request loss if your ML pipeline is not very reliable, which in most cases it isn't. Relying on a message queuing system will also increase system observability because it's easier to expose metrics, log requests, take advantage of time travelling for debugging, etc.