I've been in data analysis for a few years -- grew from scientific research in S...

duped · on March 4, 2022

I don't think it's incompetence so much as professional ignorance (which might be incompetence, I don't think it is). Source control is for your source code, not for your data. Your data belongs in a data base. Git is not a database, or at least shouldn't be treated like one.

Sure it's easy to call it lazy for a data set to be in some local directory and accidentally get committed. Happens to all of us. The bigger problem is "why is that data sitting on your file system in a directory, when it should be in some data base, preferably not locally."

> these tools have great power that should be used with greater training, responsibility, and care.

This screams more and more that the tools are bad. Git is famously hard to use and even harder for non-plaintext data. Databases are annoying to initialize and get access to without a developer who's done it before. The tools suck, they can be better, and require less training. It's not wrong to be lazy - it's wrong to make the lazy path dangerous.

Karellen · on March 4, 2022

> Git is not a database

Um, yeah, it is - by most reasonable definitions of the word "database".

No doubt there are a few unusually-narrow definitions of "database" out there that would exclude Git, but I'm pretty certain they're in the minority.

duped · on March 4, 2022

I'm not talking about pedantry, but pragmatism. Git is not designed to be used as a conventional database, and should not be.