I'm having a hard time imagining positioning here. Can you explain further how Diversion differs from DVC, Git and when using Diversion makes sense over other use cases. The GTM is slightly confusing to me (also yes - Git is hard - you cannot teach data scientists this. It'll take months).
Also agreed git is terrible right now for version-controlling workflows in AI (I have a fairly large .gitignore file with S3-hosted things ever for my NextJS + FastAPI apps - pain in the butt
The vast majority of version control system uses are not distributed, even if the system itself is (GitHub and BitBucket were born to essentially make Git centralized). An example use case is game studios having repos with very large histories (hundreds of GiBs and more) where the tip is significantly smaller. Having the entire repo history on your local machine might be infeasible, and usually unnecessary. Being able to get just the tip and get the rest via API calls solves this. Having things continuously synced has other benefits like preventing conflicts at the time they happen on files that are hard for conflict resolution like game scene files, graphics etc.
AI workflows are definitely a use case we are looking at in the near future. What types of files are you hosting on S3?
> An example use case is game studios having repos with very large histories (hundreds of GiBs and more) where the tip is significantly smaller. Having the entire repo history on your local machine might be infeasible, and usually unnecessary. Being able to get just the tip and get the rest via API calls solves this.
"Being able to get just the tip" is `git clone --depth 1`, isn't it?
And then you lose functionality, for example `git blame` depends on the history being available locally. If you want a working repository with all the source control features you need a regular clone. That's where "being able to get the rest via API calls..." kicks in :)
OK so `git clone --filter=blob:none`, then. That downloads the tip and commit history, but no historic blobs. `git blame` then works by downloading missing blobs on demand, which doesn't sound too different to making an API call.
Also agreed git is terrible right now for version-controlling workflows in AI (I have a fairly large .gitignore file with S3-hosted things ever for my NextJS + FastAPI apps - pain in the butt