Anyone who needs to interact with the source code needs access to the plaintext version (employees, contractors, CodeClimate/CircleCI/Atlassian/Slack/etc type vendors, etc, all retain access), and people who don't need to interact with the source code should have their access removed in the first place.
This only protects you against a malicious/compromised hosting provider, but usually the hosting provider does more than just hosting, they have their own CI/CD features which need access to code. If you don't want the hosting provider to have access then you might be better off self-hosting Gitlab rather than dealing with restrictions like remotes can only have 1 branch.
The people who don’t know this (i.e. target audience) will think otherwise. Correcting others that are making a point can be counterproductive. To reach those under the spell of fantasy, speak the language of fantasy.
Perhaps you can give me an example of "speak the language of fantasy". While I can see your point, I can't really see how to put it into practice.
Edit: I didn't realise you were the person I responded to. I see what you mean, but I still think it's an important thing to discuss. It just won't be relevant to those people, which I can live with.
Better to know for certain (up to confidence and security in the encryption/authentication and key management scheme(s)) than to have faith in the unknown processes and systems of a third party. However, as always, it's all about designing, implementing, and operating a good engineering solution to whatever threat model you have and use.
i mean, we shouldn’t kid. a lot of kids, with excellent salaries, have access. i’m sure they are a varied bunch. their disposition and competencies are unknowable.
you pretty much nailed it. this treats aws as an untrusted provider. i version almost everything with git, and there are three buckets something might go in:
- if it's public, it's on github.
- if it's a private company project, it probably already has a home, on github or some other trusted commercial provider with cicd and all the trimmings.
- everything else goes here, as encrypted git bundles on s3.
i previously used git-remote-gcrypt for a long time. recently i've been thinking about how i wish worked with git, cicd, and infra. this is how i'm going to be working with that third bucket from now on.
the first bucket is fine, aside from lock of sha256 support.
I haven't looked at the project being discussed, but assuming it stores your code encrypted in some hosted git, one usecase I can think of is to protect the code from the hosting platform. For e.g. Microsoft ToS for Github allows it to scan / read your code for various kinds of analysis, which some may not want given their history of abuse. Encrypted git can prevent such things. And ofcourse, if one of the BigTech is going to be your competitor it makes to store your data encrypted on their platforms.
If I may, I made one using restic that seems much easier to use [0]. You only need a dumb storage host and no database. Restic takes care of the indexing and we use snapshots just like commits.
Great work. Is it possible to reconfigure this to use existing tools like rclone which can save encrypted files across multiple remotes. Thus way J can use my Google Drive or Dropbox to store instead of needing to use s3.
Something about the idea of being able to back things up on multiple remote target storage services sounds extra compelling. Simple encryption with robust backup for e.g. text notes.
it might be worth making dynamodb optional, so all s3 compatible remotes can be used. i made it non-optional to simplify implementation and documentation. there are no knobs, no conditional semantics.
Well, technically with a bit of work you can use anything self-hosted that is S3 compatible.
Personally, I rather like:
- https://www.zenko.io/
- https://min.io/
Even if I don't really buy into AWS too much (apart from enterprise stuff), it's good that they went ahead and created a standard for blob storage that other implementations could also benefit from, due to the compatibility with various libraries etc.
Of course, there's also reliance on DynamoDB for this project, though that could also probably be swapped out for something else.
If you don't like S3 or DynamoDB then it'd be easy enough to sub them out for other things.
Many cloud providers provide an S3-Compatible object-store protocol.
There's also open source projects self-hosting your own S3-Compatible object storage.
Swapping out DynamoDB for (say) Redis would also be fairly easy.
The source provided is a few hundred lines of pretty readable Golang.
While I'm not generally a fan of "If you don't like it, submit a PR to fix it" type responses - as a short example of how to implement something like this, it's a pretty decent starting point.
This is pretty cool as a new way to unbundle/change how people host git. While it's pretty AWS heavy now, some interfaces around the interaction and I can easily imagine this working with other providers (ex. Backblaze) and start shimming out the coordination pieces (currently Dynamo).
> thanks! git-remote-gcrypt has rclone support, which provides all possible backends.
Oh that's awesome, I didn't catch the rclone support while skimming the README. rclone is such an excellent piece of software.
> the addition here is a second object store with compare-and-swap semantics making it possible for multiple writers to collaborate safely.
I think I must have misunderstood this point too -- the compare and swap semantics that you want are against dynamodb, right with raw/"dumb" storage on S3?
If I could request one thing it would be some extra interface files with "drivers" for the two concerns which are currently being handled by main.go... The locking mechanism and the raw storage mechanism.
Distilling the interface to these components would make it so easy for someone to come along and implement replacements! For example, one might like to see SQLite, FoundationDB, etc as metadata/synchronization drivers.
If there's a clear interface then it's easy to say "sure, send a commit with the implementation and we'll consider including it!", and leave implementation up to people who want the feature.
> many infra providers have object storage with those semantics, my preference is aws.
Yeah I bet this would work out of the box against other infra providers as well? S3 has basically become the defacto API everyone chases anyway.
Got also does E2E encryption, but it can additionally encrypt branch names from remote servers.
It also handles syncing large files and directories better thanks to an improved data structure.
I still don't have a clear understanding on when you would need such a thing.
This does not help the people who want to use git for video hosting since it puts the storage and usage back onto a persons credit card again (tho if you come into a line of credit it may be useful).
Hosting on AWS like this does not match the dark web requirements for hiding in plain sight and not easily killable. To spread this over multiple aws accounts and s3 buckets alone would need some form of sts cross linking for the permissions to allow pushing to be granted.
Can you provide a business use case or scenario this would be useable. Else it will continue to have the smell of 'resume driven development'.
To block force push, would also suggest that branch deletion would not work either. Is that the case?
branch deletion also does not work. the data model is very simple, a single chain of git bundles. s3 policy could enforce a variety of data models, such as inability to delete objects.
there is no business use case for this, unless your threat model needs an untrusted git provider and you don't have another way to enable that.
perhaps you could store your dotfiles or other backups with this.
This stores blobs in S3. So this is about not trusting hard drives which you have rented. Whether or not that makes sense depends on your threat model, but it seems reasonable to me that there are people who would find this useful.
The S3 in question may be not on AWS, but on any of the S3-compatible providers, or a non-public S3 store (in your company, university, your friend's NAS which you use as an extra backup, etc).
I feel like once web browsers / hardware devices start to implement authentication the UX and accessibility of these sorts of tools can become a default choice for user data.
The hard part is solving the "lost my yubikey" UX issue but I suspect Apple will reach a reasonable solution that finds an OK balance of convenience and user-authenticating security.
untrusted hosts open up a bunch of interesting system designs. i’m mostly thinking about these recently. trusted hosts have good use cases, but shouldn’t be used otherwise. trust hard.
lost my yubikey is somewhat covered by the popularity of crypto. recommend users to backup fido2 secrets as bip39 mnemonics.
ios and android will hopefully help popularize fido2.
For years, across two different jobs we just had bare repos on Linux servers - we used git's built in ssh support and we liked it. Worked great. We wanted a Pull Request workflow and moved to GitHub, but self hosted is fine.
All you really need is a server your developers can SSH into with a shared directory. $2 a month Vultr server and you're golden.
You want to get real spicy, you can just do this without a central server at all, The way God and Linus intended. git was designed to be decentralized. You don't even need to share the server, you can just pull from each other's repos if you give each other your SSH access to each other's repos.
The only thing that's going to remain resident after a push/pull is os level file system cache. There's no daemon when operating git over ssh.
Are container escapes on major hosting services common enough to even worry about? I don't hear much about them.
Or are you concerned about AWS/Vultr/Digital Ocean stealing your code? That seems like paranoia. They have world class developers, best in the industry. They don't want your code. This isn't the movie Antitrust.
> All you really need is a server your developers can SSH into with a shared directory. $2 a month Vultr server and you're golden.
which addresses exactly none of what this project is intended to, since there’s no end-to-end encryption. (yes, if you don’t need end-to-end encryption on your git repo, this is too complicated for your use case.)
Anyone who needs to interact with the source code needs access to the plaintext version (employees, contractors, CodeClimate/CircleCI/Atlassian/Slack/etc type vendors, etc, all retain access), and people who don't need to interact with the source code should have their access removed in the first place.
This only protects you against a malicious/compromised hosting provider, but usually the hosting provider does more than just hosting, they have their own CI/CD features which need access to code. If you don't want the hosting provider to have access then you might be better off self-hosting Gitlab rather than dealing with restrictions like remotes can only have 1 branch.