Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Encrypted Git hosting should be easy (github.com/nathants)
99 points by nathants on Aug 31, 2022 | hide | past | favorite | 67 comments



What's the threat model here?

Anyone who needs to interact with the source code needs access to the plaintext version (employees, contractors, CodeClimate/CircleCI/Atlassian/Slack/etc type vendors, etc, all retain access), and people who don't need to interact with the source code should have their access removed in the first place.

This only protects you against a malicious/compromised hosting provider, but usually the hosting provider does more than just hosting, they have their own CI/CD features which need access to code. If you don't want the hosting provider to have access then you might be better off self-hosting Gitlab rather than dealing with restrictions like remotes can only have 1 branch.


> This only protects you against a malicious/compromised hosting provider

In the increasingly large set of countries without absolute freedoms, such a thing is a given for any hosting provider.


> without absolute freedoms

No current country has absolute freedoms, except as a fantasy.


Every country requires pragmaticism, in some cases encrypted storage is a good thing


The people who don’t know this (i.e. target audience) will think otherwise. Correcting others that are making a point can be counterproductive. To reach those under the spell of fantasy, speak the language of fantasy.


Perhaps you can give me an example of "speak the language of fantasy". While I can see your point, I can't really see how to put it into practice.

Edit: I didn't realise you were the person I responded to. I see what you mean, but I still think it's an important thing to discuss. It just won't be relevant to those people, which I can live with.


even when you can trust your provider, and often you can, not trusting them can be psychologically beneficial.


I mean, you need to be a bit realistic, and not kid yourself about who has access.


Better to know for certain (up to confidence and security in the encryption/authentication and key management scheme(s)) than to have faith in the unknown processes and systems of a third party. However, as always, it's all about designing, implementing, and operating a good engineering solution to whatever threat model you have and use.


i mean, we shouldn’t kid. a lot of kids, with excellent salaries, have access. i’m sure they are a varied bunch. their disposition and competencies are unknowable.


you pretty much nailed it. this treats aws as an untrusted provider. i version almost everything with git, and there are three buckets something might go in:

- if it's public, it's on github.

- if it's a private company project, it probably already has a home, on github or some other trusted commercial provider with cicd and all the trimmings.

- everything else goes here, as encrypted git bundles on s3.

i previously used git-remote-gcrypt for a long time. recently i've been thinking about how i wish worked with git, cicd, and infra. this is how i'm going to be working with that third bucket from now on.

the first bucket is fine, aside from lock of sha256 support.

the second bucket i'm still thinking about.


lack of sha256


I haven't looked at the project being discussed, but assuming it stores your code encrypted in some hosted git, one usecase I can think of is to protect the code from the hosting platform. For e.g. Microsoft ToS for Github allows it to scan / read your code for various kinds of analysis, which some may not want given their history of abuse. Encrypted git can prevent such things. And ofcourse, if one of the BigTech is going to be your competitor it makes to store your data encrypted on their platforms.


this is a good use case.


If I may, I made one using restic that seems much easier to use [0]. You only need a dumb storage host and no database. Restic takes care of the indexing and we use snapshots just like commits.

[0] https://github.com/CGamesPlay/git-remote-restic


this is very cool! i've actually never used restic, and should. i currently backup with git-remote-gcrypt and tar[1].

1. https://github.com/nathants/backup


Much, much better. I don't know why so many devs rely on solely AWS as much.


aws. it’s not that good, but everything else is worse.


The Keybase encrypted Git is just fine [0]. While Keybase still exists at least. Still sad about that one :(

[0] https://book.keybase.io/git


looks like keybase git was implemented as a git remote helper, just like this. it’s up at github.com/keybase/client/kbfsgit.

my take away from implementing this is that git remote helpers are easy to write and very flexible.


Ive been using this for a while to store personal data but I was getting increasingly concerned it might be shut down.

Im happy that there are some alternatives being developed.


I also use it and although I'm very thankful for the service given that it's free, then I wish it would be faster (push/pull takes 10+ seconds).


dang. 10 second push/pull is not acceptable.


Great work. Is it possible to reconfigure this to use existing tools like rclone which can save encrypted files across multiple remotes. Thus way J can use my Google Drive or Dropbox to store instead of needing to use s3.


git-remote-gcrypt supports rclone, and all of it's backends. i would use that for your use case.

this takes an explicit dependency on aws, though all that is needed is:

- (large) object storage with read-after-write consistency (s3)

- (small) object storage with compare-and-swap (dynamodb)

it would be easy to port this to any provider that provides these two kinds of object storage.

compare-and-swap means that multiple concurrent writers can safely collaborate without risk of force pushing over each other.


Something about the idea of being able to back things up on multiple remote target storage services sounds extra compelling. Simple encryption with robust backup for e.g. text notes.


it might be worth making dynamodb optional, so all s3 compatible remotes can be used. i made it non-optional to simplify implementation and documentation. there are no knobs, no conditional semantics.


Oops missed it, in which case will definitely give it a shot.


Why S3? I don't have an aws account and don't ever plan on having one.

If you want to make git hosting easy, make a self-contained executable that requires nothing but a bare unix-like environment.


Well, technically with a bit of work you can use anything self-hosted that is S3 compatible.

Personally, I rather like:

  - https://www.zenko.io/
  - https://min.io/
Even if I don't really buy into AWS too much (apart from enterprise stuff), it's good that they went ahead and created a standard for blob storage that other implementations could also benefit from, due to the compatibility with various libraries etc.

Of course, there's also reliance on DynamoDB for this project, though that could also probably be swapped out for something else.


s3 is sota, but there are others. just run git on a server! not one executable, but you can pretend.


S3 is a cloud service provided by one company. It's not state of the art.


If you don't like S3 or DynamoDB then it'd be easy enough to sub them out for other things.

Many cloud providers provide an S3-Compatible object-store protocol. There's also open source projects self-hosting your own S3-Compatible object storage.

Swapping out DynamoDB for (say) Redis would also be fairly easy.

The source provided is a few hundred lines of pretty readable Golang.

While I'm not generally a fan of "If you don't like it, submit a PR to fix it" type responses - as a short example of how to implement something like this, it's a pretty decent starting point.


i can confidently say that a series of encrypted git bundles could be stored anywhere. have them be monotonically increasing integers.


> S3 is a cloud service provided by one company

At this point it's basically a standard as many blob storage providers just implement the S3 API.


then who is sota and for what metrics?

/giphy shutup and take my money


The ssd on your laptop


so true.


CIA has entered the chat ...

> Amazon is a world class superior provider, you should definitely trust them.

CCP has entered the chat ...

> git not need the end to end encryption. Especially if you build website in xingjian.


both novels sound like a good read.


This is pretty cool as a new way to unbundle/change how people host git. While it's pretty AWS heavy now, some interfaces around the interaction and I can easily imagine this working with other providers (ex. Backblaze) and start shimming out the coordination pieces (currently Dynamo).

Reminds me a lot of gitaly[0].

Awesome work!

[0]: https://gitlab.com/gitlab-org/gitaly


thanks! git-remote-gcrypt has rclone support, which provides all possible backends.

the addition here is a second object store with compare-and-swap semantics making it possible for multiple writers to collaborate safely.

many infra providers have object storage with those semantics, my preference is aws.


> thanks! git-remote-gcrypt has rclone support, which provides all possible backends.

Oh that's awesome, I didn't catch the rclone support while skimming the README. rclone is such an excellent piece of software.

> the addition here is a second object store with compare-and-swap semantics making it possible for multiple writers to collaborate safely.

I think I must have misunderstood this point too -- the compare and swap semantics that you want are against dynamodb, right with raw/"dumb" storage on S3?

If I could request one thing it would be some extra interface files with "drivers" for the two concerns which are currently being handled by main.go... The locking mechanism and the raw storage mechanism.

Distilling the interface to these components would make it so easy for someone to come along and implement replacements! For example, one might like to see SQLite, FoundationDB, etc as metadata/synchronization drivers.

If there's a clear interface then it's easy to say "sure, send a commit with the implementation and we'll consider including it!", and leave implementation up to people who want the feature.

> many infra providers have object storage with those semantics, my preference is aws.

Yeah I bet this would work out of the box against other infra providers as well? S3 has basically become the defacto API everyone chases anyway.


dynamo holds a pointer to an object in s3 which records an ordered list of bundles.

compare-and-swap ensures that if two simultaneous pushes happen, only one will succeed. the other will have to pull first before retrying push.

there are actually a lot of providers with s3 and dynamo compatible apis. otherwise, fork and implement new provider! git-remote-CLOUD.


I like it, a nice simple approach! Fun reading through the code, heres the bit that does the encryption (I think): https://github.com/nathants/git-remote-aws/blob/c8012c5a6b80...


that it. in libsodium we trust. the key goes in a cryptobox to each recipient, followed by a secretstream of the data.


I work on a project which solves a similar use case.

https://github.com/gotvc/got

Got also does E2E encryption, but it can additionally encrypt branch names from remote servers. It also handles syncing large files and directories better thanks to an improved data structure.


this is cool!


For encrypted git remotes I just create a bare repo on a nextcloud folder that gets synced automatically.

I use it for plain text accounting and personal notes. It's been working great so far, of course I'm the only user and I don't do concurrent writes.

In theory I think the only files that could get conflicts are the ones in the refs directory.


does nextcloud use a local sync daemon like dropbox? is nextcloud encrypted at rest on the remote? this definitely would work for a single writer.


I still don't have a clear understanding on when you would need such a thing.

This does not help the people who want to use git for video hosting since it puts the storage and usage back onto a persons credit card again (tho if you come into a line of credit it may be useful).

Hosting on AWS like this does not match the dark web requirements for hiding in plain sight and not easily killable. To spread this over multiple aws accounts and s3 buckets alone would need some form of sts cross linking for the permissions to allow pushing to be granted.

Can you provide a business use case or scenario this would be useable. Else it will continue to have the smell of 'resume driven development'.

To block force push, would also suggest that branch deletion would not work either. Is that the case?


branch deletion also does not work. the data model is very simple, a single chain of git bundles. s3 policy could enforce a variety of data models, such as inability to delete objects.

there is no business use case for this, unless your threat model needs an untrusted git provider and you don't have another way to enable that.

perhaps you could store your dotfiles or other backups with this.


Ignorance: What's the use-case for this? If you can't rely on the drives not to be tampered with, surely you also cannot rely on the CPU or kernel?

Am I missing something?


This stores blobs in S3. So this is about not trusting hard drives which you have rented. Whether or not that makes sense depends on your threat model, but it seems reasonable to me that there are people who would find this useful.


The S3 in question may be not on AWS, but on any of the S3-compatible providers, or a non-public S3 store (in your company, university, your friend's NAS which you use as an extra backup, etc).


i think the best way to think of this is that it's cheaper than private github/gitlab and easier that aws codecommit. also other stuff.


I feel like once web browsers / hardware devices start to implement authentication the UX and accessibility of these sorts of tools can become a default choice for user data.

The hard part is solving the "lost my yubikey" UX issue but I suspect Apple will reach a reasonable solution that finds an OK balance of convenience and user-authenticating security.


untrusted hosts open up a bunch of interesting system designs. i’m mostly thinking about these recently. trusted hosts have good use cases, but shouldn’t be used otherwise. trust hard.

lost my yubikey is somewhat covered by the popularity of crypto. recommend users to backup fido2 secrets as bip39 mnemonics.

ios and android will hopefully help popularize fido2.


easier than aws


This is too dang complicated.

For years, across two different jobs we just had bare repos on Linux servers - we used git's built in ssh support and we liked it. Worked great. We wanted a Pull Request workflow and moved to GitHub, but self hosted is fine.

All you really need is a server your developers can SSH into with a shared directory. $2 a month Vultr server and you're golden.

You want to get real spicy, you can just do this without a central server at all, The way God and Linus intended. git was designed to be decentralized. You don't even need to share the server, you can just pull from each other's repos if you give each other your SSH access to each other's repos.


git with ssh on a server is fantastic, though sometimes managing and paying for the server can be annoying.

afaik it's not possible to have an untrusted git server, at a minimum ram contents will be plaintext.

before i undertook this, i had never heard of git-bundle:

https://git-scm.com/docs/git-bundle


> at a minimum ram contents will be plaintext

The only thing that's going to remain resident after a push/pull is os level file system cache. There's no daemon when operating git over ssh.

Are container escapes on major hosting services common enough to even worry about? I don't hear much about them.

Or are you concerned about AWS/Vultr/Digital Ocean stealing your code? That seems like paranoia. They have world class developers, best in the industry. They don't want your code. This isn't the movie Antitrust.


yes, the concern is aws employees reading my private poetry collection. my rights!


> All you really need is a server your developers can SSH into with a shared directory. $2 a month Vultr server and you're golden.

which addresses exactly none of what this project is intended to, since there’s no end-to-end encryption. (yes, if you don’t need end-to-end encryption on your git repo, this is too complicated for your use case.)


SSH is a lot of privilege for a coworker to have. You could get by with HTTP GETs and `update-server-info`


I usually use the followig when developing with multiple people without an Internet connection:

https://stackoverflow.com/a/377293

   git daemon --reuseaddr --base-path=. --export-all --verbose
Others can just pull from the IP address. Works well for hacking in the field, a PR is just one shout away :)


telnet git on lan! til.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: