Hacker News new | past | comments | ask | show | jobs | submit login
Don't publicly expose .git (2015) (internetwache.org)
156 points by g4k on June 11, 2017 | hide | past | favorite | 86 comments



I don't like the advice he gives on just denying access to .git. I think the idea of cloning the repo in the htdocs folder is just wrong.

A much better approach (or at least, what I use) would be to set up the repo somewhere private with --bare and set a receive hook to checkout HEAD to the htdocs folder, this way the htdocs only has the content and you get the extra feature that you can sneak extra commands on the checked out source (such as building/minifying) without changing the original source


It's actually kind of mentioned at the end of the article:

> Another approach is to use git’s --git-dir and --work-tree switches to move the git repository out of the document root.

Yet another option is to make the htdocs directory a worktree of the git repository, which doesn't require passing flags around, or setting environment variables. Technically, this still leaves a .git, but it's only a file containing the actual location to the real .git directory.

https://git-scm.com/docs/git-worktree


Or have a `www` directory and symlink it.


Yeah this sounds like a much better idea. You could also do a shallow clone (for performance and so you don't have the full history there).


This sounds like an awesome approach. Do you have any resources you could share on a few of the details? Using --bare I get, but I've not yet played with hooks and such.


Git hooks are just arbitrary scripts. They need to exist in a specific location, and accept certain arguments depending on when in the lifecycle they are meant to execute. But they can be written to do basically anything. From linting and syntax checking, to complex deploy behaviors.

http://githooks.com/


Or you could just set permissions to 700.


I would use a build script to copy only required files to htdocs directory


I feel like whenever possible, the answer is to stop storing sensitive information in source control. That solves a whole class of problems, including this one.

If your history has sensitive info, see about rewriting the history. If that's not possible, maybe fork the repo, remove the sensitive info, and get the team to switch to the fork. If that's not possible either, make the sensitive info meaningless (reset your DB passwork, revoke the API tokens, etc).


Source Code is the sensitive information that he's talking about in the article.


If you've got a static site hostable in htdocs directly, then that "sensitive source code" is also accessible in browser dev tools and View Source menus.


You're correct that secrets shouldn't be stored in source control, but it's a problem from management's perspective:

a) how do you search a repository's history, along all branches (including undeployed development branches, which may nonetheless contain production secrets) to find secrets mistakenly stored in version control? b) what happens when somebody stores a secret which isn't easily rotated (because legacy systems have hard-coded the secrets etc.)? How do you deal with trying to rotate secrets which aren't well-managed, because the real problem isn't that your teams are storing secrets in source control but that you don't have proper secret management set up across your organization?

The simpler (and more correct!) way to deal with this is to stop using version control for deployments and to start using proper package management and deployment tooling. Version control is not designed as a deployment tool; it is a poor replacement for proper deployment tooling; and teams which think they only have hammers to hit what are not nails but screws, need to learn that sometimes they'll have to go out and get a set of screwdrivers too.


Even without passwords, just knowing the infrastructure of the target system is candy for a hacker.


This is really true only if you're ascribing to the perimeter model of security rather than defense-in-depth, though. If your systems are (properly, to my mind) constructed, knowing what your infrastructure looks like doesn't provide significant value. That obscurity becomes a nice-to-have, rather than an essential, aspect of your security.

(When building systems for clients, this is something I stress. "We should be operating as if it is assumed that an attacker has a VPN into your network space and has nmapped all your stuff.")


Counterpoint: Knowledge of the infrastructure can be used in social engineering attacks, e.g. to increase the likelihood of success for password spearphishing.


If people with access to your infrastructure can be spearphished for passwords, I would venture to say that you're probably doing other stuff wrong that needs to be fixed first.


No, you need to delete the .git folder from your server entirely. Ideally, delete it before you deploy. In fact, don't even put a github deploy key on the server. Deploy binaries.

And don't just stop with .git: Delete any folder/file that's not required to operate the app in production.


Yup. I used to host .git on a server but it ended up being more complication than it was really worth. Now I just have a “deploy” script that uses sftp to push all the build artifacts (and nothing else) to the server.


Now I just have a “deploy” script that uses sftp to push all the build artifacts (and nothing else) to the server.

We do something similar. "Clean and minimal" is a good strategy when you're deploying assets to publicly accessible systems.


> When deploying a web application, some administrators simply clone the repository.

Step one: stop randomly smearing crap around. Prod should only have files that came from a .deb or .rpm signed by the legit build process, because that's how you know your system is reproducible and has everything it should and nothing else.


Yes, this. Cloning a git repo is not a reproducible build and deploy strategy, especially once you scale past one developer. Building a package (RPM/DEB/Docker container/whatever) once, and promoting it from dev through to test and prod, is. You are guaranteeing that the same code you wrote and tested is what is finally being deployed to production.

If the deploy process is "git pull", you're praying that Joe McGee didn't push some untested crap to the relevant branch 5 seconds before you deployed.

I'll argue with "that's how you know your system is reproducible and has everything it should and nothing else", though. To get there, you need to look at immutable infrastructure, where you're building a new container or VM image for every deploy. Otherwise you might have libfoo installed on the app server, despite your app dropping support for foo 2 years ago.


This is yet another problem. Joe McGee shouldn't be able to push in your master branch anyway.

You can use git for deployment but indeed you must have a clear commit policy to do that (and it's a good idea anyway to push only major curated versions on master).


Woah, that's not very kind! Some of these sites are run on a shoestring by sysadmins who are also juggling other issues, and it's hard for them to justify a brand-new deployment pipeline when the existing 'fput ~/code/the_web_site ftp://hosting/our_great_web_site' deployment costs nothing to maintain and works fine (sales/conversions are coming in).


And then you moved the problem from keeping your files in sync on the web server to keeping your files in sync on the package source.


That's a significantly more straightforward and safer problem to have to handle. You should prefer it across-the-board.


That's the exactly same problem, with the exactly same failure modes and consequences on failure.

It's also solved the exact same ways, by scripting your stuff on one level or another.


Solving the problem in one place (your CI/CD server) is a significantly less complex a task than solving it in N places (every server on which your application is running). It removes the concerns around configuration drift (have all your machines been properly brought up to policy?) and enables easier reasoning about the whole thing.


Building debs and rpms from version control is a legitimate step though.


Not necessarily.

You can actually use git in production, properly.

I personally like to mount gitRepo volumes on Kubernetes, and have my CI pipeline automatically update the revision of the deployment whenever it validates tests. Then I have kubernetes roll out the update automatically.


".git" is only one of many checks performed by Nikto (an open source security scanner - https://cirt.net/Nikto2), but there are other checks and many other scanners.

shameless plug: I've developed a service that you run to check against vulnerabilities in your apps/servers and it has a free plan (https://my.gauntlet.io/registration.html) in case you're interested (https://gauntlet.io).


Is this just hosted nikto2? It sounds very cool. Do you distribute against different cloud systems? I know you're a pro from the site address but I'd be worried about quick blacklisting; I'm sure you're effective.


Not only nikto2, but other scanners (https://gauntlet.io/en/product/supported-scanners/) as well. It comes with open source installed, but integrate with commercial too. Currently each scan triggers a new virtual machine - thus a new IP Address and all applications need to be verified prior to execute a scan (e.g., Google Analytics require metatag, file upload or dns record).


The author fails to acknowledge a scenario where you wouldn't care, or where you'd even actively want your source code to be public.

For example, static websites for open source projects, et al.


I think they did...

> It seemed like an accessible git repository was intended on some websites - mostly open source projects where the website’s sourcecode is available online.


Every so often, the Django security address gets an email from someone who wants to claim bug-bounty money because "Dear Django team, I have discovered source-code disclosure vulnerability in your web site..."


Even then though, they should be aware of the meta data that's stored in the repo and make sure it's appropriately sanitized.

> On the other side, we had to hold our breath when we noticed that more than 100 projects used HTTP-Authentication for server-client communication. That means, that the protocol://user:password@host/repository combination is saved in the .git/config file, giving attackers access to the users (companies) GitLab-instance or GitHub/BitBucket account. With a bit of luck an attacker gets access to the CI-Server and then runs malicious code to further compromise your infrastructure.


It is possible to separate the work tree from the git repository files with the "--separate-git-dir" flag. .git is then a file whose contents point to the directory where the repository files reside. Any other command works as usual without specifying the directory, so it is just needed for clone or init.


If I remember my Apache config right, the two examples are switched. The 2.4 config should be 'Require all denied' and the 'Order deny,allow' the old 2.2 syntax.


Hi,

I just rechecked with [1] and you seem to be right.

I'll update the blogpost in a second. Thanks for the hint!

[1] https://httpd.apache.org/docs/current/upgrading.html


An interesting thing to note is that .pack files give you some safety against this sort of disclosure. Bare git objects are very easy to access even with indexes disabled because their name is their hash (and so if you have access to the index or the current HEAD you can recreate the history). Pack files contain multiple objects, but their name is computed from a hash of the packed objects. This makes it quite difficult to figure out the path to the pack file (you have to brute force the entire history and how it was packed in order to get a single .pack file's worth of data).

Not that you should have .git exposed on your public webserver anyway. I do remember participating in a CTF that had a problem like this a few years ago, it's possible that it was the same one the author mentioned.


Not safe, just slightly more obscure. The .pack files are listed in .git/objects/info/packs.


According to https://github.com/kost/dvcs-ripper/issues/6#issuecomment-11... , .git/objects/info/packs is not reliable.


Ah, that makes sense. I probably should've thought about it a bit more, because if there wasn't mapping from object -> pack then git wouldn't be able to quickly look up objects in packfiles.


> A tool to discover, one to download and one to extract git repositories.

Hasn't dvcs-ripper [1] been around for longer? It supports other VCSes as well.

Also, the article fails to mention that a simple `git clone` would usually work as well, although that tends to be blocked in similar CTF challenges.

[1] https://github.com/kost/dvcs-ripper


Hi, one of the authors here.

I knew about dvcs-ripper, but thought that implementing another variant might be fun and let me learn about git internals.

Does a simple `git clone` really work? I just tested it and it failed:

``` $> git clone http://x.domain.tld/ fatal: repository 'http://x.domain.tld/' not found

$> git clone http://x.domain.tld/.git/ fatal: repository 'http://x.domain.tld/.git/' not found ```

And yes, the post's background is a CTF challenge that blocked a simple `git clone`.


I remember, that it was the 9447 ctf in 2014. The challenges were bashful and tumorous. See: https://github.com/ctfs/write-ups-2014/tree/master/9447-ctf-...


Since about 2012: https://k0st.wordpress.com/2012/10/23/rip-or-pillage-dvcs-st...

Parts are even in metasploit.


The real solution to this problem is to reference passwords, tokens, keys, or and other "private" strings with environmental variables or external config files, which are then excluded from the source control system. That way your super-secret stuff can never be extracted from git or svn. This approach also has a whole class of additional benefits relating to being able to run a system in different places (for example- setting up dev->test->prod staging servers)


Hello HN,

here's one of the blogpost's authors. Although it has been a while since we published the blogpost, I'll try to answer any questions or listen to any suggestions.



> Bad people can use tools to download/restore the repository to gain access to your website’s sourcecode.

So if I post my website's sourcecode on github, I'm equally vulnerable? I could see problems if said checkout contained a credential cache, but that doesn't seem to be mentioned.


If you post your source code to github you know it will be public, and might remember to remove passwords from debug code etc., however if you expect it to be private, while it's not, you might be lazy and store passwords in plaintext in the code. A horrible thought for a programmer, yes, but human beings are lazy.


git clone https://username:password@github.com/... will end up in git reflog, so yes, it's a problem, and there's no guarantee a future feature of Git that you've never heard of doesn't make the problem worse


> git clone https://username:password@github.com/... will end up in git reflog

So if your deploy process is an anonymous git checkout of Drupal, there's no credential leaking to worry about.

> no guarantee a future feature of Git that you've never heard of doesn't make the problem worse

Okay.


reflog is local though



I don't understand. Do you mean, the remote also has a reflog? It sure does, it's local to the repository!

But entries recorded in my reflog are not pushed to or fetched from any remote I interact with. Instead, their reflog is updated when I push (to record that my commits were accepted & their branches/tags changed), and my reflog is updated when I fetch (to record that their commits were accepted & my branches/tags changed).


And what happens when Ansible checks out the Git repository for you? Or some equivalent shell script or deployment system, or some helpful developer hand-deploying something with 'git clone' on the server?

Or what about helpful developer that checked in some secrets which are visible in the repository history but not the current checkout?

Or that stupid PHP thing where 'config.php' and its MySQL passwords are world-readable, but rely on the web server interpreting it as a PHP script due to its file extension to prevent secret leaks.. not so valid when a copy of the script if available as ".git/objects/00/cf74f2066b0c72a4c4b2a24ef116f1fd23df42".

But of course, even if these weren't problems, the original point still stands: there is no guarantee .git doesn't contain secret data (such as username:password) either now, or into the future, so exposing it is a bad idea.

https://en.wikipedia.org/wiki/Principle_of_least_privilege


That's too much commentary for me to extract from a single fake link to an NXDOMAIN.

> there is no guarantee .git doesn't contain secret data (such as username:password) either now, or into the future, so exposing it is a bad idea.

The same can be said of the HTML and images, so I don't find it a useful heuristic. Note that I was disputing your claim that a username+password used to fetch a repo over http would leak into the remote's reflog.


I'm not "making a claim" or inventing a heuristic, you can test this trivially:

    $ git clone https://....:x-oauth-basic@github.com/dw/csvmonkey.git
    Cloning into 'csvmonkey'...
    remote: Counting objects: 340, done.
    remote: Compressing objects: 100% (27/27), done.
    remote: Total 340 (delta 19), reused 27 (delta 10), pack-reused 303
    Receiving objects: 100% (340/340), 138.93 KiB | 0 bytes/s, done.
    Resolving deltas: 100% (212/212), done.

    $ cat csvmonkey/.git/logs/HEAD
    0000000000000000000000000000000000000000 c9d566bf167dcf3556008df58be37c4a27ff5062 David Wilson <dw@botanicus.net> 1497289486 +0100	clone: from https://....:x-oauth-basic@github.com/dw/csvmonkey.git
If you perform a Git checkout on a web server e.g. as part of an Ansible script, and you embedded secrets in the repo URL (common enough, believe me), then that secret is readable per above.

FWIW this isn't some unbelievable theory or hypothetical scenario, I've seen plenty of Ansible setups like this and found domains with this exact problem in the process of writing http://pythonsweetness.tumblr.com/post/52587443706/devs-plea... a few years back


Hi,

You might not remember me. I'm the poster you're responding to. How have you been? Me, I'm all right.

I was just thinking of when we first spoke… it seems like so long ago! I remember it as clearly as yesterday: you had made a partially-conherent argument that the auth creds for a git URL could leak into a remote deployment's reflog! Oh, how we laughed, and our amusement doubled in size as you fancied a implausible situation where the read-only deployment credentials could be recovered from the very same repo they allowed access to!

It was much later when we crossed paths again, but your talent for sharing inventive tales had not waned in the slightest. For this next performance, you regaled us with the simple truth that no person can be certain that their commit history will not reveal their darkest secrets, and thus should strictly eschew sharing it in a public place; but that the contents of their index was above suspicion, and could be shouted to the world without a moment's thought! Many of us stumbled to determine what byzantine process made the working directory automatically scrub itself of secrets, before finally the jape dawned on them.

I eagerly anticipate our next encounter; what fresh new hilarity will you share with us?

I hope my restatement of my understanding of your position helps make my position clear,

--falsedan


I guess we're both having a bad day. Let's break down the original statement:

> git clone https://username:password@github.com/...

This command produces a new git repository by cloning the supplied URL

> will end up in git reflog

The newly generated repository's reflog will contain the credentials passed on the command line.

> so yes, it's a problem

Assuming the newly generated repository also happens to be a static HTTP server root, which is the subject of the thread in which you've been posting


Thanks, that makes it clear to me.


> Or that stupid PHP thing where 'config.php' and its MySQL passwords are world-readable

I mean, if you're tracking config.php in git, you're already doing it wrong.


Yes. It's not necessarily a problem, but it's a risk to consider carefully at licensing time.


This is the old security via obscurity. At least though, one should know that one publishes the full source.


? Would it not be secure enough to put .git in the directory above the public root:

  /mysite/.git
  /mysite/mysiteroot/index.html
  /mysite/.gitignore
  /mysite/lib/common


For nginx:

        # deny access to HG and Git repositories
        location ~ /\.(hg|git)/ {
            deny all;
        }


Doesn't solve the issue of the data being available on a public server, though. Any piece of code run by the web server would most likely have access to those directories, as well as any other static content in the web root...


Isn't this a non issue (don't need to change any config to block .git) with a properly configured firewall and nginx proxy passing to localhost when the code does not live in a publicly visible location? Eg- https://www.digitalocean.com/community/tutorials/how-to-set-...


Are you asking if this is a non-issue if you've... addressed the issue?


You could have worded it a little differently: if a folder is not accessible in the root directory of the web server, there is no need to modify the web server config to deny access to .git.

These type of snarky responses discourage newcomers to participate in discussions. I have seen this happen to many people, so please dial back the snark.


I see where you're coming from. From what I understand you're suggesting the same thing as Hamcha, who currently has the top post: make the web root a subfolder in version control, so the version control folder is above the web root. However, when I read it, it sounded like "if you have some uncommon setup with proxying to localhost [and then filtering out requests to .git?]" which indeed sounds like addressing the issue. Your second comment clarifies what you mean.


It's an issue, just not really specific to git and has been around for a long time. The issue is having source files or any sensitive info, under the web root where it could get exposed by an incorrectly configured web server. This is why modern setups keep the source code somewhere else and use some sort of application server behind the web server or similar arrangement.


It's a non issue if you use a proper deployment tool like capistrano.


Rationale for "make install" rediscovered. Film at 11... :-)


If someone being able to download your source code repository is opening yourself up to attacks, you're doing something wrong. Either you are relying on security through obscurity, or you checked keys into git. Both horrible practices.


> you checked keys into git. Both horrible practices.

Hey! That's not very kind to disparage everyone using a text file in a git repo to manage their passwords/keys.

Putting keys into a git repo is fine! But be careful when publishing that repo, as something you thought was private could suddenly become public.


It still is a pretty bad practice though.


I would do it, if I was certain I could avoid accidentally publishing the repo. I'd never describe it as 'bad', as that's extremist & it's easy for someone who does this to misinterpret you as saying, "you are bad for doing this and not following best practices".


Do you give everyone complete access to your production code?


Yes? It’s under GPL, after all.


If the .git folder is exposed, you can download it, then do "git checkout" in that folder and get the full working copy.


Just like anyone can go to Github, Gitlab, Bitbucket, etc. and get a full working copy.

If your code is public then what does it matter? If it's not then you should be protecting it like any other sensitive information.


You make the assumption that code is public and in a git hosted service. Neither of those are necessarily the case.

You can unintentionally expose your repository by deploying .git by mistake.


Who said the code was public?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: