Hacker News new | past | comments | ask | show | jobs | submit login

I decided to move all my private repositories to my own server.

When you do this, make sure that the server has continuous backups. Also, make sure you still have an offsite backup.

Once you figure out what these things are worth, you may realize that you should probably just keep paying Github.




The backups aren't as important as each git repo is a fully blown code. If your local repo is destroyed, you still have the server copy. If your server blows up, you still have the local copy.

There are many other good reasons for a service like Github, like the excellent collaboration features, the really good repository and history browser or the good bugtracker.

If you don't need those (small team, working alone) but are concerned about uploading your intellectual property to a third party server in a potentially foreign country (depending on your location), then quickly setting up gitosis / gitweb / redmine might be enough for you.

In my personal case, I would really love to use github even for my small team, but I'm too concerned about the legal issues to go ahead with that (and the local installation is plain too expensive).


What legal issues/other issues from uploading your code to GitHub are you worried about?

I can't imagine that GitHub would steal your code. They've never heard of you, they have no reason to believe your code is worth anything to them, and one "I have pretty damn good evidence that GitHub stole my code" could ruin their entire business.

You mentioned legal issues. Are you afraid someone's going to ... subpoena your code or something? Because if that happens, you'd have to turn it over anyway.

They've got some pretty intense-looking security[1], and people like Twitter trust them with their code[2]. If they aren't worried, why are you?

1: http://help.github.com/security/

2: I don't know that that's officially known, but I saw Twitter commenting on the "GitHub now has Organizations" post complaining about the lack of the cheaper plan that they added the next day. So they definitely have some private repos on GitHub.


I don't live in the US. Our company isn't based in the US. While I'm somewhat familiar with US legislation by reading HN, I certainly don't feel comfortable to upload my code to US based servers of a US company as I plainly don't know their laws well enough to trust them with my companies intellectual property.

Of course, I could always trust them for now and instantly remove my stuff when there are signs of trouble, but I asked them (a year ago) whether deletions are instant and irreversible and they told me the usual thing: repositories are not instantly deleted so they could restore them in case of accidental deletions. In addition they stay around in backups for an indefinite time.

Legislation not known well enough and no control over the removal of my code from their machines - call me paranoid, but these are good reasons not to upload my code to them.


> Are you afraid someone's going to ... subpoena your code or something? Because if that happens, you'd have to turn it over anyway.

Not if you don't live in the US.



In 18 months when you need to a clients project gets deleted and you and find the server version destroyed... well...

Might sound unlikely but it happens.


it could also happen that github loses data and it's hard to valuate the exact likelihood of you accidentally deleting two repos or github losing a fileserver and its backup.

Also, if github is down or your repo with them is corrupted, you have to go through their support. If your own server has a problem you can fix it instantly.

I'm not convinced that reliability is the correct reason to go GitHub. Features: Yes. Reliability: Not necessarily.


Sure, Github can lose data. And you can lose data. But the advantage is that you and Github are much less correlated; the odds that both of you will lose the data at the same time are fairly low. [1]

Data safety is all about fighting correlation. You don't back up one partition to another on the same spindle, because when the drive dies the whole spindle is lost. Paranoid people back up to two different drives, two different disk controllers, two different machines, two different datacenters, two different continents...

---

[1] But nonzero. It is worth thinking about the scenarios.


Agreed. For that exact reason my main dev machine now has hourly local backups through time machine, local HDD clone backups every 4 hours and a separate offsite backup with Mozy.

In addition to the remote repos on Github and my normal local copies...


Agree 100%. Furthermore, it's important not to conflate version control with data backup. Although they share some traits, they have different goals. For example, if I lose my local working copy of a repository before any commits, I've lost valuable work. In absence of a good backup strategy, the existence of the remote repository is of little consolation.


As soon as bandwidth costs come down enough I have a great startup idea: backup copies stored on a different planet/planetoid. Mars. The Moon. Either one. Half-kidding!


I'm <s>guessing</s>sure Github does backups.

QED.


Seeing "I'm guessing" paired with "QED" is... strange to say the least.


Only guessing in the sense I haven't actually checked :)


I agree and disagree with you in equal measures. Paying github is no protection against github messing up, in the end you are still responsible for your data and any subsequent loss will be your problem, regardless of the cause of the loss.

So github can be a part of a backup strategy but it isn't a strategy by itself.

Likewise, there are plenty of parties that wouldn't dream of storing their data in a third party repository, it could be compromised, there are at least 'n' github employees that now have access to your data etc.

So there is a need for both options, one where you outsource your headache to github and keep a couple of local copies just in case, another where you do have your own repository that you control with the associated backup mechanisms and a number of off-site copies.

Fortunately github makes it easy to do the former and git itself can make it (relatively) easy to do the latter.

For plenty of people the first is enough. For me it wouldn't work, so I'm really happy this got posted.


I wrote this little shellscript to backup all my Git repositories on GitHub: http://github.com/avar/github-backup

It runs in cron and backs up all my data daily.


Daily, weekly and snapshot backups with Linode are $5 on top of what I'm paying them anyway (and just a few clicks to set up). That's less than the cost of GitHub's micro plan and I can have as many private repos as I want.

I love GitHub and I'm sure they'll continue to do well, but running your own Git server is only going to get easier. If you don't need the social side of what they offer, hosting yourself makes sense.


My poorman's offsite backup script involves zipping up the repository, encrypt it, and mail it to GMail. The task is scheduled daily and works pretty well.


I did this with my university project source code. The only difficulty is that Gmail does not allow executables even within a zip file. You have to work pretty hard to avoid getting them into your repo.


If you encrypt the zip, it works fine. Took me ages to discover that :)


    mv $ZIP_FILE_NAME $ZIP_FILE_NAME.txt


I thought the point of git was that it was decentralised. So even if the server died, he wouldn't lose anything would he?

You can also stick a git repo on top of Dropbox... There are several articles about how to do this if you do a quick google.


Git makes it trivial to replicate your repos to more than one directory, no matter where those directories are located.

Whether or not this makes you "decentralized" depends on where your directories live. Two machines in the same location? Not decentralized.


Obviously. You can also backup your files to a folder on the same disk, but you wont will you.


Backups are important, but not that important if it's just a small private Git server. After all, the full history is not only stored on the server, but also on each of the clients.


As the author, I can respond to this:

Given the decentralised nature of Git, having continuous backups becomes less important unless all my computers (including the server) fail at once.

But yes, backups are important, and they are done.


If all your computers are in a single location, it's not decentralized in cases of fire, flood, earthquake, or other natural disasters...


Or even theft. I lost more than a week's work last year becuase a team member had his laptop, external harddrive, and desktop stolen from his house while he was visiting his parents for Thanksgiving. Whoops. Triple backups don't count if they're all in one apartment.


You have to have backups on server anyway. And setting offsite backups to amazon s3 is like 30 mins of work and may be $1 per month in s3 costs. and this will backup not only your git repositories, but the whole server.


You should already have those things in place for your server. I don't think that the marginal cost of managing git is much if you're already managing a database, web server, application, mail server, and so on.


Server most likely needs to be backuped anyway. Backups not so important for git as for svn because you are cloning repository with history etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: