Hacker News new | past | comments | ask | show | jobs | submit login
Python moves to GitHub (python.org)
245 points by rashoodkhan on Jan 1, 2016 | hide | past | favorite | 112 comments



GitHub's reign over public open source programming is a bit terrifying. But, it's been mostly benevolent so far, but I do find it troubling to trust a private company to keep dev's in mind, and not profit.


I'm not sure why it's a problem though. If they decide to go down that path, it's not that hard to move off. They don't have the lock-in that would scare me like having terabytes of data in an oracle database.


There are some things you simply can't do on GitHub:

* block trolls from interacting with an organization, short of filing a support ticket

* install pre-receive hooks without using the pull request infrastructure

* set up status checks (the PR equivalent of pre-receive hooks) for pull requests that require force pushes, allowing some external check to replace git's usual fast-forward check

* have status checks apply to individual commits, not trees, so that you have style or copyright or whatever checks on each commit

* disable pull requests

* enforce rules for creation of new branches or tags

I admit that none of these features are dealbreakers, but they all have legitimate use cases, and they'd all be trivial if you were using a regular UNIX-based git server.

On the other hand, the usable interface of GitHub wouldn't be available to you if you were using a regular UNIX-based git server, and that is a dealbreaker. :)


> On the other hand, the usable interface of GitHub wouldn't be available to you if you were using a regular UNIX-based git server, and that is a dealbreaker. :)

That would be one hell of an open source project: a github like UI + ticketing system but federated rather than hosted by a central authority with some kind of single sign-on mechanism.


Fossil [1] repositories include a distributed bug tracker (and also a distributed wiki and a distributed time-based notes system), and provides a browser-based UI.

I've only read about it, not played with it, so don't know how close it comes to what you are envisioning.

[1] http://fossil-scm.org/xfer/doc/trunk/www/index.wiki


It's a good idea! However, the distributed nature seems to make it sufficiently involved as to preclude a good UI (e.g., issues seem to use hashes), and the simple fact of not being git, regardless of whether it's better than git, is a serious problem for adoption. Bitbucket also learned this the hard way, and even made an April Fool's joke about it before realizing they had to primarily support git.

GitHub being centralized is a feature, not a bug. You don't have issues that exist in some people's repositories and not others. You don't have a question of who has the authoritative repository; either GitHub does, or GitHub doesn't and there's a note saying it's not authoritative. You can support things like have cross-site forks, but it's actively good for UX to have a single authoritative server for the current state of the project. (And you certainly don't want to require cross-site forks, since that would make it far more burdensome to submit a single patch.)


> and the simple fact of not being git, regardless of whether it's better than git, is a serious problem for adoption

I think I was a bit too terse. I didn't mean to suggest that someone who wanted to do everything (SCM, bug tracking) distributed use Fossil instead of git. The idea was to use git for the code, and Fossil for the bug tracking and wiki.


Fossil is so great... only if they picked branding that doesn't mean "something from the far past"...


Like GitLab?


GitLab and gogs are both fine for getting 80% of the UI, but there are network effects with GitHub: that you're already signed in and have keys and such set up (although OAuth and API cleverness with keys can get you close), that your contributions across all projects show on your profile, that third-party tools like Travis know how to talk to your GitHub in particular, etc. And all sorts of little things, like that when you mention another project's PR in a commit, a note shows up on that PR. That's a lot of what makes GitHub nice to use, more than just the UI, and I suspect that's what pushed some of the Python core devs in favor of GitHub.

There's potentially a way to build a system that federates all of this, and then toss GitLab on top for the UI. A good standard would be whether a newcomer could submit a PR with only one more click (for the OAuth screen) than they would need on GitHub.


A good federated system would allow a fork from one host to end up on another and the PR would be done automatically after the very first PR was done with that extra click (and that extra click should do a bit of magic behind the scenes to send your key over to the other server together with your first sign-on on that server).

That way future interaction would hide the fact that the systems are in fact separate.


Also, Gogs. Still, I stick with GitHub over either of these two others.


Good point, gitlab is probably 90% there already.


Is GitLab federated?


Not really.


Like Phabricator?


Something else that's IMO more important than any of those items is this:

You can't submit patches. Submitting/accepting a patch is something that straight up cannot be done on GitHub. Unless you use a sidechannel like sending a message containing a link to a patch that you threw up on some host somewhere (in which case, have fun rooting out that information on a case-by-case basis), the only way to ever get any changes upstream is via pull request. Mildly obnoxious, but then you get hit with this zinger:

You can't submit a pull request if your branch isn't already hosted on GitHub.

So if you want to make a one line fix, there need to exist no fewer than 3 copies of the repo: the upstream, your local copy that you made when you cloned it, and then some third copy that you have to spin up and maintain through GitHub. This is nuts.


I highly doubt GitHub maintains two copies of the same object (not counting replication). They likely copy most of the repo metadata for forks, but if two objects have the same SHA, even in completely different repos that were never even forks of each other, they can just store one copy. You can see evidence of this if you look at some of their advice for how to remove all traces of some data (say, you accidentally checked in sensitive info to a public repo and now need to make it totally inaccessible)


That third "copy" is a copy-on-write copy; it shares objects with its source. (In fact if you built a distributed version of GitHub, you'd have to figure out a more clever solution to this.)

It is true that the inability to submit a PR without a local clone is annoying. For the use case you describe, a one-line fix, note that GitHub has a web-based editor (that will create a fork if necessary), so you can skip the local branch entirely.

Someone could probably toss together a web service that does the equivalent of a `git clone --depth 1` behind the scenes and then applies your patch.


Another mild annoyance: that any time this is brought up, people think that it's relevant to clarify whether GitHub is performing a genuine copy or just simulating one.

> For the use case you describe, a one-line fix, note that GitHub has a web-based editor (that will create a fork if necessary), so you can skip the local branch entirely.

And you've just optimized out the wrong thing.

Person A: "The thing I don't like about going to that place is that I have to spin around three times, go outside, scratch the belly of the cook across the street for ten seconds, and then I come back and place my order and my food shows up."

Person B: "When you go there, you can just do the spinning thing and scratch the other cook's belly and then go home. You don't have to go there to eat."

Oh, Thanks!


OK, sorry, can you clarify what you are actually objecting to? I'm not quite understanding what the workflow you desire is -- you have a patch generated locally, and you want to submit that. What happens if it doesn't apply cleanly? Should it be rejected, or should it give you an option to fix it up, or?


I hope it's clear from my example that it's the twirling and belly-rubbing that I could do without.

> I'm not quite understanding what the workflow you desire is

It's easy. Look at how most open source projects work literally anywhere, outside of GitHub and its clones. Look at how Mozilla works. Look at how Chromium works. Look at how the patch-based workflow that LKML uses works—using Git, no less. Look at virtually every open source project that existed before GitHub showed up worked for decades.

My objection is this:

> third copy that you have to spin up(1) and maintain(2) through GitHub(3)

This is how it should be: I have a file containing my changes (a patch) that I can send to you (the maintainer). That's it. That's all that needs to happen.

To really belabor the point, here's a comparison of the two processes: I clone it. I change it. I create a patch. I send it in. (Replace the first step with `git pull` if this isn't your first contribution and you still have the repo around from last time.)

Here's how it works on GitHub: I clone it. I change it. I dick around in the GitHub UI to fork it and wait. I add that new fork to my remotes. I push it. I file a PR. I dick around in the GitHub UI some more to get rid of the zombie repo, but only after my changes have been merged. (Replace the first step with `git pull` if this isn't your first contribution and you still have the repo around from last time. Skip the last step and step 3 if you think you might ever contribute again and would like to minimize the hoop-jumping you have to go through for your next contribution. Omit step 4 if both of the preceding conditions were true.)

Astute readers will point out that you can completely skip the remotes step if when you make your original clone, you do so from your personal fork, rather than from directly upstream. These astute readers are not as astute as they think. Notice how much of "the GitHub way" requires knowing all the things that you would possibly want to do far before you actually want to do it. At the very least, it requires that you preemptively create a fork for every project whose source you pull down, on the off chance that you might want to contribute to the project some day. (Of course, nobody does this and ends up just messing with their remotes instead, but they're no less happy to show up and point out that it can be done, while generally ignoring its unreasonable requirement of premonition and offer no response.)

> What happens if it doesn't apply cleanly? Should it be rejected, or should it give you an option to fix it up, or?

The question is weird. Just do what Git does. It already supports patches. There's no design decision to make here.

This is why there's a problem with characterizations of the sort you'll find below—you get people who've never even used Git or any other sane VCS outside of GitHub that come along and try to tell you about how things work. They'll say that GitHub is Git with some extra features. No. GitHub is a crippled implementation of Git with some proprietary value-add to halfway work around the stuff that it screws up.

And let's be clear. I know why it screws those things up. It's for the same reason Facebook makes all the moves that they do to aggressively silo themselves. By both deprioritizing support for patches and disallowing off-GitHub pull requests, and by championing a more convoluted workflow, they end up not only with the upstream hosting their repo there, but also with virtually every contributor hosting there, too.

The weird thing is that with Facebook, you run into a vocal minority whose response to that behavior is to say, "fuck that, and fuck them", and the FLOSS crowd generally shares membership with that group. Meanwhile, GitHub—whose userbase is largely made up of only that type of crowd—seems to deal with far less criticism over it.


Is that really any different than GitLab though...?


The number of times that GitLab was mentioned in my comment or those upthread in the parent chain: 0.


My mistake. I was referring to how the original article is about why they're leaving GitLab for GitHub.


That's not what the original article is about, either. They're not leaving GitLab. They were never there.


The problem is with people's mindset that everything has to be on github. Large projects that dare to use self-hosted more fitting infrastructure like Phabricator tend to be punished socially.


What does "punished socially" even mean?

Unable to attract as many contributors? The cold reality is that most open source projects aren't really TRYING to attract contributors. Small-time hobbyists often don't want to play with others in their personal project sandboxes, and big-time projects with high profile usually put up high barriers to contribution.

Unable to attract as many end-users, or as much public mindshare? Now this makes more sense. Many of the high-profile moves to GitHub are really about P.R., and signaling friendless to the developer community (e.g. Microsoft's recent image re-branding efforts). Not having a GitHub presence DOES make you less visible and more isolated from developers today.

Either way, I would argue that GitHub is really just a sort of Facebook or Twitter-like presence for software organizations. A way to draw attention, publicity, or mindshare from the public... but often merely a hosting mirror, and not even the primary toolset through which the software is actually developed.

This being the case, who cares if GitHub is dominant? Or SourceForge ten years ago, or whatever the next popular thing might be ten years from now? In most cases, it's effectively about as consequential as having to shift your focus from MySpace to Facebook when the audience moves.


>What does "punished socially" even mean?

This is an interesting expression by the grandparent. I've never heard this term before, but I've definitely felt like I've been "punished socially" in some way on the Internet by not having Facebook. The most common example are the large number of apps that I can't use because they require a Facebook account to get started. (I had an FB account in the early days when it was still edu only, but ended it a few years back)


Because of how Git is a distributed model, hosting your project on Github shouldn't be a big deal. You could host your project privately and then just have Github mirror that private host with hooks.

Git being distributed means that you can host your project on as many different places as you want. And making the workflow to accept pull requests for Github, Gitlab, or other providers really isn't very difficult. They are all just remotes.


Because of how Git is a distributed model

News flash: the number of people using the "D" in "DVCS" is so tiny that it might as well not be there. Central server with central authoritative repository, where all developers push and pull and synchronize, is how people use git in the real world, and it's time to stop pretending that any distributed features will ever be used.

(which is to say, if github disappeared tomorrow, everyone using it would find a way to transition to yet another central server with central authoritative repo that everyone uses)


Wait, what?

Git doesn't do remote propagation all by itself, so if you're thinking "distributed" like DNS, then yeah of course devs don't do that. Of course we "rely" on a central repository. But if github goes down for good, we change our remotes and the migration is painless.

What else would you want? Propagation would be neat but is completely unnecessary (which is why it hasn't been done). But if the remote loses all my data, well, I and every other dev have a copy locally and we can start working from a fresh remote.

Github repositories are "authoritative" by convention. These things are a feature when you work in a team. In some of my personal projects, it's not the Github repository that's authoritative but my local repository. I really don't understand what you're trying to get at, honestly... git is popular because it covers all those use cases.


I think that's a different reading of what Distributed means for DVCS.

Git is about everyone having a full copy of a repo so they can go through the entire history and branch and commit freely. To actually move a project forward amongst several devs there will inevitably have to be synchronization using push/pull so there's no way to escape that.


Everyone who uses Git is using the distributed features. It is not possible to use Git without the distributed features, because every time you do a commit or a merge or a push, you are doing it on your local machine on your own remote.

That being said, I will agree with you that most remotes exist on personal computers and not server and we tend to be focused on more centralized servers. I think that is a remnant of the Subversion mentality and is slowly getting better.

But, the distributed features of git are alive and well. Maybe not as alive and well as we'd all like. But, alive and well.


I think that most open source projects just don't have the resources or contributor levels for true distributed development to happen.You probably are right that using a DVCS as intended is overkill for most projects.

But the whole reason why git exists at all is to support development of the Linux kernel. I don't think you can brush it off so easily.

Look here for an example of what real distributed development looks like with git: http://git.kernel.org/cgit/linux/kernel/git/next/linux-next....


Forking a repo creates an alternative authoritative repository. That's a huge part of the D in DVCS.


Projects that use github merely as a mirror always run into the issue that of all features pull requests are the only one that you cannot disable, and they have to deal with closing pull requests, pointing to the real contribution process. So far github has always rejected the possibility to disable pull requests.

The problem with Sourceforge is that we now have a thousand unmaintained projects only available there as an archive and are risking to lose them if we're not careful. It seems easier to build a github-to-new_thing service as long as github cooperates (web api throttling).

If git is to be the dominant tool, it should support a wiki, tickets, code review and more complete distribution. Kinda like Fossil++. I hope git-appraisal or at least the idea gets more popular.


> The cold reality is that most open source projects aren't really TRYING to attract contributors. Small-time hobbyists often don't want to play with others in their personal project sandboxes, and big-time projects with high profile usually put up high barriers to contribution.

I'd say there is a large amount of projects between these two extremes, and Github makes it relatively easy to get contributors "accidentally", by making it trivially easy to contribute. And a small percentage of those will probably stick around.

For small contributions (e.g. bugs due to different configuration of my system that are trivial to recognize and catch, errors in the documentation, properly describing a bug or crash I see) GitHub is really quick. With other projects, figuring out how to get post access to the bug tracker and how to check that the bug hasn't been reported yet, or reading up on where and how to submit code changes probably takes more time then dealing with the issue itself, and I'm not going to bother.

Not saying that other places can't be similarly quick (I'd say Bitbucket and GitLab probably are), but many projects hosted on their own infrastructure are worse about that.


This is incredibly shortsighted, but the rebuttal is simple:

> This being the case, who cares if GitHub is dominant? Or SourceForge, or whatever the next popular thing might be.

For one, I do, and for two - why was sourceforge changing their business model to shady predation a problem? It was not because they were a website on the Internet - there are millions of URLs of clickjackers or other malware you could visit and harm your computer with, but you rarely see them.

The problem was that a ton of open source software had mindshare on sourceforge, that remains a problem to this day, and it took years to move most projects away from that hostile environment. Plenty of very useful free software remains hosted on sourceforge, and there are plenty of reasons - developer inertia, community loss from switching, legacy systems in place that aren't portable, lack of interest in learning new tools, and many more - but the reasons matter less than the result - that we have thousands of projects staying on a website known to infect people with malware.

Most of that software remains as portable as anything on gitub - often even moreso, because sourceforge offered many fewer developement ecosystem features than github now does - but has no switched for whatever reason. We can go after the individual projects and heckle them until we can get them off sourceforge, but that is a ton of effort and mental energy we could have better used making good software.

Which is fundamentally why the decision between github and any open source alternative is so important. This is not a question of benevolence, or even time - Github, Inc is a private company hosting a proprietary website that has 11 million accounts and 29 million source repositories today. Any action they take can destroy either trust in the platform (why exactly do you trust a proprietary web service, again?) or its usability for whatever purpose you depend on it for, and since it is proprietary there is no recourse. You just have to repeat the sourceforge hell and somehow move off of it as a hosting platform - except you might have drunk the koolaid, and now have your issues, releases, build service, wiki, and website all bound to the github platform. If moving just the source control, release hosting, and a forum from Sourceforge was bad, trying to get away from Github would be much, much worse.

But all these migrating projects should know better. They were already betrayed once by proprietary software they depended on, but are taking familiarity and mindshare over the security to never have that happen again. At least with gitlab, when Gitlab Inc jumps the shark, you - or anyone else could spin up the lifeboat to easily and seamlessly save your community with. And that collateral alone means Gitlab Inc. is much less likely to betray you for profit.

It is not a question of if. Unless Github open sources itself, and that is impossibly unlikely considering how huge their code base must be now and how many football fields of lawyers they would need to prune their internal code, a proprietary software project must eventually act against your interests because you are not in control of it, no matter the intention of the creator.

If you are going to have to bite the transition bullet, you might as well only have to do it once. Considering the parity between github and gitlab, I have never met anyone who would literally refuse to contribute to a project because its not on github, you just miss casual eyes that are more common there because the platform has captured more userbase.

But that userbase control is so dangerous, and we should all care enough to try to correct for it when we can, if it doesn't negatively impact us much. And honestly, a project like Python would have been perfect for it - they won't see a dearth of developer interest just because they are using the second most popular source control web service out there.


> a proprietary software project must eventually act against your interests because you are not in control of it

Being open source doesn't mean that won't happen. A lot of open source projects have acted against their users' best interests. A good example is Firefox with the whole "Pocket integration" controversy. Another example is Wikipedia, with its editorial policies that drove away editors. You can fork Firefox, and you can host your own Wikipedia mirror with exactly the same software setup, and in fact a lot of people have done so, but mantaining your own fork/mirror of a project on this scale is a lot of work. Unless you have unlimited free time (you don't) open source doesn't inherently remedy the issue that other people will do things that you do not agree with, and that may cause problems to you.

Platform choice is ultimately an economical decision. Github has significant incentives not to "go rogue", i.e. they don't want to shoot themselves in the foot. They currently provide very significant value to the community, and are in a very comfortable position, but that could change really fast if they take the Source forge route.

While the more code that gets open sourced the merrier, I personally don't believe open sourcing Github would, at this point, bring a lot into the table, especially since gitlab is so fully featured already. Their secret sauce isn't really the software, but the service they provide.


> open source doesn't inherently remedy the issue that other people will do things that you do not agree with, and that may cause problems to you.

It's not a remedy. It's just a lifeline. Some of us do take Icecat or Iceweasel. With proprietary software, you have no recourse at all. Wouldn't you prefer some safety net to none at all?


You don't have to maintain your own fork, though. When Firefox implemented pocket, Iceweasel saw a ton of uptick involvement. I'm not sure of any popular more free to edit clones of Wikipedia though, but that is only because I have never had to seek one out. I imagine they exist. Wikimedia has not gone aginst my interests yet - if they ever put ads on wikipedia, for example, that would be justification for forking. But that would not be your own clone, it would be a community divide. And most projects of scale realize that those kinds of forks are dangerous and act to bring the community back together by resolving a problem that is so bad it schisms the ecosystem.

This has happened before. Gnome 2 and 3 schismed, and Gnome 3 became more user friendly and configurable to bring back its users. Libva and FFMPEG split because FFMPEG was very contributor unfriendly, and then it started merging all the libva changes so contributing to libva got your changes in everywhere. Open and Libre office forked because of Oracle changes, and now Libreoffice is pretty much the only name left in town, and is more successful for it. It happens all the time in branches and private forks and message boards before it even reaches end users.

But that is the difference between MS Office and Libreoffice. When a problem is huge to the OpenOffice community, they fork and start LibreOffice, and when its substancial enough the community overrides the original developers and takes over when terrible decisions are made. If Microsoft fucks up their office (and the early ribbon was a great example of that) you are stuck, and have no power. When the version you like goes end of life, you either sit on decaying software or forfeit your preferences because Microsoft has complete control.

If Gitlab jumped the shark, you would not be spinning up your own gitlab clone to maintain yourself - the gitlab community would fork away from Gitlab Inc and create Librelab, and mindshare would drive products to it because its a seamless transition. And that is more unlikely to happen with Gitlab in the first place because the power of the user can keep the company in check from making extremely user hostile moves like Google, Microsoft, Apple, Amazon, Facebook, etc can do with impunity because users are trapped and they hold all the keys to the castle.

> Their secret sauce isn't really the software, but the service they provide.

The services are the problem. Its the same problem with Facebook - Google+ is feature equivalent, but it was dead in the water to start and its free clones like StatusNet (albeit thats more twitter) are completely dead because they aren't just personal software, they are platforms. And platforms with mindshare and the users are insurmountable.

Its not about the wiki, or issue tracker, or pages. Those are replaceable - painfully, but doable. But replacing the community of developers is not a solvable problem, because they are already on github. You have already lost. And that gives github overwhelming and dominant power in how you conduct and host free software now, with no recourse.


My question was what it means for an open-source project to be "socially punished" by not hosting on GitHub. My conclusion is that it doesn't mean much of anything in terms of actual software development, and only matters in terms of P.R. and marketing.

Some of these replies are pivoting to talk instead about the fate of abandoned projects on SourceForge. That's a great discussion, but is a DIFFERENT discussion. The parent comment talked about consequences that reduce engagement with active projects. By definition that becomes irrelevant when the projects are inactive and long abandoned.

Anyway, nothing in this lengthy post is really all that compelling. Developers didn't shift from SourceForge to GitHub because SourceForce became shady. The timeline was the other way around, SourceForce became shady because everyone left. That initial shift happened because tastes changed, people legitimately preferred what GitHub was doing interface-wise... and once a critical mass has moved, it pulls everyone else who wants the social advantages of being in the popular destination.

If GitHub turned evil (or worse yet... "uncool"), then I just don't see the migration pain you're talking about. Pushing a git repo to a new remote is trivial. Learning a new issue tracker ticket system takes about 10 minutes (they all work basically the same), and SOMEONE will write a script to automatically migrate GitHub issues to the new thing. Wiki content? Sheesh... it's just Markdown.

The biggest risk is that you've used your GitHub URL as your exclusive web presence, rather than spending $10/yr on a real domain name. But if you've made that mistake, then you'll probably make the same mistake with ANY solution that isn't self-hosted.

By all means, host your own personal open source projects wherever you like. But telling everyone else to stop doing what everyone else is doing, on the basis of some Richard Stallman-esque principles that don't hold water, seems like a waste of time and energy.


Github.com hosts approximately 50 million pages right now. Many of those pages have inbound links associated with them.

If those pages were to disappear or if the decision was made to become less benign then the damage would be substantial and it would take many years to heal the rift.

The web never really recovered from Yahoo's bad stewardship of Geocities (untold backlinks broke, including many from Wikipedia causing those links to point to nothing which led to articles becoming unsourced, or -worse- being changed by bots to point to linkfarms instead).

If github turned evil the migration pain would make the Geocities demise look like a veritable walk in the park.


I suspect what you're seeing as social punishment is people preferring the UX GitHub offers. That's just user choice in action


Not really, the inconvenience of having to register another account (which often means another set of password) is quite a big deal for entry.


I'd consider "single sign on that connects me to tons of projects' repos to be great UX. Personally, it (and the centralized API) are big reasons I use GitHub to interact with projects whenever possible


Centralizing the "hub" pretty much negates the distributed aspect of a dvcs, and github has no incentive to invest in extending git to be even more distributed ala fossil or bittorent, to mention two not quite fitting but conveying the point examples. Maybe git-appraisal will grow into something good.


This gets brought up pretty much every time GitHub is mentioned on here, and I've never understood it. Using a hosted service like GitHub as a remote for your git repo doesn't "negate" the distributed aspect, any more than parking your car somewhere negates its ability to drive around.

I can use GitHub as a remote and take advantage of the social and UX aspects they offer, and if GitHub vanishes (or has downtime), I can continue utilizing my git repo with all the distributed features it had before. I can (and have) continue to work while GitHub is down, I can export any of the additional data (issues, wiki, etc) that GitHub offers and store them just like I store all my backups, etc.

Centralized hosting solutions can offer value, even to distributed tools.

Where GitHub ends up conflicting with distributed workflows is when build pipelines or other systems are set up to rely on GitHub (Go dependencies, I'm looking at you). In cases like this, the user is making a choice to pin that piece of their infrastructure to a centralized service. And that's not always a bad call! If the value provided outweighs the harm done when the centralized service goes down or has other issues, it can be a good choice. It's up to the person designing their infrastructure to balance the risks and benefits, as with anything else.

Specifically speaking to GitHub's lack of incentive to invest in extending distributed git features: that seems counter to actual events. They recently worked to develop their large file storage, and rather than release a closed source / centralized system, they released an open source extension that ties in with core git (https://git-lfs.github.com/)


I don't know how git-lfs is distributed.

Yes, you can still use the basic git features, and github doesn't break them. The problem is that instead of turning Git into a Fossil++, we're relying solely on Github's tooling around git. Yes, it's easier to have a centralized location that doesn't require synchronization for tickets and reviews, but when you have to move off github, you cannot really export all the history properly without losing some details, so I firmly believe we have to make more of the git story distributed. Those features can be used by github in a friendly web interface, but like the git push and fetch model, it should exist in git itself. That way, it really won't matter where users make contributions, and it will solve the Phabricator vs github dilemma for big projects, plus allow for full archiving and porting of the data. Once your tickets and diff reviews are part of the git repository, you get features that enhance the user experience. I believe git-appraisal is using git's notes feature, and that's a good way to avoid extending the data model.


As I noted, their API lets you export the data they add: issues, wiki, PRs, and the like. And git-lfs lets anybody run their own large file systems, backed however they want, without GitHub being involved.

If you're arguing that open source and self-hosted alternatives get better at the UX and communal aspects that GitHub has been excelling at, I'm all for it. But lets not sell short that the reason people are flocking to GitHub is because what it offers has value, value which is worth using a hosted centralized service for them.


Something similar to git-lfs:

https://git-annex.branchable.com/


The negation is that we centralize important data like code reviews and tickets in an external place that can be easily lost. Without those, most git repositories fail to tell the full history of a project.


Joey Hess has partially solved that with his github-backup, which "is a simple tool you run in a git repository you cloned from GitHub. It backs up everything GitHub publishes about the repository, including branches, tags, other forks, issues, comments, wikis, milestones, pull requests, watchers, and stars."

https://github.com/joeyh/github-backup


Most useful, thank you! I'll pass that link on immediately to some people that were wondering about just that a few days ago.


Usually you can log in with your github account.


This. Github's UX is nice, but so is BitBucket's and other competitors'.


Maybe, and even despite the missing code review functionality, if you compare it to stuff like Phabricator.


I'm looking at projects like nginx that had to setup Github mirrors basically out of pressure.

It serves no value for the project because they use a different system for version control, code review and ticketing, yet someone had to go setup a Github mirror because.... people expect to see you on Github?


Without competition they don't really have much of an incentive to improve.


Because github can selectively own anyone that uses code downloaded from it, that's why.



It's absurd that so many open-source and / or free software projects are happy to host their public-facing development infrastructure with a private company running proprietary software.


Why?



What's the alternative? Everyone running their own little proprietary stacks with none of the features or community or ease of use?


Why would a free software project leave GitHub just to run a proprietary "stack"?

You could try one of the self-hostable free software alternatives listed here: https://en.wikipedia.org/wiki/Comparison_of_source_code_host...


With self-hosting you don't get the broader community... With node, for example, most modules have source on github and it's easy to fork, fix/enhance something and submit a PR... with everyone self hosting, how many logins do you have to create.

I know, tracking, federation, etc... but it's much nicer only having a couple logins (google, fb, twitter, github) for most sites I access regularly.


SSO is a separately solvable problem, and even so, I'm not sure it's a solution that needs applying here. It's not clear why any form of sign-on is needed—SSO or not. Almost everybody in this space is already using public-key crypto for authentication.


But not for signing into the website, and adding notes for a pull request, or filing a bug/issue/feature request.


I don't even know why I bother trying to have discussions with GitHub users.


How does public key crypto let me use multiple website, similar to github's, so that I can create a PR, and update/edit notes/comments on issues? It's a demonstrably broken experience in the browser.

Having one key to rule them all, so to speak is far easier than having to sign into every project's own issue tracker. I understand the risks (ie: sourceforge), but have some level of trust in GH's founders.


self-hostable and free doesn't mean they're not proprietary.

and my comment was more about the fact that github is just git (which is completely open) with some extra features. like the other comments have said, it's not really a big risk.


I don't mean free as in price. I mean free as in freedom, and I mean proprietary as in non-free: https://en.wikipedia.org/wiki/Proprietary_software


It's easier than pull requests, to email a patch.


To be honest, I don't find this to be true. To email a patch, you usually have to

* figure out if you can use git-request-pull or git-send-email * sign up for a mailing list (which, depending on the mailing list software the project uses, can take anywhere between a few seconds and annoyingly large amounts of minutes) * optionally figure out how to not receive the whole traffic of the ML or set up custom email filters * figure out if you have to CC the people responsible for the area of the code you're going to send a patch for (which might be written down in a wiki, some CONTRIBUTING file or somewhere else) * figure out if the project wants an extra cover letter for the patch series or not (and optionally search the man pages for how to write one).

With GitHub, I just use [hub](https://hub.github.com/) or an Emacs package or plugins for other editors to

* fork the repository * (without additional tools, I now have to add my newly created remote) * push to my own remote * create the pull request after reading CONTRIBUTING<tab>

which so far has been the exact same workflow for every project I've seen (ymmv of course).


If I was going to email a patch to a project,

- I would already be following the mailing list and/or the bug tracker,

- I would already have read the relevant guidelines before starting to work on the tree, and

- I would already have figured out to whom should I refer my patch.

And no lock-in to git itself. The project can use git, hg, cvs, svn, darcs, rcs, sccs, whatever. I can, after obtaining the tree, create diffs to orig files and be done with it.

If I was going to contribute to a project on github (which I did a few times), the above-mentioned are still relevant, if one wants to contribute to a projects, they should be familiar with it. Also, I would have to know how things work the github way. And because the github way is so mechanical, it becomes hard to enforce project rules.

Then, the github web interface is score oriented: Commit numbers, release numbers, source tree layout in the face, source language statistics, search that can take me to other projects, profiles with contribution numbers, many other irrelevant stuff. It makes one want to "score", and "show off". It distracts from the actual goal of one's contribution: sharing.


I agree. I like GitHub (and I pay for their service). But man if they wake up one day and decide to go the SourceForge route (or demand ransom to not do so) it is going to be a pain to redirect stuff.


The good news is that it is straightforward to make a copy of much of the important public data on GH.


It goes a lot further than that. Github quite literally becomes the hub of the projects it hosts. Without the hub the spokes will have to reconnect and many of the glue bits will be lost in the process and some of those bits are very important. A public home is a very important thing to an open source project.


Even the issue tracking?


Someone will probably point out that data can be exported, but that is frankly irrelevant given that the URLs for your project are now embedded around the entire world to a domain name you don't control (github.com), and which your goal should be to "outlive". There is tons of historical content sitting on SourceForge and Google Code that has links all over the Internet pointed at, and it is a major issue that you can't migrate and take these links with you; if github.com offered "bring your own domain" as a feature for some moderate cost, it would be much more viable as a platform for people who care about the longevity of their projects.


You can use your own domain for GH pages... which many sites do use for their documentation.


Their API exposes issue data, and there are a handful of 3rd party API-based tools designed to back up / extract it.

Also, for anybody not already aware (I didn't know this for a long time): GitHub repo wikis are themselves git repos that use Gollum


I've already written a standalone tool to do exactly this: https://github.com/josegonzalez/python-github-backup


The API lets you access all that.


Thankfully, setting up a git mirror on alternative hosting platform and update it automatically is almost trivial.

Of course, you have issues and discussions that are tried to Github's platform, but we don't have any universally accepted format for it except email anyway.


especially given some of their behavior in the last couple of years...


well I don't know if I agree with that. I haven't seen an instance where their behavior has been amoral.

I have seen instances where they have been cowards, regarding DMCA's and acquiescing to threats, but that is a minor sin. I don't expect every organization to fight every fight worth fighting.


this is pretty significant; so far the only git presence of Python was a "semi-official readonly mirror" on github: https://github.com/python/cpython

now does this mean that Python will completely switch away from Mercurial? this was one of the major projects still using hg

what does this mean for mercurial's global adoption vs git?

also, i don't understand why the free software aspect of gitlab wasn't an important argument in the decision... that seems like a key element of the difference between the two platforms.


Yes. It means Python is no longer using Mercurial. We knew this was a done deal a several weeks ago.

Mozilla, Octave, and Hedgewars are a few remaining projects that still use hg. For them, I will continue to improve, promote, and use Mercurial.

Their reasoning for using Github have nothing to do with the technical merits of git or hg or with the freedom of the software or platform. They want more contributions and they figure using the popular platform is the way to obtain them.


SDL also uses mercurial I think.


Well, Facebook is probably the single largest corporate user of Hg in terms of repo size and investment into the tool. In terms of influencing global adoption, they probably pull more weight in the community than Python did.


This is true, since most of the hg devs are affiliated to Facebook, including lead dev mpm.


That's what I'm wondering, too. Mercurial (which I prefer) seems to still have pretty decent usage in the Python world. And Bitbucket still pretty Mercurial.


If there ever was a company that became 'too big to fail' it is github. A major security breach at github would have earth shaking consequences and if they ever went out of business it would take a long time before the rift was healed.


I think that the fail of Amazon and their AWS would have far worse consequences. Half of the internet would stop working (the half that's not using Google's and Microsoft's services, anyway).


Everyone is moving to GitHub these days. Why doesn't Gitlab.com get more love? Isn't their whole stack open source?


I have three personal reasons for preferring Github to Gitlab. I'm not going to talk about the community edition of Gitlab, since Github Enterprise is a separate topic and frankly I think most individual developers couldn't be bothered to download, run, and maintain their own Gitlab instance (as evidenced by the fact that Github.com is so damn popular).

1. Gitlab themselves admit that Gitlab.com is bad[1]. Making a cloud-based software development platform is hard, and Github has some of the best engineers working on scaling the service. Gitlab doesn't have nearly as many resources.

2. The community is thin[2]. Almost all of the largest projects on Gitlab.com are Gitlab. Gitlab CE is maintained by only a handful of people. Go through the list of top contributors on Gitlab.com and you'll find that they have almost zero "personal projects" hosted there. If the developers of the service don't even use it, why should I?

3. The UX of Gitlab is honestly quite poor. At best, it's a clone of Github's UI. At worst, it has way too much scrolling thanks to inconveniently placed whitespace, poor performance, a confusing information architecture, and oodles of low-contrast text. I personally find getting around Gitlab to be tiring and confusing. Here's an easy example: from the Contributors tab of a project [3], how can I see more about a contributor? You just can't. There's lots of little papercuts like this scattered across Gitlab. Not to say Github is without papercuts, but it certainly has far fewer.

[1] https://about.gitlab.com/gitlab-com/ [2] https://gitlab.com/explore [3] https://gitlab.com/gitlab-org/gitlab-ce/graphs/master


> At best, it's a clone of Github's UI.

What I don't understand right now is why they aren't trying to copy GitHub as much as possible ... when it makes sense of course. I've been studying both GitHub and GitLab's design in detail, because I want to incorporate my technology into their branches and commits page, and I've found the user experience between both quite striking.

When I get some time this week, I would like to submit an issue with GitLab to let them know in more detail what I think they can do better, but here is just a synopsis:

- I still can't put my finger on it, but there is something off about the font.

- Avatars with 50% border radius is a bad design choice in my opinion. The problem with creating circular avatars is they hide too much of the image and they create a focal point towards the center of the image. If your avatar doesn't have a natural center focus, it will look bad and create unnecessary eye strain, since your mind will naturally try to fill in cropped areas. Keep it simple and use a simple border radius of 8 or less.

- Avatars on the commits page are too small. I'm not sure if this design decision was the result of the Gitorious acquisition, but Gitorious had this problem as well. Using small avatars has its place, but not on the commits page since this page is designed to help you better understand who did what quickly.

- How the commit message is revealed in the commits page is too jarring. If you click on the ellipses at

https://gitlab.com/gitlab-org/gitlab-ee/commits/master

and

https://github.com/gitlabhq/gitlabhq/commits/master

you'll better understand. The problem with GitLab's implementation is the strong focal point is the avatar and when you click on the ellipsis, your eyes will get dragged down with it when the message is revealed. Just copy GitHub and use a bigger avatar and have the commits message reveal below the strong focal point (avatar)

- The calendar icon on the commits page is visually too strong and should be removed.

- The clipboard and other elements on the commits rows creates an unnatural balance/flow. If you look at GitHub's commits page, the clipboard, commit sha and browse tree icon are equally balanced in size and weight, which creates a natural horizontal flow. You can sweep from left to right and it won't create any unnecessary jarring effect.

This is just some of the things that I've thought about when I was looking at the commits page and I really don't understand why they aren't copying GitHub's UI in a lot of places. The amount of money that GitHub is investing in their UI vs GitLab's is quite significant, and it should be obvious that GitHub is way more capable of producing a better user experience, so why not copy it?


GitLab and Bitbucket are more about private companies and their private hosting for doing their actual private work.

GitHub does that too, of course. But they are also basically a social networking website for individuals and organizations in the software industry. Prominent companies put code on GitHub because that's how you maintain a public presence in the developer community today. Even though the GitHub repos are more often than not mirrors, with the code actually being developed with internally-hosted repos and tooling.

So as far public destinations for open source code, GitHub has that market pretty sewn up for now. However, there's lots of competition in the "charging money for private services" market. Atlassian is a giant in the enterprise world, and I'm sure GitLab is doing well too. GitHub sells private services too, of course. Lots of people argue that Bitbucket's tooling is better, and even the paid version of GitLab is insanely cheaper... but GitHub does have an advantage in being able to leverage everyone's familiarity with it.


Gitlab is really nice, but the user experience isn't quite as slick as Github's. Understandable, since Github has a lot more engineers behind their infrastructure.

I think Gitlab's current user base is for developers/startups who want VC backups for their private codebases. I'd probably use Gitlab if I were developing a private project that I didn't plan on releasing.

Github, by design, is friendly towards people who want their code to be open to the world and encourage exploration of interesting projects. It's a great for sharing research code, and many big companies (Google, Facebook, Apple, Microsoft, Mozilla, Disney, Pixar) already put their stuff there. It's highly unlikely these big fishes will move to Gitlab anytime soon.


Definite +1 on the italicization of 'quite'.. GitHub's UX advantage over Gitlab is marginal at best, at least as far as their web client is concerned. The desktop experience is pretty nice, but we've otherwise got SourceTree for non-developers who need a decent UI.

Gitlab is the first piece of open source I've seen in probably a decade where I haven't a single bad word to say for it, their attention to detail is superb, starting with the installation process. I'm usually by default a critic of pretty much anything I encounter (aka. "typical HN comment author") but I find myself advocating Gitlab constantly


I will cut the BS and give you my thought: simply because few people are interested in maintaining GitLab. If I were in the dev's position, I would have more time to focus on improving build pipeline, like adding some cool bot.


Source code hosting is an area where longevity is probably the single most important factor.


It looks like the vote was not "unanimous" https://mail.python.org/pipermail/core-workflow/2016-January...


If I were them I'd move issue tracking to GH too _in addition_ to the current tracker. Having GH issues that are familiar to developers lowers the barrier for feature requests and bug reports which is a huge deal for a language.


I'm not scared of Github. If they take bad decisions for the community, we will tell them. But I have to agree that the worst is the lack of competitors.


(3). Guido prefers Github ~deal with it~

Seriously, I like the idea. It will give visibility for who contribute and might bring some new contributors.


GitHub is popular. More people will come in contact with the development of Python by Python being on GitHub. That's a significant benefit to any open source project; one that I believe outweighs the concerns of a private company valuing business over developer ideals.


On the other side, Ruby refuse to move to Github, but are actively trying and hoping to attract more contributors.


So is github the Google of code?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: