Funny story: my first task when I joined the original iPhone team was to merge our forked WebKit with master. It was a sort of hazing ritual slash "when else would we do it but when someone new joins?". Anyways, we used a tool called SVK[1] in order to get very primitive "git-like" abilities. It was basically a bunch of scripts that used SVN under the hood. For example, in order to get the "local everything"-style behaviors of git, the very first thing it did was checkout every single version of the repository in question. For WebKit, this meant that the first day was spent leaving the computer alone and letting it download for hours. I made the mistake of having a space somewhere in the path of the target folder, which broke something or other, so I ended up having to do it all over again.
Anyways, I distinctly remember one of the instructions for merging WebKit in our internal wiki being something like "now type `svk merge`, but hit ctrl-c immediately after! You don't want to use the built-in merge, it'll break everything, but this is the only way to get a magic number that you can find stored in [some file] after the merge has started. If it's not there, try `svk merge` again and let it go a little longer than last time." A few hires later (I think possibly a year after) someone set up a git mirror internally to avoid having to do this craziness, which if I remember correctly, was treated with some skepticism. This was 2007, so why would we try some new-fangled git thing when we had svk?
We had a similar rotation on Chrome team for merges from WebKit (pre fork), and it was similarly a lot of work and clunky tooling!
A few times in my career (including this one) I have thought, "We are sure going to a lot of effort to maintain a modified copy of that code while also preserving our changes atop it as we sync, and this is exactly the kind of workflow that Git was designed to enable." Like, the Linux kernel dev workflow is all about different maintainers maintaining different branches and merging between them, and that is where Git comes from.
So in a setting other than Chrome I have tried out using Git to try to manage these sorts of situation. I have found in practice many engineers aren't comfortable enough with Git to have it end up helping them out tooling-wise. This is disappointing but also not too unexpected given Git's UI.
I worked at Rockmelt from 2009 till it was acquired by Yahoo. You've never heard of it, but we had a browser built on-top of Chromium that had built-in integration with Facebook, Twitter, Posterous, RSS feeds, everything social of the day:
I was our build/release engineer. One of my jobs was keeping our code rebased on-top of Chromium.
It wasn't terribly painful but I could always tell when a new engineer joined Chromium because they'd inevitably rewrite some major component and there'd go my day porting our code onto the rewrite. (I did more than just resolve merge conflicts. I also took a first pass at updating our code before handing it off to the rest of the team.)
We were using git from the beginning. Chromium was using svn then, but Google had an official git-svn mirror and I worked from that.
It's been a while, but I recall that I switched us from working off the tip-of-trunk to working off of release branches. That became feasible when Chromium switched to its more frequent release schedule (2 weeks I think?).
I don't understand any developer's that aren't willing to put in the time to learn how to use Git - to me, it's the single greatest tool available to enable productivity and confidence in changing code. There's no shame in using any one of the many GUI interfaces for Git that make the process simple and intuitive but even with the CLI, there are only a small handful of commands that I regularly need to use to do all the work of managing branches, merges, rebases and resets; and a lot of the time, there's more than one way to do any particular operation.
> I don't understand any developer's that aren't willing to put in the time to learn how to use Git
I can effectively use, I don’t know, probably the about 5% of git that I need to do my job. It could maybe benefit me to learn 5% more? The rest feels like a trivia rabbit hole, I have shit to do, and I already have a lot of difficulty with cognitive load and far too many yaks to shave daily. With that said, I have no strong disagreement with you, but I do want to add a bit of nuance: “learn how to use git” is extraordinarily open-ended, and one of the most challenging parts is to know where to even start. Or continued learning has any practical benefit.
A little less nuanced: I encourage everyone who will listen, even experienced devs, to use a GUI frontend. Not just because GUIs do ~90% of what you’ll normally need in a daily workflow (which is especially good for noobs who learn which things are important to know first). Also because the GUIs generally have really obvious cues for how to unfuck a mistake, which is the most difficult thing for people who aren’t already well adjusted to git. I use a git GUI for almost all of my version control tasks, and I’m much more effective for it.
I’ve learned more how to use the git CLI for an art project than I’ve ever needed to know for normal work processes.
(Self nit: bullshit made up approximately educated guess percentages)
This (and the reply to js2) gives me the impression that "effectively use 5% of git" means something like "effectively use 5% of the git CLI surface", but to me that's not what "learn how to use git" means.
One way or another, I've wound up becoming the git guy at work, and think I'm on the more proficient side of average. But, for most anything outside of commands I use daily (or can easily find in shell history), I'm off to the git docs or a search engine. Just about always, I know what I need git to do, it's just a matter of finding an incantation to do it.
To me, there are two pieces to learning git. The lower level is about git itself: understanding that commits are snapshots and the diffs are calculated on-the-fly, that cherry-pick is an automated version of diff and patch, rebase is like a way to compose cherry-picks. The other layer is about how the organisation uses git - like how to fork a project on github, push some changes to a branch in your fork, make a PR to propose those changes to upstream, why committing directly to main is a bad idea.
> understanding that commits are snapshots and the diffs are calculated on-the-fly
I never understood why that gets mentioned as essential for understanding git. A version control system has to be able to produce both all versions of a given file and diffs between versions of a file, but how it does that is an implementation detail. There are many options there: full first version with diffs to newer versions, full last version with diffs to older versions, variants of those with full versions inserted every now and then, mixes of forward and backward diffs (as is done in some video formats, if you see each frame as a version of a file), etc.
Of course, there may be performance reasons for choosing an implementation. I could understand statements such as “Because of the existence of merge commits, it’s easier to store full files, rather than diffs, as a merge commit would have diffs with each of its parent that must be kept consistent”, but the “understanding that commits are snapshots” claim isn’t about implantation details, but claimed to be essential to understand git.
That's a good question, and I don't have a straightforward answer.
I think git sits in a sort of uncanny valley, where it's possible to go a long way treating the tool as a black box (and you're totally right - snapshots vs diffs is an implementation detail), but in practice, to really "get it" it's necessary to understand a bit about what is going on under the hood. The problem space that git addresses is inherently difficult, in contrast git internals are quite straightforward.
So, a person wanting to learn git has two options: they can keep the hood closed, study the manual, and fret over all the complicated looking knobs and switches. Or, they can have a look under the hood, realise that there really isn't much to it. Most of those controls just aren't relevant to do what they need to do at the moment, and when a problem arises they have a much better idea of where to look for solutions.
In my experience, being a well-rounded software engineer (for instance able to collaborate with a team at work, and make the occasional PR to random outside projects) requires a certain level of git proficiency. At that level of proficiency, the black-box approach seems to have a much steeper learning curve than the look-under-the-hood approach.
At a less philosophical level: git is a collaboration tool, and people get hopelessly confused about this stuff when they try to collaborate without sharing a common language. "How do I email a commit?" is the sort of question I get asked occasionally.
> “learn how to use git” is extraordinarily open-ended, and one of the most challenging parts is to know where to even start.
It's not really. It seems like it is because it has a baroque CLI with a ton of commands and those commands all have a ton of switches and the same commands can do lots of different things.
But under that atrocious CLI git is very elegant and simple conceptually. Once you understand the underlying concepts (blobs, trees, commits, refs, index/stage) the rest flows naturally.
Now I've been working with it since its earliest days including contributing to it here and there, but I never really found it very hard to grok.
"Git from the bottom up" is dated but git hasn't changed conceptually since then. "Git for computer scientists" is another good one. But any of the tutorials that start with git's low-level concepts and build from there are where I'd start.
> It's not really. It seems like it is because it has a baroque CLI with a ton of commands and those commands all have a ton of switches and the same commands can do lots of different things.
Do you want to be right or do you want to win? I was trying to remain sympathetic to your concerns while offering an alternate approach to learning git you might not have considered. I don’t have any trouble with git’s CLI but I understand why people find it frustrating. That is all.
I’ve yet to meet anyone using git command line in their day to day activity. Everyone’s just using IDE. So IMO git CLI is not the issue in most scenarios. You might want to have git guru in your team to resolve some particularly hard situations, but that should be an exceptional situation.
Inversely, I don't think there's a single developer in my work center that routinely uses git anywhere but the CLI. I think CLI git usage may be more common than you believe.
To be clear, I think many developers are comfortable enough with the small handful of commands most regularly use. For the more advanced case of maintaining a fork of a high-velocity codebase like WebKit, it's likely you'll need a deeper understanding of remotes and how to manage complex rebases, especially in the presence of lots of conflicts. And possibly some fancier tools like git-subtree. In particular in WebKit my recollection was it was common to patch something locally and eventually take it upstream, but after upstream's requested modifications the patch would eventually come back around and conflict with itself.
> I don't understand any developer's that aren't willing to put in the time to learn how to use Git
It’s not like git invented version control or even distributed version control. Stuff predated git and git wasn’t even the only solution that was coded together when BitKeeper changed license terms (personally, I wish Mercurial had won because it has a much better interface, but what won won and I’m happy enough with git to not really care).
Putting all that aside, systems and codebases have their own workflows for a reason. I’m sure the reason WebKit was on SVN for so long wasn’t because the committers weren’t willing to use git (we’ve seen in this comment thread revelations by former Apple engineers who admit that they maintained a shadow git-based codebase internally), I’m sure almost everyone involved uses git. But for whatever reason, there were blockers to migrating (and some of those are explained in the WebKit blog).
Now, as an outsider, I might think that waiting this long to migrate to git, a solid decade after it made sense to do so, is odd. But I don’t have the context. I don’t know the reasons why, and for what it’s worth, the git-svn mirrors seemed to be working well for the people working on the project.
WordPress, a much more active open source project, at least in terms of outside contributors, is also still on SVN. Like WebKit, most contributors work on the git mirrors rather than using SVN. As an outsider, I can also think that it’s ridiculous for that project to still be on SVN, but again, I don’t know the context. I don’t know the blockers, I don’t know the workflow considerations.
But I do feel confident that none of these decisions (or lack of decisions) were made because developers weren’t willing to learn git.
The problem is that you spend 95% of the time using it on the golden path; add, commit, push, maybe click a merge button somewhere. At least in my experience
So it can be hard to learn the hard parts by doing, because by the time you need to do them again, you've forgotten what you learned last time
I feel like you do need a bit of rigor, especially because rebases and the like often don't do what you intuitively want, leading to merge conflicts over and over again when trying to clean things up.
Even small stuff like not knowing to enable rerere means that rebases have incidental complexity compared to, for example, outright recreating commits in another spot.
I disagree with most people about mercurial being "better" (I love my staging area), but doing stuff with git in practice can be super duper fiddly. I've found it's much nicer to do the right thing with stuff like magit though. I bet there is a great UI that is yet to be built for most people that makes it easier to do stuff like "updating old commits" or other things that you end up having to do through rebases.
I use the bare minimum of it and am not happy. I’m sure it’s a great fit for the Linux kernel, but for a sole developer or small team, not so much.
I’d rather just have several independent file system snapshots of a particular folder + logs. I’d like to rewrite history easily. But that ship has long sailed.
In this case, was it really working? It sounds like they had enough friction to make considering alternatives reasonable.
In general, I agree with your point but I think there's a very pragmatic argument that the open source world heavily converging on Git means it's worth knowing even if it's not your primary tool. In the 2000s that was spread out more with CVS, SVN, Mercurial, etc. also being good candidates depending on what communities you worked in but it's been quite a while since I've seen even Mercurial being used in the wild.
I'm totally willing to learn the magic of advanced merging, but most "tell me more about Git" talks/articles rather want to tell me more about its general internal structure, which I find very far removed from actually using it.
So what's the best resource for learning more about using Git?
Personally, I found https://learngitbranching.js.org/ surprisingly helpful for getting a mental model of Git usage, even though I had already been using Git for a couple years. As always with learning resources, YMMV.
The amount of developers who don’t know about “bisect” alone makes me sad. Such a powerful tool especially when you’ve got a good replication of a bug in some kind of test-like thing you can run.
We've been working on a migration from SVN to Git (since 2019, still going strong!) and, since our workflows are all built around SVN assumptions and SVN workflows, it has served only to make everything more complicated and more finicky than it would have been.
I'm looking forward to doing more modernizing of things over time, but for now most of our work is trying to map SVN semantics and structures onto Git, and dealing with weird tooling issues.
Case in point, the Git SCM branch source that the Jenkins multibranch plugin uses; all we need to filter on is branch name, which you can get from `git ls-remote <remote>`, but because of the way it's designed it actually clones the entire repository (between 4 and 12 GB) and then checks the branch list. Nightmareish.
Still, a lot of stuff works a lot better, and now we have more and more developers and teams testing out or using Gitlab's features, like merge requests, and more and more projects trying to modernize their approach with CI builds and the like. It's a very exciting time.
I worked in the Cocoa group when we migrated from cvs to git. But it wasn't exactly cvs, it was ocvs, which was an ancient forked version of cvs that handled certain directories (nibs) as tar files.
There was no direct importer from cvs to git, so we had to go through Subversion as an intermediate. This was Cocoa so its history was very long, but the earliest history turned out to be mangled, so some poor fellow was tasked with learning how to repair CVS history, only so it could be migrated to git.
Once we got to git our lives were so much better. I'm glad we skipped svn.
I'm curious why it wouldn't make sense to just think, "Before THIS DATE we just don't care about history anymore".
I look at my main project and I'm fairly happy to forget about 90% of the branches I have hanging around and half the lifetime of coding it, and it's just 100k LOC.
But where do you draw the line when migrating to a superior source control system isn't possible because of your current source control system?
If curiosity is what you're wanting to allow, keep the current source control system on ice, and move everything to the superior one at its current state + a few releases back. Save a decade or two.
It seems to me a version (sic!) of technical debt otherwise.
I think it's especially valuable to try and be complete when doing a migration from a centralised VCS to a decentralised one; once you manage to migrate the repository once, you can then avoid any necessity to maintain servers for it.
In extreme, you could just distribute a tarball of the decentralised VCS, hence all you need is the ability to distribute a tarball, which is a lower bar than maintaining servers for it indefinitely.
As a (relatively) young dev, I always find it wild whenever I'm reminded that git is such a young tool, because it feels like it should be contemporaries with vim, perl, or even emacs with how fundamental it is to modern software development. Hell, I remember when python was the fancy new kid on the block, and even python2 is a full five years older than git.
That someone who worked on the iPhone Team responded here - much less with such a detailed reply - is the reason Hacker News is one of my internet GOATs.
Thank you - first for your contribution to the product; and secondly, thank you for opening up a window into the glass wall that is Apple.
If you have that access, you should use svnsync instead. For a large repository, running rsync can take hours just hashing all the individual files on either end to compare, just to eventually update one or two revisions. svnsync is much more 'aware', and so it's much easier to keep up to date if necessary.
True. Or just zip/tar it up and then rsync that. To be clear I was talking about a one time thing, not continued updates -- which would be done with SVK
Describe works it’s way backwards to find a tag matching the search pattern. If you are checked out on origin/master and the tags come from the same centralized origin, then you will have a predictable global order.
It’s basically the same thing as rev-list that they do, except more readable, with tighter integration to tags and with the result usable as a commitish.
"zero-tolerance performance regression policy"... no patch can land if it regresses benchmarked performance.
I'm guessing the tooling around this used subversion's increasing commit numbers and it was easier to add a shim to git, than to rewrite or rethink the tooling.
> If a patch lands that regresses performance according to our benchmarks, then the person responsible must either back the patch out of the tree or drop everything immediately and fix the regression.
I imagine that a security hotfix would lead almost immediately to the second situation (perhaps as soon as the implementor had gotten some sleep!)
The builtin tooling is insufficient for many purposes, including if your bisect algorithm requires you to run tests across many machines. Many large projects write their own bisect framework because of this.
It is intuitively a bit hard to believe, but bisecting in parallel is actually not much faster than serial. In my experience you just save the final one or two steps in the bisection - regardless of how long the range is!
Imagine bisecting builds with parallelism of two, when one of them completes, there is a 50% chance that the build on the other side of the range is now uninteresting. If you are lucky and it is interesting, you’ve only saved yourself half a step in the bisection, because when running in parallel you sliced the range in 3 rather than 2. Adding even more parallelism just makes this effect even worse.
Someone can probably work out the math better than me but you can quickly see that for 2x build power you instantly waste half the results for very marginal gain.
Just comparing the big O should also tell you this, parallelism only buys you O(N) while bisecting is O(log N).
I wonder if this policy, or the general mentality that produced it, is why webkit is so underdeveloped and widely reviled by devs compared to alternatives.
Sounds like one can push in a security bug or performance increase that is fast because it is frankly completely broken, but wouldn’t be allowed to reverse the decision at least not without finding some alternative unrelated performance increase. Features or fixes that may obviously be worth an associated dip in performance can generally not be implemented.
Under these conditions if I knew of ways to improve performance without adverse effect, as a developer I would sit on them instead of applying them because I would need a stockpile of reserves to apply in case of emergency to account for unrelated performance degrading changes that absolutely need to be made. I would also work to make sure the benchmarks were lousy and irrelevant.
Policies like this can’t be written by people with any semblance of sense. Singular mindedness on one metric and extreme policies like this can kill development and subvert goals just as lack of discipline can. Everything is a balance. Weigh performance metrics heavily by all
means, but absolutism leads to the absurd.
I guess as long as Apple holds its monopoly on keeping iOS device browsers exclusively on Webkit, it will stick around though.
Sadly having had years of an inside perspective on the competence level or lack thereof of decision makers in the non-revenue generating parts of their software business, can’t say this policy surprises me.
Seems like the other commenters understand this already but it took me a while to figure it out so for anyone else that's confused: IIUC by "natural ordering" they mean you can tell the ordering just by looking at the IDs.
Funnily enough I have the opposite desire - I've worked in a VC system with "natural ordering" and it once led to an incident, where I visually compared two version IDs and said "yep this release has the bug fix". Turns out this is hard to do accurately for big numbers and I was wrong. I put a big warning on our ops documentation saying "never compare version IDs visually" with a link to the postmortem!
I have literally 0 inside knowledge but from the article it seems to be a more human visual thing than a software problem, something like this was working in 12 and broken in 13 is a more obvious regression than this was working in aaab131 and broken in ccad53s
13 being greater than 12 is not a property that's just for human vision. In Subversion a commit on a branch increments the global commit number.
Git doesn't have a concept of one commit being before or after another once you've branched, or any native mechanism for enforcing global state across branches.
Sequential IDs also let you think about ranges: a feature was introduced in 11, broke in 17-23, and worked thereafter.
I used SVN like this in grad school: data files included the SVN $Id$ of the script that generated them. This let you work around bugs and experimental changes. For example, you might hardcode a delay, realize it should be longer, and then eventually decide to let the experimenter adjust it on the fly. This is easy with sequential ids:
if version < 11:
delay = 50
elif 11 <= version < 29:
delay = 100
else
delay = params.delay
Using git hashes, you'd need to maintain an exhaustive list of every version ever run, which is even tricker because there isn't a sole source of truth like an SVN repo.
Git hashes are not for identifying versions but only to identify commit (that may not have a significance on their own, e.g. a developer uses more commits to implement a new feature!). To identify version you should use tags that can be written on the format you like. A tag is just a non mutable (unlike beaches) pointer to a particular commit.
In my company we have the CI server that automatically creates a new tag (sequentially) each time one pushes on the master branch.
The SVN-controlled code generated data by controlling hardware and embedded the $Id$ of the controlling script in its output. I would then refer to the version ID later, when loading that data in for analysis. This accounted for any changes to the data-generating code.
For example, we tracked the orientation and direction of objects moving on a screen. One update redefined 0° to be up/north/12:00 instead of the +x direction used before. The code which loaded these files checked the $Id$ value and rotated the directional data so that the entire dataset used the same definition.
Git also lets you think about ranges. You just tell it the range and it figures out what commits are in the range. You can also get a sequential number from whatever point you choose with tools like `git describe`.
The Git dev model is _very_ different from cvs, svn, etc so the trade offs are less obvious.
A lot of the benefit of git to me has always been the local development model, but the git-svn bridge made that largely transparent which I think lowered the pressure to change.
> The Git dev model is _very_ different from cvs, svn, etc so the trade offs are less obvious.
I think that the difference makes the tradeoffs using anything other than git, more obvious. I even held out myself for a very long time with svn vs. git and once I switched... I kicked myself for not doing so earlier.
But, like they said in the post... they did need a feature, which is core to svn (incrementing changelog ids) and a workaround in git. Minor in the grand scheme of things.
I remember migrating some codebase from CVS to SVN ... and this was sometime after CVS was adopted instead of "at the end of the day, every dev will copy his change into a floppy disk and give it to the Tech Lead for merging".
This was during the 90s in a software development company in Mexico. Good times!
> This was during the 90s in a software development company in Mexico.
Version 1.0 of svn was released around 2004. According to wikipedia the project was started around 2000.
(Sidenote: there seems to be cognitive dissonance that svn was released much earlier than git ... but svn was released in 2004, and git in late 2005. There's a less than two years gap in between, yet so many projects had been "stuck" with svn...)
IIRC, svn was marketed as "CVS done right" [1], which meant devs who were exploring alternatives to CVS didn't have to worry about svn's learning curve.
So, I think inertia may have been a huge part of why adoption of svn was so good compared to git that has a steep learning curve, even today.
Source control is an inherently distributed system. Each developer has their own copy of the code that are distributed across many computers, and they need to communicate between each other. That's exactly what the primary function of the software is.
That's just as true with SVN, (for the local in-progress working copies) but with a single, centralized history. Which is how most devs use git anyway (essentially emulating a centralized versioning system workflow), and in this case the distributed nature of git allowing many alternative histories is unnecessary in the best case, and massively gets in the way in the worst case (because for most projects a single history is actually a feature).
(In a very real sense, the server from Subversion does the same job of the optional server from Operational Transform: it is acting as a sequencer. The next thing we could do is replace that with proof-of-work, and then we could have a fully decentralized system with the properties of Subversion ;P.)
Git is true mesh source control system. You only need one .git folder to survive and you haven't lost anything. My memory is fuzzy but with svn if you lost the central repository you were in a world of hurt.
This is what git does right. What git lacks for me is smoothness in interaction with the user. To use git well you must think like git. Which is kinda annoying when I prefer my tools to think like me.
When I moved from svn to mercurial it was absolutely painless. On my next company git was terrible experience. I am sure that git tools and flexibility are amazing for linux kernel and other projects of the scale. But probably are overkill for smaller stuff when a more friendlier user flow will be nice.
My memory is that SVN was very popular long before 1.0, I feel like by 2002 it was "beating" CVS for new projects among hobbyists. It was the exemplar for VC in my SE course at university in early 2003.
As I remember, the initial versions of Git would have been a chore to use; the intention was for it to be only the "plumbing", with a more friendly front-end layered on top. So it took a good few years to change that, and for it to catch on.
If they had floating or dynamic externals and a bunch of permission models, I'm not surprised at all. I'd label this more of a conversion versus a migration.
github has had more than fifty outages this year alone, and has a rocky history of recourselessly banning users from countries that are sanctioned by the United States. switching to github makes no sense if "The WebKit project is interested in contributions and feedback from developers around the world."
AFAIK Github can provide free access to public repositories even if the users are subject to OFAC sanctions. In some cases they've applied for (and received) exemptions to allow for sales of paid services: https://github.blog/2021-01-05-advancing-developer-freedom-g...
They also banned developers that worked on Tornado Cash (not just the project, any developers that worked on it), a project that had a deployment put on the OFAC list. It's almost universally agreed to be an unnecessary step by Microsoft.
Did they ban anyone other than the core project developers? There were specific people called out by name in the Dutch press release believed to have personally profited from North Korea's money laundering through Tornado Cash. That's pretty different from “any developers”
I note that Matthew Green's mirror and GitHub account do not appear to have been blocked, which would fit with the idea that there's more to this than just committing code:
> github has had more than fifty outages this year alone
I'm a heavy user of github every day and maybe 1 or 2 of these caused me any disruption whatsoever. Most of the time I think they created productivity boosts as people just focused on what they were working on instead of reacting to Github notifications about issues or PRs or failing tests or whatever.
> a rocky history of recourselessly banning users from countries that are sanctioned by the United States
This is likely a feature for companies, projects, and organizations who have (or want) to adhere to the same strict regulations.
In fact the very fact that your source control hosting service can be surpringly[1] unreliable is the best advertisement for git you could imagine.
In fact, if github disappeared from the internet today, all but the largest projects could just set up an ssh-accessible box somewhere and continue work (code review and issue interfaces notwithstanding, of course), probably with 24 hours.
[1] I work in github-cloned repositories almost full time. And sure, I remember a handful of times over the past 4-5 years where it's been down when I wanted to push something. I had no idea it was 50x/year! And that's because "working in a github-cloned repository" doesn't, in fact, require much contact with github itself.
> I think the bigger picture here is the migration to Git
Is it tho? Why wouldn't they just install git on their server? Now there is not many mainstream successful social hosting for svn. They acknowledge the choice of github is to attract devs. So it's as much about the software as about the type of hosting and web presence.
> Why wouldn't they just install git on their server?
That's easy, and it's about 5% of the functionality which GitHub provides. Even if you're working entirely in private, the tools you'd have to build yourself to do code review, CI/CD, package management, security updates, etc. are a significant amount of work and that's before you get to things like Codespaces.
Code review itself is a big deal in terms of the complexity of the UI for managing reviews but I’d also be surprised if they didn’t use anything else. Linting and other static analysis checks, reporting CI results, etc. are quite powerful and less work than setting the equivalent infrastructure up yourself.
> about 5% of the functionality which GitHub provides
Are you sure? I can’t even use “go to file” on GitHub and stay on a selected ref, I can’t bisect and gob help me if I need to rebase before closing a PR. I made a comment elsewhere in this conversation that I think I might use 5% of git functionality. I like GitHub, but if I can’t use even that on their site I’m having a hard time imagining they provide ~20x value over git as underlying functionality.
There are a handful of deep Git features like rebase or bisect which GitHub doesn’t expose but those aren’t things most people use frequently. Git has no equivalent for the things people do use all of the time: the issue tracking system, code review with all sorts of rules and approvals, the CI/CD system, package management for manage languages, not to mention newer features like Codespaces.
That’s a ton of features which cause people to use services like GitHub or GitLab, and it’s not like you’re giving up any of the CLI functionality to do so. My point wasn’t that these services are perfect but rather that there’s way more to it than setting up a Linux box you can push to.
I don’t disagree with anything you’re saying other than the relative scale of what each provides. Like I said, I like GitHub. I just think it adds less to git than git adds to it. And most of their features are great, but I’d sure rather a nice distributed interface to bisect than an IDE in browser or issue forums (which are useful too!).
It "works offline" in that you can create commits, view project history, and view every branch while offline. But fetching and pushing are such a common part of an engineer's day-to-day workflow that a poorly-timed outage of your remote repo is very disruptive, especially if you use git for deployment.
This is technically true but the number of GitHub outages which have prevented you from doing that for more than a couple of minutes is pretty low. In comparisons like this, the more important question is not “is GitHub perfect?” but rather “what are you comparing it to?” — internal systems are notorious time-sinks and productivity levels from using GitHub normally are high enough that I think it'd be quite fair to conclude that you're still well ahead of where you'd be even if you have an extended coffee break once or twice a year.
Not to mention if you rely on gh actions for ci/cd. I think it makes sense for them to migrate to git and github, but I've been slowly migrating most of my code to sr.ht or self-hosted mirrors. Email patches work pretty well for smaller teams.
FWIW not all of those outages were the core git/web product. A lot of those were GitHub actions or other associated functionality... but yeah it goes down disturbingly often given how much we all depend on it.
I think GP is not talking about git in general, but about choosing a free-tier hosting by an american commercial entity, and not by the project itself or some other umbrella organization.
I don't think WebKit reasonably sees itself as a risk for US sanctions unless they have an open source money laundering feature that no one has told me about.
Awwwww, I remember back in the days of dealing with CVS, where there was so much scripting to try and manage basic stuff we take for granted like creating patches that included new files.
Subversion was so undeniably superior that everyone was super happy and instant. Git took much much longer as the complexity vs. the win was much more debatable to people, so it's interesting to see this finally happening - I will miss linearly increasing revision numbers though.
Glad to see they're keeping with bugzilla though - for whatever reason I find the GitHub issue tracker super annoying. Presumably at least part of that is familiarity and/or change resistance :D
I remember when CVS was the new hotness. I was on a team of 3 at the time (90s), and one of our members worked remotely, so the fact that it was actually usable over a dial up connection was a killer feature for us. Also pretty much anything was better than SourceSafe, which is what everybody else in the company was using.
Subversion had it's quirks initially. It didn't work on NFS for instance, because the default backend was Berkeley DB, or some version of it.
Looking back Subversion was a bit of a weird project. It didn't re-evaluate what was wrong with CVS. It was more a "Let's try that again, but fix a few obvious problems". Subversion didn't really contribute with that much overall. We could do branches more simply, but most of us just replaced the cvs command with svn and continued to work as before.
I was fine with CVS, and I'm fine with Git now. But, for some reason, I could never wrap my head around Subversion. Something about revisions-as-a-directory-tree just messed with my neurons.
> Subversion was so undeniably superior that everyone was super happy and instant. Git took much much longer as the complexity vs. the win was much more debatable to people, so it's interesting to see this finally happening - I will miss linearly increasing revision numbers though.
p4 and then git were easy sells for large projects. while subversion was faster than cvs with its local hidden copy, many operations were still dog slow as they'd scan the whole repository (this was often worked around by creating lots of small repositories with associated wrapper scripts). p4 and git on the other hand were designed to handle large trees with ease. so for something on the scale of an operating system, browser, or both... the difference in productivity was significant. (tens of minutes vs single digit seconds for basic operations)
Interesting – what makes it particularly handy on Windows?
I've only used it for automating build numbers. The number of commits on the main branch behaves, in practice, close enough to a monotonically increasing counter that it works 99.9% of the time without anyone thinking about it.
I need to get better at powershell but it keeps on annoying me massively with basic stuff you can do in unix shell since the 80s, 90s or 2000s that you still can't do in PS. It needs to reach a sort of usability parity for sure.
One of those things I ran into recently was trying to find an equivalent of the following thing I do a lot of in *nix programming: Finding files of a certain pattern in grep, only list them, and then pipe that through something like awk or perl to find/replace in each file for each of those found files in that given grep. That's super easy in *nix. Powershell still doesn't have such an equivalent. It came of age in a time when everything as text and etc from unix principles weren't considered.
I think the philosophy is that as much as possible is done by composing cmdlets within the PS ecosystem, Select-String being kind of like grep and PS itself being the one language for all scripting. There's a lot that 'awk or perl' could map to; here's a (made-up) example of converting dates in certain files from 'August 1, 2022' to '2022-09-01':
PS's advantage: leaning on .NET for doing things the right way.
The above has mostly top-level pipes like the Unix equivalent. You could also start with Get-ChildItem and skip the Select-String, instead having a conditional within a ForEach-Object, which certainly can masquerade as awk:
ls -file|%{gc $_|% -begin{"Counting in $_";$n=0}{if($_-match'\b\d{4}-\d{2}-\d{2}\b'){$n++}}-end{"$n lines"}}
demonstrating that PS one-liners can be just as readable as those of Unix tools!
But, as you've identified, the pain in pipelines with native commands is real. No subshells or named pipes, naturally. All command output is parsed and re-serialized—anything binary, or with linefeeds, or UTF-8 without BOM is, as expected, silently corrupted. At least that's being worked on (https://github.com/PowerShell/PowerShell/issues/1908) -- the hindsight shell manages to have an immense amount of WTF locked in, turned into wontfixes by Microsoft's backwards compatibility. Tour the footgun arsenal: https://github.com/PowerShell/PowerShell/issues/6745
Oh, and the dreadful slowness. But is it worth it, to no longer fear self-pwn by file naming?
It's unbelievably fast and handy when you have multiple directories to recurse through or or tons of files. Anyone can remember that after using it like twice. Powershell requires like five lines of code just to get that done as you demonstrated. That's not feature parity at all. I also strongly feel it's way more readable than your example.
It makes a lot of sense for some people to install Git into their path but not stuff like wc, since adding everything to your path can interfere with standard Windows utilities. Then you can use Git from PowerShell, but you won’t have wc.
This is not an unusual or bizarre configuration, it’s a very reasonable configuration.
Generally Git for Windows includes bash and coreutils, but it's not work on cmd or PowerShell unless it's added to PATH, so it would be useful sometimes.
On the automating build numbers side, `git describe` is really handy. By default it's {last annotated tag name}-{commit count since}-g{hash prefix of most recent commit}. (There's an optional --dirty flag to include a marker when the repo isn't clean status. There's options to control which tags are eligible for the first part.) It's basically a full build version identifier from a single command. Generally the only things to do to make it 'semver compliant' is to swap the second to last dash with plus and the last dash with a dot, depending on how you expect to use it with semver. (v1.2.3-18-g123456 => v1.2.3+18.g123456)
Bonus is that almost every git command that takes a "commitish" descriptor (git switch -c, git log, etc) can work with a `git describe` output as "commitish" name for a branch state.
Is there a reason why it seems there is so little documentation/comments in source files of WebKit? Or maybe I'm missing something/opened the wrong files.
There's some policy about comments being bad / code should be self documenting.
From what I hear from my colleagues on the Chrome team, one of the first things they were delighted to do after forking Blink was to finally go around and comment the various confusing parts of the codebase. (And, no longer get requests to remove comments when trying to land a PR.)
Thanks, for the reply, what such a weird policy. I found it so uncanny at first, that I even thought that they had somehow internally a system where they were able to automatically merge some comments and documentation with their source files.
Most WebKit developers are good at documentation, it’s just that they often work on things that their employer would not like being made obvious because it deals with SPI or unreleased products or security vulnerabilities. Commit messages are actually pretty good for the most part except in these situations where a laconic or purposefully misleading message will be used.
Documentation has been a bit of a challenge in my experience. There are some high-level docs at https://trac.webkit.org/wiki though many are 10-15 years old at this point. My approach has been to look at the commit history for the file to see if the changesets shed any light, and sometimes go to the attached bugzilla link to see if there was any discussion about the change there. Then attach a debugger and step through to try to uncover how the classes relate to one another.
Microsoft maintains the entire Windows source tree in Git now and has made some really interesting contributions to git where it comes to very large projects, though I don't know about the penetration throughout other dev groups. They also own Github now obviously.
I was just looking at Microsoft's git VFS (https://github.com/microsoft/VFSForGit), which is deprecated and now points to Scalar (https://github.com/microsoft/scalar), which is also deprecated I think? What's Microsoft's story with git for large repos now? Is there still a virtual file system involved at all?
Microsoft is trying to get away from needing a VFS. Scalar eventually accomplished that by combining some enhancements that had been added to git since they originally designed the Git VFS, and plus some new features that only live in scalar.
The long term plan is for Scalar to either go away completely, or possibly just be a simple front end for setting up a repository to use certain optional git features.
They actually have a version of scalar in official git's contrib folder, and they are very actively working with the core git team to convert the relevant features into core git features in whatever way the core git team is happy with.
But of course the present situation is certainly confusing, and does not seem to be well documented. I've only picked up on some of this from reading the git mailing lists, and have no idea how to actually use scalar.
That doesn't really answer my questions though. What exactly does microsoft/git do that makes monorepos better? The readme doesn't answer. They have a branch called vfs but the readme doesn't mention virtual filesystems at all.
was this before or after the github acquisition? I figure if they were going to spend the money for github they probably intend for git to have a bigger role
Microsoft started the Windows transition to git sometime before Github acquisition as a part of a company-wide effort to move to VSTS/Azure DevOps Repos internal dogfooding. The choice there was between the zombie ghost of TFS source control or git, so you can possibly imagine some of the reasons why they chose to put a lot of time/effort into making git work for themselves. The subsequent acquisition of Github seems to have made them even happier at the switch to git.
That’s a hefty repository if all you want [1] is part of it. Subversion offers a way to download just a subdirectory; is there an analogous solution for Git?
> git’s distributed nature makes it easy for not just multiple developers, but multiple organizations to collaborate on a single project.
Git is as "distributed" as Ethereum at this point. You have a central repo on github onto which you push changes. Just because you have a copy of the main branch when you're working on stuff, it's no different from SVN.
Yes it has the capability to be distributed, and individual contributors can certainly host their own git servers and you can have as many remotes as there are contributors, but we aren't doing things this way are we?
You are seeing the tip of the iceburg. The most discoverable online platform is github and others exist at much smaller numbers. Because these are public they are easy to see and count. Private instances are hidden by default which makes them hard to count.
- it's more difficult to do "agile", churning out lots of code, working simultaneously on the same pages, or functions between different teams, which then needs to be merged together by some unfortunate (possibly clueless) dev. it puts a restraint on ignorant management who don't appreciate downstream issues.
- merging and branch maintenance is more rigid with svn, so teams tend to have dedicated release teams to handles this, instead of offloading this onto devs - most devs now are expected to handle git merging and so on, not so with svn.
i worked about 10 years with svn, and 5 with git, and just my exp only of course.
I mean, Blink is a fork of WebKit, so yes? But it’s also even less-responsive to upstream contributions so, outside of embedded systems that had previously adopted WebKit, I seriously doubt it'll recapture the traction it had before the fork.
"recapture the traction"? It is used by literally billions of devices of varying form factors every day. It has probably 100x the number of users as Gecko/Mozilla Firefox and one of the wealthiest companies on the planet sponsoring development, and I don't see Apple betting on Gecko or another fork of Chrome anytime soon.
Who said anything about Apple? I would never suggest Apple use another fork or the fork they incubated/created/developed. I was replying to a comment asking if WebKit, which != Safari could be a replacement for the Chrome-driven world we are in and when used by people who aren’t Apple.
And yes, it is absolutely bigger than Gecko. No question. But are you going to tell me with a straight face that WebKit has the same amount of traction with third-parties as Blink? Because sorry, that’s not the case. That was the case for about 5 years, when post iPhone, everyone and their brother decided to use WebKit to power their mobile browser for whatever mobile OS they were building for (Android, BlackBerry, Symbian/Maemo/Meego, webOS, Tizen) or for whatever embedded systems they were designing for in-car systems or whatever, but after the Google fork became demonstrably different, the surviving players in that arena switched to Blink because Google was faster at iterating and easier to work with for upstream commits (easier does not mean easy).
It is a fork, but that was really long time ago, most parts of both engines were completely rewritten so for all intents and purposes these are completely different engines.
They’ve splintered a ton over the years for sure, but there are still similarities. But yes, this isn’t like the first few years when Blink was just WebKit with V8.
But on the whole, Blink is still more similar to WebKit than it is to Gecko.
Brendan Eich IIRC said they looked into building Brave on top of Webkit but it was so hard to compile and embed across all three platforms that they went with Chromium. Same story with Gecko.
So that's another reason why we now have a Blink monoculture: because the alternative engines didn't spend any effort in making them usable by third party applications.
WebKit is used by the WebkitGtk library which all of these include.
But it's been stagnating for years and it's still not a viable replacement for Gecko and Chromium (broken scrolling on scaled screens, WebAuthn is unsupported so I can't login to my email account, etc.)
In any case, if I recall correctly Eich's comment, the biggest difficulty was building webkit for Windows. The documentation was very out of dated, and it's understandable given Apple's focus on its OS. Even this GitHub says it's a web engine for "macOS, iOS and Linux".
If Mozilla bite the dust, unfortunately Firefox/Gecko developers will lost the job, then possibly some of them start developing WebKit or based browser on Apple or different corp. I don't want it.
AFAIK, Apples control over WebKit is not any less than Googles over Blink, which many say is the source of many of the webs problems. I remember the days Apple posted tarballs infrequently. I'm wondering if this move may open development up, and thus may become a codebase more widely 'owned' than Blink's.
I think "control" here is somewhat unreasonably pejorative.
They each do the vast majority of the dev work in their respective engines, so it's "control" only because if they want a feature they'll just implement it. Similarly neither implements things that aren't of interest to them - I can't speak for blink but webkit isn't going to oppose people contributing support for new features unless the implementation is not good (I assume this is a universal across all projects), or it's considered harmful (features that are inherently insecure - a la SVG raw socket access - or inherently harmful - e.g. new tech to aid tracking).
The problem with Blink is not necessarily that Google controls it, the problem is that Blink owns the lion's share of the browser market. This makes Google's control of Blink a problem, since it effectively means Google controls the lion's share of the browser market.
That's only half of it - the problem is that the duopoly maintain a stranglehold over their respective portions of their markets. Remember that Ios forces browsers to use their Safari engine, which is a step more egregious than MS' forced IE bundling. Both together make the problem far worse, and neither alone would make things better. Google and Apple are a huge problem for the web. So to answer the original question, no.
Microsoft's stranglehold over the browser market was a pretty huge problem, irrespective of their business model. The web is way too important to be so tightly controlled by one single company, no matter who they may be.
Microsoft's business model was 'we need to deal with Netscape before the Web kicks our ass' followed by 'we own the market now, no need to spend resources improving this'.
Of course, Firefox and later Chrome came around, so in the end the Web kicked Microsoft's ass anyway.
> Microsoft's business model was 'we need to deal with Netscape before the Web kicks our ass' followed by 'we own the market now, no need to spend resources improving this'.
I think it was a bit more general: "We want to keep a stranglehold on the Apis used to write general software". The ability to write software once for the web and have it work on web browsers on any platform was viewed as an existential threat.
However, the observation that they only developed IE as long as they were worried about that threat and then left it to rot afterwards is spot on.
Their strategy was aimed against their competitors interests, not their users interests. Microsoft in the bad old days still saw their users as paying customers and not just a source of data to be exploited.
It isn't Microsoft that has entered the extinguish phase for ad blocking browser add-ons. I also don't remember Microsoft doing anything as anti-user as trying to keep pop-up blockers from working back in the bad old days.
Pop-ups were universally reviled, but also explorer aggressively pushed lots of random half features to support their products - the only difference is that they didn't publish a "spec" as google uses to create a veneer of not being anti-user.
MS's anti-user was in the form of ensuring that necessary sites would not work in other browsers (including the Mac IE engine, whose name I have forgotten, and therefore IE mobile). They didn't block ad blockers, etc because they didn't support any kind of extensions :D
It wouldn't unfortunately. It's merely one stranglehold in a duopoly; Blink and Webkit are controlled by their corporate overlords in their respective spheres of influence. Ios won't allow non-Safarized browsers (think back to the IE days, but worse), and Google decides what goes in Blink.
So in short, no. If Mozilla dies, then there will be trouble and would need someone to carry on their work as the last remaining 'freedom' blessed engine.
right? We already have an IE-esque level of monopoly and associated behavior from chrome/blink. Making it just blink+webkit seems like it would be even worse - even though they have diverged significantly it's also wrong to think they're "different" in the sense of gecko and blink/webkit or even presto.
I think the real problem is the gecko and spidermonkey seem to be falling significantly behind on real user experience. This is ignoring the Firefox application itself which I find super irksome.
But as their gross built in tracking+advertising shows they are at least somewhat hurting for cash which does not help, and encourages gross stuff like said spam+tracking.
It doesn't help that the google folk keep shoving out half-assed specs for whatever some google team has decided they want/need with specs but little thought of generally of how to make more universal solutions. That just means you've got constant pressure to implement ever increasing numbers of standards just to stay in place - if apple (and technically MS in the past) has difficulty keeping up with the constant "spec" spam it's hard to see Mozilla managing in the longer term.
Apple has iOS browser control on complete lockdown, so even if it performed as bad as Gecko, they’re pretty well off.
> bite the dust
If I were ceo of Mozilla I would have cut off Firefox development like 5 years ago. It doesn’t look pretty, but AOL and Yahoo changed assets, they don’t look as ugly. But I also hate a lot of what they currently stand for, and they don’t really have assets. They’re like some NPR for web standards documentation, and while it is the best, it’s not very valuable. Google seems to have a lot of leading control while Mozilla is angry outside, with a megaphone, and red-orange dyed hair.
They’ve always been open source, they’ll die of natural causes.
None of Mozilla's virtue signalling serves to bring back the Firefox of yore. Firefox has instead followed Chrome's heels at every turn to the point I might as well just use the real deal rather than a third-rate knock off.
I want a lean, effective browser that I can tailor to my specific needs and desires, and Firefox has been not that for at least 20 years.
Mozilla is (supposed to be) a collective of computer programmers, not activists and lobbyists. So fuck their advocacy, more accurately virtue signalling. All of it. The specifics don't matter. Fuck all of that noise. If they go back to making some good software I might be more supportive and respectful of them again, but not a step before.
I guess, I hate to be more pessimistic than I already am, but when I see pointless petitions to “Facebook: Stop Group Recommendations” I don’t see anyone over there truly “fighting the good fight”. I think GNU is a far better example of this type of action.
I think an open source foundation has to stand on the shoulder of a valuable product to get noticed. GNU has all of its things, Mozilla is an acoustic guitar busker playing “bulls on parade by Ratm” outside of a Barnes and Nobles.
GitHub is a company & git is a core technology which is replacing Subversion for version control. FWIW, WebKit could have moved to Gitlab, Gitea or Bitbucket & still used git. Subversion & CVS's days are numbered - the writing was on the wall for many years
They kept the bugzilla tracker and will probably not use any CI on GitHub meaning migrating out will be easy, so not much to worry really. Problem is when repos rely on GH issues, projects, actions, codespaces, etc. because then migrating out becomes an enormous task.
I assumed it to mean, what with GH being owned by Microsoft, it’s now in the extinguish phase of EEE (embrace, extend, extinguish). Though if anything I think it’s in the extend phase.
Why wouldn’t an open source project announce a change to the place their source is hosted on their own blog? Why would they want to sweep it under the rug?
Does Apple not have an internal department that handles this for all their teams? Seems kinda weird for a division of a company to even have to choose their host.
The Open-source projects generally do things separately - e.g. llvm, as you're otherwise requiring apple set up a new account system (blocking contributing on iCloud account would seem less great), and building up its own UI and infrastructure for a git interface.
Also given that GitHub is a somewhat universally understood host that people seem to like, and it has all that UI/development integration that people like it kind of makes sense to just use that. It also seems that having GitHub accounts is increasingly widely spread so contributors would not necessarily have to create yet another account with yet another service.
Anyways, I distinctly remember one of the instructions for merging WebKit in our internal wiki being something like "now type `svk merge`, but hit ctrl-c immediately after! You don't want to use the built-in merge, it'll break everything, but this is the only way to get a magic number that you can find stored in [some file] after the merge has started. If it's not there, try `svk merge` again and let it go a little longer than last time." A few hires later (I think possibly a year after) someone set up a git mirror internally to avoid having to do this craziness, which if I remember correctly, was treated with some skepticism. This was 2007, so why would we try some new-fangled git thing when we had svk?
1. https://wiki.c2.com/?SvkVersionControl