I really like that no single thing stands out. It shows that a lot of people were working on a lot of different things, which is a sign of a healthy project... not that I would otherwise think of Linux as an unhealthy project.
As someone who has nowhere in the vicinity of 1% of the responsibility to insuring that a code base is clean and performs its intended function- who is reviewing all these pull requests? It cant be Linus, right? How are the folks with PR privileges vetted?
Fully admitting to ignorance here, but to me it seems that this problem is way to big to prevent bad actors from participating. I would love to hear thoughts to the contrary.
There's sorta like a "tree" of maintainers for each part of Linux -- with Linus at the top of the tree. When you make a patch for the kernel, there's a script you can run on the patch:
scripts/get_maintainer.pl
It'll output the appropriate subtree of maintainers who "own" the code in the kernel that the patch applies to. Then, you can email your patch to them. The maintainer will send the patch up the "tree" until (hopefully) it ends up in Linus' repo.
Two years ago Linus took a break and Greg Kroah-Hartman was leading the project during that. Greg normally maintains the LTS branch of the kernel (and possibly many other things also).
There are maintainers for each subsystem. For example, linux/drivers/media is maintained by Mauro Carvalho Chehab. When you have a patch, you submit it to the linux-media mailing list and it's vetted there. If Mauro thinks it's good, then he'll submit it to Linus as part of a large merge package.
For the two patches I submitted, it took two months to get in the kernel.
Here's the deal. For all practical purposes, Linux kernel code is maintained by sub-system maintainers, linus is just a stamp-head who merges everything provided by these guys and resolve conflicts, if there any.
These guys are veterans in maintaining the sub-systems and make sure code base is clean enough based on linus' guidelines.
One could imagine a sophisticated attack involving multiple patches to different subsystems at different times, that eventually converge into a Voltron-like monster once they’re all inside the kernel.
Potentially. However that also leaves a larger attack-surface, in the sense that more patches = even more scrutiny. Conceivably such a scheme would require all parts to be included (I mean, why make it larger than necessary?), so it would be extremely fragile to any single maintainer seeing the potential for abuse and correcting it.
There are far more devices running Linux in the world than Windows & macOS combined. That would seem to be a much more useful comparison than how many people own those devices.
Parent post was updated to say "not counting servers and embedded", but dismissing smartphones -- literally the most popular type of personal computing device in the world -- this way seems to completely invalidate the point.
The parent thread was about possible backdoors; more devices running a piece of software = more surface area for any possible backdoor in that software.
Also worth noting for people that use distribution kernels, a distributions LTS kernel may or may not match the official one. Just something to be aware of.
RHEL7 has another year[0] of support left (until August 30, 2021), and is based on kernel 3.10 [1], which was released in 2013 [2]
RHEL 8 is expected to see support through 2029, and is based on kernel 4.18, which was released in 2018.
Redhat does backport changes, but tends to avoid backporting feature changes.
If you want to be able to use "recent" features of software, you are probably not running running Redhat But, many companies prefer the stability of systems that have been battle tested, and so are perpetually years behind the bleeding edge in terms of features.
I don’t mean to start an off-topic debate on this, but I just have to say that personally, I run into issues caused by using old software on our EL7 boxes all the time. For example, cgroup memory limits are broken on Red Hat 7 kernels as far as I can tell (most versions have a memory leak, but iirc the most recent has a panic instead). They’re missing all sorts of features that are useful for usability or reliability.
I’m just personally not convinced we really win much with the trade off as a society :p
Memory control group code is replete with bugs throughout 4.x kernels, way beyond RHEL 7. This is one of the big problems with getting locked into "LTS" kernels. It takes a long time to discover all the nameplate features that don't actually work.
Are there any "RHEL considered harmful" posts? That project is one of the most damaging things for the Linux ecosystem IMO. It forces companies on to ancient outdated compilers and library versions that are full of bugs.
Well, when I quit the place we both worked, the production kernel was 3 years behind and rolling forward at .5 years/year, so that implies 6 years to get current.
The next company I joined is using RHEL's 2.6.32 in production, more than 10 years after its release.
Have you asked whether you're able to use newer kernels, at least on your subset of hosts?
I've been having pretty good luck at work just asking for kernel updates - though it helps that I'm willing to stare at oopses and kcore dumps and userspace regressions (they happen!) on the canary machines.
Ah, yes, but all software is 10 years old and full of exploitable security holes and weak algorithms if you're constrained to FIPS, so the kernel isn't a special case :)
Only if your organization submits to the stupidity of the FIPS 140-2 consultant scam. Google has this level of certification and they obviously aren't using vendor kernels (or vendor anything). At Google the production kernel was, in many ways, newer than the one you could download from kernel.org
you state that "Google has this level of certification" as if it is a certification which applies to organizations. It does not. Specific hardware and software cryptographic modules are given FIPS 140-2 certification. Google currently has 8 cryptographic modules which have active FIPS 140-2 certifications. 5 hardware and 3 software. The software modules are all versions of BoringSSL. Google can replace their kernel as they wish because they don't use the cryptographic api's in the kernel for anything which requires the use of a FIPS140-2 cryptographic module. Not everyone is in that position or they use a commercial Linux OS in which the non-kernel certified cryptographic module provided is available on only certain releases. If they want support for their OS they must therefore use the kernel provided with that release of the OS.
Thanks for expanding on my simplistic remark. Have you considered offering your services as a FIPS consultant? Because my ex-Google experience with this aspect of the industry is that management hires a consulting firm who come in and says things like everyone must use this RHEL kernel, this (incredibly broken) openssl and so forth. Those consultants are not offering the nuanced perspective to the management, who aren't technical enough to understand it anyway. So the word comes down from the top that we're married to kernel 4.15 until the end of days.
You can use it right now as long as you're using Linux 5.1 or later. It's going to be a bit before these kernels work their way through enterprise deployment schedules, and longer before io_uring is widely deployed enough that more conservative places are comfortable using io_uring in production.
I wouldn't discount it, but I think many software devs continued working from home instead of getting laid off. maybe all those kernel contributors found themselves with an extra 2/3 hours in the day without the daily commute? Not to mention not being able to carry out daily social lives + leisure activities
There are a lot of us developers out there who have had less free time during the lockdown. Whether it’s because of having young kids or doing charitable work in the local community for those less fortunate.
How much of Linux development is actually volunteer work? I assumed the bulk of the work was done by professionals whose employers were paying them to work on the kernel.
In that case, I could see a non-trivial number of those developers being in a situation that their other work commitments were slowed down, so kernel development got a bigger chunk of their time.
Not a lot, you can see it on the development statistics that LWN releases for every major kernel releases, for 5.7, it's only about 13-14%, so most people working on the kernel do indeed get paid for it.
I am wondering if kernel developers had more time to spend on the project because of all the recent virus lock-downs. That could account for the higher-than-usual number of files changed, and overall bulk.
A lot of that is driver code, since so many devices are supported by mainstream. The typical user wont see anywhere near 800k new lines of code running on their systems when upgrading.
This is a condensed summary of merges, not the commit messages. The actual commit messages for those merges were:
* A few little subsystems and a start of a lot of MM patches. Subsystems affected by this patch series: squashfs, ocfs2, parisc, vfs. With mm subsystems: slab-generic, slub, debug, pagecache, gup, swap, memcg, pagemap, memory-failure, vmalloc, kasan"
* More mm/ work, plenty more to come. Subsystems affected by this patch series: slub, memcg, gup, kasan, pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs, thp, mmap, kconfig
* More MM work. 100ish more to go. Mike Rapoport's "mm: remove __ARCH_HAS_5LEVEL_HACK" series should fix the current ppc issue. Various other little subsystems"
* Various trees. Mainly those parts of MM whose linux-next dependents are now merged. I'm still sitting on ~160 patches which await merges from -next. Subsystems affected by this patch series: mm/proc, ipc, dynamic-debug, panic, lib, sysctl, mm/gup, mm/pagemap"
* a kernel-wide sweep of show_stack(); pagetable cleanups; abstract out accesses to mmap_sem - prep for mmap_sem scalability work; hch's user acess work. Subsystems affected by this patch series: debug, mm/pagemap, mm/maccess,
mm/documentation.
* various hotfixes and minor things; hch's use_mm/unuse_mm clearnups. Subsystems affected by this patch series: mm/hugetlb, scripts, kcov, lib, nilfs, checkpatch, lib, mm/debug, ocfs2, lib, misc.
* A few fixes and stragglers. Subsystems affected by this patch series: mm/memory-failure, ocfs2, lib/lzo, misc
And of course there are hundreds of commit messages for the patches that describe what is really going on. That said, Andrew Morton works in a different way than every other maintainer so the merge commit messages in his case tend to be much less descriptive than everyone else's.
This is why I hate merge commits. I mean, I have no idea of what constitutes this Morton's work and if he could've done better, but I think messages like that are still a problem, they provide zero value and mess up the log. And this is actually only a little worse than what I usually see in merge commits, especially when they are done with some semi-automated tool like GitLab. People rarely do merge by rebase, all useful things that could've been said are usually in the commits themselves (and there are some guidelines about them), so merge commit messages are rarely empty and even more rarely are they useful, because author has no idea what to say. This is just trash.
And even though it never ocured to me to check, yeah, I'd actually expect a higher quality standards of Linux kernel.
You wouldn't be offended, if you possessed the requisite context for your opinions.
Linux kernel commits are very detailed and high quality. The style can be a bit ad hoc, but the face of enforcement is Linus himself.
What you're looking at here is a throwaway piece of unnecessary text. Git offers the opportunity to include it, but it is generally left blank, because it adds no value. All of the interesting bits are included by incorporation.
In this case, Andrew Morton decided to have a bit of ironic fun with the throwaway text. Dismissals ensue!
It is used by Linus to check what's going on and see if there's anything he should check more closely (perhaps cross-checking the message with the diffstat); it is used by developers (now or in the future) and journalists to make themselves an idea of what is being merged; it may explain features that are in development and have preparatory work being merged; and so on.
I'm not really "offended" since I don't really care about the state of Linux kernel's log. But since you insist on the discussing the case so much, let me add that I'm a person with very low tolerance for humour in the working process. I like jokes in the chat, email, or even in bugtracker's comments for that matter, but the git log (or any change log, for that matter) must be very clean, clear and boring.
So even though I said
> I have no idea of what constitutes this Morton's work and if he could've done better
I very much believe that he could have. For 2 reasons:
1. There are multiple people with huge, broad, non-specific merges in that list, and every single one of them, did, in fact do better than Andrew Morton.
2. I believe that consistency in logs is good, so even in the case somebody really, truly, actually cannot say more, than just "updates" about the changes he introduces (which I don't think is ever true, but let's assume it is), I think "updates|updates|updates|updates" would still be a mile better, than "updates|more updates|yet more updates|still more updates".
Because the second parent commit of the merge commit gives you the pointer to the list of commits off the main branch where the feature is implemented. Without the merge commit, there would be no easy way to group commits that implement a feature.
You're focusing on the wrong part of the picture. Yes, you need that data to be stored somewhere. But the complaint is not that a merge commit or equivalent data structure exists somewhere. It's that if you get the equivalent of 'asdflklf' shown in the log, especially when it's displayed in a compact and supposedly information-dense form, something has gone wrong.
If we take it as true that merge commits have no useful information by definition, shouldn't they have a blank commit message? Or a message of "merged branch X"? And why are we spending lines displaying them in the denser log formats?
> It's that if you get the equivalent of 'asdflklf' shown in the log
I think the fundamental problem is that commands like merge and revert generate their own default commit message and don't bring up the editor so that further updates can be made.
At my job, I always will tell people that they need to include why they're reverting a commit in the commit message. I haven't really done anything about enforcing standards on merge commit messages, but perhaps including the cover letter or PR description in the merge commit would be a good start.
All of my merge commits say “merged x with y” and all of my branch commits say “created branch z”
They are useful because it shows very clearly when a branch and merge was done. Dates and times are useful. Knowing the parent of a branch is useful. Knowing when a branch was merged back in to another branch is useful.
At least for me, they definitely are one of the valuable outputs of the development process.
Especially when looking back through older code to see "what did we change since X date?", and/or "what changed between (say) the 3rd and 5th of July?".
That seems to be a fairly standard thing too (from my perspective).
Using git blame and looking at a commit that last updated a line allows one to check the message for the reasoning behind adding or updating that line, and the context that line was added or updated.
This could help in terms of preventing regressions by making further updates for example.
They're valuable for when you're looking for why they broke the code base. This looks like dead code, will I break anything if I remove it? Git blame gets me the ticket number, details on the particular change being made, and (if the author writes their commits well) related files and tests associated with this change. From there, I can see if the other code has since been removed, or if there's an edge case I didn't think of, or if the thing was never plugged in in the first place, or if this is a bug no one has noticed. All within two minutes, and without needing to ask the author (if they're even still available).
I'm not sure what the fuss is all about. I've only been developing for < 2 years, and I used to add detailed MERGE commit messages, but I realized that it's just duplication of information (which can be read from the INDIVIDUAL commit messages). I've left MERGE commit messages empty since then
I still wish people would give meaningful names to their merge commits. Together with `git log --first-parent` and the alligator [1] workflow, it can filter out the less interesting patch series to focus on the bigger picture (the fact that you added a functionality, not that it took you 53 commits changing very specific parts to do so).
I agree, I don't think "it's only a merge commit" is any excuse whatsoever. But I personally pretty aggressively squash my commits with "rebase -i" before merge, exactly in order for there to be not much of a "bigger picture". And I urge others to do the same as well, because no matter how many fixes there was during the development, later on everybody will ultimately care only about "what it was about in the essence?" I.e., added functionality this-and-that, or a fix, or refactoring, or maybe something new implemented (but not "active" as of yet). Small commits are nice when doing git bisect, but given your code is not a complete trash, it is properly tested and refactoring (which is usually the biggest code change anyway) is separated from the "meaningful" changes, there's no much use in them after a year.
So the best use I've seen for the merge commits are bugtracker task IDs and such.
I try to avoid horrible things like that at all costs. Sometimes it's easy, sometimes not so much. The rule of thumb is that all branches (unless there is a really, really good reason to do otherwise) are created from the master, and are rebased before the push (by the person, who did the development in this branch), and if there are conflicting changes, before the merge as well.
With the absolutely rarest exceptions all patch releases are done in the same manner as the regular ones, i.e. from the HEAD of master. History must be as linear, as possible.
So there are basically 2 general cases when you really need to share a commit. In the first, you find out in your branch there's something to be fixed, which you know I already fixed in mine, in the commit X. In this case it's better if I reorder commits in my branch so that X is the first one, create a new branch with X, which then is merged in a proper manner. You can rebase then.
Second case is when 2 people with different skill-sets or responsibilities must work on something that "from the above" looks like the same piece of functionality. Then, however hard you may try, it all turns up much messier and they will end up needing to share branches (so they cannot rebase them) with multiple commits. In this case I name one of them responsible for changes made within this task as a whole, so when the work is finished the second (3rd, 4th,...) person doesn't touch the branch(es) in question any more, and the first one might rebase and clean up it as needed.
I'm on a project and it's the opposite. Full releases every... 2 months give or take, but every... week or two, there's a round up of 'critical' issues, and you're expected to take just those issues from 'develop' (or somewhere) and get them over in to a 'release' branch of the current released code. I now tend to keep local branches of anything I worked on for months, because 4-6 weeks after it was approved/merged (and deleted) in to 'develop', I may have to also do a 'hot patch' on to a release branch, and cherry picking is the proscribed way of doing it.
I think it possibly did a few years ago. There's a lot of 'tribal knowledge' stuff, and none of these practices were explained when my team (another... 8ish people) were added to the current team months ago.
I think it could be done a lot differently, and better, but any change requires huge amount of political changes. There's probably ... 2 or 3 teams that would need to approve of any process change, and no one wants to be the one to push changes like this ahead. It's "good enough", even though... we often collectively lose a non-trivial amount of time every week or two or three. We log that "wasted time" in a spreadsheet to make a point of it, but... no one is motivated enough the shepherd the required change process. The devops team would have to change a bunch of their scripts, and the process would need to be coordinated among multiple teams of varying sizes and skills. It is kinda disorganized and nightmarish with... I'm trying to count - there's maybe 20+ devs touching this. I suspect this was 'fine' 10 years ago with, say, 5-6 devs who would have dealt with it.
YES! Not everyone has the luxury of evergreen continuous deployments onto their own infra. It is quite common for enterprise software that runs in clients data centers to splinter/branch in this way - since they usually want to manage the risk by staging specific versions of your software for testing before going live. They may consider some of your new features on the main branch as too risky, but still want you to backport fixes to their release branch
Git and Subversion even allowing single line “-m” commit messages was a mistake. You would see a lot better messages if they forced you to use the editor. I bet a lot of programmers don’t even know you could do multi line messages and assume there is some short-ish character limit. I know I did for a long time
Git and Subversion even allowing single line “-m” commit messages was a mistake.
That's a subjective preference. Maybe for some repos it's a good idea to allow or even require longer messages, but certainly not all.
For example, after seeing many different strategies for commit messages over the years, my own preference in many cases is now to permit only one-line commit messages, with a strict character limit, which include a reference to a suitable source such as a feature or bug ID where more details may be found. With git, this can be checked using a pre-commit hook.
The big advantage of this is that a `git log` or similar command with a single-line format can never then hide anything important. Assuming we're talking about an established project that does have things like a bug tracker and some sort of structured project documentation set up, I have found that any time I am tempted to write a longer commit message, there is usually a better place I could record that information, and if not then maybe it wasn't that important anyway.
YMMV, of course. As I said, it's a subjective preference, and no doubt there are many different ways a team might choose to use their tools.
I feel a huge pain of resignation when I am forced to use multiple lines to describe a single commit. There is almost no audience for any line of the commit message other than the first line. As far as line length goes, any practical means of displaying the messages, like gitk or git --oneline, is unfriendly to lengthy lines. Gerrit will display all the lines of the tip commit's commit message, so one can amend that commit to create a summary of the others, but then of course the other commits are not as visible.
The character limits are a real thing in git culture. One could start here, with a question posed by an apparent skeptic:
As someone who has had to dig deep with git blame, I tremendously value a good long multiline commit message. At the very least it can save the trouble of having to dig up some obscure mailing list discussion or bug report to figure out why a change was made, assuming that exists at all. If it doesn't, all you have is the git commits.
> There is almost no audience for any line of the commit message other than the first line.
git log would definitely show the entire commit message, so I don't see why there would be almost no audience. git blame along with git show or git log -n1 would also allow you to see the rest of the commit message for a particular line in the codebase.
git log would definitely show the entire commit message
git log could show the entire commit message, but I can't remember the last time I saw any developer whose default format wasn't a single-line one. Presumably there are some people who do prefer to see the whole message every time, and maybe that includes you, but IME that's quite rare.
One would actually have to make a change to their global git config to set the pretty setting to display oneline or add a bash alias to run git log --oneline.
By default, it shows the entire message, and, in my experience, most people aren't going to change the default behavior in one specific case.
One would actually have to make a change to their global git config to set the pretty setting to display oneline or add a bash alias to run git log --oneline.
That is true, but I literally can't think of any developer I've worked with in many years who had not done something like that. The typical display in every GUI for git repos that I've come across in recent times is also geared to single-line display, though some of them are marginally better with at least indicating the presence of additional lines than git's own one-line display formats in the CLI.
YMMV, and apparently it does, based on your second paragraph.
Just a bit, you can have multiple lines with -m, I do it frequently. You can also write single line messages in an editor, which I'm also frequently guilty of. They're orthogonal issues.
Yes, however, -m does encourage people to make single line messages.
He didn't raise good concerns, he got outraged after reading a HN comment and labeled "pathetic" something he knows nothing about, has no context in, and didn't even spend 5 minutes researching.
This is the correct answer. The difference between the messages I leave for myself in my local merges and my actual commits to the dev branch are night and day.
I'm not sure I'd classify what I read as "outraged" (different people express themselves in different ways, not everyone uses "pathetic" with the same vehemence or vitriol).
That said, that merge log is fairly useless. Whether it needs to be anything other than that, and who generally would see it and whether it's for the person writing it or someone else is something to be discussed, but even in the case it's mainly meant for the author to look back on, is it even succeeding in being useful in that job? I would agree that it appears pathetic, but probably pathetic in a low-cost doesn't really matter way.
I do a ton of that locally (or, sometimes, in a remote branch that I 100% own and no-one else will be touching), but clean it up before pushing or merging.
The number of lines in the amdgpu driver sadly does not mean that it would be usable in any way. The amount of red in dmesg is just sad when you're stuck with USB-C DisplayPort graphics.
I am looking at the changelog, even if we ignore drivers being used not in Server Environment, and improvement from Security, Networking, I am wondering how are BSDs going to compete?
I don't think a Linux monoculture is good, but from the perspective of an individual or organisation choosing which OS to use, there's less and less reason to choose a BSD.
Yeah, completely. Competition and alternative approaches are what pushes us forward. The fanboi culture of “*BSD is dying”, “use Chrome” etc. which tech communities seem to encourage makes it worse.
Keep doing it the hard way, trying something different and running your code on a different browser and OS. In the long run, Windows and the BSDs have done a lot to make Linux good, in the same way that Android and iOS push each other forward. I’m just hoping folks keep adopting Firefox.