Someone recently replied to me with comments by Dave Cutler, designer of the kernels for Windows NT/XP(...) and (Open)VMS:
“I have a couple of sayings that are pertinent. The first is: ‘Successful people do what unsuccessful people won’t.’ The second is: ‘If you don’t put them [bugs] in, you don’t have to take them out.’ I am a person that wants to do the work. I don’t want to just think about it and let someone else do it. When presented with a programming problem, I formulate an appropriate solution and then proceed to write the code. While writing the code, I continually mentally execute the code in my head in an attempt to flush out any bugs. I am a great believer in incremental implementation, where a piece of the solution is done, verified to work properly, and then move on to the next piece. In my case, this leads to faster implementation with fewer bugs. Quality is my No. 1 constraint – always. I don’t want to produce any code that has bugs – none.
“So my advice is be a thinker doer,” Cutler concluded. “Focus on the problem to solve and give it your undivided attention and effort. Produce high-quality work that is secure.”
Let me give some senior developer feedback on these statements.
Developing kernels is a highly technical job. I can imagine that you have to think hard before writing any code. Once you figure out a proper solution, you can invest some time into writing high-quality code. But remark that no code is written bug-free the first time!
Now contrast this with a consumer facing application. The biggest challenge there is not stability or technical structure. The biggest challenge is to provide a useful, user friendly application. You cannot just think this up in your head and hope all is fine. The "no plan survives contact with the enemy" very much applies here. There is no point in investing into code stability when that code has a high chance of getting thrown away. Get the "useful and user friendly" figured out very fast, using an iterative approach. And only then start focussing on stability and all the rest. Plus, sometimes it makes more sense business wise to shift focus more towards features instead of stability.
Anyway, each project is different. So when you see advice, make sure you first check if it applies in your case or not.
> ... when you see advice, make sure you first check if it applies in your case or not.
I wish everyone understood this. There are four things you need to do to successfully apply advice:
1. Discover it (lots of ways to do this, even more now in the age of the internet).
2. Determine if it is good or not (you can rely on the wisdom of the crowds, other people's judgement based on qualifications, books, references, etc).
3. Determine if it applies to you (you have to make this call, requires thinking about the context of the advice giver and your situation).
4. Actually apply the advice. This requires making the changes suggested and is intrinsically individual.
Step 1 happens pretty naturally, and we're well set up for step 2 (though of course people get fleeced by snake oil salesmen all the time). Step 4 is intuitive.
Correctness and pleasantness are entirely distinct. In most cases incorrectness has little or no effect on commercial success whereas unpleasantness will sink a product instantly.
There are some areas where incorrectness is also highly unpleasant, kernels and compilers being two examples.
This is correct, yet dangerous to say this without mentioning a timeline. I would say do anything (that works) to get customers to pay. Then do everything (good enough) to get them to stay.
Hmm. There's definitely things to be learned picking up things that other people won't touch, but I would also qualify it more:
"Successful people choose their battles"
You can spend your entire career fixing other peoples' crappy bugs because nobody else can be bothered. That's not really going to make you successful, it's just going to make you a good bug fixer.
Choose problems that other people don't want to touch because they're hard or challenging, or because they're a core, valuable part of something.
That's what's brought me succeess ;) (and headaches)
The problem of advice like these is that the "modern" Scrum Master would demolish this approach in favor of a predictable pace with shallow depth. DC could only become successful in environments without any Agile/PMP certified PM.
I find that at Staff+ level you do get the ability to do these things again. Generally you are not beholden to the PM’s and sprints any more. Most of the work you do cannot be categorized as feature factory stuff.
So, coming from a mechanic/blue-collar perspective the idea that everything has to be your job is an insanely dangerous practice. We absolutely have a "not my job" mentality because there are countless instances where poorly trained/inexperienced diagnostic technicians and day laborers have caused catastrophic loss of life and property damage. Is there no room in coding for inexperience? what is the price of a job half done poorly?
"Successful people do what unsuccessful people won’t." is, in my industry, a neat way to start the story of how you lost your hand.
In tech, the consulting industry is full of "job half done poorly". Consultants overpromise and underdeliver. They take the money and run. Yes, it somewhat works. If you have a bug, no one can help you because their code is a giant ball of spaghetti that requires a Rosetta Stone to decipher. Which they will not give you without you paying more money. Within five years, you are likely to ditch that train wreck and hire a bunch of new consultants to rewrite what will probably be another slow burning dumpster fire.
Unfortunately, in many places, “that’s not my job” has been weaponized by people to avoid pulling their weight or to fill in a critical gap. Often in very low-stakes situations.
“That’s not my job” for industrial/manufacturing is absolutely critical and should not be downplayed. But refusing to teach the newbie next to you how to use the Jira ticket system because it’s “not your job” is counterproductive.
in software, this is where "catastrophic" data leaks and security vulnerabilities come from, an engineer or team deciding to roll their own version of a security or cryptography library without having the proper experience to even know what they don't know.
The NT kernel is incredibly high-quality work. It was developed in 1993 and 28 years later is still powering billions of devices. Windows did have a lot of security problems, granted, but many (if not all) were in the upper layers - of which there were many.
He did the very best that could be expected in the early '90s.
It was exemplary coding, and very little of its fundamentals have changed. I have some assertion on this from an interview in the late aughts - most of the problems were in layers above the kernel.
That being said, even VMS is not without flaws (we still run it for our manufacturing floor):
No, the best that could be expected in the early 90’s is formal proofs of security enforcing multiple independent levels of security (MILS) allowing the execution of arbitrary, malicious programs on the same systems handling TOP SECRET data in complete safety as achieved by TCSEC Class A1 certified systems contemporaneously.
In contrast, OpenVMS only ever achieved C2, security enhanced VMS B1 [1], and the NT kernel C2 [2]. NT kernel based systems subsequently maximally achieved EAL4+ indicating independent verification that it is adequate to protect against “casual, and inadvertent attacks”, but failed to demonstrate resistance against attackers with a “moderate” attack potential.
So, no, it is not the very best that could be expected in the early 90’s.
If you are not asking that rhetorically, then the Gemini Trusted Network Processor (GTNP) using GEMSOS achieved a Class A1 certification in 1995 [1] and included ethernet device support in its specification. This is adequate to implement a TCP/IP stack in a unprivileged context. My quick read of the report does not allow me to confidently assert that they do simultaneous ethernet device multiplexing which would allow a trivial implementation of multiple independent data streams over the same link, but they do appear to provide at least a generic time-partitioned device multiplexing solution which would allow multiple programs to transmit and receive serially rather than just being bound to a single program at boot time. This is adequate for a large number of use cases and removes the TCP/IP stack from the TCB and thus requires a lower degree of scrutiny as its failure modes do not cause whole system failure.
If you are asking rhetorically, then what NT kernel, NT kernel derived, NT kernel component, or even any code associated with the NT kernel in any context has been evaluated to Class A1, Class B3, EAL6, EAL7, or equivalent demonstrating proofs of security adequate for usage in high assurance systems? Arguing the NT kernel has achieved less security as a tradeoff to allow it to solve a broader use case is only valid if they have demonstrated the ability to actually make a tradeoff by achieving high security in a narrower use case nominally or at least qualitatively similar. If they can not actually demonstrate the ability to achieve the alternative, then they are not making a tradeoff, they are choosing the only option they can do.
And this is what they have demonstrated. At no point have they ever demonstrated a equivalent, or even qualitatively similar level of security at any scale within orders of magnitude of the scale demonstrated by the Class A1 systems. And it is not like they have not tried. They have attempted numerous times to certify, demonstrating a desire to succeed and at least meaningful effort to do so, and have failed to certify at anything more rigorous even in highly constrained configurations to the extent that they have given up.
To use a analogy, this is like having two energy companies and one of them says, "We have chosen a fusion energy solution that does not generate net energy because we believe that only fusion will be adequate when we have a interstellar civilization that needs to operate in the interstellar void." while the other has a working, economically efficient, solar energy solution that generates energy on the Earth today. Asking, "Can you point to a solar energy system that could generate adequate energy in the interstellar void?" as a counter argument is just plain silly because the existing fusion energy solution also does not work there and in fact does not work anywhere in any context.
Not only was I not asking rhetorically, but I saw the Gemini TNP and am aware that it never had an evaluated TCP/IP stack, only, as you said, the potential for someone to write one.
For about 4 months I was in charge of a Harris NightHawk running a B1 version of SVR3, which was two and a half pains in a half-pain glass.
Saying he doesn't want to produce any code that has bugs is not the same as doing it. Given how accomplished he is I'm sure he is well aware that sometimes bugs slip through.
A lot of NT hardening happening over last ~20 years was effectively enabling access controls that were set too lax previously or even disabled, all due to performance issues (GDI being moved in-kernel, with all the security issues, was also due to performance problems of NT3.x)
The NT kernel is quite nice and solid, the problem is that not everything built on it followed all of the design rules.
>I don’t want to produce any code that has bugs – none.
And there's the reverse thinking: let me intentionally introduce some bugs, so when I will be tasked with solving them I can fix them in 5 minutes and take a good sleep for the rest of the sprint.
True to form, Cutler had a break with DEC, and they gave him a carte-blanche VAX "skunkworks" with Prism/Mica.
DEC eventually shut this down, which prompted his departure for Microsoft. This is unfortunate for DEC, as they eventually poured the company into their Alpha RISC processor, which did not live as long as DEC hoped. Prism might have been a superior design.
At this time, Microsoft was maintaining a UNIX kernel in their Xenix product, so they knew a good kernel engineer when they met one. Microsoft was the leading UNIX vendor in the early 80's.
Cutler famously disparaged the UNIX kernel (his notable saying was "Get a byte, get a byte, get a byte byte byte" to the tune of the finale of Rossini's William Tell Overture).
Microsoft dumped their Xenix onto SCO about this time.
What is more interesting to me was Cutler's involvement with Azure. He must have had some sway over CBL-Mariner, Microsoft's RPM-based Linux distribution.
Much of Cutler's earlier work is documented in the "Showstoppers" book:
> What is more interesting to me was Cutler's involvement with Azure. He must have had some sway over CBL-Mariner, Microsoft's RPM-based Linux distribution.
He was involved with Red Dog (the modified Windows host that powers Azure).
He's not involved with CBL-Mariner team at all to my awareness. Mariner is mostly about solving a supply-chain problem at Microsoft... we have a ton of internal teams all using different flavors of Linux and packages have historically come from all over the place. With CBL-Mariner we are basically trying to unify on that and own the package build and distribution portion as well. There isn't much reason for a kernel designer to be involved in that as its a well-understood problem (and entirely different domain) and we already have internal upstream Linux kernel contributors (which is how produce -azure supported kernels).
Apple didn’t end up on a BSD kernel. They started on Mach (from NeXT) and then made it more performant with XNU by not being so pedantic about microkernels.
"XNU was a hybrid kernel derived from version 2.5 of the Mach kernel developed at Carnegie Mellon University, which incorporated the bulk of the 4.3BSD kernel modified to run atop Mach primitives..."
This is from the link I provided: “The BSD portion of the OS X kernel is derived primarily from FreeBSD, a version of 4.4BSD that offers advanced networking, performance, security, and compatibility features. BSD variants in general are derived (sometimes indirectly) from 4.4BSD-Lite Release 2 from the Computer Systems Research Group (CSRG) at the University of California at Berkeley.”
After NeXTStep was adopted by Apple, they hired a bunch of the FreeBSD core developers and updated the BSD service and userland using FreeBSD. Apple was actually already messing around with Mach prior to that with MkLinux so there was some initial speculation that they might port to MkLinux rather than update NeXT's Mach+BSD hybrid.
The code updates are very limited, though. To the point that for one's own sanity it's better to assume userland is 4.3BSD unless marked otherwise (been burned by this myself in code that assumed every BSD had changes from ~1994 NetBSD)
MacOS X essentially started by updating the NeXT core to latest OSFMK distribution, which was hybrid system (bsd server integrated into kernel space) powering the BSD alternative to SystemV - OSF/1 (most famous of them being DEC OSF/1 aka Digital Unix aka Compaq Tru64). They applied bits of FreeBSD to the bsd server code and over time improved its concurrency, but considerable portion of XNU is code that is called by bsd server but not part of bsd server (IOKit, among other things)
> Cutler famously disparaged the UNIX kernel (his notable saying was "Get a byte, get a byte, get a byte byte byte" to the tune of the finale of Rossini's William Tell Overture).
I'm pretty sure that had to be in reference to STREAMS, because the original UNIX I/O model certainly is not byte-oriented. So basically, entirely irrelevant.
Berkeley was paid to port TCP/IP to unix and as far as I know, part of the deal was that the code should be available to quickly port TCP/IP to more systems including non-BSD.
Thus considerable portion of Operating systems at one point or another used BSD4.3-derived network stack to detriment of network API evolution (sometimes updated with later code), some over time developing their own implementation.
“Bias for action” is a nice catch phrase and not bad to get people out of paralysis.
The other successful thing to add onto it, is to tell people what you’re going to do (ex, a high level 3 step plan, not more than 5), then do it. If there are objections, ask [what can be done] to mitigate the issue instead of arguing the points.
Don’t “bias for action” and do something w/o setting expectations first.
I have another lost interview with Dave Cutler that Microsoft took down from their website about ten years ago. I can find and post it, if there is interest.
I found this as an MHT file, edited as best I can. Maybe I should get this to archive.org?
Date: Sat, 23 Oct 2004 09:40:05 -0700
From: Windows Contact Us <wincu@css.one.microsoft.com>
We apologize for the delay in our response.
I have attached "The Architects: First, Get the Spec Right" an interview
with Cutler and Mark Lucovsky.
Goldie, Microsoft.com Customer Support
___
The Architects: First, Get the Spec Right
Once upon a time ... there was NT OS/2.
Every month, Nadine Kano prowls the halls of Redmond to profile the real folks behind Windows 2000 development. This month: David Cutler and Mark Lucovsky, who helped guide the operating system from its infancy.
"Well, we were just about to leave," David Cutler says from behind his desk.
I'm five minutes late to my interview with Cutler and Mark Lucovsky, two of the original architects of the Microsoft Windows NT operating system.
As I wilt into the carpet, I realize Lucovsky must have mentioned to his colleague how nervous I was about approaching them. After all, who the hell am I to be talking with these guys? They are developers' developers two of the visionaries behind the operating system that began as NT OS/2 and has evolved into Windows 2000. Cutler refuses virtually all interviews with the press, but he and Lucovsky are willing to talk to me, a program manager from down the hall. They probably find my nervousness amusing.
Build it, ground up
I was a college senior when Bill Gates personally recruited Cutler, Lucovsky, and others from Digital to begin the NT OS/2 project at Microsoft. Their quest was to build, from the ground up, an advanced PC operating system, something far beyond MS-DOS. NT signified "New Technology."
One of my engineering classmates, David Treadwell, joined Microsoft to work on NT. I remember how excited he was to be a part of this obscure little development effort that none of us really understood. Remember that 1989 was before Windows 3.0. In 1989, Windows was still a nonentity.
"For example, we went with a client-server design, though at first things like context-switching rates and cache misses were too high," Lucovsky recalls. "Things were slow. But we didn't let ourselves get concerned with the size of memory. Not everything can be at the top of the list. We consciously put performance lower on the list than extensibility and didn't pay close attention to memory footprint until (version) 3.5."
We talk about how the operating system evolved with each release, from memory optimizations and Power PC support in versions 3.5 and 3.51 to kernel-mode GDI and User, plus a new user interface in version 4.0. "The basic, internal architecture has not changed, except for Plug and Play," Cutler says.
"We wanted a good design from the beginning so that, ideally, people could put anything on top of the system once it was there. We focused on the underpinnings. We wanted to minimize the need to add to the base or to tear up the base, because doing those sorts of things creates a lot of bugs. If moving forward you don't have to touch the basic code except to make small tweaks, then you know you got it right."
Some things must wait
At the same time, Cutler admits, "Nothing is ever architecturally correct." Needs evolve, and it takes time to build an operating system. Although support for distributed computing and clustering were part of the original vision, features such as the Active Directory haven't come to fruition until Windows 2000. "If we tried to give customers everything with the first release, we would never have finished it," Lucovsky says.
Cutler elaborates on this philosophy. "If what we desire is to have a mature operating system, then we need to achieve revolution through evolution, through incremental improvements. Within five iterations of an operating system like Windows NT, you see a big difference."
Both Cutler and Lucovsky see taking advantage of every opportunity to increase quality as the top priority for Windows.
Reliable is cool
"I'd much rather see the most reliable and usable operating system than the most whizzy-bang operating system," Cutler says. "To increase reliability we have to make choices. For every 10 bugs we fix, we may introduce three more. But do you want to ship with 10 bugs, or do you want to ship with three?"
"Do you want one more new feature," Lucovsky concurs, "or do you want to fix more bugs?
"When the Internet was first catching on, it was OK if your browser crashed once in a while. But these days, if you go to an e-commerce site and you hit the Buy' button, things had better work. When you're dealing with a leading-edge piece of technology, you can play fast and loose with it. But as the technology matures, playing fast and loose isn't acceptable anymore. This is characteristic of the maturing process for a product like Windows. People will put up with more from the bleeding edge."
"What I think is cool," Cutler interjects, "is that the system doesn't crash, and it doesn't lose my work, and it has functionality. I could care less that the visuals are flashy if my 32-gig hard drive goes away."
Communicating quality
"And if you're a consumer," Lucovsky responds, "you want even better reliability." He concludes: "Quality is the most important vision that everyone working on this product needs to share. It isn't always easy to communicate how we're going to do this, particularly as the team gets bigger."
The growth in the number of people working on the project over the last 10 years has other downsides, Lucovsky notes. "When you have a bigger group, quality problems become especially detrimental to productivity. Say it takes 10 developers and testers to fix one bug. Whoever put that bug in there just caused 10 people to lose time. We're working to make sure our development tools keep up with the growth in our system and our team. We're streamlining the process in ways that will make a dramatic difference in the way we build the code."
"If we want to stay competitive," agrees Cutler, "we have to invest money in tools and mechanics as well as features. We need to put guidelines on paper so that people stay good at planning. Simple things like writing a good spec are basic to software engineering."
Museum quality
It all comes back to the spec.
As I leave Cutler's office, I wrap the NT/OS2 spec in my jacket and head back to my building. I have three computers with lots of whiz-bangy, new fangled things running on them, but for a few hours it's the spec that holds my fascination. As a Microsoft geek, I feel like I'm holding a piece of history. And it turns out I am. This fall, the spec I borrowed for a time will join the Information Technology collection at the Smithsonian Institution's National Museum of American History in Washington, D.C. It's another good reason to write a spec.
I sputter a bit as I look at the clock on Cutler's desk, smile, and take a seat.
To begin the interview, I want to set the context. What was the team's initial vision for an operating system?
"We had five or six major goals," says Cutler. He pulls a copy of Helen Custer's book Inside Windows NT from his bookshelf and flips through the pages. Portability, reliability, extensibility, compatibility, performance. I think that's right. Let me see."
He goes back to the bookshelf to retrieve a thick black binder. The label on its spine says "NT OS/2 Design Workbook." He flips through some pages.
"Here," says Cutler, casually handing me the volume. "Why don't you borrow this? As long as I get it back," he continues. "I think it's one of the only copies left."
Four inches of spec
Inside the binder, separated by a dozen neatly arranged tabs, are 4 inches of documents that make up the original specification. Dated 1989 and 1990, they bear names like Kernel Chapter, Virtual Memory, I/O Management, File System, and Subsystem Security. Page 1 of the introduction, written by Lou Perazzoli, reads: "The NT OS/2 system is a portable implementation of OS/2 developed in a high-level language. The initial release of NT OS/2 is targeted for Intel 860-based hardware, which includes both personal computers and servers..."
I try to keep my expression casual as I set the volume on the table next to me. I know these guys find reverence irritating. I hope it's not raining. How will I get the spec back to my office? Doesn't this binder belong in a museum? God, I hope it's not raining.
"Do you think you achieved these goals?" I ask.
"We certainly achieved extensibility and portability," Lucovsky says. "We tested ourselves by not doing the x86 version first. We did the RISC (Reduced Instruction Set Computing) stuff first. It would have been so easy to drop the RISC support; everyone in the company wanted to. But the only way to achieve portability is to develop for more than one platform at a time. It cost us a lot to keep portability alive, but we did, and that has made it easy for us to respond to things like Merced," he says, referring to the 64-bit chip from Intel.
No embedded semantics in the kernel
At every step, Cutler and Lucovsky explain, the team prioritized design. They knew that the code they were building had to last for years. This meant thinking ahead, understanding, for example, that hardware would evolve perhaps drastically.
"We tried to create a system that had a good, solid design, as opposed to one that would run optimally on hardware of the time," Cutler explains. "When we started, we were working on 386/20's. At the time that was a big, honking machine. Since our design had to be portable, we didn't allow people to optimize code in assembly language, which is hardware specific. This was hard for the Microsoft mentality at the time. Everyone wanted to optimize code in assembler."
The original vision kept the operating system nimble. "We didn't embed operating-system semantics into the kernel," Cutler explains. "So when we switched from OS/2 to Windows, we didn't take a major hit. If we had built OS/2 threading or signals into the kernel, we would have been in trouble. Instead we built the OS in layers and created subsystems to handle OS/2, Windows, and POSIX."
Not everything can top the list
But the original vision also required tradeoffs. The team's engineering philosophy was to focus on one major area at a time. "That's why we wrote a spec," says Lucovsky. "The way we see it, write down what you're going to do, and then execute on it. Don't stand around dreaming, telling yourself, 'Wouldn't it be nice if' We spelled out what we planned to do right there," he says, pointing to the spec sitting next to me, "and we stuck by what we said we would do.
Delegate the prioritization by using a template similar to this one.
> "Hey, I noticed [this thing that I think might be a problem]. Would you like me to [do a specific action that would probably fix it] or move onto something else?"
Yes but quality, naturally enough, takes more time. And while this might be great for organizations with plenty of buffer, for any start-up it has to be that "good today beats perfect tomorrow".
There is a difference between quality and overengineered features nobody asked for.
Nobody is suggesting we "do everything in the ideal way". We're suggesting taking 5 minutes to think about the thing you're writing in the next 30. It produces less buggy code, and likely, be able to complete the task with less iteration because you made the mistakes in your head and changed plans before you wrote them out.
I close out 30 tickets a week, other teams at our company are lucky if they close out that many as a team. I'm not trading quantity for quality at all, I'm simply choosing to do the things that matter and I ignore everything else.
Buggy code can be a real timesink for startups...the failures often cannot be ignored too long, they just sit and mature and compound some interest before they have to be fixed anyway.
And writing bug free/low bug code is about experience level and how your development methodology is -- not about spending more wall clock time. (I.e. for the quality you may spend more on salaries per hour, but not more hours -- probably a very good idea for a startup IMO)
I believe that in many situations high quality code is a LOWER net investment than low quality code.
We can't and shouldn't all aim to produce the Mona Lisa or Starry Night. Such expectations and perfectionism are unhealthy and often counterproductive. Sometimes you just need to showcase the product and fix the issues in beta. Going slow with low bug counts is nice sometimes and in theory, but certainly not every exception or edge case will be forseeable or preventable even with an incremental approach.
I was sold this same line since I started my career in software as an intern ~15 years ago. Every year I've become less convinced that fully applying myself is worth it for my career. Any time that I've stepped up to fill a gap when it didn't directly benefit the short-term perception of my manager/skip-level, the effort was undersold or outright ignored.
Edit: ok - I'll admit, I didn't read the whole article at first. I still stand by my point, though - even when I'm simply identifying those gaps and not fixing them, I still don't see engineers being rewarded proportionally for the value they're bringing their employers.
Agree with this. Pain points are cleavage. Let them be felt, then leverage them into making your improvements known.
I so often see junior engineers making work for themselves by "fixing" things that weren't broken. This is often the case when they want to refactor something to make it better. If you can't measure the improvement, it's hard to make the case you actually improved things. Legacy code is like settled law, touched only when absolutely necessary.
> I still don't see engineers being rewarded proportionally for the value they're bringing their employers.
My theory is that's because the relationship between employers and engineers has an off by 1 error.
Non-software folks see 'managers managing software developers creating software' in a similar dynamic as 'managers managing drivers driving trucks'. Whereas I believe it is better to frame the function of software engineers as 'managers managing code handling data'.
From the business perspective, its software engineers should be seen as managers. Its just that what they manage doesn't require HR or sick days. And just like a re-organization of the business, software updates might take more than a sprint or two.
Also early in my career I had a boss who told me something like, "I know X is a problem. It's easy to point out X is a problem. It's hard to point out X is a problem, figure out a solution for it, and time/manpower/money to fix it"
Pointing out issues is easy. Most of the time you're not going to point out something that no one has noticed before. Many issues are known. They just havnt been solved because they are too hard compared to the payoff or they are not that big of an issue.
I'd take some of this advice one step further. In the examples of when not to step up to fix a problem, the author states under-resourced teams. If you do step up to help out such teams, not only do you risk burning yourself out... you are also masking the problem. Helping those teams hit their targets sets up a long-term situation where leadership says, "They get their work done, so they must have enough resources."
It does feel off to not help is such situations. It feels like you are letting a team down. But in the long run, showing evidence that the team is under-staffed will force leadership to make a choice: Either increase staffing or decrease scope. Either way, the team is adjusted to have a healthy match between their staffing and their workload, the people have a healthier work environment, and the business has a more predictable and reliable work output from that team. Everyone wins.
I'm not saying to never help a team. I'm saying to look deeply at why the team needs help before jumping in. Maybe they could be fixed with mentoring or process improvements. Maybe their PM needs guidance in a better way to roadmap the work. There are many cases where you can jump in and help. But if the team really is running well, but simply overworked and under-staffed, that needs a different fix.
Being able to parse out the differences of where direct help is a benefit vs. a hindrance is exactly the type of perspective that takes people beyond the "senior" level role.
Your take presumes that management is incapable of understanding these issues, or perhaps that managers are incapable of communicating when their teams are losing time fixing bugs of others.
While obviously there are bad companies and managers out there who operate like this, it’s far too cynical to assume every company is this broken.
If the only way you can think to communicate a problem to management is by letting a team fail and hoping management figures it out, that signals a massive communication/management failure. The proper response is to do what’s necessary to unblock your work, even if that means helping another team, and then also communicate very clearly what had to be done and how much of a setback it was. If management is paying any attention at all, they’ll collect these anecdotes and realize that the failing team isn’t getting their work done.
From the management side of things, when I see teams playing too much of the “not my job, not my problem” game to the point that they’re letting their objectives fall behind just to make a point about not helping with an issue, I also lose confidence in that team. Why? It’s a political move to let something fail just to make a point.
This doesn’t mean that teams should be quietly dedicating resources to compensate for other teams, though. It needs to be communicated clearly through the management chain to make it understood like professionals working together to accomplish shared goals. Making moves to let goals fail just to make a point is, like it or not, a petty political play. Communicate, don’t play politics.
> Making moves to let goals fail just to make a point is, like it or not, a petty political play. Communicate, don’t play politics.
We're not talking about your own team. We're talking about someone outside the team looking at how they can help. Letting your own team's goals fail would be petty and political. Looking at a team from outside and helping communicate the problem to management is exactly the communication you are asking for.
Some years ago I was one of the first 5 engineers at what became a semi-unicorn, in 5 years we spent close to 1bn before a sale to tencent.
In the first 12 months I built a dozen microservices, each which eventually had a team supporting them. As the team grew I and one other guy were the 'glue' that kept much of the platform working, we knew how things went together, and by the time the Eng team hit 200+ staff we were indispensable.
After year 2 however, as several of our peers from the beginning moved into team management roles (something I prefer not to do), we noticed we were 'too important' to promote, or to allow to move, or to take off 2nd lvl on-call (at all).
What started with us being the architects of the system turned into us being the 'glue' that kept a massive multi-country eng team operating, which eventually turned into being boxed into a shitty support role rather than promoted, watching people vastly less qualified get moved ahead of us.
Eventually I just quit and moved into a tech lead role at a startup for something different. I feel like this is a trap for IC roles, don't be so helpful as glue that you 'set' into an indispensable position.
I just started a new gig and I'm working on a critical glue stuff that was nobody's job so I can totally relate to this article. Don't stop at the title which is voluntarily provocative.
So far identifying coworkers ownership, planning and validating glue to make things work took most of my time but everyone appreciate that. Some veterans are astonished and say things like "wait so you get Mr Grumpy to do that?" or "How on earth could you get a validation from Mr Nitpicker with a single meeting?"...
Well acknowledging that "it's not my job" to do what is Mr Grumpy and Nitpicker area of expertise is exactly what made things works. They feel recognized and they also understand that the burden is on them if we collectively fuck up so the glue project can move on fast.
One of the tricks up my sleeve is figuring out that sometimes the new guy is the only person who can get Mr Grumpy to change his mind.
There's one phenomenon where some people don't re-evaluate their position until they've encountered the same issue with multiple people. Sending new people to ask often works better than going yourself for the third time. You're just a repeat, not a new data point.
And then there's a form of the Curse of Knowledge, where the resistant person has heard all of the arguments and (circular?) reasoning from their coworkers and doesn't buy any of it. And then someone asks the question in plain English, with no conceptual or relationship biases. Either the way they ask it makes the person realize they're being unreasonable, or they dissect the problem along a new axis that suggests a way to escape the deadlock. Words and questions can push us toward some connections in our brains, and away from others. Rephrasing is sometimes all it takes to reframe a problem, and the more inured you are to the project jargon the harder it is to rephrase things.
I had to have this talk with my partner recently, who works at a small-ish company that hires people competent with data science, but absolutely useless when it comes to architecture, git or even setting up a python development environment.
She does all of those things really well and ended up putting a lot of effort into fighting chaos (spaghetti code, spaghetti git flow, a lot of copy pasted code that doesn't have to be, a complete lack of dependency management, no automation for anything) at that place. I stepped in when I suggested some shared vacation days and she refused because she wouldn't be there to be her team's guard rails. That screams leadership failure to me.
Keeping in mind she barely gets paid better than a junior dev.
The biggest issue with making everything your job is that it hides the problems growing in the company. If there's a gap that needs to be filled, if you fill that gap, you're increasing your workload, and the work is being covered, so management may not even know they need to hire to cover it. At some point you'll be doing so much gap work that you'll burn out, leaving all those gaps uncovered.
Obviously you shouldn't be saying "That's not my job", but that's not the point of the post. The point of the post is that you should point that gap out to your management, so they can close it. They may ask you to help cover it, until it's hired for. They may find another team to cover it. They may do nothing, in which case either the work isn't considered high enough priority to cover, or they're not good managers.
Don't be a hero for the sake of being a hero. Be the glue that binds everyone together.
That first paragraph, not a great start when I have to do mental gymnastics just to figure out what the blog post is even about.
If you can't write about a subject directly, use creative license and make the situation up. Nobody knows / is going to find it different and it will help get your point across. I bailed on this article after the first paragraph.
>"If you can't write about a subject directly, use creative license and make the situation up. Nobody knows / is going to find it different and it will help get your point across."
Making up a situation—without disclosing that it is fiction—is risky in a practical sense and also morally questionable.
Readers typically don't like to be deceived when reading non-fiction. If a specific detail is fabricated and a reader notices, a reader can assume that the writer is sloppy/deceptive, and become far more skeptical with the rest of the author's works. Undisclosed fabrication can also lead to unsound conclusions (for example, an article could rely on the existence of a counter-example to disprove a rule, but a fabricated counter-example would make the argument fail to be valid).
Furthermore, suppose a reader cites an article with a fabricated story, relying on that story to make a point. If the story is false, the reader would take a reputational hit as well. In a moral sense, it's better for the reader to avoid fabricating situations.
It is typically good practice to make an article introduction easy-to-understand with minimal jargon, but creative license should always be disclosed when writing about a hypothetical situation. It leads to higher-quality reasoning (by avoiding arguments that rely on false anecdotal evidence) and is simple to disclose (e.g. "Consider a hypothetical: ...").
They’re called business fables, five dysfunctions of a team being a classic, well known, best selling one. It’s a common way of conveying information that might come from several related experiences and built on to drive home meaning.
People don’t like to be deceived to the point they’ll go out of their way to believe a story is true until presented with unrefutable evidence that it’s false. Even then they’ll prefer to believe the original lie. This is called the backfire effect.
You wrote four paragraphs assuming your parent comment urged authors to pass fiction for facts. There are tons of stylistic choices (using names like Bob and Alice, etc.) that make it clear when something is meant as an illustrative example. Let's not cancel the metaphor.
That aside, I'd be very concerned if someone took everything they read as truth unless declared otherwise. I think the overwhelmingly common behavior is to explicitly mark a real story, not the opposite.
I would amend that to "nobody cares"; it's totally fine to discuss a hypothetical if it lets you skip irrelevant details that slog down your point. Readers presumably are interested in the point you are making rather than the details.
This article is a good example where it would have helped. I did read it, (just forget aboit the start, it gets readable a couple paragraphs in) but I was also inclined to close the site after the initial paragraph. Starting like that doesn't invite to read.
I think the author—rightly in my case—assumed they have an audience who already knew the context based on the title, and the tweet which prompted it, pictured a bit further. Not all of us spend a lot of time on Twitter, it’s okay to feel the intro lacked context.
It’s also okay to wade into something unfamiliar and patiently find out if you’ll get something out of it. Probably 90% of the best things I’ve read from a HN link have been completely context-bare for me when I clicked through to read.
You probably spent more time writing this than it would have taken to read far enough to gain the context you were lacking. Not that you’re obligated of course. But maybe it’d be more productive for you and everyone if you quickly dismiss by moving on to the next thing.
This could be said about any criticism. It’s also perfectly valid to point out flaws in a style of writing.
It’s so common, a lot of places have comment sections where people can leave their opinions.
As for making assumptions, that’s fine, I read heavy technical stuff on hn a lot and often find myself out of depth and unable to gain benefit that would apply to me from an article. So often I’ll just stop reading and go about my day.
But really, having an obtuse opener on a blog post serves no one. There’s just no flow to it.
I think this is a part of the reason so many people focus on the comments over the article. I've found that whether there are a lot of comments and good healthy discussion is a excellent marker for article quality. The first paragraph(s) frequently are not.
I agree with the 90% experience, but I have to say it is very hit and miss: let's say like 50% of posts that start badly are bad throughout. (And I do consider starting with an assumed context to be starting badly. It takes just a sentence to describe the fact _that there is context_ not present in the article if you don't want to include it outright)
This article seems to be advocating sticking your head in the ground when something goes wrong
There’s a world of a difference between telling others “that’s not my job” for tasks that have nothing to do with you, and owning the success of your own projects
And it feels like this author is conflating the two concepts
If there’s a dependency of yours that’s not working the way you need it to, it doesn’t mean you can simply go home free. At the very least you have to fill in the communication gap of informing management of the blockers.
If need be, then you have to own driving the call to either add more resources to your own project (to compensate for the bad dependency) or push forward the call to nix your project since it apparently has a high chance of failure
And if none of those attempts pan out, _then_ you can try to move on to a different place where you can actually be effective. But know that you’ll face this same type of challenges in your new role as well!
There’s no easy escape: if you took a position of leadership, then it’s definitely your job to lead
I read the article as saying you can't solve everything yourself. That's different than saying you should ignore problems. Instead you need to communicate when you see a problem that you're not positioned to solve, because you don't have the bandwidth or you're not in a position of authority for that domain
> you should be working on properly communicating the gap and its risk to the business (and risk to which part of the business) and NOT attempting to solve everything.
> There’s no easy escape: if you took a position of leadership, then it’s definitely your job to lead
Leading != to fixing every problem that has no one assigned to it or filling every gap. It's about helping the other so that overall the org can achieve it's goal.
Leaders should be miles away not in the trenches. You can ask Russia what happens when you send your general to the front lines. A similar effect happens in software when more senior devs take on all the work themselves.
The original tweet is saying the higher up you go the more you have to do everything. That's wrong an extremely unhealthy.
I agree the original tweet is wrong. Imo as you become a more senior your role should become more and more tightly focused. Narrowing down into things that are more important for the company. Examples might help here:
As a junior dev you can be expected to fix bugs in the product, work on features, maybe help with testing. Basically anything that the company wants to get you more experience. This is the role discussed in the tweet filling in any small gaps.
As an intermediate dev I would expect you to start working on your own features (with help from a senior+). Only when you're finished these you can tackle anything else like random bugs and suchlike.
Senior dev is more about helping others so you mentor juniors, help intermediate devs with their features and are given the slightly harder features to work on.
Principle devs shouldn't really be mentoring juniors. Certainly you can ask for advice but I would have all my principles working on the hardest parts of the codebase. Anything that requires research or things that the company hasn't done before. Examples of this would be adopting new technology, complex algorithms ...etc. Seniors go to you for advice, not mentoring or step by step by advice on how do I approach X or Y that sorted of thing. You should have a principle working on a low level bug (say a spelling mistake or poor grammar) that's for people further down. There's an element of people management here as well.
Staff roles are mostly hand off and are their to take the strategic approach that comes from the exec team i.e. the CTO / CEO and actually puts them into action. So think working with product coming up with reasonable plans that are actually doable. You'll work with the principles to execute your actions. If it's a big system and a big team part of this will be working out how to split the work amongst the sub teams.
This is what I mean by the responsibilities as you become more senior your responsibilities change from fixing spelling mistakes to help set and execute the roadmap of the product / service.
There is a class of these problems that's easy enough to talk about: "Glue work vs gap filling" is relatively easy topic. Broaching "doer/talker" ratios can be stickier.
Some not-my-jobs are evolved/strategic... they have meta reasons. Redoing the company's google-sheets inventory management horror might be high value, but is status-detrimental work for a senior dev, or assistant PM to do. "Excel dude" is low status. Within excel-dude land, the highest (personal) value work might be indulging executive pedantry over the formatting of dashboards. UI is low status at some shops. Talking to clients and tracing back requirements is thankless work at other shops. Etc. Incentives can be wacky in orgs, after all .
People naturally fill in gaps when they're motivated and work independently. You can see this with hobbyists, personal projects, startups, Community OS, small teams, criminal gangs, children playing and whole lot of other human endeavors.
They row in the same direction when they want to go in that direction, perhaps to reach a destination. In 80% of orgs, 80% of 80% of people's concerns are related to happenings on the boat itself. The vector of the boat isn't their job. They don't care where the boat goes.
You need to analyze the organizational landscape to know what your role as a staff engineer really is.
If your manager has sole discretion on your compensation, then you need to be strongly aligned with her goals in order to succeed. If she sees her goals as solely delivery of her projects, then that's what you should work on.
If you work in an organization (like I do) where your compensation is determined by a group of leaders, then you need to look at what the group wants from you and deliver on that. In my group, a big part of that for staff-level engineers is making sure that the overall organization is successful.
Story for me this year: I noticed another team struggling. Their deliverable is really important to my boss 3 levels up, and I have the background to help them get unblocked and be successful. This team works on something completely disconnected from my manager's goals.
I started working with them, and after a month or two told my manager I was working on it. She agreed that it was a good use of my time, and my involvement didn't interfere with other commitments we'd made. I've probably spent about 3 weeks in total on it. At review time, there was widespread agreement that it was the most important thing I've done for the year.
To close - As a staff engineer, you'll see lots of gaps (if you don't see them, you're not a staff engineer). Most of them should be handed off to others. But some of the gaps are things you're uniquely suited to fill, even if it's not precisely in your area of responsibility. They're "not your job" but exactly the right thing for you to be doing. If your role is too narrowly defined, go negotiate with your manager (or better, your manager's manager) to broaden your role.
Yes. And, sometimes when the standard is so low, it's impossible to convince anyone that it is broken. Specially when you indicate the effort to fix things.
Sometimes 'struggling by' is the modus operandi. What comes with this is the toxic culture, high staff turnover and disrespect of anything new by the people who have suffered through it all for a long time.
Screw that. If I'm not getting paid for it and the company isn't invested in my well-being or I'm seen as a replaceable cog, then I'm not taking on extra responsibilities for no pay or reason. I'll let the executives worry about business shortcomings and only do my job. Not two others as well. If I'm being paid to own my work and its attendant liabilities, I expect to be paid for that risk.
On top of that, agreement isn't always necessary. You can disagree with coworkers and still execute just fine.
Some things have never had an owner, others can’t keep one.
In the latter case, at least in my experience, it’s because the ‘thing’ is actually composed of several things that clearly have different owners but the link was lost in the composition.
The furthest I’d go with pat advice is own finding the owner. If it’s truly homeless then feel free to take it in as a rescue. If it’s a rat king, grab your side cutters and get to work.
Don't bring up buying a new boat/whatever until she does. "Honey, your boat really looks bad and has so many problems, maybe you should think about getting a new one"
If I owned a boat and tried that with my wife it'd be more like, "Honey, your boat really looks bad and has so many problems, it's time you got rid of it."
This article seems to be advocating sticking your head in the ground when something goes wrong
Yeah, sometimes all you can do is raise awareness. And if the ongoing issue is going to affect your deliverables, then you better be raising awareness!
Saying “that’s not my job” isn’t for anyone in a leadership permission
The phrase "It's not my job" is never good to say,
Often, focusing on the things that are explicitly your job and intentionally ignoring bigger problems is a good move so that the people who are responsible for those issues will allow you to fix them.
I had this theory a while ago that accumulation of this sort of thinking is why big corps without regulatory capture stop innovating and then wither and die - “if it was so important someone would surely tell me to do it”
Adolescents act out because they quickly realize that the adults in their life give a lot more attention to problems than to things going well. I think this makes sense, and there's a psychological basis for this. I'm not surprised then that it works the same way in the workplace. No one is going to be willing to change things unless they feel the pain associated with not changing it. If someone keeps putting out fires at the expense of their work-life balance or health, then their superiors will simply direct their finite attention at the fires that weren't put out - just as the honor roll student with a dropout sibling probably doesn't get a ton of attention from their parents.
My experience working for a big company is that most things are not your job. It's not that you get to say "it's not my job", it's that you are not allowed to make it your job.
I like working for a smaller company because I can make things my job. In any company there are things that need doing that either aren't being done or are being done suboptimally. Almost every day I spot something and think "I could do that". So I make it my job.
This is a topic close to me, and one I have problems expressing my thoughts about without coming across as cynical. My bottom line is "not my job" is a way to avoid companies taking advantage of you.
I'll give you a real specific example. My wife joined a company as a junior fresh grad. As she grew into the role, started to expand her responsibilities to get to the next level.
The first time she got passed for a promotion she decided to take in even more responsibilities to strengthen her case. She was performing at the next level while getting paid the absolute lowest salary for her role. By the end of her tenure she was literally mentoring a newly hired Senior, and still got passed for a promo for the third time because "she wasn't ready yet".
Stretch assignments are a great way to grow, but there is a clear asymmetry of power here. Companies will easily extract as much value as they can from you and give nothing in return. I think it is perfectly reasonable to set boundaries and say not my job. The difficult part is choosing where that boundary should be.
If I do stuff that is not my job it will become my job to do that stuff. I'm not sure if I want the additional responsibilities or increased workload, especially when there's no guarantee that it will go hand in hand with pay increase. Why bother?
I'm can't be the only one exhausted from all these pointless bloggers trying to point to how their promotion to hierarchical title N is justified because it's arbitrarily different than N - 1 at their org. Let's just acknowledge these things don't actually map across different organizations and therefore have no meaning. Hierarchical titles are for engineers that hide their mediocrity behind HR appointed titles.
Like a lot of things in life, judgement is required. To dismiss levels as "having no meaning" is to misunderstand their purpose and limitations. Someone being promoted to Staff, Senior Staff, Principal, Distinguished at top tech companies says a lot. On the other hand, getting hired into Staff means less. In all cases though, the specifics depend on a lot of (usually) nameless, faceless individuals who also are subject to their own biases.
I'll tell you this though, having interviewed hundreds and hired dozens over the last 25 years: someone unilaterally declaring levels to be bullshit for the mediocre is a pretty strong negative signal about their ability to succeed in a large organization outside of specialist positions in research or other isolated roles that don't require cross-team or cross-functional influence.
> Hierarchical titles are for engineers that hide their mediocrity behind HR appointed titles.
Disagree. The motivation is more around the financial benefits that come from it. Which, honestly, I wouldn’t blame anyone for wanting, considering the CoL in most major US metros.
I do think its worth attempting to create a technical career path for engineers though. The current iteration may not be perfect, but its an improvement over the previous thing, which was that you had to move into managerial positions to make “career progress”. In fact one of the criticisms I have of the upper levels of this rat race is that it devolves into more managerial work anyways, reducing the amount of time engineers spend in the trenches (so to speak) contributing to the decline of their skills.
“I have a couple of sayings that are pertinent. The first is: ‘Successful people do what unsuccessful people won’t.’ The second is: ‘If you don’t put them [bugs] in, you don’t have to take them out.’ I am a person that wants to do the work. I don’t want to just think about it and let someone else do it. When presented with a programming problem, I formulate an appropriate solution and then proceed to write the code. While writing the code, I continually mentally execute the code in my head in an attempt to flush out any bugs. I am a great believer in incremental implementation, where a piece of the solution is done, verified to work properly, and then move on to the next piece. In my case, this leads to faster implementation with fewer bugs. Quality is my No. 1 constraint – always. I don’t want to produce any code that has bugs – none.
“So my advice is be a thinker doer,” Cutler concluded. “Focus on the problem to solve and give it your undivided attention and effort. Produce high-quality work that is secure.”
https://news.microsoft.com/features/the-engineers-engineer-c...
Unfortunately, I do not have this level of talent, but I do what I can.