Author here. For what it's worth, I (mostly) think you're right! IMO heroics were a huge problem at that company, and are a huge problem at the company I work at now too. The gruntwork is really really important.
I tried to get at this in the last section, that it's not about going out trying to be a hero, and it's unfortunate it didn't get across. That was my point that you shouldn't go look for company priorities to fix, to be a hero. Just look around you, see what's going on, and do good shit. Sometimes yeah that does involve heroics, sometimes it's just gruntwork.
I don't want to go back and edit the essay now but I think it could be much clearer indeed!
> Imagine the same story but somebody two months earlier had voiced a concern that Jabberwocky and ChaChing wouldn't play well together. Pushing for the APIs to be harmonized so that they could play together and integrate. [...]
Disagree here, though it's maybe not obvious from the bits of the story I told why. ChaChing was one of... maybe three or four dozen experiments like it. All of the others failed. There was no reason to believe ChaChing would be any different, and if it did work, even the best-case expectations were well below what it ended up doing. It really was a lightning strike. So it would have been a huge waste of time to do all of that work, delaying Jabberwocky, for a bunch of things which never ended up shipping.
To give a different point of view, the message I got out of this was exactly what you were intending. I didn't come away with "go find opprotunities to be a hero". I found the opposite in fact, that you're saying ~don't~ go looking specifically for those heroic opprotunities. I feel that you reinforced this point in your closing statements quite well:
>It isn’t about creating bigger and bigger opportunities for yourself — it’s not about selfishly inventing self-serving projects. Rather, it’s about getting better at recognizing and taking advantage of bigger opportunities which are already there and just making things happen.
> you shouldn't go look for company priorities to fix, to be a hero
The "to be a hero" is the problem, you can always feel good about solving hard problems and finding your boundaries, but "avoiding a crisis" is a thankless job & we should thank those people more than the "only I can fix it" people.
The core problem for me is that the constructive "avoid crisis" people are baked out of these sort of events, rather than born risk averse in the first place.
> Just look around you, see what's going on, and do good shit.
One of the biggest side-effects of WFH was that this whole "who's around" mechanic has faded away without a proper replacement.
Most of my satisfying work came from random interactions like this (not the most impactful or the most purposeful).
I was a principal engineer and most of my 10 AM to 2 PM was taken up by meetings which mostly involved prevention of work duplication or to stop someone driving down to where I knew there was a dead-end (like, "I see you're using alpine docker images, why?" or more management boundary stuff like "the difference between these two ideas is ~140k over a year for all customers - can we agree that the expensive one is better for customers, because they don't see a 12$ month bill and it costs more than 140k$ to build the cheap one in NRE?").
Most of that work is tiring and always unproductive - the actual work is done by smart people who answer my questions carefully, the dumber the questions, usually the better the results.
But sometime around 3 pm, I'd be line to get a coffee and I would run into a PM or someone from support or dev who would have a question for me which sounds nonsensical at first - but having an hour & half to go down that rabbit hole looking over someone's shoulder usually ended up with mostly "hey, I know what's happening, I've seen this before" or the classic "hmm, that's strange, how could that even happen?".
Six to ten weeks later, I would be in a meeting where I would get some sense of "deja vu" when someone describes their approach and reach back to that day & that problem, to ask "how would you detect X" (something like "machine reboots, comes back with same hostname, but different IP" - would you restablish connections to IP or hostname, what about krb5 ... yada yada).
That part is the real value as an engineer, but entirely unappreciated from the management's point of view.
With WFH, I would go get a coffee, go for a run/walk and as good as it was for my state of well being, I would eventually get pulled into these crisis situations Thursday at 9 pm instead of Tuesday at 3.
I like WFH for the work I actually have to do, but this sort of temporal serendipity is not easy to recreate on Zoom.
It comes up so often because it is a measurably good idea when you look at container sizes or startup times (gzip is terrible, terrible way to package container images).
An engineer tries it out, it works great and it does because alpine is lean, fast and generally good.
I work on performance, so any decision to make things slower gets routed through my desk - the perf buck stops at my desk. Now it's my thankless job to squash out her initiative and as gently as possible (there are lots of smart people in the industry, but being kind has a better ROI), so that it is not a "this is policy" form letter appeal to authority, but walk them through the entire list of tickets in my notes.
Because we hit a bunch of SEGVs with the JVM and the JDK team basically closes them as WONTFIX when reported.
And this sort of decision has a shelf life, I can be right for 18 months and wrong the week after - so this is not a hard line.
This specific thing is definitely not a permanent thing, but a temporary headache - the JDK team is working on Portola and that'll fix any issues they had.
If you're going to install glibc as a workaround then you're basically giving up the space savings anyway.
We still break things in production, but counterintuitively I'm the happier when the problem is bang in the middle of code I can commit to, rather than deep inside musl -> jdk interactions on what happens with a longjmp on a SIGSEGV.
Also the turnaround for the CVE reported was much faster across teams if everyone picked the exact same base image - standardizing on Redhat Universal Base worked out (& also AquaSec scans etc is easier if 6 different products have the same image down there).
Amazon Corretto does have alpine images, so this is sort of a 2020 advice with rapidly declining value (I'm funemployed for 2022, so my current "weakly held" opinions are entirely about rust syntax).
I tried to get at this in the last section, that it's not about going out trying to be a hero, and it's unfortunate it didn't get across. That was my point that you shouldn't go look for company priorities to fix, to be a hero. Just look around you, see what's going on, and do good shit. Sometimes yeah that does involve heroics, sometimes it's just gruntwork.
I don't want to go back and edit the essay now but I think it could be much clearer indeed!
> Imagine the same story but somebody two months earlier had voiced a concern that Jabberwocky and ChaChing wouldn't play well together. Pushing for the APIs to be harmonized so that they could play together and integrate. [...]
Disagree here, though it's maybe not obvious from the bits of the story I told why. ChaChing was one of... maybe three or four dozen experiments like it. All of the others failed. There was no reason to believe ChaChing would be any different, and if it did work, even the best-case expectations were well below what it ended up doing. It really was a lightning strike. So it would have been a huge waste of time to do all of that work, delaying Jabberwocky, for a bunch of things which never ended up shipping.