Thanks, but this isn’t my first rodeo. In the future, please more carefully exer...

withinboredom · 2024-05-03T10:04:27 1714730667

I'm not sure you're agreeing with me and using disagreeing words, or you didn't read what I wrote...

Software engineers can't estimate how long things will take: https://www.sciencedirect.com/science/article/abs/pii/S02637...

They're wrong, 60% of the time by overestimation and when underestimated, so vastly wrong it's terrifying. I remember this one time I merely had to update a component in prod. Everything went fine in staging, then when I pressed the "button" in prod ... all hell broke loose. We spent the next 4 days fixing it.

I never wrote that software engineers can't estimate effort, I said they can't estimate time, but you're accusing me of the former.

sanitycheck · 2024-05-03T10:13:04 1714731184

I think you've linked to a study of "expert project managers", and we might see similar results in a study of whether "expert project managers" can succeed in tying their own shoelaces.

If you're working with a system where your staging environment is not sufficiently close to your prod environment to be entirely predictive of behaviour, that's a "known unknown" and should be in the estimate.

withinboredom · 2024-05-03T10:22:47 1714731767

The reason it failed in prod was entirely unrelated to it being prod. The same could have happened in staging. IIRC, the error was entirely due to a RST packet from some external system during the upgrade. It was a bug in the upgrading system that should have been accounted for, had anyone known it existed. Identifying the root cause of the failure, was what took the most time. Had deployments been idempotent it also probably could have been resolved in moments as well ... but here we are, 15 years later with lots of lessons learned.

sanitycheck · 2024-05-03T10:39:57 1714732797

Sounds annoying, but seems like you found a bug in the upgrading system that could have struck anyone during any change?

The time/work to investigate and fix it probably wasn't considered (or shouldn't have been, at least) part of the work on the component you were changing - that was just delayed, same as it would be in scenarios like "Dave got hit by a bus and he's the only one with the prod password" and "Our CI service suddenly went out of business and we need to migrate everything".

withinboredom · 2024-05-03T10:49:03 1714733343

My point is that you can't estimate time with any accuracy. At the end of the day, even this fix and shenanigans was still "easy" once we knew what was going on. The effort never changed and we would have been dead on. The issue is when trying to say, "It will take me two weeks to do this," and it actually takes two weeks -- there are simply too many unknowns for ANY task in our industry for us to actually be confident in that assessment.

sanitycheck · 2024-05-03T11:22:42 1714735362

Not to the day, but you can estimate a range based on experience. After that deployment issue you may add "release could be delayed by up to a week" to future estimates until you're sure it's fixed.

I've written TV apps and in that world I've often given estimates that are 5 days of actual work but, because Samsung's QA process can take 6 weeks and spurious rejections are common, "deployment" will often take literally months.

Time to release and time for development can be totally different things and it's arguable whether "waiting" time should be included in any individual estimate at all. (You're adding 4 separate features and doing 2 bugfixes in one release, which one gets +2 months? In reality "submit/release" becomes a different ticket/task.)

XorNot · 2024-05-03T10:05:54 1714730754

Tell me, what is a unit of "effort"?

How would we measure that?