The most productive teams I've run have been when using OKRs or OKR-like schemes.
They're hard to get right though. Its easy to add too much process, make planning too much of a burden, or not regularly check-in.
One concept that's helped lubricate things is "task-relevant maturity", which I first heard about in Andy Grove's _High Output Management_. It's a gross phrase but essentially means to that people who could accomplish a goal in their sleep need less help than someone who is facing something new. Accordingly, I've cut more slack to the former when laying out OKRs.
Its hard to overstate how valuable that is. Senior engineers who've been in the same space for eons chafe at having to do the same pedantic things as a junior engineer, and rightly so--they've seen a million managers and fads come and go.
To the point of the article I _really_ like the concept, but I'm wary of demanding another step for fear that it won't take. Usually I try something out by myself but keep it optional for a year or two---or forever.
You might resonate with Situational Leadership. The basic premise is almost exactly what you say above about senior engineers. In SL they break it down into four stages:
1. Beginner: you tell them exactly what to do and how to do it.
2. Novice: you mostly tell them what to do and encourage them to learn and experiment.
3. Experienced: you consult with them on what to do, and they mostly manage how to do it.
4. Expert: you give them the goal and mostly just ask them how they'll approach it and how long it will take.
The problem with OKRs is that they are a religion. Once a metric is established it becomes a god to worship at the expense of all else, until the harmful effects of such single mindedness become painful enough that the old god is deposed and a new god put in its place. With the same myopic thought process. True leaders call this blindness "focusing on what's important".
Simplifying something as complex as an engineering team or a company down to a couple of variables cannot possibly model the real world. You can't say this out loud though, because this goes against established leadership culture. You can't be one of them philosophers. We've got to have achievable goals here!
Example: once upon a time a goal was instituted where tasks were supposed to be assigned to the engineer with the most experience in a particular area. Focus! Faster time to close a ticket! Happier engineers - they work on what they are good at!
See the problem yet? Meeting the goal actually made it so that there were a ton of bottlenecks. If an engineer was out sick the team lost an expert in a particular area. Engineers got bored working on the same thing day in and day out. And because the OKR killed knowledge transfer it became impossible to tackle a problem as a group.
And so the old metric was declared a problem and a new ridiculously limited model of the world was put in its place.
Making your efforts focused on a thing seems like a good idea, given proper research and discovery is done and the metric to improve is measurable and meaningful.
I think the criticism you are voicing is more down to a management style rather than OKRs. I’ve never would isolate my team members in areas and prevent knowledge being focused in one person. Always let the group tackle the problem and let them decide on who is going to work on it. Encourage pair programming and especially during the problem discovery, involve the entire team (when tackling a large, new problem).
Smaller stuff can be picked up by the same engineer. A bug fix here, a small adjustment there. But as soon as you introduce a big chunk of business logic, it’s important to bring along the team. I feel like that’s not what happens to you, but I don’t see this as a fault of OKRs
> Progress is more like a hill than a straight line. Uphill: figuring things out (uncertainty, unknowns, problem-solving). Downhill: making things happen (certainty, confidence, execution)
This is the first time I've seen a "hill chart" and I found it a little confusing to look at - maybe because the shape looks like a normal distribution, where "the sides" are more uncertain and not less uncertain. If one turned a hill chart inside out, you'd get a chart like a dartboard, where as things are more certain you are closer to a bullseye in the middle, and less certain you are further away. I could see looking at OKRs that way.
I kinda like the hill chart but TBH it shouldn't look like normal distribution, but more like a half-circle. In my experience, most (SW) projects (at all levels of granularity) are S-curves (and not straight lines like in burndown chart). You're slow at beginning, because you're learning the domain and finding good abstractions. You're slow at the end because you're working out the kinks and fixing little details. The fastest progress is in the middle. So if the hill chart is supposed to show the difficulty in progress, it should look kinda like a derivative of inverted S-curve.
Going up is difficult but going down takes a lot longer than expected from the top, usually; so yeah, the hill should be skewed to the left and have a fat right tail.
The entire shape up philosophy from Basecamp is honestly the best project management style I’ve been a part of in software engineering. The inversion of thinking around time (how long do I want to take vs. How long will this take?) is the most critical aspect of it.
Yup, that is so obvious in hindsight but I hadn't thought of it that way before. "We're willing to spend this much time, will you be able to do it?" is a much easier question both to ask and answer than "how much time will it take?"
It also encourages collaboration over adversarial negotiation.
I think that completing a training or sharing the results are, technically speaking, deliverables not results; that's why that doesn't work as a OKR. OKR would ask for the measurable impact that completing those things has on business behavior (user-facing KPIs, or internal productivity / value measures).
My only encounter with OKRs sucked badly, and mismatches like that were one of the reasons.
This is the problem with both SMART goals and OKRs. They purport to be meaningful by measuring results, but results are outside of one's control and subject to the inputs of lady Fortune!
What we want to incentivise is not success, but the kind of behaviour that leads to success. But that, on the other hand, suffers even worse from Goodhart's law.
Cedric Chin dug deeply into this dilemma[1] about a year ago. His suggestion is to frequently follow up on both behaviour metrics and result metrics to build a tacit understanding of how one is affected by another. This then allows you to focus improvements on behaviour metrics, which are mostly in your control.
I have yet to try putting it to practise, but I like the idea.
Sure, OKR's can help keep multiple teams aligned on business and product goals. And that's an advantage that's too big to ignore. But the problem with OKR's is too often, they're handed down by well meaning people multiple levels removed from the day to day software development work. There can be a lack of ground truth. But too often they miss the bottom-up intelligence, creativity, and engineering needs that comes from the people who work on the product day to day. It's also important not to load your product roadmap into OKR's and call it a day, or else you'll end up with teams of people who check boxes and software that barely makes it out of the driveway. Personally, I would encourage a reflection period after each series of OKR's. Not to just discuss outcomes amongst the senior leaders, but to collect intelligence on the ground, talk to devs, talk to users, talk to support reps, and decide what really should be the priority. But hey, what do I know I'm just a person who's comment you're reading.
The biggest problem I've seen with OKRs as an engineer is that the objectives are business goals the engineering team has little or no ability to influence. Like, an objective of "onboard 5 new large customers" makes sense for the business, and the engineering team could definitely screw it up by, say, building a system that can't scale that high. But when the sales team only closes 3 new deals, the engineering team fails its OKRs no matter how good a job everyone did. I've seen this be quite demoralizing, especially when meeting OKRs is tied to bonuses.
The second-biggest problem is that the time horizons of OKRs are often longer than the interval between pivots that cause the old OKRs to no longer make sense.
Or building things that help retain current customers vs acquire new customers so sales has something to sell.
The bigger problem as I've seen it is engineers that don't see themselves as part of the business, either because of the culture or personal choice around lack of focus on soft skills.
All too often engineers spend zero time understanding the market and customers, and scaled agile, as it's typically implemented and managed, definitely doesn't help.
> All too often engineers spend zero time understanding the market and customers,
From my perspective the problem is that companies invest zero time in training engineers about the market and companies, and instead just treat engineers like assembly line cogs that produce business value on-demand.
At a certain scale you don't need to make OKRs fun and playful - there are enough VPs that all the OKRs read like an episode of Silicon Valley.
If you're big enough to have company level OKRs that were written solely by folks that haven't opened a text editor in 10 years simply get out the popcorn and a can of beer. There are going to be some hilarious ones in there.
This concept is something of a breakthrough for me, personally. I have fairly severe ADHD, and it took me about fifteen years in this industry to really find my place in it. My "hill chart" is more of a "valley chart".
In dealing with my own mental abilities, I've found that I've developed a toolset that has resulted in my strengths being in aggressively removing uncertainty and finding the optimal path to implementation - because implementation is the part that I struggle with.
Now I'm wondering if the very effective teams that I've been on have been comprised of people whose "charts" here overlap in a way where someone is always in their "downhill" portion while others encounter the "uphill".
I can't help feel infantilized by these games. It doesn't make unwanted things easier to swallow, quite the opposite.
"Here comes the airplane!" is a little game that I like to play with PMs so they get to enjoy me punching them in the face. Aren't we all having fun now, how wonderful!
Execs can't do this themselves because they don't have the information they need, and are trying to be predictable and compete in a market. So think about it as an ELI5 exercise where you're infantilizing up, or as it's more commonly and professionally referred to, managing up, or more plainly, helping the business make better decisions.
Helping the boss convince the market he made better decisions.
I often feel the frustration when I'm writing some data-crunching report-extraction thing, of the knowledge that the output of it will be some administrator squinting shrewdly at it and thinking himself better informed. Then he'll do pretty much whatever he would be doing anyway.
And that's the better case. I suspect in many cases they don't look at the information they ask for at all.
Maybe with a shitty boss and a complacent environment. Help them not squint at it and make a gut decision. Why not instead ensure the objective of the report is clear, there's a meaningful target around which to hinge on a decision, and present the data in a way that helps make the decision? And in lieu of the time or resources to do that, work on those problems?
OKRs are a goal setting framework for non-trivial 3-12 month goals.
Some people are self motivated and well organized, are great at communicating progress proactively to other stakeholders, an understand the idea of cross departmental alignment. OKRs will not help them.
For everyone else, OKRs are a tool that can help accomplish those things.
PS I actually like OKRs and after a lot of effort, learned how to make them useful. I did not get it at first either.
The problem for me is the “key results” part of OKR’s. It means that if it’s not measurable, it’s not an OKR, and in too many organizations, if it’s not an OKR it’s not worth doing.
Cleaning up your code base to accommodate all the gradual accumulation of small fixes/hacks is not something you can put a metric on, at least not without pulling numbers out of your ass. But everyone agrees that you can’t really have quality software without doing this. But OKR’s would say that making nothing but tactical changes to ship features and never revisiting architecture is perfectly great. The incentives always seem to push you towards tech debt.
OKR proponents would say that revisiting architecture and paying down tech debt should be implicit and part of the process of achieving your results, but I’ve never seen it done. Or rather the only time I have seen it done is when someone tries to shoehorn the refactoring work into an OKR in order to justify it, making up bogus metrics, getting the OKR dropped because it’s not meaningful enough, and then just working on it anyway.
"Cleaning up your code base to accommodate all the gradual accumulation of small fixes/hacks is not something you can put a metric on" - why not? generally the need to refactor is driven by something - getting too hard to make changes? productivity down? introducing more errors then we used to? In those cases improving productivity or reducing errors is the result we're targeting, and cleaning the code is the activity we do to achieve the result.
> In those cases improving productivity or reducing errors is the result we're targeting, and cleaning the code is the activity we do to achieve the result.
Right, but please read the first sentence of my post. It’s the “key results” part that’s hard. Because you need to quantify all the benefits you’re targeting, giving them a number, so that you can show whether you completed your goal or not. If you say “productivity will go up”, you have to put a number on the current productivity, then give regular reports on what happens to that number after the refactoring. What do you pick? Number of PR’s merged per day? That’s probably going to go down, because most of the PR’s today are small tactical band-aid fixes. So do you say the number of PR’s merged should go down after the refactoring? That could just as easily be because the refactoring made things worse and everything’s so broken that nobody can make changes. So PR’s merged is a shitty metric. What else? Bugs filed? In a product where you’re growing users you’d probably expect them to go up due to increased usage, so any benefit to refactoring is likely going to be lost in the noise. Line of code count? Please.
More often than not people just pull whatever metric they want out of their ass to make the case for what they’re trying to accomplish, and cherry pick things so that it looks better after the effort. But it’s against the spirit of OKR’s to do this, which is why OKR’s are bad for anything “fuzzy” like refactoring. You have to shoe-horn work that everyone agrees is worth doing, into a framework that isn’t designed for refactoring work, to make the case.
You are hitting on an important and difficult aspect of OKRs. Getting the alignment between what you can affect (the leading indicator) and what the outcome is (business value).
It's not an exact science. You can make pro and con arguments against different things that could conceivably be measured. This is where experience and strategic thinking help.
You can always come up with a risk or a reason why a particular measurement won't affect the desired outcome. You will be more right on some and less right on others. However, throwing out the entire OKR approach because you can not be sure is not correct either.
If refactoring code doesn't lead to fewer defects down the road, or to faster feature implementation with less errors, or faster employee onboarding, or any other visible result, then maybe management doesn't want you to do it, and maybe it's reasonable to consider why.
But it’s really really really hard to quantify it in a measurable way. Which is what OKR’s force you to think about: what is the metric, what is its current value, and what is your goal for the metric, so you know whether you achieved it?
Can you quantify “faster employee onboarding”, reliably? Can you graph it over time? Can you quantify “faster feature implementation” in a way you can actually measure that isn’t sensitive to the fact that all features are different?
The key results are not a problem just for you! You are hitting the nail on the head. Check out section 6 of this study ( https://arxiv.org/pdf/2311.00236.pdf ) - defining good OKRs is problem number 1, and data issues are the 2nd most cited concern!
I wrote a piece about the common issues that people face creating OKRs. There are a few common mistakes that people make which makes key results unmeasurable: https://koliber.com/articles/top-okr-mistakes
> the gradual accumulation of small fixes/hacks is not something you can put a metric on
I've done it before. On one team, we had a goal to reduce the number of linting errors and warnings from 18,000+ by 50% (while not growing the number of INGORES). The team was reluctant at first, because "it's only linting and it does not matter." But they relented and eventually started fixing things here and there. And the number started going down, albeit slowly. And over time we got the number of linting errors down to 18 (or something close), because people found time here and there to improve things. And the team learned how to use OKRs. And they put in place a style guide and an auto-linter. And they started using it so that the errors did not come back. And there were plans in place to put in more sophisticated style analysis and run another OKR agains that.
They literally matured in the code development practices way beyond just linting, just becuase of the relentless drive on one seemingly insignificant OKR.
This is just one example. You can use OKRs with engineering metrics to improve lots of things:
- fix the top 10 Jira tickets tagged with #techdebt
- reduce linting errors by 20%
- reduce number of functions with a cyclometric complexity of 10+ by 50%
- research 5 static code analysis tools
- increase unit test code coverage from 56% to 62%
You can go many different ways. I've helped engineering teams do this well, starting with deciding what makes sense to improve and getting buy-in, through defining the OKRs, building the system of measuring it, and most importantly, driving the OKR every week.
In the case you cited, with a bunch of hacks, I'd approach it like this:
- Create a OKRs like "Reduce tech debt".
- One of the key results would be "Identify 50 hacky places in code, and create Jira tickets for them" or something similar, by Jan 31st."
- 2nd OKR would be "Refactor XX out of the 50 hacky places identified by Jira tickets, by March 31st"
- Take whatever it is you want to do and break it down into N jira tickets
- Make an OKR saying “solve these N jira tickets by date X”, with the result indicator being “number of those particular jira tickets solved”
- At the end, your OKR percentage is some fraction of N
This works regardless of what the thing you’re trying to do is. It goes against the spirit of OKR’s which is to use metrics that matter to the business (number of users onboarded, page load time, conversion percentage, etc) to justify work. That’s what the “results” in OKR’s are supposed to mean.
Correct. Breaking something into N tickets is one way of approaching OKRs.
It does not go against the spirit of OKRs. Reducing tech debt and making a metric out of the number if Jira tickets can work, and is a workable approach if there is business value from reducing tech debt. If you can align it to "reduce page load time", why would you not use it? Don't conflate business value with how you measure things. OKRs should align to business value. OKRs should be measurable. You can have things aligned to business value that are harder to measure. You can have measurable things which provide little business value.
There is no rule that says that you can not measure the number of tasks that get completed as part of an OKR. It's true that the smaller N gets the less sense it makes, and that N=1 is a binary goal. OKRs are better for larger N numbers, as those show progress better. Going from memory, "Measure What Matters", the OKR bible, has examples of OKRs where the goal
Nothing is stopping you from using OKRs for small N. But I have seen people come up with all sorts of excuses why "it won't work" so your milage may vary. My suggestion is always "try it fullheartedly before you knock it."
The generalization won't work once N is large, or is continuous, or does not make sense as separate Jira tickets. Luckily, it does not need to and you can track such metrics without the help of a ticketing system.
Examples that won't work as Jira tickets but can be good OKRs, if they align to a business goal:
- improve the Core Web Vitals cumulative layout shift (CLS) by 0.3 points. (can align to "reduce bounce rate" as CLS affects the perceived load time and quality)
- increase test coverage by 15% (can align to "reduce churn", if churn is caused by poor product quality, and test coverage can improve quality)
Key results can be more granular and less granular. Think of it as a continuum. Some are continuous:
- Improve the time to first byte for the homepage by 500ms.
This one is continuous, because time is continuous. Realistically it will be quantized into milliseconds, but that's nitpicking.
You can get less granular:
- solve 500 linting errors
This one is kind of continuous, but there are 500 distinct steps. Each week it is feasible that you can solve a handful, and can see movement and improvement.
- add 5 unit tests to XYZ module
Now we are getting much less granular. It is a checklist of 5 TODOs. But you can track the progress. Its unlikely each week you will make an improvement, but on some weeks you need to if you want to hit "5" by the end of the quarter.
- hire a new DevOps engineer
This is a binary, checkbox, or hit-or-miss key result. Sometimes they makes sense. It's not great if they make up the majority of the key results on an OKR. The good news is that you can make it more granular. Create a plan for hiring an engineer. Break out the steps, assign a percentage to each step, and track it as a 0-100% key result. This way, as you write a job description, post it, create a pipeline, review resumes, and hold interviews, you can track and share the progress.
When my previous company introduced them I pointed out that despite several studies, not one could conclude that OKRs had any positive effect and that the only famous "success" case, Google, didn't prove at all that Google was successful due to them but arguably, despite them.
Let's just say that I didn't won any popularity awards there for being the rational one.
Ever hear someone complain about big company bureaucracy? The bigger an organization gets, the more impossible it gets to manage it and keep people aligned towards effective and compounding goals. OKRs promise to help improve that impossible process. I'm not surprised at all that people are interested.
A lot of people are mistaking my explanation for why people are interested in OKRs for an endorsement of OKRs. I think having goals is good, understanding how they type into the broader company mission is good, but being able to adapt is also good. OKRs:goal-setting::Scrum:agile
Kinda disagree. All the times I have seen OKRs implemented has been as part of a "transition to agile" with several other measures, and often what happens is they get slapped on top or replace other measures, without doing anything real for fixing company bureaucracy. You need much more fundamental changes to fix that, and that would involve pissing off a lot of people with a lot of power.
OKRs can in theory make sense but they create a new kind of tension. Because they compete with the BAU work, the niggly things that come up mid quarter that you need to get done. Individuals need to resolve the dilemma, do what has to be done, or do what makes me look good.
I think a new thing is needed at least for smaller teams. Something like adaptive goals. Roadmap a year, vaguely and plan the next 6 weeks. Measure stuff where it makes sense but not everything needs a measure (or things that don’t might be 0 or 1). Plan based on velocity that takes into account that you wont have planned everything.
Measures are useful but also bullshit. There is no correlation between the measure and business success without intelligence. Even revenue is not a measure of success (I could sell half price bitcoins and have a lot of revenue!)
The existence of bad measures doesn't preclude the possibility of good ones, even though they won't be perfect. BAU activities, e.g. devops, can be measured in useful ways[0] as well.
Oh yes. But intelligence needs to be applied. OKRs turn measures into something like a sport, like soccer where those measure become the goal (beat the KR and get a bonus, or even just pressure on the KR and nothing else).
No, I don't think that's true. People might do that, but they also do that to things that aren't OKRs. It's great to critique OKRs, but not by comparing them to a hypothetical perfect world.
BAU work should be covered in resource planning, not OKRs. The amount of time available to work on KRs should never be 100% -- it should be in the range of 60-70% to account for overhead, trainings, vacations, and unexpected issues that must be addressed.
To me its not OKRs specifically but organisations in general - what is an organisation and how does it actually work (as opposed to what we tell ourselves), and that includes the frameworks and processes we use.
OKRs are interesting then because they are a specific framework that maybe, or maybe not works. But people getting obsessed about OKRs in particular and making them "fun" are completely missing the bigger picture and the purpose of metrics, as tools of control and stuff...
Feels like people start by wanting to do good work, then they need to invent an objective retroactively. Add external accountability and meetings, et voilà!
They're hard to get right though. Its easy to add too much process, make planning too much of a burden, or not regularly check-in.
One concept that's helped lubricate things is "task-relevant maturity", which I first heard about in Andy Grove's _High Output Management_. It's a gross phrase but essentially means to that people who could accomplish a goal in their sleep need less help than someone who is facing something new. Accordingly, I've cut more slack to the former when laying out OKRs.
Its hard to overstate how valuable that is. Senior engineers who've been in the same space for eons chafe at having to do the same pedantic things as a junior engineer, and rightly so--they've seen a million managers and fads come and go.
To the point of the article I _really_ like the concept, but I'm wary of demanding another step for fear that it won't take. Usually I try something out by myself but keep it optional for a year or two---or forever.