OKRs are a goal setting framework for non-trivial 3-12 month goals.
Some people are self motivated and well organized, are great at communicating progress proactively to other stakeholders, an understand the idea of cross departmental alignment. OKRs will not help them.
For everyone else, OKRs are a tool that can help accomplish those things.
PS I actually like OKRs and after a lot of effort, learned how to make them useful. I did not get it at first either.
The problem for me is the “key results” part of OKR’s. It means that if it’s not measurable, it’s not an OKR, and in too many organizations, if it’s not an OKR it’s not worth doing.
Cleaning up your code base to accommodate all the gradual accumulation of small fixes/hacks is not something you can put a metric on, at least not without pulling numbers out of your ass. But everyone agrees that you can’t really have quality software without doing this. But OKR’s would say that making nothing but tactical changes to ship features and never revisiting architecture is perfectly great. The incentives always seem to push you towards tech debt.
OKR proponents would say that revisiting architecture and paying down tech debt should be implicit and part of the process of achieving your results, but I’ve never seen it done. Or rather the only time I have seen it done is when someone tries to shoehorn the refactoring work into an OKR in order to justify it, making up bogus metrics, getting the OKR dropped because it’s not meaningful enough, and then just working on it anyway.
"Cleaning up your code base to accommodate all the gradual accumulation of small fixes/hacks is not something you can put a metric on" - why not? generally the need to refactor is driven by something - getting too hard to make changes? productivity down? introducing more errors then we used to? In those cases improving productivity or reducing errors is the result we're targeting, and cleaning the code is the activity we do to achieve the result.
> In those cases improving productivity or reducing errors is the result we're targeting, and cleaning the code is the activity we do to achieve the result.
Right, but please read the first sentence of my post. It’s the “key results” part that’s hard. Because you need to quantify all the benefits you’re targeting, giving them a number, so that you can show whether you completed your goal or not. If you say “productivity will go up”, you have to put a number on the current productivity, then give regular reports on what happens to that number after the refactoring. What do you pick? Number of PR’s merged per day? That’s probably going to go down, because most of the PR’s today are small tactical band-aid fixes. So do you say the number of PR’s merged should go down after the refactoring? That could just as easily be because the refactoring made things worse and everything’s so broken that nobody can make changes. So PR’s merged is a shitty metric. What else? Bugs filed? In a product where you’re growing users you’d probably expect them to go up due to increased usage, so any benefit to refactoring is likely going to be lost in the noise. Line of code count? Please.
More often than not people just pull whatever metric they want out of their ass to make the case for what they’re trying to accomplish, and cherry pick things so that it looks better after the effort. But it’s against the spirit of OKR’s to do this, which is why OKR’s are bad for anything “fuzzy” like refactoring. You have to shoe-horn work that everyone agrees is worth doing, into a framework that isn’t designed for refactoring work, to make the case.
You are hitting on an important and difficult aspect of OKRs. Getting the alignment between what you can affect (the leading indicator) and what the outcome is (business value).
It's not an exact science. You can make pro and con arguments against different things that could conceivably be measured. This is where experience and strategic thinking help.
You can always come up with a risk or a reason why a particular measurement won't affect the desired outcome. You will be more right on some and less right on others. However, throwing out the entire OKR approach because you can not be sure is not correct either.
If refactoring code doesn't lead to fewer defects down the road, or to faster feature implementation with less errors, or faster employee onboarding, or any other visible result, then maybe management doesn't want you to do it, and maybe it's reasonable to consider why.
But it’s really really really hard to quantify it in a measurable way. Which is what OKR’s force you to think about: what is the metric, what is its current value, and what is your goal for the metric, so you know whether you achieved it?
Can you quantify “faster employee onboarding”, reliably? Can you graph it over time? Can you quantify “faster feature implementation” in a way you can actually measure that isn’t sensitive to the fact that all features are different?
The key results are not a problem just for you! You are hitting the nail on the head. Check out section 6 of this study ( https://arxiv.org/pdf/2311.00236.pdf ) - defining good OKRs is problem number 1, and data issues are the 2nd most cited concern!
I wrote a piece about the common issues that people face creating OKRs. There are a few common mistakes that people make which makes key results unmeasurable: https://koliber.com/articles/top-okr-mistakes
> the gradual accumulation of small fixes/hacks is not something you can put a metric on
I've done it before. On one team, we had a goal to reduce the number of linting errors and warnings from 18,000+ by 50% (while not growing the number of INGORES). The team was reluctant at first, because "it's only linting and it does not matter." But they relented and eventually started fixing things here and there. And the number started going down, albeit slowly. And over time we got the number of linting errors down to 18 (or something close), because people found time here and there to improve things. And the team learned how to use OKRs. And they put in place a style guide and an auto-linter. And they started using it so that the errors did not come back. And there were plans in place to put in more sophisticated style analysis and run another OKR agains that.
They literally matured in the code development practices way beyond just linting, just becuase of the relentless drive on one seemingly insignificant OKR.
This is just one example. You can use OKRs with engineering metrics to improve lots of things:
- fix the top 10 Jira tickets tagged with #techdebt
- reduce linting errors by 20%
- reduce number of functions with a cyclometric complexity of 10+ by 50%
- research 5 static code analysis tools
- increase unit test code coverage from 56% to 62%
You can go many different ways. I've helped engineering teams do this well, starting with deciding what makes sense to improve and getting buy-in, through defining the OKRs, building the system of measuring it, and most importantly, driving the OKR every week.
In the case you cited, with a bunch of hacks, I'd approach it like this:
- Create a OKRs like "Reduce tech debt".
- One of the key results would be "Identify 50 hacky places in code, and create Jira tickets for them" or something similar, by Jan 31st."
- 2nd OKR would be "Refactor XX out of the 50 hacky places identified by Jira tickets, by March 31st"
- Take whatever it is you want to do and break it down into N jira tickets
- Make an OKR saying “solve these N jira tickets by date X”, with the result indicator being “number of those particular jira tickets solved”
- At the end, your OKR percentage is some fraction of N
This works regardless of what the thing you’re trying to do is. It goes against the spirit of OKR’s which is to use metrics that matter to the business (number of users onboarded, page load time, conversion percentage, etc) to justify work. That’s what the “results” in OKR’s are supposed to mean.
Correct. Breaking something into N tickets is one way of approaching OKRs.
It does not go against the spirit of OKRs. Reducing tech debt and making a metric out of the number if Jira tickets can work, and is a workable approach if there is business value from reducing tech debt. If you can align it to "reduce page load time", why would you not use it? Don't conflate business value with how you measure things. OKRs should align to business value. OKRs should be measurable. You can have things aligned to business value that are harder to measure. You can have measurable things which provide little business value.
There is no rule that says that you can not measure the number of tasks that get completed as part of an OKR. It's true that the smaller N gets the less sense it makes, and that N=1 is a binary goal. OKRs are better for larger N numbers, as those show progress better. Going from memory, "Measure What Matters", the OKR bible, has examples of OKRs where the goal
Nothing is stopping you from using OKRs for small N. But I have seen people come up with all sorts of excuses why "it won't work" so your milage may vary. My suggestion is always "try it fullheartedly before you knock it."
The generalization won't work once N is large, or is continuous, or does not make sense as separate Jira tickets. Luckily, it does not need to and you can track such metrics without the help of a ticketing system.
Examples that won't work as Jira tickets but can be good OKRs, if they align to a business goal:
- improve the Core Web Vitals cumulative layout shift (CLS) by 0.3 points. (can align to "reduce bounce rate" as CLS affects the perceived load time and quality)
- increase test coverage by 15% (can align to "reduce churn", if churn is caused by poor product quality, and test coverage can improve quality)
Key results can be more granular and less granular. Think of it as a continuum. Some are continuous:
- Improve the time to first byte for the homepage by 500ms.
This one is continuous, because time is continuous. Realistically it will be quantized into milliseconds, but that's nitpicking.
You can get less granular:
- solve 500 linting errors
This one is kind of continuous, but there are 500 distinct steps. Each week it is feasible that you can solve a handful, and can see movement and improvement.
- add 5 unit tests to XYZ module
Now we are getting much less granular. It is a checklist of 5 TODOs. But you can track the progress. Its unlikely each week you will make an improvement, but on some weeks you need to if you want to hit "5" by the end of the quarter.
- hire a new DevOps engineer
This is a binary, checkbox, or hit-or-miss key result. Sometimes they makes sense. It's not great if they make up the majority of the key results on an OKR. The good news is that you can make it more granular. Create a plan for hiring an engineer. Break out the steps, assign a percentage to each step, and track it as a 0-100% key result. This way, as you write a job description, post it, create a pipeline, review resumes, and hold interviews, you can track and share the progress.
When my previous company introduced them I pointed out that despite several studies, not one could conclude that OKRs had any positive effect and that the only famous "success" case, Google, didn't prove at all that Google was successful due to them but arguably, despite them.
Let's just say that I didn't won any popularity awards there for being the rational one.
Ever hear someone complain about big company bureaucracy? The bigger an organization gets, the more impossible it gets to manage it and keep people aligned towards effective and compounding goals. OKRs promise to help improve that impossible process. I'm not surprised at all that people are interested.
A lot of people are mistaking my explanation for why people are interested in OKRs for an endorsement of OKRs. I think having goals is good, understanding how they type into the broader company mission is good, but being able to adapt is also good. OKRs:goal-setting::Scrum:agile
Kinda disagree. All the times I have seen OKRs implemented has been as part of a "transition to agile" with several other measures, and often what happens is they get slapped on top or replace other measures, without doing anything real for fixing company bureaucracy. You need much more fundamental changes to fix that, and that would involve pissing off a lot of people with a lot of power.
OKRs can in theory make sense but they create a new kind of tension. Because they compete with the BAU work, the niggly things that come up mid quarter that you need to get done. Individuals need to resolve the dilemma, do what has to be done, or do what makes me look good.
I think a new thing is needed at least for smaller teams. Something like adaptive goals. Roadmap a year, vaguely and plan the next 6 weeks. Measure stuff where it makes sense but not everything needs a measure (or things that don’t might be 0 or 1). Plan based on velocity that takes into account that you wont have planned everything.
Measures are useful but also bullshit. There is no correlation between the measure and business success without intelligence. Even revenue is not a measure of success (I could sell half price bitcoins and have a lot of revenue!)
The existence of bad measures doesn't preclude the possibility of good ones, even though they won't be perfect. BAU activities, e.g. devops, can be measured in useful ways[0] as well.
Oh yes. But intelligence needs to be applied. OKRs turn measures into something like a sport, like soccer where those measure become the goal (beat the KR and get a bonus, or even just pressure on the KR and nothing else).
No, I don't think that's true. People might do that, but they also do that to things that aren't OKRs. It's great to critique OKRs, but not by comparing them to a hypothetical perfect world.
BAU work should be covered in resource planning, not OKRs. The amount of time available to work on KRs should never be 100% -- it should be in the range of 60-70% to account for overhead, trainings, vacations, and unexpected issues that must be addressed.
To me its not OKRs specifically but organisations in general - what is an organisation and how does it actually work (as opposed to what we tell ourselves), and that includes the frameworks and processes we use.
OKRs are interesting then because they are a specific framework that maybe, or maybe not works. But people getting obsessed about OKRs in particular and making them "fun" are completely missing the bigger picture and the purpose of metrics, as tools of control and stuff...
Feels like people start by wanting to do good work, then they need to invent an objective retroactively. Add external accountability and meetings, et voilà!