The article cites awesome stats about the effectiveness of showing the results of static analysis at "diff time" (in code review) vs. outside the dev workflow.
> For classes of bugs intended for all or a wide variety of engineers on a given platform, we have gravitated toward a "diff time" deployment, where analyzers participate as bots in code review, making automatic comments when an engineer submits a code modification. Later, we recount a striking situation where the diff time deployment saw a 70% fix rate, where a more traditional "offline" or "batch" deployment (where bug lists are presented to engineers, outside their workflow) saw a 0% fix rate.
> ...
> The response was stunning: we were greeted by near silence. We assigned 20-30 issues to developers, and almost none of them were acted on. We had worked hard to get the false positive rate down to what we thought was less than 20%, and yet the fix rate—the proportion of reported issues that developers resolved—was near zero.
> Next, we switched Infer on at diff time. The response of engineers was just as stunning: the fix rate rocketed to over 70%. The same program analysis, with same false positive rate, had much greater impact when deployed at diff time.
I am the Sourcegraph CEO, and we've long been aware of tools like this at Facebook, and Google's similar tool (described in another ACM article, https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...). We are working on bringing static analysis into the dev workflow, and letting you run campaigns to apply static analysis fixes across your entire codebase. Check out https://about.sourcegraph.com/product/automation/ for info and screencasts to see what this looks like.
I've read something very similar before (also facebook) and one thing disturbed me, and it's given in your quote, the difference in bugfix uptakes by devs depending on how it's presented. I have to say that shocked me as it appears so unprofessional (yes, I read the reasons but still).
The root of that shock was the human factor. Us devs aren't machine parts, but our essential job is to do stuff and fix the inevitable bugs. The fact that presentation of bug reports could change things so radically was an eye opener, and one I'll generalise and remember.
Then again I've never worked on huge codebases, and I assume the devs there were top engineers so I'm not attributing this to laziness or indifference - but still, amazing and not in a positive way! I'd just assumed static analysis = bugs found = bugs fixed = better code; end of story.
if someone comes to you with a bazillion issues that you cannot possibly fix in the X hours you have during the day what are you going to do? you're going to prioritize and you're going to fix the most important bugs, right?
if you're developing a new feature and things are pointed out during the code review (bots or no bots) what are you going to do? fix it, right? you want to ship whatever you're doing with high quality.
so the problem outlined below is a false one. you can never fix all the possible bugs if the codebase if nothing but trivial.
> you're going to prioritize and you're going to fix the most important bugs, right?
That doesn't make sense. Presented one way, the bugs were fixed to a 70% rate. Presented another way, 0%.
Presumably the bugs in each case were about equivalent in their severity (else the conclusion is meaningless) so severity was not the driving factor in what got fixed.
> you can never fix all the possible bugs if the codebase if nothing but trivial.
That's a non-sequitur. No-one suggested they could fix all, or should try. It was about known bugs presented to them.
It makes perfect sense. If you file a bug assigned to me that doesn't seem to actually affect users or isn't aligned with the work I am already committed too, it goes in the backlog and I may or may not ever get to it. If it's a bot that generated the bug, it comes after user reports, and I may never get to all of those.
If it's a comment on a code review for code I am already working on, then of course I want my new or changed code to be up to date with standards and avoid error prone patterns, etc.
@mirceal, @wffurr, I see what you're saying and you're both right.
However that does leave a major question for me; if devs are triaging their own bugs to this extent (70% vs 0% fix), is this safe, because those bugs will relate ultimately to business requirements related to uptime and correctness.
For the most part at Facebook, ongoing maintenance and business outcomes are the responsibility of the team that originally wrote and deployed the feature. The idea is to give the team (and ideally the individual developer) as much agency as possible as they are the closest to the actual problems.
developers don’t triage their own bugs (mostly) but the difference is that one issue is implicit at PR time while the other is explicit (actual
bugs filing - this means that apart from fixing they occur the overhead of actually having Jira/whatever items attached to them, being discussed and prioritized in backlog grooming, being preempted by higer priority feature work or bugs, you name it. what was a 5 minutes fix now eats hours. with 0 recognition and perceived as cleanup that might not be needed at all)
We are seeing similar with SonarQube[1]. Putting it in the CI pipeline encourages iterative progress. It's much less overwhelming than a periodic batch scan that outputs a ton of mixed positive/false-positive issues.
It's not perfect, but the smaller scale reporting is easier to deal with, fix, add suppression rules, etc. Sounds obvious, I suppose.
> The response was stunning: we were greeted by near silence. We assigned 20-30 issues to developers, and almost none of them were acted on. We had worked hard to get the false positive rate down to what we thought was less than 20%, and yet the fix rate—the proportion of reported issues that developers resolved—was near zero.
Duh? If I am assigned a low priority bug, I will put it on the bottom of my todo list. My manager has the ability to reprioritize items on my todo list, but that doesn’t seem like that was done in this case. If you want people to fix your low priority bugs, just filing them and praying isn’t going to work.
I'm not familiar with Infer or Zoncolan, but I wonder what percentage of issues these tools detect is already covered by IntelliJ IDEA "inspections". Inspections consist of a configurable series of static analyzers each targeting a specific case. IJ runs inspections concurrently in the editor, flagging problems as they occur. What's more, many inspections provide "quick fixes" -- they can fix the issue at hand for you with a simple keystroke. Inspections can also run in batch and can bulk-apply quick fixes.
The obvious advantage inspections have over diff-time tooling is that detectable issues are often fixed immediately when a developer is most familiar with the code in context (edit-time), which translates to improved productivity. Additionally since most inspections provide quick fixes, they not only save time, but also provide code consistency regarding how issues are resolved.
I realize FB uses languages other than Java, so I wonder if their static analysis investments would be better served focused more on building/buying better IDE tooling?
I highly doubt FB is trying to save money on their IDE budget. Zoncolan and Infer can detect many more bugs and security issues outside the capabilities of the IDE. Also, even if one dev is using Intellij, the global configuration of the system in the article can enforce consistent standards across the organization.
You misunderstand. It's not about the cost of the IDE but rather the cost of maintaining software and preventing defects over time. The takeaway from my comment: Toward a general policy if static analysis were built into the editor, as it is with IntelliJ, it would significantly improve productivity and guarantee a higher rate of defect prevention.
> For classes of bugs intended for all or a wide variety of engineers on a given platform, we have gravitated toward a "diff time" deployment, where analyzers participate as bots in code review, making automatic comments when an engineer submits a code modification. Later, we recount a striking situation where the diff time deployment saw a 70% fix rate, where a more traditional "offline" or "batch" deployment (where bug lists are presented to engineers, outside their workflow) saw a 0% fix rate.
> ...
> The response was stunning: we were greeted by near silence. We assigned 20-30 issues to developers, and almost none of them were acted on. We had worked hard to get the false positive rate down to what we thought was less than 20%, and yet the fix rate—the proportion of reported issues that developers resolved—was near zero.
> Next, we switched Infer on at diff time. The response of engineers was just as stunning: the fix rate rocketed to over 70%. The same program analysis, with same false positive rate, had much greater impact when deployed at diff time.
I am the Sourcegraph CEO, and we've long been aware of tools like this at Facebook, and Google's similar tool (described in another ACM article, https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...). We are working on bringing static analysis into the dev workflow, and letting you run campaigns to apply static analysis fixes across your entire codebase. Check out https://about.sourcegraph.com/product/automation/ for info and screencasts to see what this looks like.