Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: FixCache – keep track of bug-prone files from Git commit history (github.com/aavshr)
40 points by aavshr on Sept 4, 2020 | hide | past | favorite | 11 comments



Looks really cool, although it's a shame the description isn't a bit more "plain language" - a simple summary of what it is intended for, before going into jargon would help people realise why it's valuable. The fact it uses a cache isn't immediately important: I doubt anyone is saying "I hope I can find a solution which uses a cache", they'll be saying eg "I'm struggling with QA, what can streamline it?".

The paper's introduction section is more useful, but I wonder if people will get that far. Likewise the repo screenshot is useful but it's way down the README page.

Anyway, these are just well intentioned comments. It's easy to lose track that potential users won't get what you've been building until you pitch it in their terms.


I agree, I will update the README to showcase it's intended use and value more.

Thanks for the input!


Hello HN, this is a github app implementation of FixCache(https://people.csail.mit.edu/hunkim/images/3/37/Papers_kim_2...) I wanted to do as a side project that might prove to be useful for pull request reviewers.

The app maintains a fix-sized cache of bug-prone files from fix-commits and updates a pull request with information about the cache if these bug-prone files have been updated in the pull request.


I read the whole README but have no idea what this does. Is it letting you know when you change files that have had a large number of bug fixes recently? With the assumption that a code location that was recently fixed is likely to have more bugs?

Wouldn't the fact that a location was fixed recently imply that it now has fewer bugs? And wouldn't a location that hasn't been touched recently be likely to be problematic?


The core assumptions of the algorithm:

- if a file introduced introduced a bug recently, it will tend to introduce bugs again - new files added with the bug introducing file will tend to introduce bugs - other files changed with the bug introducing file will tend to introduce bugs - files often changed together with the bug introducing file will tend to introduce bugs soon

fixCache maintains a fixed-size cache of these bug-prone files based on bug fix-commits. This helps in prioritizing verification and testing resources (right now it only updates a pull request with a comment and a label). If a file no longer introduces a bug, it will eventually be replaced from the cache.


Are you sure it's not discovering bad developers?

Maybe critical areas, e.g. that have the same amount of bugs as the rest but are complained about more? (since the algorithm can only consider bugs that have been reported, so biased to areas important to users) Or maybe that are prioritized by management? (since it considers fixed bugs, so bias towards bugs that were fixed first)

Hopefully an increased scrutiny on new patches to those areas leads to fewer bugs getting in which breaks the feedback loop, but if bugs are fixed in separate commit this sounds like it could have negative effects (specific developers/areas getting all the attention, leading to the discovery of more bugs/nitpicks there, reinforcing the bias...).


Based on the existence of the config option "FIX_KEY_WORDS", it seems like this detects bugs just using keyword matching? Github also tracks when issues are fixed by commits (and pull requests, I think). Is it possible to use that metadata in addition to (or instead of) keyword matching? (Does Github even make that information available through an API?)


One way to support this would be to make FixCache support an optional hook that would be responsible for deciding if a given commit was a bug fix or not. Naively, it could have an interface like "IsBugfix(commit SHA1Hash) bool". The hook could be implemented as a function call to a used-defined plugin or something, or perhaps by executing an external process (e.g. shell exec my_custom_isbugfix.sh abcd1234")

Then users could write a custom hook that looked up the given commit using github's APIs or whatever other crazy scheme your team uses for bug tracking (e.g. cross reference JIRA ticket number baked into commit message with JIRA and look at the type of the JIRA ticket), but FixCache itself could be kept clean and pure from these integrations, which most users wouldn't want.

Github has lots of APIs, I'd bet it is possible to do this with github provided the data defining the relationship between the commit and the bug is encoded somehow -- either in commit message or in github issue or PR metadata or comment text.


The bugs are detected only from keyword matching in bug fix commits. The "FIX_KEY_WORDS" are fix key words in a fix-commit.

You set up a branch to track ("TRACKED_BRANCH") commits on (its 'master' by default). A history of commits, from last 30 days by default, is extracted from that branch on installation and bug fixes are detected from commit messages.

After installation, the app subscribes to push events (pull request merges are also push events) on the "tracked_branch" .On every push to the tracked branch, all the commits in the push are checked for bug-fix commits and the file cache is updated accordingly if there are any bug-fix commits.


Neato. How is "spatial locality" defined? Just distance in the filesystem tree?


With logical coupling, if two entities are changed together many times, they get a shorter distance.

distance(e1, e2) = 1/c(e1, e2) if c(e1, e2) > 0 otherwise infinity. c(e1, e2) is the count of times e1 and e2 have been changed together.

I have not implemented the spatial entity in v0, as it is a bit tricky to identify the exact file that introduced a bug from a fix commit. For now, only the files modified in a bug-fix commit are put in the cache.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: