Hacker News new | past | comments | ask | show | jobs | submit login
The best bug tracking system. Don’t raise bugs, write an automated test (makinggoodsoftware.com)
37 points by fogus on Feb 23, 2010 | hide | past | favorite | 41 comments



Here are 4 real-world bug reports, in their entirety. Anybody care to rewrite them as automated test cases for me?

  * "Try it Now" button sometimes drops below the "Learn More" button in FireFox

  * Capitalization inconsistent in top nav

  * Popup blockers occasionally stop the App window from opening properly

  * Wierd timeout errors clumped around 1:00 am every few nights

  * Accounting needs to be able to call up weekly income reports
In my experience, it's actually pretty rare to find a bug that can be wrapped in a test case without actually discovering and fixing it in the process. Most things that end up in the bug tracker are either cosmetic, wierd "solar activity" things, or disguised feature requests.


Whenever the unit testing comes up, people always cite GUI issues as untestable. But it should be pointed out that in Fred Brook's usage of the terms, this is an accidental problem, not an essential one. GUIs are hard to test not because GUIs are hard to test, but because your GUI doesn't contain provisions for being tested.

Nothing stops a GUI from being very queryable. Nothing stops you from being able to query the locations of two widgets and making assertions based on those locations. Nothing stops you from getting text back out, or verifying fonts, or verifying the lack of overlap, or lack of popups... except that GUIs are written to be opaque monoliths, graveyards of data. This is correctable, but not by the end programmer.

So it is true that GUIs are untestable, but this doesn't have to be true. But it's going to take a conscious effort by QT or GTK or someone to make their GUIs testable before anything will change.

(I acknowledge your other issues; don't agree with all of them but the GUI issue is what I'm passionate about.)


As a Rails developer doing a fair amount of TDD and some BDD, I have a first-hand view of the state of the art of view testing for the web. Tools like webrat, selenium and cucumber are compelling because they hold the promise of end to end testing.

I'm really glad that people are working on solving these hard problems, but, I'm also a designer and usability guy, and I can tell you that these tools are still very very crude. You know the sayings "when all you have is a hammer everything looks like a nail", and "use the right tool for the right job"? Well, to developers, code is the hammer. Developers (especially consultants) would like to believe that if they write a cuke test and it passes, then their job is done. But the fact of the matter is that test code is not free to write or even to run, and the amount of things that a QA person is actually testing when they click through an application is several orders of magnitude more coverage than an state-of-the-art acceptance test gives you (this may change eventually, but it's an AI-level problem).

I see a lot of value for the initial acceptance tests that verify basic flows, but they are just one tool in your belt. Usability testing and QA provide orthogonal human perspectives that provide more bang for your buck then increasingly fine-grained automated tests. Think of it like this, automated tests take your product from complete failure to mediocre, but design, usability testing and QA can take it from mediocre to great.

I've seen brilliant programmers get obsessed with "all green" to the point that they do all their work without firing up a browser, and miss dozens of terrible usability flaws that would be immediately obvious the first time you click through.

None of this is in any way disparaging of GUI testing per se, but just that it has limitations which I've seen ignored for the sake of code fetishism.


Not sure if it's any different or better, but Project Sikuli (http://groups.csail.mit.edu/uid/sikuli/), an api for scripting visual actions in guis could be useful for testing layout and interaction.

Here's an example of using it for unit testing a gui: http://sikuli.org/documentation.shtml#examples/TestJEdit.sik...


The way I see it is that the code to test if a GUI element is properly formed is basically the same code that would just properly form the element in the first place. So by automatically testing the GUI you are basically just repeating yourself over and over.

Not to mention that when you want to improve the usability by rearranging things, it will be twice as hard to do if you need to change both the code and the tests.

Unless you have to render something pixel-perfect, it is probably not worth it to write automated tests. A human can look at your screens and verify that they aren't (or are) completely crap fairly quickly.

Ultimately humans will be the ones finding bugs in your software, and when they find UI layout bugs it should be immediately obvious by looking at the code what is wrong. If it isn't then you might need to separate UI from the rest of the gunk better.


You are right of course, but nobody said that GUIs weren't testable. Just that some bugs cannot be expressed as tests.


To do be basic GUI testing, you don't actually need to test aesthetics, just some basic morphological/topological perperties like "is this button touching that button", "is this window visible?" etc.

But this is not likely to happen either.

Part of the problem with making a windowing system query-able is that your talking about a number of distinct components that each separately paint themselves onto the screen. The screen's properties vary by system's configuration, the graphics driver, and the size of the overall window being painted. Sure, all of those components could be rewritten so at least relationships like "above" and "beside" and "touching" could be queried. But this stack involves hardware vendors, OS creators and software tool vendors. Moreover, hardware vendors and OS creators wouldn't want add anything that got in the way of performance since that's their main claim to fame.


> So it is true that GUIs are untestable, but this doesn't have to be true.

I respectfully disagree. You can't codify aesthetics or usability, and even if you could, the return on investment would be abysmal.


"Aesthetics" or "usability" as a whole? Sure, of course not. No more than you can codify "works correctly" as a whole.

Codify huge swathes of it, though? Absolutely. Assert that correct fonts and font sizes are used. Assert that color schemes are correctly followed, or that certain widgets always have attached desirable behaviors, assert that all widgets have keyboard shortcuts, assert that those keyboard shortcuts are coherent (CTRL-F is not assigned to three actions), assert that using the keyboard shortcut has the same behavior as whatever it is the shortcut to, assert that automated validation of appropriate accessibility guidelines hold (ie, for web interfaces assert that a Bobby run over your interface holds no surprises), and so on.

Sure, you can't "automate usability" but if you just give up at that point you'll miss out on a lot of things you can automate... or could automate if GUIs let you, getting back to my original point.


All of the examples you give are functionality, not how the thing actually looks when it renders on the screen.

Is there some way to write a test that determines whether my page renders correctly on Firefox, Chrome, Safari, IE 6, IE 7, IE 8, and Opera? If so, please show me the way.

In web development, these sorts of things end up being a large percentage of bugs filed in FogBugz, and there's no way to test for them but to have someone look at the page and make sure.


We had manual test case for "page looks correct" test. It consists of more than 10 checks, i.e. "no JavaScript errors", "no miss-spelled words", "consistent look across pages", "no broken links", etc.

More than half of that list is already automated.


Yes, bugs still need to be reported, and not all of them yield to automated tests. Nevertheless I do concur with the spirit of the article. I write very demanding test cases, often before anything goes wrong but certainly after it does, even for non-deterministic code (e.g. I'll fork a thousand processes in an automated test and invoke random busy loops here and there to encourage contention.)

I don't do a lot of GUI though, other than HTML interfaces, which are fairly amenable to some automated testing. One time I broke a sign-up form due to a misnamed field, so now I have an automated test for that.

For things like "Capitalization inconsistent in top nav", that's a prime example of why you still need to keep track of bugs. Even if you're working alone, just keep a "TODO" list. You can keep the TODOs in a note-taking program such as Tomboy, but better yet, put them right in the code. Then don't release until "grep -r TODO" comes up clean. (If you want to postpone a TODO change it to LATER.)

It might be difficult to create a regression test for that one, and it probably wouldn't be cost effective, but I suppose could have a snippet of code that iterated over the list of navigation items and tested the case. Again, probably not worth it. Better just to stomp that TODO and move on.

I can't say I have the silver bullet for the examples you list, but the important thing is to keep manufacturing better bullets with higher silver content. I figure I gotta try at least.


different programming domains have different problems...

wrt this list:

  * if GUIs were easier to write tests for, then you wouldn't have problems writing tests for the first 3 on your list.

  * timing issues and edge cases are one of the reasons i value tests, assertions and strong design so much: debugging is hard.

  * the last one sounds like a feature that hasn't been written, not a bug.


timing issues and edge cases are one of the reasons i value tests, assertions and strong design so much: debugging is hard.

It always seems like when you scratch the TDD surface, "writing tests first lets you write bug-free software" becomes "tests are an important tool along with good design, assertions, etc". I agree with the later but I don't see how it's different earlier methodologies - and earlier methodologies certainly didn't let you write bug-free code.


Check out Selenium.


> * "Try it Now" button sometimes drops below the "Learn More" button in FireFox

Use JavaScript or Selenium for that Test case. Get coordinates of buttons (getBoundingClientRect()) and compare them. Execute this test case on tons of different configurations.

> * Capitalization inconsistent in top nav

Easy to test. Also add spell checking along the way.

> Popup blockers occasionally stop the App window from opening properly

Selenium?

> Wierd timeout errors clumped around 1:00 am every few nights

You always can shift clock automatically (on Linux).

> Accounting needs to be able to call up weekly income reports

If it is missing feature, then no test case is need until you will start work on it.

PS. If you want to automate your manual testing, you will look for ways to do that. If not, you will look for problems, which are prevents you from doing that.

I wrote example how to execute HTMLUnit tests at build stage because I want that. Almost everybody else says that this is not possible. ( http://vlisivka.pp.ua/en/integration_testing_with_maven_test... )


> Wierd timeout errors clumped around 1:00 am every few nights

You always can shift clock automatically (on Linux).

Obviously a problem of this sort is not due to the mere numerical value of the time but rather some daily scheduled process that runs around 1am (perhaps a database backup) which interferes with the web app in some way. This sort of thing can be simulated but it's a non-trivial task and it's probably better to apply those efforts in other ways (by diagnosing the problem, putting in place a performance monitoring system, buying more hardware, etc.)


Typical QA cycle:

  * find problem;
  * identify root cause of the problem;
  * write automated, semi-automated or manual test case, which will highlight problem;
  * fix problem.
If you will not use test cases at development and staging stages, then you will just use your production environment for testing purposes. If your downtime cost you nothing - don't bother with test cases.

In our case, we had similar problem, so I created very simple test case in shell in less than 2 hours. We spent more than year until it was fully fixed.


And of course, the testing for any bug can be easily automated. Not!

Look, some things are very easy to test in unit tests: Does sin(180) return the right value? If I do operation X on supposedly const object O, does O have the same value before and after?

And some things are incredibly hard: Does this MP3 sound right after compression? Does the typography for this combination of letters look right? Is the user interface responsive enough? What about the crash that only occurs in just the right hard-to-duplicate circumstances?

Why is it that these hard core test advocates always seem to assume that all bugs are of the former sort?


> Why is it that these hard core test advocates always seem to assume that all bugs are of the former sort?

It's probably for the same reason that evangelists for certain programming styles will dogmatically insist that you shouldn't have more than X lines of code or Y levels of nesting in your functions, without reference to things like the nature of the programming language or the essential complexity of the algorithm to be implemented. Inexperienced people tend to overgeneralise from their own experience, and those who have only ever worked in one small part of the programming world are the most disadvantaged of all.


There are lots of benefits from assuming you can do automated testing on code you write,

Testing makes you think about the specification, what it is you are actually writing and what makes it working.

It makes you thing about writing testable code, which means you worry about things like side effects, isolation and api's

and some times, it helps you end up with automated tests :)

testing doesnt just apply to mathematical notation, lots of real world applications suddenly become easily testable when you look at them with the eyes of a tester


> Does this MP3 sound right after compression?

A quick check for this would be to have a bunch of wav files that demonstrate different properties that the encoder is supposed to compress on, run them through the encoder, and then read them back out through a standard decompressor, and then verify that the waveforms match within a tolerance. With any release, you'll want to do some manual testing, but you can gain higher confidence on small changes between milestones.

> Does the typography for this combination of letters look right?

Same sort of thing, generate a bitmap from the output, compare against a pre-approved sample using a heatmap to determine differences.

> Is the user interface responsive enough?

Define what responsive means in quantitative terms. For most interactions, you're going to want sub 200 ms. For stuff where you want the user to wait, you'll have to define wait times ahead of time. Maybe you make changes that cause the latter to break because of a significant change in what you're doing, but it's good for people to know about that ahead of time, right?

> What about the crash that only occurs in just the right hard-to-duplicate circumstances?

Hey, nothing's perfect. But this objection is tautological. It's only hard to duplicate because you don't understand it yet.


"verify that the waveforms match within a tolerance" The problem is, that is not easy. Determining how close two pieces of audio sound is the majority of the lossy audio compression problem. You can of course check if you're output is bitwise identical, but when it isn't, you need ABX testing. With people.


Yes, it looks like manual test.

BUT, when QA team found problem, you can create automated test case, which will look for that particular problem only. Right?

For example, QA noticed "clicks" every few seconds in resulting sound. Is it hard to create automated test case for that?


I'm not sure how hard it would be to create a reliable test for clicks. Depends on the type of click, I suppose. Some would be fairly easy to detect (e.g., "for .1s, all samples output are 0, with loud samples on both sides"). Though I'm guessing that would actually result in plenty of false-positives, and would be a fairly carefully tuned (and thus fragile) test case.

A better approach might be to detect it in the frequency domain, after performing an FFT (that instant drop to 0 will generate a lot of energy on both sides). I suspect you'll still need the careful tuning; after all, a sudden burst of energy on your FFT could be a click, or it could be a cymbal.

Not sure how well this would work, I've never tried it, though it sounds like some fun code to write.


Why is it that these hard core test advocates always seem to assume that all bugs are of the former sort?

As someone who's more and more becoming a "hard core test advocate": not all bugs are the former sort! But why would that stop you from testing the ones that are?


For the record, I've written 18,000+ test cases in the last year. When you can write reasonable tests, they are an awesome tool for programming. I just object strongly to the notion that any possible bug can be reduced to an automated test. That's not my experience at all.


This wouldn't work for:

* Public bug trackers

* GUI applications, where writing automated tests is difficult or infeasible. (how would you write a test case for "icon is upside down"?)

* Abstract bugs, such as performance-related bugs or feature requests

* Nondeterministic or otherwise unusual bugs

I'd love to live in a world where all bugs were easily reproducible and even a non-programmer could write a test case. A quick glance over my issue tracker reveals this is not the case.


Raising the cost for reporting bugs might people want to avoid reporting bugs.

I found I hardly every used a bug tracker I had installed on a slow server, because it was so slow. So I don't think making bug submission even slower would be a good thing.


It also raises the literal dollar cost for QA people. If you have a large QA operation, you're probably paying semi-skilled non-programmers. If you have to get QA people who can code enough to write a test, they'd cost a lot more.


Yea, because all testers are capable of writing software tests and because all bugs warrant the effort to write a test...

Not sure what this is doing on HN, really.


Well, while I certainly agree that this is perhaps a way off in terms of technical capability, that won't always be the case perhaps. I'm not talking about everyone learning to code, that's dead in the water.

But systems which allow business people to define their own acceptance and test criteria exist now - tools like Fitnesse, etc. I'm not saying this is feasible today, but a future can be envisaged where writing a repeatable test which determines whether a function works as desired is within the domain of less technical users.


If you let people without a development background write tests it's easy for them to say something to the effect of (X > 7) and (X < 4). More generally insuring that there are no conflicts in your requirements is a very hard problem.


"But systems which allow business people to define their own acceptance and test criteria exist now - tools like Fitness."

Out of the mouth of a business person, wonderful. Out of the mouth of a developer, worthless.


Communication problems: Sometimes the description of the bug is poor, or the developer misunderstands the problem.

-> Maybe more effort should be placed on improving communication skills for your company? This will greatly improve many other aspects of your business - not just bug logging.

Regression testing. This process requires the handover of code back and forward between QA and development, this causes versioning problems and duplicated testing effort.

-> why? Maybe more effort should be placed on improving communication skills...

Low robustness. The process doesn’t guarantee that the error will appear again in the future.

-> Neither would buggy automated tests - or whom do you need to hire to test the tester?

Bureaucracy. Traditional bug tracking systems switches the team focus to bureaucracy from quality

-> If you care about your product quality and your other team members development goals it's not bureaucracy it is quality. Can't see past mistake occurrences, thus don't see the system as meaningful. Rather the focus should be to take the current system seriously and work on improving it.


Reporting bugs is already tedious. Sometimes more (eg when requiring an account), sometimes less. Asking for a test case would surely reduce the number of "bug reports" to abysmal figures.


Most synchronization issues in multi threaded applications can't be automated. Also there are bugs triggered by a specific state of the application that is complex to encode in a test.


[deleted]


...and how do you ensure that the unit tests are correct? Another layer of automated tests? Accept you will make mistakes, but code in a way that makes finding and fixing them easy.

The first way is to make the test logic self-evident so its easy to audit. Another approach is to use something like Ruby's Heckle tool which mutates code. If the code can be modified and the tests still pass, then the tests are incomplete or incorrect. Perhaps other languages/platforms have something similar.

...I'd like to see the unit test that checks things look ok Internet Explorer

While testing aesthetics is pretty much impossible, Selenium can test the functionality in multiple browsers.

or that the system generally does what the client pays for...

If you haven't reduced the client's requests to a useable specification you can write tests from, then that's your fault. Computers can't do your job for you.


my company is developing web applications that are used by companies to optimize internal processes. i recently found out about selenium and this tool is just great. i wasn't aware that it is possible to write tests for even complex and heavily ajax driven web interfaces. roxxx. don't get me wrong, it is indeed a lot of work to write those tests. but once written they'll save your time and more important, your life. give it a try.


Bug Trackers also play an important prioritisation role - a central location to decide what work is most important right now.

If we wrote only unit tests, pretty soon someone would hack up a system to take the names of the tests, and put them in a list. It would be called testzilla.


If you are developer and doing TDD or some sort of automated testing (which you should!) and if you are not writing automated tests for bugs, you are doing it wrong anyway!

If the bug is GUI inconsistency, looks crap on my mom's computer sort of bug, then you can't automate anyway. Assuming you testers' mom haven't got serial ports.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: