Creative Chaos: When should a test run unattended?

Thursday, October 30, 2008

When should a test run unattended? - III

First off, I've revised the title of of the series. I'm all for automating work that can be described and /precisely/ evaluated.

For example, let's say you have a PowerOf function. To test it, you could write a harness that takes input from the keyboard and prints the results, or you could write something like this:

is(PowerOf(2,1),1, "Two to the first is two");
is(PowerOf(2,2),4, "Two to the second is four");
is(PowerOf(3,3),27, "Three to the third is twenty-seven");
is(PowerOf(2,-1,undef,"PowerOf doesn't handle negative exponents yet");
is(PowerOf(2,2.5,undef,"PowerOf doesn't handle fractional exponents yet");

And so on.

When you add fractional or negative exponents, you can add new tests and re-run all the old tests, in order.

That is to say, this test can now run unattended and it will be very similar to what you would manually. Not completely - because if the powerOf function takes 30 seconds to calculate the answer, which is unacceptable, it will still eventually "Green Bar" - but hopefully, when you run it by hand, you notice this problem. (And if you are concerned about speed, you could wrap the tests in timer-based tests.)

Enter The GUI

As soon as we start talking about interactive screens, the number of things the human eye evaluates goes up. Wayyy up. Which brings us back to the keyword or screen capture problem - either the software will only look for problems I specify, or it will look for everything.

Let's talk about a real bug in the field

The environment: Software as a service web-based application that supports IE6, IE7, Firefox 2, Firefox 3, and Safari. To find examples, I searched bugzilla for "IE6 transparent", where we've had a few recently. (I do not mean to pick on IE; I could have searched for Safari or FF and got a similar list.) That does bring up an interesting problem: Most of the bugs below looked just fine in other browsers.

Here are some snippets from actual bug reports.

1) To reproduce, just go to IE6 and resize your browser window to take up about half your screen. Then log into dashboard, and see "(redacted element name)" appear too low and extra whitespace in some widget frames.

2) Page includes shows missing image in place of "Edit" button in IE6 and IE7

3) In IE6 only, upload light box shows up partly hidden when browser is not maximized.

4) In IE6 and IE7, comment's editor has long vertical and horizontal scroll bar.

5) In IE6 at editor UI, there is a thick blue spaces between the buttons and rest of the editor tools

6) To reproduce, in IE6, create some (redacted), then check out the left-most tab of (redacted 2). The icons for type of even are not the same background color as the widget itself. (see attachment)

All of these bugs were caught by actual testers prior to ship. I do not think it is reasonable to expect these tests to be automated unless you were doing record/playback testing. Now, if you were doing record/playback testing, you'd have to run the tests manually first, in every browser combination, and they'd fail, so you'd have to run them again and again until the entire sub-section of the application passed. Then you'd have a very brittle test that worked under one browser and one operating system.

That leaves writing the test after the fact, and, again, you'll get no help from keyword-driven frameworks like Selenium - "Whitepace is less than a half and inch between elements X and Y" simply isn't built into the tool, and the effort to add it would be prohibitive. If you wanted to write automated tests after the bugs were found, you'd have to use a traditional record/playback tool and now have two sets of tests.

That brings up a third option - slideshow tests that are watched by a human being, or that record periodic screen captures that a human can compare, side-by-side, with yesterday's run. We do this every iteration at Socialtext to good effect, but those tests aren't run /unattended/. Thus I change the name of this series.

I should also add, that problems like "too much whitespace" or "a button is missing but there is a big red X you can push" are fundamentally different from a crash or timeout. So if you have a big application to test, it might be a perfectly reasonable strategy to make hundreds of thousands of keyword-driven tests that make sure the basic happy-path of the application returns correct results (of the results you can think of when you write the tests.)

To Recap

We have discussed unit and developer-facing test automation along with three different GUI-test driving strategies. We found that the GUI-driving, unattended strategies are really only good for regression - making sure what worked yesterday still works today. I've covered some pros and cons for each, and found a half-dozen real bugs from the field that we wouldn't reasonably expect these tests to cover.

This brings up a question: What percentage of bugs are in this category, and how bad are they, and how often do we have regressions, anyway?

More to come.

7 comments:

Lisa said...: I like the question "when should a test run unattended", that is a good specific category of tests.

I'm a bit confused on your conclusion. Are you saying unattended tests are only useful for regression, but your regression tests aren't catching many bugs?; 9:33 AM
Matthew said...: hello Lisa. I'm not coming to conclusions yet. I'm saying the unattended strategies are most valuable for regression, and asking the question - how many bugs are regressions? For that, I'll need to do some empirical research from a bug database (and have permission for it), ideally several.; 10:27 AM
Lisa said...: I wish we kept better track of our regression metrics that are caught by the unattended tests. I would guess on average, 5 per iteration. They can be very important bugs that I know for sure we would not have found any other way. Also, major refactorings can trigger dozens of failures, so refactoring support is a big value of the tests.; 10:32 AM
Mr. Owens said...: I'd be very interested to hear more about the slideshow tests you use. We have been talking about this exact problem here at weplay (it is actually on my to do list to post to the agile-tester list about this topic). At times we have problems with visual regressions and are seeking better coverage. While doing exploratory testing it is not uncommon to come across something that has gone out of wack in IE6 for example. Anything you would be willing to share about the tools/process you use?

Jeremy
www.weplay.com; 6:46 AM
Anonymous said...: Did I say that in the podcast? I don't think I agree with it. The stumbling block for automation is always maintenance of tests as code changes. Most of the reason I was wrong about unit-level TDD in 2001 was that I assumed maintenance cost and unpleasantness would lead to programmers abandoning tests. What I didn't realize was that programmers would change the way they programmed so that (a) maintenance costs would go down, and (b) the value of tests to them would go up. The combination made test maintenance palatable.

Open source tools have something to do with the cost going down - so does fast, cheap computers - but I think that wouldn't have been enough without the change in design style.

So what I suspect we need for whole-product style tests is more a change in larger-scale architectural style than better tools. But I'm recently wondering whether such style changes wouldn't help exploratory testing to the point where it would cut away at the market share of automated whole-product tests.

Tools are still important because they make it easier for a common style to infect a large community.; 11:03 AM
Anonymous said...: merhaba

sesli sohbet sitemize bekleriz.; 7:58 AM
Anonymous said...: Sesli Sohbet Sesli Chat Diyet Makyaj Sesli Sohbet Sesli Chat; 1:03 AM

Creative Chaos

Schedule and Events

Thursday, October 30, 2008

When should a test run unattended? - III

7 comments:

Interviews and Videos

Blogs and Interesting Links

Labels

Blog Archive

Analytics