Creative Chaos: May 2009

Thursday, May 28, 2009

Brian Marick on Acceptance Test Driven Development

I jut saw this interview of Brian Marick on infoq.com.

The comments I found most interesting begin at 16:00 Minutes in, I've tried to summarize a couple of different ways with direct quotes or paraphrases, but I just can't seem to capture the whole idea in context. Please invest the six minutes to watch the video at least 16:00 minutes in - it's worth it.

Also note Brian's follow-up statements at 20:00 in about good testers and what I would call the oblivious school - that many of the initial agile proponents had simply never been exposed to really good, value-adding, professional testers. Also that, as good testers start to come out and get involved in agile, the conventional agile wisdom of software testing is slowly changing.

As to the value of acceptance-test automation, which Brian is challenging 16:00 in, note that I'm somewhere in the middle. My team uses a balanced breakfast approach that includes:

- Story tests executed by a person combined with exploratory testing
- Exploratory testing of high-risk areas when they are changed (loose "charters")
- A regression test suite written in selenese that has roughly 10,000 test steps
- A staging server where we "eat our own dog food" for a week before placing new code in production
- "Slideshow" tests that drive the GUI while a human being watches
- Developers doing TDD and pairing to improve the quality when it moves out of development in the first place

Our test suite /does/ find a number of bugs. What I struggle with is - How much time do we spent writing new test steps? How much time do we spend exploring false error reports and maintaining the tests as the system changes? Could we be doing better things with our time?

"What to do with our time", as they say, is the whole ballgame of software testing. We'd best spend it wisely. If that's true, I hope you'll agree, as I do, that things like the comment above, by Brian Marick, are ... fascinating.

Wednesday, May 27, 2009

I just heard a great quote

I was listening to "How To Build a Lean Startup" when one statement really struck me.

The speaker said that when you change a process or adopt something new, unless the people understand the benefit of change, they will subvert and compromise it until the change goes away - or, at least, becomes superficial and minor.

I see this all the time. Most recent and noticeably when organizations go to "Agile", especially larger organizations.

Oh, yes, sure, we can be Agile, but we want to know when we will be done.

And we have this whole team of analysts, and they aren't going to go away, so we need to find a role for them. Same thing with our PM's. No, we aren't going to go to HR and tell them we want a role called "Coach." C'mon. This is a professional organization.

Oh, yeah, co-located. Riiight. That's going to be a problem with facilities. I'll put a ticket in. Cross your fingers.

And we still need traceability from requirements to test cases, of course.

And requirements? Well yes, we still need them. They need to be written and comprehensive.

... the list goes on.

It's not just me, there's even a C2 wiki page called Big Agile Up Front.

What is the result of this? Organizations aren't getting the "sea change" results Agile Software Development promises. Instead, they may see a 10-20% improvement. And, while 10-20% of 10 million dollars might be just fine for a CIO, to the workers, we have, to paraphrase Brian Marick "Gone from 'this is the best project I've ever worked on' to 'thanks for making my job suck less." And it's hard to get excited about that."

So that quote at the beginning really struck me: Of course those organizations compromised. They adapted the "new" method to the existing thinking.

More and more, I'm thinking of explaining Software Development process in terms of principles, values, and tradeoffs. Maybe it's time for another article on methodology design to supplement the first.

Development Process at Socialtext

Software Development Process at Socialtext, In a very, very small nutshell, taken from my latest post to the agile-testing list:

--- In agile-testing@yahoogroups.com, "adam_peter.knight" wrote:
>
>I was recently reading Lisa and Janet's book. In it
>it mentions that Janet's team "release every few iterations
>and might even have an entire iteration's worth of endgame
>activities to verify release readiness."
>

At Socialtext, we've released to production after /every/ iteration for something like 34 of the past 37 iterations. We use 2 week iterations. Let's join our process, already in progress:

Second Tuesday, Wednesday of iteration 1: Product Management works on iteration 2 stories, Dev developers iteration 1 stories, QA tests iteration 1 stories done by developers and "in QA" via story-tests, writing selenium RC test cases, and exploratory methods.

Second Thursday, Friday of iteration 1: devs estimating stories for iteration 2, PM assembles an iteration 2 story pool. Code is 'closed' to new stories, master branch is cut to iteration-2009-MM-DD branch, devs finish existing stories on the branch, move to QA. QA is testing existing stories via story-tests and exploratory methods on master, then that branch.

Last Weekend of iteration 1: Release candidate testing begins (selenium/automated test suite) on branch iteration-2009-MM-DD.

First Monday of iteration 2: Devs/QA 'Sign up' for stories for iteration 2, story kickoffs with whole team, QA continues candidate testing via exploratory coverage of features not automated, slideshows, and exploratory testing in general.

First Tuesday of iteration 2: Devs develop stories on master for iteration 2, QA performs upgrade tests for iteration 1.

First Wednesday of Iteration 2: (hopefully) upgrade goes to staging server, devs develop stories. First few stories may be in QA. QA beging story-testing, working on selenium automated test execution/evaluation backlog, or may perform upgrade testing if the upgrade will go to appliances.

First Thursday of Iteration 2 ... Second Tuesday: Developers develop on master, QA tests on master.

Iteration 1 branch will probably go to prod around the second Wednesday of iteration 2.

Etc.

Essentially, because we release to prod /every/ iteration, we pay a moderate pain in regression testing /every/ iteration, but the pain does not "batch up" for releases. This is a pretty quick summary. For more details, you can consult my chapter of "Beautiful Testing" from O'Reilly. It will be available in October of 2009, but you can pre-order it from Amazon TODAY:

http://www.amazon.com/Beautiful-Testing-Leading-Programmers-Reveal/dp/0596159811/ref=sr_1_2?ie=UTF8&s=books&qid=1243385588&sr=8-2

There are some other people you may have heard of who have contributed to the book, like that "Crispin" person. Something about a donkey, I can't remember exactly.

:-)

regards,

--heusser
twitter: mheusser
blog: xndev.blogspot.com

(PS: Now that I've beaten up on metrics, I have more actual /good/ example metrics, still to come!)

Tuesday, May 26, 2009

Metrics, Schmetrics - III

I just left this as a comment to a post on Software Testing Club. My newest comment, on Page 2 of the thread:

If you are talking about a transactional system - something like amazon.com - where you can measure the books that are successfully delivered and measure the failures, well, sure. Those transactions are roughly fungible. If we can assign a root cause to each failure, then aggregate the root causes and sort in excel, we can start doing problem-solving on the biggest problem first. That's just basic six sigma, and I'd support six sigma for transactional systems.

The problem is we try to take systems that are /not/ transactional and make them such. A lot. We also use metrics to create smoke and mirrors.

The example above is a potential good use of metrics - and I'm remiss for not mentioning it earlier. In my experience, only a small percentage of software projects have metrics that can fit this paradigm.

Then again, there are a few thousand people who read this blog in a month, and none of them mentioned it either. So I suspect the percentage is relatively small, indeed.

Do you stackoverflow?

Stackoverflow is a relatively new community for software developers. It's goal is to connect people with questions to people with answers. Joel Spolsky, of JoelOnSoftware fame, is one of the co-founders of the venture. I'm a longtime JoS reader, so I decided to check it out.

Questions are searchable, And lo and behold, there's a bunch of testing questions. Yayyy. I left a particularly long comment on one thing morning - a user asking questions about a potential master's thesis on black-box record/playback tools.

I dare to say the site will attract people of slightly higher intellectual bent than some ... other forums, and I'd encourage you to check it out.

Yes, it's free, and yes, I have no commerical relationship with Joel or any of his ventures. It's just good stuff. Sheesh. :-)

Thursday, May 21, 2009

Second-Class Citizens III

I'm reading Weapons of Mass Instruction: A Schoolteacher's Journey through the Dark World of Compulsory Schooling by John Taylor Gatto. It's a great little book. On thing I saw struck me enough to write about. Gatto quotes the German philosopher Immanuel Kant, who had four questions he believed were key to education:

"What can I know?
Where may I hope?
What ought I to do?
What is Man?"

Gatto went on to write this:

"All of Kant's questions must be grappled with before a useful curricula can be set up to reach the ends you wish. But if you duck this work, or are tricked into ceding it to an official establishment of specialists (or coerced into doing the same thing), it shouldn't surprise you to find yourself and your children broken on the wheel of somebody else's convenience, someone else's priorities."

Well, let's look at those questions again and reframe them:

"What is our organizations larger goal - or mission?
How does the testing group fit in, support, and help accomplish that mission?
What are the problems? What are our risks? What risks matter?
What are our values? What tradeoffs can we make to get something we value and abandon something we do not?
How, then should we test?"

These are key questions for any testing group. But I can't tell you how many times I have seen those questions abandoned, because they are "philosophy" or "not practical", or simply ducked, or, perhaps worst of all - some "expert", generally in a suit, answers the questions for us.

Some expert that has never even been in our office, never examined our software, never met our team members, is going to prescribe a solution.

Without examining the patient.

Let's look again at how Gatto feels about that, shall we?

"All of Kant's questions must be grappled with before a useful curricula can be set up to reach the ends you wish. But if you duck this work, or are tricked into ceding it to an official establishment of specialists (or coerced into doing the same thing), it shouldn't surprise you to find yourself and your children broken on the wheel of somebody else's convenience, someone else's priorities."

Among others, three common arguments for a standardized testing curriculum is that it allows us to stop re-inventing the wheel, it eliminates the bottom layer of ignorance from the craft, and it allows us all to use the same words for the same thing, so we can stop spending time arguing about definitions and get on with the work.

I believe the first argument, that a standard cirriculum would help us stop re-inventing the wheel, is nothing less than offering to help us duck the tough questions. That's bad, my friends. And even if it could be done, are the testing challenges at Yahoo in 1997 the same ones that a big bank, telecom, or government institution will have? Somehow, that seems unlikely.

As for the second argument, it is appealing at first, I admit. But the real bottom layer of the craft - people so ignorant they don't realize they are ignorant - is like any other craft - they believe the work is trivial, that anyone can do it, that it takes little or no training. Ask yourselves: Does a two-day course with completed certificate remove this cruft, or does it threaten to trivialize testing still more? Of the dozen+ test certifications currently in place, how many trivialize the work?

Perhaps it might be fair to say how many DON'T trivialize the work. I can think of a small minority: The Black-Box Software Testing Training offered by the Association for Software Testing is the first to come to mind, but it doesn't promise to help you duck the tough work, or do it for you.

The third argument is the decreased communications friction from not having to say "when I say system test, I mean this." This argument is strongest in my mind. It is a real benefit. That doesn't mean you're actually good at testing, but it does mean there will be a slight decrease in the operational cost of the work.

It's just that, well, I know where I stand. And I value that benefit less than some of the other unintended consequences. I believe I listed tradeoffs in my questions above, right?

Ultimately, whatever you do, whatever you support, I'm not going to condemn you. I'm just trying to explain the dangers of letting someone else decide for you.

As an industry, we seem determined to complain that we are treated as second class citizens.

What do I think are the keys to getting first class?

Well, one place to start is the questions above.

Tuesday, May 19, 2009

Metrics, Schmetrics - II

I've been putting off writing this. It is, to be honest, a little painful.

Some metrics, like expenses, income, and cash flow, for example, are really really important. You need to track them.

And, in a vehicle, you certainly want to know how fast you are going and if your tank is empty or full.

All of those examples are fungible - a gallon of gas can be traded for any other gallon of gas. A penny saved is a penny earned. They are all the same.

Yet test cases, lines of code, these things are not the same. You can have a test case that takes two hours to set up, or a half-dozen similar ones you can run in thirty seconds. If you are measured by test cases executed per day, which do you think you are going to focus on?

I've mentioned that point before, but I thought it was worth mentioning twice.

But wait, there's MORE!

This is the part I didn't want to write. What's the purpose of your metrics program? Well, to be terribly honest, these are reasons I have seen for corporate metrics programs:

1) The people on the helpdesk, in operations and finance have them. Without them, we look kinda dumb.
2) Metrics seem to /prove/ things. Without metrics we are down to our stories; why should senior management believe those stories?
3) Some auditor told us we had to have them to be mature.
4) Because we /desire/ easy control over software projects.

Hopefully the previous post about metrics convinced you that, like the beer commerical where the swedish bikini team suddenly appears - the promise and the result rarely align.

There are, however, other reasons to gather metrics than formal programs designed to create 'control.' Do-ers, and even managers, can gather metrics every day in order to understand what is going on in the system.

Example: Querying the bug tracker to figure out which browser is the most problematic - and should get the most testing time, is a reasonable thing. Do it once and you're likely to get unbiased results - no one was manipulating the system when you took the sample.

Now, on the other hand, if you set a corporate goal to decrease the % of released bugs in internet explorer and measure it every week, and you are allmost certain to introduce dysfunction.

So metrics as a tool by individuals to improve performance - with no intent or evaluating people or "controlling" the process? Sure. I'm all for it.

So how can we respond when asked for metrics for some of those ... less noble reasons above?

The pyramid of information

As a do-er, Joe has to manage himself. His boss, the manager, has to manage ten people. His boss, the director of software engineering, has to coordinate ten projects - and his Boss, the VP of Information Systems, has ten big projects, three corporate initiatives, and fifty small projects going at one time.

The information received from each person - then each project - has to get smaller as you go up the chain. Middle management metrics seem to focus on process, while senior management cares about outcome.

This causes a disconnect when middle management presents those metrics and senior management asks, awkwardly "so ... we have 300 open bugs. What does that mean to me, exactly?"

If you're struggling with metrics, one solution is to give middle management better ones - metrics that will actually address the concerns of the big boss.

In my experience, what metrics does the big boss want?

For each project
- Is it on time?
- Is it on budget?
- Is it on features?
- Is it at risk for some other reason?
- How's the ROI looking?
- How do you feel about the quality?

What's the best way to do this, in my experience?

Make a spreadsheet. For each project, have the projected go-live as a column. The next column is either Green (good), Yellow (hmm), Orange (lookout) or Red (oh dear). In the next column, explain the color. If you've got one, the next column is a link to a wiki page with the detailed status.

The big boss can scan and drill into any project he has concerns about in a very traditional way - by talking to people. Your spreadsheet gives him the tools to know which projects need drilling.

A "metrics expert" would point out that the spreadsheet above is a qualitative metric, which does not enable a quantitatively managed process.

For a response, I'd send him a link to "Metrics, Schmetrics Part I".

You
Have reached the end of this brief article. And if each reader of Creative Chaos as one year of experience, combined, that's a few thousand years. What metrics have you had success with? I'd like to know - and - likely - so would a thousand other people reading this.

Thursday, May 14, 2009

Second Class Citizens, ReDux

I just posted this to the Software-Testing Yahoo Group:

I've seen the second-class citizen issue in many disciplines. I first read it explained well in one of John Bruce's Essays. Bruce puts it this way:

"There's one way I've found -- though certainly not the only way -- that you can tell a job has gone south. It's when the work suddenly focuses on its most rudimentary component. I'd been writing, coordinating, and publishing all the documentation at the USC computer center, a responsible position. Within a very short period I'd become nothing but a typist. Later, in other jobs, I'd find the work had suddenly changed from being a system engineer to being the kid who pushes a cart around, replacing coffee-sodden keyboards and used-up printer cartridges. That's a sign that it's time to go, or if you don't go of your own accord, someone will make the decision for you. When this happens, the window of opportunity isn't wide."

Now my words:

By it's "most rudimentary component", I suspect Bruce means the things that someone would see visibly if they did not understand the role. Without understanding, technical writing is transcription, helpdesk is call forwarding, project management is baby sitting, status reporting, and gannt charts. Programming is "just", translating from the language of humans to the language of computers, and testing is, say, "just" /verifying/ the translation was correct.

The steps that happen next are predictable: Management decides the work needs to be done, but shouldn't interfere with the busy business of production. They create two policies: First, to save money by hiring people who aren't very bright, and second, to create a /process/ to make sure those people who aren't very bright don't screw things up.

This is a self-fulfilling prophecy. You pay peanuts, create a heavy prescriptive process, lock people into roles you define ... and then you are consistently disappointed at the results those people produce. (To paraphrase Saint Thomas More: What can we say, but that we first create criminals, and then punish them?)

So, when you're told what to do by someone who's never done your job themselves, it might be time to go.

As I said before, this doesn't just happen to testers. This story is, to my knowledge, best documented in the role of the Quality Engineer in the North American Automotive Industry. I doesn't seem to be working out too well for them.

When I move into a shop, I do my best to impress my peers and clients so much that they will leave me alone to 'do my thing. When pushed to follow a process, I'll likely say something like "I appreciate your input, and I I'll keep that in mind when I make my decision on how to proceed." (Keep in mind, I've never worked in avionics or life critical systems.)

So far, it seems to be workin' for me.

Later, my Friend, former co-worker, and current writing partner Chris McMahon added this:

I have worked in life-critical systems, and our team implementing unique processes that contradicted the state of the practice in the late 90s (well, we thought they were unique until the Agile Manifesto was published) saved lives. Literally, saved lives.

If you see a better way, take the better way.

Thanks, Chris, I appreciate that.

More Metrics Schmetrics, still to come.

Monday, May 11, 2009

Amazon Review

Did you know that James Bach refer to Jerry Weinberg as the "Prince of Testers"?

Did you know that Jerry led the first documented independent test team, in the 1950's?

Did you know that Jerry came out with a book on testing last year, called Perfect Software: And Other Illusions about Testing?

I've just put up my review on Amazon. If you've ever felt "stuck" explaining the impossibility of complete testing or framing expectations - or answering a question that begins "Couldn't you just ..." - you might want to buy a copy and find a subtle way to get your boss to read it.

It makes a great Birthday, Christmas, or "Anniversary with the company gift". Or just buy a copy, leave it on your desk, and when some snoop starts leafing through it, let 'em borrow it. At twenty bucks, it is one of the cheapest investments you can possibly make in your career and work environment.

(I have no financial relationship to Jerry or Dorset House. Really. It's just a good book.)

Thursday, May 07, 2009

Metrics, Schmetrics

Long-time readers will know that I am very wary of metrics for software engineering. Oh, there's the usual problems:

1) Generally, software engineering metrics are proxy metrics. You really want to measure productivity, but you can't, so instead you measure lines of code. Or you really want to measure quality, but you can't, so you measure defects.

2) If you measure something, you're likely to get it - but people will take short-cuts in order to get that thing. Generally, this will involve exploit the difference between what you want and what is actually measured - for example, a developer will argue "that's not a bug" if measured by bug count, or a tester, if measured by bugs found, may search in the documentation and file a bug on every single typo. Demarco and Lister refer to this as "dysfunction" in their book Peopleware.

3) Likewise, software engineering metrics (often) measure things that are different and put them all in one box. Instead of measuring dollars or widgets, all of which are interchangable, we measure tests, bugs, or lines of code. These are /not/ interchangeable - some could take much more time to find/create than others - yet putting them in the same box means they are all treated the same. The result? You'll get a /lot/ of very small things. Try this with projects - watch the size of your portfolio shoot up ... but each project is smaller. Isn't it strange how that happens?

4) Even if you can measure well, there are probably some things you have not measured - and to achieve the metrics, the team will usually trade those "intangibles" off. The classic example of this is hitting a date by allowing quality - which is hard to measure - to suffer. In the 21st century, with more advanced techniques, we are getting better at assessing product quality, so the next thing to take on is usually technical debt.

5) The classic answer to this is to have a balanced scorecard - to measure several things, such that a tradeoff to increase one thing will cause a visible decrease in another. But consider how hard it is to measure technical debt - or strength of relationships - and consider how expensive it is to try to create and maintain an exhaustive metrics system. By the time the metrics system is in place, you could have shipped a whole new product. Can we really call that improvement?

Getting Metrics right is /hard/. Consider McDonalds, a multi-billion dollar corporation, that measures price of it's food, sales, and repeat customers. What do they not measure? I suspect McDonalds does not measure the waistlines of it's best customers, treatment of animals in it's food pipeline, and, until lately, effect on the environment of it's waste.

When I explain the challenges with Software Engineering Metrics to folks, I usually get one of two reactions: Either strong agreement "I always felt that way but didn't have the words", or no response at all. It's not that people who don't respond at all don't care - they are usually strong proponents of metrics in a software group. They simple have an opposing viewpoint and yet have no answer to the dysfunctional issues caused my metrics.

To which I will add one more thought experiment:

I belong to twitter, which counts the number of people who follow me. This is a simple, concrete, hard measure of my popularity. I can use my twitter score as an objective number to argue my case before a book company, a magazine publisher, or a conference - in some cases, this could directly result in more bookings and higher revenue for the still-exists-but-tiny Excelon Development.

Yet if /all/ I cared about was that one metric on twitter, I would adjust what I write to appeal to all people working in testing. Then to anyone doing any kind of software work. Then I'd generalize to knowledge work. Then I'd go mass-market and try to talk about technology in the 21st century. And the message would get weaker and weaker and weaker and ...

So, to be true to myself, I need to ignore my twitter ranking, ignore my technorati ranking, and try to generate real relationship and create content that's actually worth reading.

It's funny how that works, eh?

Wednesday, May 06, 2009

In Defense of Testers

I just ran into this short article that explains some of the long-term benefits of applied critical thinking and the tester's perspective. Stick with it through page two; it's worth it.

In the mean time, I'm working on a one-page article that provides a brief explanation of topics in Agile-Testing. If you're interested in peer review, drop me a line, or leave a comment with your email address. (You may want to hide it, EG matt dot heusser at gmail dot com - etc, to prevent spam.)

Monday, May 04, 2009

May Issue of Software Test and Performance

Chris McMahon and I talk about Unit Testing Tools in our column in Software Test&Performance this month, on page nine. As always, it's a free download.

Friday, May 01, 2009

Three Kinds of Improvement

Malcom Gladwell, the author of Outliers: The Story of Success, is credited with saying that "Talent is the desire to practice.", and I tend to agree.

In his article in The New Yorker on the subject, Gladwell claims that instead of promoting raw talent, regardless of qualification, we should instead create /systems/. While I like the general theory of systems, too often in practice it turns into scripts and rules that, in the words of Mr. Barry Schwartz "prevent disaster, but guarantee mediocrity."

In other words, I agree with the data gathered by Mr. Gladwell, but not his conclusion. There must be a third way.

What if, instead of raw talent or systems, we focused on building skills in our people through conscious practice?

Four thousand years ago, the author of proverbs wrote that Iron Sharpens Iron, and, ask such, man can sharpen man.

Some people won't improve, or even meet minimum standards. I would not want to "inflict help" on them; I would want to avoid hiring them. A small number of rules can create some visibility and control for a very small price. And, to those that are amenable to it, conscious practice can bring real improvement.

I'm inclined to think it's a little of all three.

Creative Chaos

Schedule and Events