(More Test Estimation to come, but first: An Interlude)
By now many of us know the standard 'barbs' - for example, that when Winston Royce designed the waterfall model, he said it was "risky" and "invites failure".
Or that Tom DeMarco, author of the oft-quoted "Controlling Software Projects: Management, Measurement, and Estimation" has essentially recanted his claim that "You can't control what you can't measure."
(To be more accurate, DeMarco actually argues that the quote may hold. It just turns out that control isn't all that important on software projects these days.)
Then I saw this video by Glenn Vanderburg at the Lone Star Ruby Conference:
Once again someone has taken some of those ideas, deconstructed them, and re-packaged them in a way greater than the sum of their parts. In the video, Greg goes way back, explaining not only what Winston Royce wrote, but the how and why it could have been perverted in the "waterfall" "standards based" and "software engineering" approaches of the 1980's and 1990's -- and what we should do about it.
I think you'll enjoy it.
Next time: Still more test estimation to come.
We're getting there. Really.
Schedule and Events
March 26-29, 2012, Software Test Professionals Conference, New Orleans
July, 14-15, 2012 - Test Coach Camp, San Jose, California
July, 16-18, 2012 - Conference for the Association for Software Testing (CAST 2012), San Jose, California
August 2012+ - At Liberty; available. Contact me by email: Matt.Heusser@gmail.com
Wednesday, September 29, 2010
Tuesday, September 21, 2010
Test Estimation - VI
So far, we have two ways to predict project outcome:
First by comparing the test effort to other projects, and suggesting it is "between the six of project X and projec Y", this giving us a range.
Second by calculating out test costs as a percentage of the development (or total project) effort, looking at the official schedule and projecting out our expected project length. If we're smart, we also take into account slips to get that percentage.
A third approach I can suggest is to predict the cycle time - time to run through all the testing once. I find that teams are often good at predicting cycle time. The problem is they predict that everything will go right.
It turns out that things don't go right.
Team members find defects. That means they have to stop, reproduce the issue, document the issue, and start over -- that takes time. More than that, it take mental brain energy; the tester has to "switch gears." Plus, each defect found means a defect that needs to be verified at some later point. Some large percentage of defects require conversation, triage, and additional mental effort.
Then there is the inevitable waiting for the build, waiting for the environment, the "one more thing I forgot."
So each cycle time should be larger than ideal - perhaps by 30 to 40%.
Then we need to predict the number of cycles based on previous projects. Four is usually a reasonable number to start with -- of course, it depends if "code complete" means the code is actually complete or not. If "code complete" means "the first chunks of code big enough to hand to test are done", you'll need more cycles.
If you start to hear rhetoric about "making it up later" or "the specs tooks longer than we expected, but now that they are solid development should go faster", you'll need more cycles.
(Hint: When folks plan to make it up later, that means the software is more complex, probably buggier, than the team expected. That means it'll take more time to test than you'd hoped, not less.)
So now we have three different methods to come up with estimates. With these three measures we can do something called triangulation - where we average the three. (Or average the ranges, if you came up with ranges.)
When that happens, it's human nature to tend to throw out the outliers - the weird numbers that are too big or too small.
I don't recommend that. Instead, ask why the outliers are big or small. "What's up with that?"
Only throw out the outlier if you can easily figure out why it is conceptually invalid. Otherwise, listen to the outlier.
Which brings up a problem -- all the estimating techniques I've listed so far have a couple of major conceptual flaws. And I haven't talked about iterative or incremental models yet.
They are just a start.
Still more to come.
First by comparing the test effort to other projects, and suggesting it is "between the six of project X and projec Y", this giving us a range.
Second by calculating out test costs as a percentage of the development (or total project) effort, looking at the official schedule and projecting out our expected project length. If we're smart, we also take into account slips to get that percentage.
A third approach I can suggest is to predict the cycle time - time to run through all the testing once. I find that teams are often good at predicting cycle time. The problem is they predict that everything will go right.
It turns out that things don't go right.
Team members find defects. That means they have to stop, reproduce the issue, document the issue, and start over -- that takes time. More than that, it take mental brain energy; the tester has to "switch gears." Plus, each defect found means a defect that needs to be verified at some later point. Some large percentage of defects require conversation, triage, and additional mental effort.
Then there is the inevitable waiting for the build, waiting for the environment, the "one more thing I forgot."
So each cycle time should be larger than ideal - perhaps by 30 to 40%.
Then we need to predict the number of cycles based on previous projects. Four is usually a reasonable number to start with -- of course, it depends if "code complete" means the code is actually complete or not. If "code complete" means "the first chunks of code big enough to hand to test are done", you'll need more cycles.
If you start to hear rhetoric about "making it up later" or "the specs tooks longer than we expected, but now that they are solid development should go faster", you'll need more cycles.
(Hint: When folks plan to make it up later, that means the software is more complex, probably buggier, than the team expected. That means it'll take more time to test than you'd hoped, not less.)
So now we have three different methods to come up with estimates. With these three measures we can do something called triangulation - where we average the three. (Or average the ranges, if you came up with ranges.)
When that happens, it's human nature to tend to throw out the outliers - the weird numbers that are too big or too small.
I don't recommend that. Instead, ask why the outliers are big or small. "What's up with that?"
Only throw out the outlier if you can easily figure out why it is conceptually invalid. Otherwise, listen to the outlier.
Which brings up a problem -- all the estimating techniques I've listed so far have a couple of major conceptual flaws. And I haven't talked about iterative or incremental models yet.
They are just a start.
Still more to come.
Tuesday, September 14, 2010
Test Estimation - V
So one way to estimate the testing phase (if you have such a thing), or at least testing activities, is to compare the test effort to the development effort or overall effort on other projects.
Examples:
"We spent about ten solid months on MaxStudio, and only spent two months testing. So I think testing should be about 20% of the overall dev budget."
"We spent a year on StudioViz from kick-off meeting to release, and about a month testing. So I think testing should be less than 10% of overall budget."
Both of these examples are real.
The thing is, after release, we spent the next six months fixing MaxStudio, and took a serious hit in the marketplace or reputation.
Likewise, we developed StudioViz incrementally, with many stops along the way to bring the work-in-progress up to production quality. StudioViz was also a browser-based application - well - sort of. It ran in browser control inside a windows application. So we were able to 'lock down' the browser to at least modern versions of Internet Explorer.
What all this means is that if you pick history to do a percentage of effort measurement, make sure the development model - the "way you are working" is relatively similar. Big changes in teams, in technology, technique, or method can render these sort of projections obsolete pretty easily.
But now we have two methods: Comparing test effort to test effort on similar sized projects, and using test effort as a percentage of dev effort. (That is, what percentage was it of dev effort for previous projects, look at dev effort for this project, multiply by percentage, get test effort.)
Of course, both of those measurements assume that you have roughly the same portion of developers to testers - but like I said, changing things makes projections based on past experience less and less accurate.
Of, and successful organizations tend to change. A lot.
Another method, that I've hinted at, is percentage of the overall project. Now for this to work you have to be careful, because it's very hard to measure the effort if you go back to when the idea was a gleam in someone's eye. When I've done this, I've tried to go back to when the initial kick-off happened - at which point the project probably had a full-time project manager, 'assigned' technical staff, maybe a business analyst.
Here's another quick exercise for the folks that complain "that sounds great, but we don't have the data":
Start tracking it.
Seriously. Get a pen, pencil, maybe your email box out, and start tracking just the projects you are on or the 'huge ones' swirling around you. This is easy enough to do in excel. If you want to get really fancy, start recording when the project was created and predicted due date, along with when the due-date slips occur, and how much much they slip by.
It turns out that this trivial-to-gather data can be extremely valuable when used to predict the performance of future projects.
More on that next time.
Examples:
"We spent about ten solid months on MaxStudio, and only spent two months testing. So I think testing should be about 20% of the overall dev budget."
"We spent a year on StudioViz from kick-off meeting to release, and about a month testing. So I think testing should be less than 10% of overall budget."
Both of these examples are real.
The thing is, after release, we spent the next six months fixing MaxStudio, and took a serious hit in the marketplace or reputation.
Likewise, we developed StudioViz incrementally, with many stops along the way to bring the work-in-progress up to production quality. StudioViz was also a browser-based application - well - sort of. It ran in browser control inside a windows application. So we were able to 'lock down' the browser to at least modern versions of Internet Explorer.
What all this means is that if you pick history to do a percentage of effort measurement, make sure the development model - the "way you are working" is relatively similar. Big changes in teams, in technology, technique, or method can render these sort of projections obsolete pretty easily.
But now we have two methods: Comparing test effort to test effort on similar sized projects, and using test effort as a percentage of dev effort. (That is, what percentage was it of dev effort for previous projects, look at dev effort for this project, multiply by percentage, get test effort.)
Of course, both of those measurements assume that you have roughly the same portion of developers to testers - but like I said, changing things makes projections based on past experience less and less accurate.
Of, and successful organizations tend to change. A lot.
Another method, that I've hinted at, is percentage of the overall project. Now for this to work you have to be careful, because it's very hard to measure the effort if you go back to when the idea was a gleam in someone's eye. When I've done this, I've tried to go back to when the initial kick-off happened - at which point the project probably had a full-time project manager, 'assigned' technical staff, maybe a business analyst.
Here's another quick exercise for the folks that complain "that sounds great, but we don't have the data":
Start tracking it.
Seriously. Get a pen, pencil, maybe your email box out, and start tracking just the projects you are on or the 'huge ones' swirling around you. This is easy enough to do in excel. If you want to get really fancy, start recording when the project was created and predicted due date, along with when the due-date slips occur, and how much much they slip by.
It turns out that this trivial-to-gather data can be extremely valuable when used to predict the performance of future projects.
More on that next time.
Thursday, September 09, 2010
Test Estimation - IV
So far, I've pointed out various issues in test estimation, pointing out the fallacies involved in simple 'naive' test estimation. I also pointed out that is possible to do some kind of estimate, even if all you say is "um, yeah boss -- two weeks."
A couple of people expressed surprise at this. Laurent Bossavit, a board member of the Agile Alliance, was concerend that people wouldn't take me seriously.
But if you've been around the block a few times, you might just remember a test project or two where a wild guess made without context was the test strategy. Someone with no context walked in, said "testing -- hmm. Four weeks." and walked away. They were the boss, and it is what it is, and the team made the best of it.
Hopefully that isn't what actually happened.
It might have looked like that to an outsider, but there was likely something more going on. Perhaps the manager needed to ship the product in four weeks in order to hit some deadline. Perhaps he needed the staff to stop billing on project X because in four weeks, you'd run out of budget.
Or, perhaps, it's possible that some sort of rational process was going on in his head that we could not see.
I'm going to start with the rational approaches. Don't worry, we'll cover the social approaches, and even agile and iterative models -- but I think the next step is to have some logical reason to believe it will take X, instead of wishful thinking.
It might have looked like "Four weeks" was made up, but it's very possible that some complex process was going on in that manager's head. For example, he might have thought:
Hey, this project is about as big as project A. I've got the exact same technical teach as project A. We haven't had much turnover in the dev ranks either. We've actually learning something since project A, and I believe the team will make less mistakes. How long did project A take to test? Oh yeah, four weeks. Okay. I'll use that as estimate.
Check it out -- our theoretical manager might actually had a reason for believing in the four week number.
It's also possible that the manager was considering a number of projects and averaging the lengths, to come up with four weeks. It's unlikely the data was available, but the human mind is surprisingly capable of doing those averages, even subconsciously. I prefer to do it with a pen and paper and a little math, to have something to explain to someone who asks "where did the four week number came from?", but many a tester, lead, or manager can do this sort of comparison in their head without even realizing it.
This happens in other disciplines too. Consider the expert Butch who has spent his entire adult life in the field. You ask him for two pounds of ham and he goes slice slice slice - weight it and it's exactly 2.002 pounds. Ask him how he did it, and he'll likely say "I don't know. I suspect cutting meat five days a week for thirty years had something to do with it.
But we can do one better than that. Write down a half-dozen projects. It's unlikely that any of them are specifically like project X. Project A had a different dev team, project B had more testers, project C was riskier with a more complex infrastructure, project D was the first time we had ever used the new webserver, and so on.
So you end up with a range. "Given our previous projects, we expect this test project to take three to five weeks."
Of course, there's all kinds of reasons that's a bad estimate. But it's better than the first strategy - to make something up - right? And it's likely better than functional decomposition, for reasons we already discussed.
This idea - past history - is a lever, a way to come up with an estimate. In other words, it's a model from which an estimate will "pop out".
When this series is done, you should have several levers. You'll be able to apply several models to your project, get several estimates, and consider which one is correct.
More to come.
A couple of people expressed surprise at this. Laurent Bossavit, a board member of the Agile Alliance, was concerend that people wouldn't take me seriously.
But if you've been around the block a few times, you might just remember a test project or two where a wild guess made without context was the test strategy. Someone with no context walked in, said "testing -- hmm. Four weeks." and walked away. They were the boss, and it is what it is, and the team made the best of it.
Hopefully that isn't what actually happened.
It might have looked like that to an outsider, but there was likely something more going on. Perhaps the manager needed to ship the product in four weeks in order to hit some deadline. Perhaps he needed the staff to stop billing on project X because in four weeks, you'd run out of budget.
Or, perhaps, it's possible that some sort of rational process was going on in his head that we could not see.
I'm going to start with the rational approaches. Don't worry, we'll cover the social approaches, and even agile and iterative models -- but I think the next step is to have some logical reason to believe it will take X, instead of wishful thinking.
It might have looked like "Four weeks" was made up, but it's very possible that some complex process was going on in that manager's head. For example, he might have thought:
Hey, this project is about as big as project A. I've got the exact same technical teach as project A. We haven't had much turnover in the dev ranks either. We've actually learning something since project A, and I believe the team will make less mistakes. How long did project A take to test? Oh yeah, four weeks. Okay. I'll use that as estimate.
Check it out -- our theoretical manager might actually had a reason for believing in the four week number.
It's also possible that the manager was considering a number of projects and averaging the lengths, to come up with four weeks. It's unlikely the data was available, but the human mind is surprisingly capable of doing those averages, even subconsciously. I prefer to do it with a pen and paper and a little math, to have something to explain to someone who asks "where did the four week number came from?", but many a tester, lead, or manager can do this sort of comparison in their head without even realizing it.
This happens in other disciplines too. Consider the expert Butch who has spent his entire adult life in the field. You ask him for two pounds of ham and he goes slice slice slice - weight it and it's exactly 2.002 pounds. Ask him how he did it, and he'll likely say "I don't know. I suspect cutting meat five days a week for thirty years had something to do with it.
But we can do one better than that. Write down a half-dozen projects. It's unlikely that any of them are specifically like project X. Project A had a different dev team, project B had more testers, project C was riskier with a more complex infrastructure, project D was the first time we had ever used the new webserver, and so on.
So you end up with a range. "Given our previous projects, we expect this test project to take three to five weeks."
Of course, there's all kinds of reasons that's a bad estimate. But it's better than the first strategy - to make something up - right? And it's likely better than functional decomposition, for reasons we already discussed.
This idea - past history - is a lever, a way to come up with an estimate. In other words, it's a model from which an estimate will "pop out".
When this series is done, you should have several levers. You'll be able to apply several models to your project, get several estimates, and consider which one is correct.
More to come.
Tuesday, September 07, 2010
Test Estimation - III
So previously I posted that factors outside our own control make accurately estimating the total test schedule impossible. In addition, I posted that even estimatingf simple, known, static things by breaking them into tasks, then adding up the tasks is much less accurate than you might think.
Example Three:
Imagine your typical workday that ends at 5:00 think about it. Sure, I'll likely be home from my drive at 5:30, but I might hit traffic, my boss might ask me a question at 4:55PM, someone in the parking lot might need a jump start. So if you want a commitment from me to predict when to invite the neighbors of when to pull the turkey out of th oven, I should likely say 6:00PM.
That's a 100% timing padding.
One hundred per cent.
On a task I've done hundreds, if not thousands of times with known physical 'resources' and very few dependencies.
Which reinforces the conclusion that test accurate estimation is impossible. At least in the sense of estimating a single number for total time without context.
Yet we are tasked with doing it anyway -- better yet, creating that context. So where do we start?
About a year ago my good friend Ben Simo pointed out to me that even if you know nothing at all about a project, you can always estimate it. Here's an example:
Q: "How long will testing take?"
A: "Two Weeks."
See that? I know absolutely nothing at all about the software. Nothing at all. But I made up a number.
For any given project, we likely know more than nothing. So we can certainly do a better estimation than "two weeks."
Next: I'll start talking about how I do that.
Example Three:
Imagine your typical workday that ends at 5:00 think about it. Sure, I'll likely be home from my drive at 5:30, but I might hit traffic, my boss might ask me a question at 4:55PM, someone in the parking lot might need a jump start. So if you want a commitment from me to predict when to invite the neighbors of when to pull the turkey out of th oven, I should likely say 6:00PM.
That's a 100% timing padding.
One hundred per cent.
On a task I've done hundreds, if not thousands of times with known physical 'resources' and very few dependencies.
Which reinforces the conclusion that test accurate estimation is impossible. At least in the sense of estimating a single number for total time without context.
Yet we are tasked with doing it anyway -- better yet, creating that context. So where do we start?
About a year ago my good friend Ben Simo pointed out to me that even if you know nothing at all about a project, you can always estimate it. Here's an example:
Q: "How long will testing take?"
A: "Two Weeks."
See that? I know absolutely nothing at all about the software. Nothing at all. But I made up a number.
For any given project, we likely know more than nothing. So we can certainly do a better estimation than "two weeks."
Next: I'll start talking about how I do that.
Thursday, September 02, 2010
On Test Estimation - II
Another post I made to the Agile-Testing Group recently:
Here's a simple estimation excercise. My honest advice is don't just read it; actually try it. It takes about two minutes.
To start, think about the space between the bottom of your feet and your knees. Write it down. Then think about the space between your knees and your middle. Write that down.
Then estimate the and write down the size of your torso, then your neck, then your head.
Next, add up all five numbers.
Now compare that to how tall you /actually/ are.
That difference - between how you imagine things and how they actually are - is a picutre difference between task estimates and how long things will actually take.
Except of course, you can see and touch your body and it's been about the same height for decades, whereas code is new and fresh and symbolic and 'tests' are an even-more abstract concept.
When you think about it, tests are a first-order derivative of the code itself. Also, most testing is exploratory in nature, EG early predictions are not the best predictions.
So would I be reluctant to make task estimates on a testing task, given the typical American shorthand that estimate==commitment? Certainly.
I like to think of this as the test estimation rabbit hole. First, we have to have the bad news that test estimation is conceptually impossible.
Then we figure out how to do it anyway.
More to come.
Here's a simple estimation excercise. My honest advice is don't just read it; actually try it. It takes about two minutes.
To start, think about the space between the bottom of your feet and your knees. Write it down. Then think about the space between your knees and your middle. Write that down.
Then estimate the and write down the size of your torso, then your neck, then your head.
Next, add up all five numbers.
Now compare that to how tall you /actually/ are.
That difference - between how you imagine things and how they actually are - is a picutre difference between task estimates and how long things will actually take.
Except of course, you can see and touch your body and it's been about the same height for decades, whereas code is new and fresh and symbolic and 'tests' are an even-more abstract concept.
When you think about it, tests are a first-order derivative of the code itself. Also, most testing is exploratory in nature, EG early predictions are not the best predictions.
So would I be reluctant to make task estimates on a testing task, given the typical American shorthand that estimate==commitment? Certainly.
I like to think of this as the test estimation rabbit hole. First, we have to have the bad news that test estimation is conceptually impossible.
Then we figure out how to do it anyway.
More to come.
Wednesday, September 01, 2010
On Test Estimation - I
I posted this yesterday to the Agile-Testing List, thought I would share it here as well:
--- In agile-testing@yahoogroups.com, "daswartz@..." wrote:
>
>
> Can you help us understand why the QA people care whether
> you estimate in hours or points? I'm sure they have a reason, which
> should help us better answer the context for your question.
>
I'm not the Original Poster, but consider you are testing feature X. You break it down into tasks and say it will take 5 hours to "test."
The first build is total garbage. You can't even click the submit button.
The next day, you get a new build. You find five bugs. You get a new build
late in the day - four of the five bugs are fixed, and you find three new ones.
You get the fixes in the morning on day three. You find another bug. At noon,
your boss comes up: "You said this would take five hours to test and you are on
DAY THREE of testing? Wassup with that?"
---> Bottom line, there are elements in how long it takes to do testing beyond
the testers control. It's generally possible to estimate a test /cycle/ with
some accuracy, but estimating the entire test /process/(*) is rarely possible
unless you know the devs very well and have had some stability in delivered
software quality for some time.
Estimating in terms of points 'smooths' those gaps.
That's one possible explanation, anyway ...
--heusser
(*) - Yes, this pre-supposes that testing is a separate and distinct activity, I
know, we should be involved up front, whole team, etc. I'm with you. But
you gotta walk before you can run. Let's have that discussion on a separate
thread, ok?
--- In agile-testing@yahoogroups.com, "daswartz@..."
>
>
> Can you help us understand why the QA people care whether
> you estimate in hours or points? I'm sure they have a reason, which
> should help us better answer the context for your question.
>
I'm not the Original Poster, but consider you are testing feature X. You break it down into tasks and say it will take 5 hours to "test."
The first build is total garbage. You can't even click the submit button.
The next day, you get a new build. You find five bugs. You get a new build
late in the day - four of the five bugs are fixed, and you find three new ones.
You get the fixes in the morning on day three. You find another bug. At noon,
your boss comes up: "You said this would take five hours to test and you are on
DAY THREE of testing? Wassup with that?"
---> Bottom line, there are elements in how long it takes to do testing beyond
the testers control. It's generally possible to estimate a test /cycle/ with
some accuracy, but estimating the entire test /process/(*) is rarely possible
unless you know the devs very well and have had some stability in delivered
software quality for some time.
Estimating in terms of points 'smooths' those gaps.
That's one possible explanation, anyway ...
--heusser
(*) - Yes, this pre-supposes that testing is a separate and distinct activity, I
know, we should be involved up front, whole team, etc. I'm with you. But
you gotta walk before you can run. Let's have that discussion on a separate
thread, ok?
Subscribe to:
Posts (Atom)