Schedule and Events



March 26-29, 2012, Software Test Professionals Conference, New Orleans
July, 14-15, 2012 - Test Coach Camp, San Jose, California
July, 16-18, 2012 - Conference for the Association for Software Testing (CAST 2012), San Jose, California
August 2012+ - At Liberty; available. Contact me by email: Matt.Heusser@gmail.com

Thursday, May 07, 2009

Metrics, Schmetrics

Long-time readers will know that I am very wary of metrics for software engineering. Oh, there's the usual problems:

1) Generally, software engineering metrics are proxy metrics. You really want to measure productivity, but you can't, so instead you measure lines of code. Or you really want to measure quality, but you can't, so you measure defects.

2) If you measure something, you're likely to get it - but people will take short-cuts in order to get that thing. Generally, this will involve exploit the difference between what you want and what is actually measured - for example, a developer will argue "that's not a bug" if measured by bug count, or a tester, if measured by bugs found, may search in the documentation and file a bug on every single typo. Demarco and Lister refer to this as "dysfunction" in their book Peopleware.

3) Likewise, software engineering metrics (often) measure things that are different and put them all in one box. Instead of measuring dollars or widgets, all of which are interchangable, we measure tests, bugs, or lines of code. These are /not/ interchangeable - some could take much more time to find/create than others - yet putting them in the same box means they are all treated the same. The result? You'll get a /lot/ of very small things. Try this with projects - watch the size of your portfolio shoot up ... but each project is smaller. Isn't it strange how that happens?

4) Even if you can measure well, there are probably some things you have not measured - and to achieve the metrics, the team will usually trade those "intangibles" off. The classic example of this is hitting a date by allowing quality - which is hard to measure - to suffer. In the 21st century, with more advanced techniques, we are getting better at assessing product quality, so the next thing to take on is usually technical debt.

5) The classic answer to this is to have a balanced scorecard - to measure several things, such that a tradeoff to increase one thing will cause a visible decrease in another. But consider how hard it is to measure technical debt - or strength of relationships - and consider how expensive it is to try to create and maintain an exhaustive metrics system. By the time the metrics system is in place, you could have shipped a whole new product. Can we really call that improvement?

Getting Metrics right is /hard/. Consider McDonalds, a multi-billion dollar corporation, that measures price of it's food, sales, and repeat customers. What do they not measure? I suspect McDonalds does not measure the waistlines of it's best customers, treatment of animals in it's food pipeline, and, until lately, effect on the environment of it's waste.

When I explain the challenges with Software Engineering Metrics to folks, I usually get one of two reactions: Either strong agreement "I always felt that way but didn't have the words", or no response at all. It's not that people who don't respond at all don't care - they are usually strong proponents of metrics in a software group. They simple have an opposing viewpoint and yet have no answer to the dysfunctional issues caused my metrics.

To which I will add one more thought experiment:

I belong to twitter, which counts the number of people who follow me. This is a simple, concrete, hard measure of my popularity. I can use my twitter score as an objective number to argue my case before a book company, a magazine publisher, or a conference - in some cases, this could directly result in more bookings and higher revenue for the still-exists-but-tiny Excelon Development.

Yet if /all/ I cared about was that one metric on twitter, I would adjust what I write to appeal to all people working in testing. Then to anyone doing any kind of software work. Then I'd generalize to knowledge work. Then I'd go mass-market and try to talk about technology in the 21st century. And the message would get weaker and weaker and weaker and ...

So, to be true to myself, I need to ignore my twitter ranking, ignore my technorati ranking, and try to generate real relationship and create content that's actually worth reading.

It's funny how that works, eh?

12 comments:

David Starr said...

Yet making a choice to deliberately measire something in pursuit of improving it can't really be considered bad, can it. You points just underline how important it is to shoose wisely.

AbbotOfUnreason said...

Good points all. In addition, I think too many metrics done poorly leads to waste. Metrics for metrics' sake.

Twitter is a great point. Even if you try to use those metrics for talking to publishers, the numbers are skewed: there are so many robot followers out there that for many having a bunch of followers is simply an artifact of posting frequently.

Michael Bolton http://www.developsense.com said...

Yet making a choice to deliberately measire something in pursuit of improving it can't really be considered bad, can it.It can be considered dangerous, though, and Measuring and Managing Performance in Organizations (Robert D. Austin) makes the danger explicit. It is this: when you announce that you are measuring a self-aware agency (typically a person, but also a group) on some dimension, that agency will respond by optimizing its behaviour towards improving that dimension. But value for services (such as software development or testing) is highly multi-dimensional, highly context-sensitive, and subject to lag times between the performance of a task, the measurement of it, control action, the next round of performance.

Austin shows ways in which measurement that leads to distortion and eventual dysfunction. That is, either groups learn to lie about the measurement but still provide value, or adhere to the measurement and introduce dysfunction, unless you can arrive at complete observation and measurement. You can get something close to that for manufacturing, but it's impossible for human and non-repetitive activities such as design and development. So what's the solution?

Austin notes that if you're a manager, make people responsible for the value that they provide, and explicitly avoid measuring them yourself (but encourage them to measure themselves and keep the measurement to themselves), they will tend towards optimal delivery of value. Essentially, tell them what you like and don't like about the product of their labour, but let them figure out how best to get it to you. He calls it "delegatory management", and it's more or less like the XP principle of the self-organizing team. It's also consistent with the approaches in the Positive Deviance movement.

It's a fascinating book; I guarantee that those who read it will find something of value in it. Those who haven't read it should, in my opinion, take a break and come back when they're done reading.

---Michael B.

PlugNPlay said...

Seems to be working, Matt.

Geordie

Anonymous said...

The most sensible piece of wisdom I've heard about metrics in agile is that they are best used to assess and improve on specific problems. You identify an area that needs improvement; if it is sanely measurable, measure and improve. Once it's not a problem any more, find something else to improve.

With this way of looking at things, metrics are not a universal set of permanent benchmarks but tools to be used in specific situations and set down when the situation doesn't pertain anymore.

Janet Gregory said...

Before I consider what metric I might need, I always ask myself what problem am I try to solve. That way, like Alevin said, I can stop measuring when the problem has been solved and move on to the next most important problem.

Shrini Kulkarni said...

In the recent past, I have seen some "reasonable" arguments" about "OK metrics are bad .. shall we reject and stop using metrics TOTALLY"...

Jason Gorman - writes about it here
(Example of Max Planck and Quntam Mechanics, is good one - that is a metric). Metrics debunkers - extreme picture ...

http://parlezuml.com/blog/?postid=798

Note that use of metrics for all sorts of code coverage numbers is what Jason appears to talking about. What about IT world ....things like defect density, defect leakage, cost of quality and so on ... it is pretty messy there.

Alan Page makes his point here - he must have finished his talk at STAR EAST by now on the same topic ....

http://blogs.msdn.com/alanpa/archive/2009/04/11/metrics-mayhem.aspx

What are your views...?

Shrini

Shrini Kulkarni said...

Do not forget to add two things --
(text from Jerry's books - paraphrasing him)

1. First order measurements and second order measurements. First order metrics help us to understand how things work while second order order metrics help to optimise and improve what we understand.
Software metrics are first order measurements --- we confuse them to be second order.

That brings us to second point

2. Inquiry metrics and control metrics... Inquiry metrics prompt us to ask questions about what is being presented where as control metrics tempt (or tease) us to take actions using the information (one of several streams) projected by the metrics by the producer of the metrics.

Metrics are multidimensional rich information squeezed into a single dimension number... it is much like putting "ship" into a bottle.

A metric can have potentially thousands of stories connected with it ...

Shrini

Matthew said...

From a link Shrini put up in a comment:

From a link you put on my blog:

"But dismiss all software metrics out of hand at your peril. Because I know that software teams who don't
measure quality tend not to deliver very good software. Successful development teams use metrics. Of course, there are plenty of teams who don't measure anything and who think they're successful. There are plenty of people who think they have psychic powers, too."

How could be possibly /know/ that? Shouldn't he have some metrics to do that? Why, if he doesn't have metrics, he must have /psychic powers/.

In other words, his entire argument boils down to "... but I know a lot of successful companies that use metrics, so just trust me."

Yes, I grant that metrics can be used to understand and model system and team behavior. Shrini did a /great/ job summarizing Weinberg there.

More to come.

Anonymous said...

Here's an idea -
1) Make your measurements mean something (e.g. align them with goals, or at least put a little thought into what they are telling you).

2) Put some thought into what might change if you measure that. See if there's a better way. If not, note that you'll need to monitor that measurment more

3) Monitor and make sure the measurement is telling you what you thought it would. If not, fix it or dump it.

Alternate plan:
Give up on measurement.

Both options sound pretty straight forward to me.

Joe Beck said...

I think you're probably both misinterpreting what Jason Gorman is trying to say in his blog post.

He's not saying that people who don't use metrics must be psychics. He's drawing a parallel between people who claim their code is good quality but have no hard evidence to back that up (metrics) and people who claim to be psychic but won't submit to scrutiny of their claims under scientific conditions.

And he's definitely not saying "I know some successful companies that use metrics therefore just trust me". He's saying that, in his experience, teams that don't use metrics tend to deliver poor quality software, and that's probably because they don't have testable (measurable goals) for quality.

And he's definitely not talking about just code coverage metrics. Or canned metrics in general.
Check out his Agile Metrics Design workshop to see what he really thinks about metrics:

http://www.parlezuml.com/metrics/doyougetwhatyoumeasure.htm

Matthew said...

Thanks joe. That link really helps.