Schedule and Events

March 26-29, 2012, Software Test Professionals Conference, New Orleans
July, 14-15, 2012 - Test Coach Camp, San Jose, California
July, 16-18, 2012 - Conference for the Association for Software Testing (CAST 2012), San Jose, California
August 2012+ - At Liberty; available. Contact me by email:

Wednesday, August 20, 2008

Matt's cool developer tool

At the Tech Debt workshop last week, Ron and Chet reiterated Kent Becks four rules of good software:

1) Runs all tests - and they all pass - with every check-in
2) Contains no duplication
3) Expresses all business intents
4) Minimum Amount of Code

These rules go down in an order of importance. For example, if you can have less code by losing intent - don't do it.

Getting support for number one automatically is pretty common, and as simple as hooking up a continuous integration server.

So how can we do the rest?

To explain how to get number two, I need to talk about how Compression algorthmns work. The simplest compression alogorithmn works like this:

1) Find all the duplicate text (for example, the overuse of the term "inevitable" in a bad novel)
2) Replace that duplicate code with a symbol that does not otherwise appear in the text, such as "|$"
3) Create a "header" int he file that lists all of the symbols, followed by a unique terminator

My idea is simple:

Leverage an object-oriented compression algorithm to identify all the duplication in the code. Use some kind of threashold for the duplication - such as multiple lines - or else every time the same variable is referenced it will show up as a duplicate.

Create a results file that lists all of the duplicates, and what line of code they appear on.

The developer uses this to eliminate duplication in the code.

Step two is to integrate it into an ide to make refactorings (like a good diff tool that also lets you select 'fixes') - or have some settings that do the refactorings for you. The problem with second option is that the tool would have to be language-aware, but it wouldn't be too hard to do this for java or .net languages.

There you have it. I'm off to find a good OO compression parser in perl! :-)

UPDATE: I looked into these tools about a year ago, and all I found was tools for a specific language, not generic text analyzers. Simian seems to be able to work on any ASCII text file; I guess I am a day late and a dollar short.

Oh well. It would still make a neat open-source project as part of a portfolio.

1 comment:

Anonymous said...

You may want to check out Simian. I have not looked into it very deeply, but it sounds like it does something similar to what you are describing.