Give me 100% Code Coverage or Give Me Death!

From Programmer 97-things

Jump to: navigation, search

I once knew a manager who, having just bought a license for a code coverage tool suddenly mandated a 75% coverage minimum for all unit tests. Not an intrinsically unreasonable act in itself since it would surely result in more tests and thus higher quality code. Wouldn't it?

However the unintended consequence was that the harried developers ingeniously solved their problem in ways reminiscent of a Soviet farming collective eager to meet an unreasonable grain quota after a bad year:

1) They removed untested but essential error checking, handling and reporting from their code. Who needs a seatbelt and air-bags anyway?

2) They added "tests" that exercised additional code but didn't actually make any meaningful assertions about it's behaviour. Hey, I don't know what it did but it didn't crash!

3) They used features of the coverage tool to exclude difficult to test code from the the analysis process (technically this is called "lying").

The coverage statistics went up and everyone was awarded their tin medal from the Supreme Commander (it may have been a pen or company T-shirt, I really can't remember).

Of course we can all laugh now because we do Test Driven Development (TDD) and thus not a line of production code is ever written before we have a test for it and consequently our coverage is always 100%. Yeah, right... With the best of intentions it just doesn't work out that way and sometimes justifiably so since there are unavoidable sources of untestable code, for example:

1) Language or library features that force us to write code to handle exceptions or conditions that "just can't happen" in the context in which we're using them. Testing the "impossible" condition or unused method might be achievable but doing so may be time consuming and ultimately a contrived and worthless process.

2) There are the oceans of auto-generated code regurgitated by trusted (if not trustworthy) 3rd party tools, often as untestable as it is unreadable.

3) There is user interface code whose tests cannot easily be automated or perhaps run at all on the headless continuous integration servers that run the tests and collect the metrics.

4) There is code that can only be executed during integration testing, for example when it must be run in complex esoteric container environments (OSGI, J2EE, JAIN SLEE etc.) and where the coverage tools might not easily be used.

5) Finally (my favorite) there are the quirks of the coverage tool itself, the language and the compiler which together conspire to frustrate by (for example) creating hidden extra code or methods unbidden and then penalising you for not covering them your tests. No two coverage tools ever give the same results.

So where does this leave us? Is code coverage simply a useless measure that succeeds only in damaging our fragile self esteem and depriving us of medals?

Well, Yes. If used in a vacuum of knowledge about the code under test and as an arbitrarily imposed hurdle of a certain height.

However when used in a different spirit it can be invaluable.

What if we accept that there are categories of code that must exist but that genuinely need never (or can never) be executed under the glassy eye of the coverage tool? If we eliminate that from our consideration and from the metrics then what are we left with?

We are left with code for which 100% coverage should be attainable and a natural consequence of T.D.D. Any less requires attention since it demonstrates:

1) A genuine hole in the tests, some feature is not being exercised by them.

2) Some code has become redundant and can be deleted, for example after refactoring.

3) Some unreached code has been newly written and might be eliminated through refactoring.

4) Poor tests have resulting in non determinism e.g. variant thread interactions, timeouts, retries etc. that don't necessarily lead to test failure.

5) Some new genuinely necessary and untestable code has been written and as a last resort must be excluded from the coverage metrics by informing the tool to ignore it. By doing so we know there are no other lingering issues as detailed above.

In conclusion then a 99% coverage figure alone sounds fantastic but tells you nothing about the quality of the tests or the reliability of the code. That missing 1% might be some vestigial oversight or it might be the code that deploys the aircraft undercarriage in severe crosswinds - who knows?

If on the other hand you are confident that the coverage tool has been used intelligently as an aid to the development of a meaningful test suite then 99% coverage shows that something is wrong.

Personal tools