Measure Don't Guess

From Programmer 97-things

Revision as of 18:20, 31 October 2009 by Kevlin (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

JoGoSlo is very proud of the talented developers they have been able to attract to their organization. They are particularly pleased as their developers have just delivered a complex project on time and on budget. All seems well except there is some nagging issue with performance. The developers continue to poke and peek at the code but they just can't seem to get a handle on the problem. Every time they think they have it nailed, management runs off to the project sponsors to proclaim: "Don't worry, a fix is in the works." At least they do at first.

As each successive "fix" doesn't seem to solve the problem the developers start picking at other parts of the application. One month goes by, still no fix. And then two months, three months, and more. Meanwhile management is becoming more and more reluctant to face the sponsors who have now become quite skeptical. So skeptical, in fact, that they've decided to cancel the project because they've spent way more than they budgeted and they still don't have a usable application. Did I mention that the developers delivered on time and on budget?

If all of this seems quite cynical, I would counter that JoGoSlo is actually a composite of of the many of the different organizations I've had the pleasure of working with. The only mistake that these developers have made is they've let themselves be distracted by things other than the real cause of the problem. More often than not, the distraction is ugly code. They have guessed that code that ugly must most certainly be at the root cause of the problem only to find out later that it most certainly wasn't.

The mistake that these teams make is that when they encounter a problem they immediately turn to the code. But like many other bugs, the performance problems are often not apparent in the code and show themselves only in the running system. This is because a performance bug is a result of the dynamics of the system, whereas code is just a static element of that system. To find the real cause you need to take measurements before you deep dive into the code.

The most common tool that a developer will grab is an execution profiler. Unfortunately, this is often the wrong tool, or it is being used way to early. Looking at data produced by any profiler is like drinking from a fire hose. A better approach is to build a time budget by recording a breakdown of end-user response times on a component by component, system by system basis. In building this time budget, one will easily be able to see where the problem lies. Once you know where the problem is you'll know what type of profiling is needed and how to aim that profiler effectively so you aren't overwhelmed with data.

Obtaining the right measure is always a game changing event. Armed with this measurement, JoGoSlo was able to fix the problem in hours. More importantly, management was able to face the sponsor with confidence. The alternative? Guess don't measure, then call me!

By Kirk Pepperdine

This work is licensed under a Creative Commons Attribution 3

Personal tools