Collection of Collections Is a Code Smell

From Programmer 97-things

Jump to: navigation, search

Some time ago, while writing a scheme to marshal objects to and from a network, I ran into a problem where I was receiving attributes off the stream that had yet to be defined. At first I tried to keep the values in a collection that was embedded in another collection. No matter what I did, the code stitching the objects back together just wasn't gelling. At the root of the problem was how these unresolved references were being (mis)managed. Because order was important, the reconstituted objects where being placed in a stream. The objects that had yet to be defined, however, were being placed in a collection in the stream. This collection of collections was creating edge cases that needed to be handled until the definitions arrived.

It is easy to see from this short description that the missing abstraction was a forward reference. Once a class representing that abstraction was introduced into the model, all of the code needed to manage special cases melted away. Not only was there less code, it was more readable and more robust. Since that experience (in Smalltalk), I've run into a number of other cases where a collection of collections has been replaced by an abstraction. This has resulted in my belief that a collection of collections is a code smell.

Kent Beck's term code smell is used to describe code that is awkward or otherwise doesn't, well, smell quite right. Awkwardness in code, more often than not, points to some underlying problem with the design. Let's explore this through an example where we have a catalog being represented as a list of sections, where each section contains a list of items:

public class Catalog {
    private HashMap<String, ArrayList<Item>> sections = new HashMap<String, ArrayList<Item>>();
    public void addSection(String name) {
        sections.put(name, new ArrayList<item>());
    }
    ...
}

The first thing to notice is that it's not immediately obvious what the ArrayList<Item> actually represents. While this might not be so obvious in this trivial example, imagine burying this in several thousands of lines of code. Without good commenting it would be easy to miss intent and either miss opportunities for reuse or, worse, return the collection to a caller and have that code re-implement logic to manage the collection. While the former case is not uncommon, it is the latter case that is responsible for a significant amount of code bloat. To improve the situation all we needed to add is the missing abstraction for a section. Let's put this into a class called Section and examine the effect it has on the code:

public class Catalog {
    private ArrayList<Section> sections = new ArrayList<Section>();
    public void addSection(Section section) {
        sections.add(section);
    }
    ...
}

public class Section {
    private ArrayList<Item> items = new ArrayList<Item>();
    public void addItem(Item item) {
        items.add(item);
    }
}

If you find the second listing to be more readable than the first, then you've already experienced the greatest benefit: the effect on readability. The biggest argument against encapsulating the inner collection in a class is that it appears that you have more code. This effect can be attributed to the fact that what might be true in the small is often not true in the large. So while it is true that there will be more code in the micro view, the fixed cost incurred by including a separate class is rarely left unamortized in the macro world. More often than not, we do end up with less code, the code left behind is more readable.

The next time you come across a collection or collections, think code smell. And then ask "What is the missing abstraction?" If you pay attention to these code smells, you'll quickly start to notice an improvement in your codebase.

By Kirk Pepperdine

This work is licensed under a Creative Commons Attribution 3

Personal tools