Picking for scraping

One of our developers coded what is essentially a data scraper. The implementation is very brittle. It has to become more robust. Without rewriting it, one technique is to use "pickers" to get valid values from a poor quality input. The way a picker works is that you declare where to get the data, what type it is, and what a default value is. A different picker is created for each way a value is accessed. Don't worry, there tends to be only a handful of data access patterns so not a lot of code will be created.

For example, the following code caused an NPE this morning because result didn't have a "data" child:

Element data = result.getChild("data");
int count = new Integer(Strings.noNull(data.getAttributeValue("count"), "0")).intValue();

If a picker were used then the code would have returned zero, the default value.


int count = picker(result,"data","count",0);

The key aspect of the picker is that it checks every working value before using it. For the example above, the picker would be implemented as

int picker(Element tree, String elementName, String attributeName, int defaultValue) {
    if (tree == null) {
        return defaultValue;
    }
    Element element = tree.getChild(elementName);
    if (element == null) {
        return defaultValue;
    }
    String attributeValue = element.getAttributeValue(attributeName);
    if (attributeValue == null) {
        return defaultValue;
    }
    try {
        return Integer.parseInt(attributeValue);
    }
    catch (NumberFormatException e) {
        return defaultValue;
    }
}