A method we have in our common library is a check that a string is null, empty, or contains only whitespace. I suspect everyone has this method somewhere in their code. Ours was written a decade ago and we have not had need to replace it -- it works and has no obvious downside to normal operations.
However, every now and then I am reminded of something I learned at Tazz Networks when one of the teams was reviewing performance bottlenecks. The code that lowercased unicode strings was contributing to well over 1% of the total running time. The only reason for this high overhead was that at no time did anyone -- senior or junior -- state what the standard letter case was for hash table keys. And so, for example, the key butter was found in the code as "Butter", "butter", "BUTTER", and my favorite k.upper() only to lowered soon after! None of theses uses came from outside of the system (that is, user data) and so the development organization had complete control over their representation.
So what about the empty string check? The code is
return s == null || s.trim().length() == 0;
The use of trim always bothered me because it allocates a new string containing the original string without the leading and trailing whitespace. I would have been less bothered, perhaps, if we used this check infrequently, but some of the core code needs to check for empty because the development organization never set the standard for an empty string. If all empty strings could to guaranteed to be of zero length or, better yet, a sentinel value then the check would then be
return string == EMPTY_STRING
which is going to be very fast.
So, for now, the empty check has to look at the content of the string. I replaced the original check code with
if (s != null) {
for (int l = s.length(), i = 0; i < l; i++) {
if (!Character.isWhitespace(s.charAt(i))) {
return false;
}
}
}
return true;
In the case where the candidate string is the empty string or a string without leading whitespace this implementation is about 18% faster (100M iterations). When the candidate string contains just one leading whitespace it is 481% faster. The candidate string needs 34 leading whitespace characters before the two methods have similar run times.
I am going to keep the new code. Small things matter. Especially, when frequently used.