Does Java really lack a sum() function?
I’m writing a lot of code that needs to sum over reasonably long collections. Somewhere on the things-that-you-just-shouldn’t-have-to-write checklist, right next to “a date-parsing library”, is “a function to sum over collections.” The naïve way to do it is simply
public double sum(List<Number> numbers) {
double retSum = 0;
for(Number i : numbers) {
retSum += i.doubleValue();
}
return retSum;
}
At one level, this code is annoying because it’s not generic enough. You’d like to be able to add complex numbers, for instance. Or matrices. Anywhere that ‘+’ is sufficiently “number-like,” you want this function to work. You wouldn’t really want it to work with, say, strings, even though Java overloads ‘+’ for them. Or maybe you would? Essentially, maybe you want to duck-type: if ‘+’ makes any kind of sense for the objects, allow them to be summed. In any case, I want sum() to be a function that takes a Collection of objects of some type T and returns another T.
What’s ickier about this code is that there are many more elegant ways to write it, at least in other languages. A language which supports first-class functions (see Joel Spolsky’s excellent Can Your Programming Language Do This?, whence the remarkable Java-as-kingdom-of-nouns parable) would allow you to define the wonderful map() and reduce() abstractions, whence (to use Python as an example) addition becomes just one use for reduce():
def sum(inList):
"""
Return the sum of the items in inList.
"""
return reduce( lambda x,y: x+y, inList )
One great virtue of map() and reduce() is that they’re highly parallelizable: the sauce that makes Google tick involves breaking many common tasks into a map() step and a reduce() step, then farming those separate steps out across thousands of machines.
My friend Ken Shan, by the way, pointed me to an absolutely wonderful talk by Guy Steele on the need for more parallel style in modern programming, now that we’ve entered the age of multicore machines. Whenever I see any sequential programming, like our sum() function above, I look for some way to turn it into a balanced tree: divide the list up into two sublists, for instance, and then sum the sublists; repeat this recursively until you’re down to the sum of zero- or one-item sublists, which is trivial to compute. Yes, I could write this now in Java, but I’d like a more general construct that would allow a great many functions — apart from sum() — to be likewise parallelized.
First-class functions make those sorts of abstractions easy, but I wonder whether they could be accomplished in Java with a bit more work. Perhaps it can be done with a functor? I know not about functors. This is perhaps something to research.
The broader point is just that it’s very silly not to already have a sum() function written for me — thoroughly generalized, well-tested, and highly optimized.