Saturday, December 1, 2007

Memoization with Java Closures

I've been doing some catch-up reading recently, and one idea I've read about on some C# blogs is that we can borrow something from the functional programming world called 'memoization' and implement it neatly using closures to provide an alternative mechanism for some common coding problems.

Basically, memoization allows us to store a result of some kind against the input parameters from which the result was derived, and to retrieve that result again given the same parameters without having to calculate it again from scratch, which may be costly or contrary to the fundamental design. This should sound pretty familiar - if we're thinking about performance we might use an approach like this and call it a cache, and it can also be used as a way to implement the Singleton or 'Multiton' patterns for controlling object creation.

Using the BGGA Closures for Java prototype, we can implement a very simple memoize method like this:

public class Memoization {
static <A, R> {=> R} memoize({=> R} fn) {
boolean cached = false;
R cachedValue = null;
return {=>
synchronized(fn) {
if (!cached) {
cachedValue = fn.invoke();
cached = true;
}
}
cachedValue
};
}
}


This is an extremely basic form - it's only applicable when there are no parameters used to calculate the result - but we can use it to implement a singleton class Foo:

import static Memoization.*;
...
public class Foo {

private static {=> Foo} fooMemoizer =
memoize({=> new Foo()});

public static Foo getInstance() {
return fooMemoizer.invoke();
}
...
}

This works because the call to memoize() creates two local variables, cached and cachedValue, and returns a closure literal which has access to those variables and the fn parameter which knows how to get the value we're after. When we invoke() singletonMemoizer for the first time, cached is false, so fn is invoked, its result is stored in cachedValue, and cached is set to true. When we invoke singletonMemoizer again, cached is now true so we just return cachedValue.

There are neater ways to implement the singleton pattern in Java, but what if we return a value based on some key, where providing the same key multiple times should result in same value being returned each time? In Java 6 we might write:

public class Foo {

private static Map<Key, Foo> instanceMap =
new HashMap<Key, Foo>();

public static Foo getInstance(Key key) {
Foo foo = null;
synchronized(instanceMap) {
foo = instanceMap.get(key);
if (foo == null) {
foo = createFoo(key);
instanceMap.put(key, foo);
}
}
return foo;
}
...
}
This is also very common, and sometimes seen as an example of the 'multiton' pattern. Now let's overload memoize() to cope with this:

static <A, R> {A => R} memoize({A => R} fn) {
Map<A, R> cache = new HashMap<A, R>();
return {A a =>
R value = null;
synchronized(cache) {
value = cache.get(a);
if (value == null) {
value = fn.invoke(a);
cache.put(a, value);
}
}
value
};
}

There's a real advantage to using the memoize() approach here - we've abstracted the caching and thread synchronization logic into a general purpose method, and all we have to provide each time we use it is a simple closure literal which knows how to create a Foo given a particular Key:

import static Memoization.*;
...
public class Foo {

private static {Key => Foo} fooMemoizer =
memoize({Key key => createFoo(key)});

public static Foo getInstance(Key key) {
return fooMemoizer.invoke(key);
}
...
}

This advantage shouldn't be underestimated - when we re-code the same logic over and over, variations inevitably turn up in the form of bugs, which is especially nasty when we're dealing with thread-safety concerns (and you don't need to be explicitly using Threads or java.util.concurrent for thread-safety to be relevant!).

The next steps are to look at how we might write a version of memoize() which will work with multiple keys, but I'll save that for the next post.


UPDATE: Neal Gafter kindly pointed out a school-boy error in the synchronization I used in the first version of memoize() - I've updated the method accordingly. This just emphasises the point though: finding and fixing the bug in one place is far easier and better than finding and fixing all variations across multiple implementations of the same logic.

No comments: