What is a Memoizer and why should you care about it?
A Memoizer is a class created by Doug Lea/Brian Goetz. It's basically a create-and-store cache. That is, it has one very useful property:
- A Memoizer is a Generic class that encapsulates the process of creating a value, and remembering the value (memoizing it) so that subsequent calls to get the value will return the memoized value, and not call the Factory method to create it.
Why do you care?
- A Memoizer encapsulates the process of "getOrCreate()" which is highly useful when you are getting a resource that does not change by a key (think cache) and which is generally considered to be expensive to create.
But, you ask, I already can do that, why do I need a Memoizer? Glad you should ask. Memoizer is unique in its solution, because it caches a value by key, but its implementation is very efficient. It has the following additional properties:
- Calls to get items are concurrent
- Calls to get items for a particular key are guaranteed to call the Factory method for that key once and only once
It's important to understand that what the Memoizer does is both efficient, and correct. It's relatively easy to roll your own "
getOrCreate()
" method that has only one of those properties (and most people do). It's not so easy - or obvious - to do both and that's why you should use a Memoizer - and not roll your own.To review, let's see the strategy most people would use for calling the Factory method once and only once. Here's how to call the Factory method once and only once:
/*** SUB OPTIMAL IMPLEMENTATION THAT IS CORRECT, BUT NOT CONCURRENT ***/
private final Factory<A,V> factory = ...;
private final Map<A, V> map = new HashMap<A, V>();
public V getOrCreate(A key) {
synchronized (map) {
if (map.contains(key)) { return map.get(key); }
// create
V value = factory.create(key);
map.put(key, value);
return value;
}
}
But of course, the line
synchronized (map)
should be a tip-off that this implementation is not concurrent since there is only a single lock protecting access to our map. Can we do better? Sure. Let's use a ConcurrentHashMap to make our
getOrCreate()
method concurrent (but as we'll see, will not have the once-and-only-once property):
/*** SUB OPTIMAL IMPLEMENTATION THAT IS CONCURRENT, BUT NOT CORRECT ***/
private final Factory<A,V> factory = ...;
private final ConcurrentMap<A, V> map = new ConcurrentHashMap<A, V>();
public V getOrCreate(A key) {
if map.contains(key) { return map.get(key); }
// map doesn't contain key, create one -- note that first writer wins,
// all others just throw away their value
map.putIfAbsent(key, Factory.create(key));
// return the value that won
return map.get(key);
}
So, those are the two implementations, each implements either correctness (once and only once) or performance (concurrent puts and gets) but neither accomplishes both simultaneously. Of course, that's the whole point of this article. To introduce you to Memoizer. So how does Memoizer ensure correctness (once and only once) while being simultaneously concurrent?
The trick is to delay the call to the Factory create method into the future. We already know how to make the operation concurrent, but making the operation concurrent means that the first writer will win, and all the other writers will lose. So we delay the Factory create call by wrapping that call in a FutureTask. In other words, instead of putting the actual value into the map, we put a Future (which is a wrapper for getting the value sometime in the future) into the Map.
(Note: If you have not yet become familiar with the
Future
and FutureTask
classes introduced in the JDK 1.5, you should)By putting a Future into the map - and not our actual value - we can move the work of calling the Factory create method into the future - specifically we can move it until after the winner of the race to put the value into the map has been determined. That enables us to call get - which calls create on the Factory - on the one and only Future instance that is contained within the Map.
The full code for Memoizer illustrates this trick. Note that the
Computable
interface specifies a generic "Factory". Also note that I have not been able to find a library or canonical reference for Memoizer. The best I have is here: Memoizer.java.Here it is re-created for your convenience (note that I have adjusted it slightly, for example allowing the user to specify the number of segments created in the underlying ConcurrentHashMap):
public interface Computable<A, V>
{
V compute(A arg) throws InterruptedException;
}
public class Memoizer<A, V> implements Computable<A, V>
{
private final ConcurrentMap<A, Future<V>> cache;
private final Computable<A, V> c;
public Memoizer(Computable<A, V> c)
{
this(c, 16);
}
public Memoizer(Computable<A, V> c, int segments)
{
this.cache = new ConcurrentHashMap<A, Future<V>>();
this.c = c;
}
public V compute(final A arg) throws InterruptedException
{
while (true) {
Future<V> f = cache.get(arg);
if (f == null) {
Callable<V> eval = new Callable<V>() {
public V call() throws InterruptedException {
return c.compute(arg);
}
};
FutureTask<V> ft = new FutureTask<V>(eval);
f = cache.putIfAbsent(arg, ft);
if (f == null) {
f = ft;
ft.run();
}
}
try {
return f.get();
} catch (CancellationException e) {
cache.remove(arg, f);
} catch (ExecutionException e) {
LaunderThrowable.launderThrowable(e);
}
}
}
}
You will of course need the
LaunderThrowable
implementation to compile this example. For a full code listing, hop on over to Terracotta, where I show not only a full working example, but demonstrate how it works with Terracotta across two nodes:Full Memoizer Example with Terracotta