Sunday, September 16, 2007

Algorithms 101 - How to eliminate redundant cache misses in a distributed cache

I'm going to jump right in to a fairly complex subject, for background check out my recent Distributed Cache Webcast here: Online Training.

In the Webcast, I wrote a simple caching service that fronts a simple "GetService" method. I define "GetService" as some service that, for a given key, can retrieve a given value. Pretty basic stuff, if you have ever implemented a SQL query, a Web Service client, or something similar, you can probably envision the implementation. The interface looks like this:

public interface GetService<K, V>
{
public V get(K k);
}

The Typical Solution


Here's where it gets interesting. The common way to implement a cached version of a GetService method would be to perform the following operations:

  1. Check if K exists in the cache

  2. If not, load V from the Service (a database for example) using K

  3. Put (K,V) into the cache

I kid you not, this is the pattern that seems to be popular, and accepted, as best practice. For example, memcached describes this operation thusly in perl:
sub get_foo_object {
my $foo_id = int(shift);
my $obj = $::MemCache->get("foo:$foo_id");
return $obj if $obj;

$obj = $::db->selectrow_hashref("SELECT .... FROM foo f, bar b ".
"WHERE ... AND f.fooid=$foo_id");
$::MemCache->set("foo:$foo_id", $obj);
return $obj;
}

The problem with this approach is that it simply ignores the race conditions that happens when you have a cache miss. So what, big deal, the gap is really small, right? Ok, fine, if you like races in your code. I don't like races in my code, you shouldn't either. Worse, in the context of a distributed cache, races don't just cause corrupted data, they cause performance degradation.

To solve the race, we have to fix a couple of issues...

Step 1 - Change to First Writer Wins


First, we have to change from a Last Writer Wins to a First Writer Wins strategy.

In the the Last Writer Wins strategy, all of the racing writers put their own version of the value retrieved from the underlying get service into the cache. The last one to do so will end up with their version of the value in the cache. So if there are 10 nodes racing through this code, the underlying service - let's say a database -- will be hit 10 times, and each of the 10 writers will put 10 unique values into the cache (note that the exact value depending on the implementation specifics and timing of changes to the database).

We want to change to a First Writer Wins strategy, so that the cache will return a canonical value regardless of the number of races. To fix to a First Writer Wins, we make one small change to the algorithm. Let's look at the implementation in code:

public final class CachingService<K, V> implements GetService<K, V>
{
...
public V get(K k)
{
V v = cacheStore.getValue(k);
if (v != null) { return v; }

V v = getService.get(k);
return cacheStore.putIfNotExists(k, v);
}
}

In place of Put there is a method called putIfNotExists. This method basically relies on our underlying cache implementation to implement the First Writer Wins strategy. This may be easy or hard, depending on your cache implementation. Lets assume it can be implemented for now.

So, have we solved the problem?

No.

Step 2 - Eliminate redundant reads


The next thing we have to do is eliminate the multiple reads that happen due to multiple racing writers. This happens in step 2 after each racing writer has discovered that there is a cache miss. What we have to do is coordinate the actions of all of these cache writers so that only one of them goes to the underlying cache service and retrieves the value, puts the value into the cache, and then notifies the other pending writers (now readers) that the value is available.

Sounds easy right? How would you do this? I would think it would be next to impossible in memcached. Maybe your underlying caching technology gives you some locking and messaging APIs - maybe it doesn't. If not there's always RMI, JMS, JGroups, SOAP, ESB, EJB, etc. Sound like fun? Ugggh, not to me.

And how do any of these deal with failures of the node requesting the data? You'll need to let one of other pending readers in after the failure and let it do the read. Correctly handling the entire bevy of problems that happen in a distributed environment is the reason memcached suggests taking the easy way out. It's simply too hard to manage all of the possible failure scenarios.

Don't Give up Just Yet


But wait. That's why Terracotta is such a powerful solution. It doesn't just give you a durable memory store to write to - which of course it does.

In fact, Terracotta extends the entire Java Memory model across the cluster - which means writing code to synchronize and coordinate the actions of many writers is a simple matter of writing natural Java code.

I hope writing natural Java sounds like more fun than using JMS, RMI, proprietary API, and the like. It does to me. If you like writing Java, you'll like writing clustered Java with Terracotta.

So the problem we need to solve is to write some code that can ensure that the value v for some key k is retrieved once and only once. Ensuring mutual exclusion to ensure only one writer reads the value is really trivial if we know we can just rely on Java synchronization - on a single node or many nodes across the cluster which is exactly what we get with Terracotta.

Here's the code:

public class DynamicCacheElement<K, V> implements CacheElement<V>, Serializable 
{
...
public synchronized V get()
{
if (v != null) { return v; }
return v = getService.get(k);
}
... (factory methods to follow)
}

That's it - I swear. Note though that in the real code I replace synchronized with a ReentrantReadWriteLock instead of synchronization to optimize the case where the value is not-null allowing more than one reader to enter. Afterall, we want high concurrency in a distributed environment.

All that is left now is to change our CachingService to store CacheElements, not type V, and we are done:

public final CachingService<K, V> implements GetService<K, V>
...
public V get(K k)
{
CacheElement<V> element = cacheStore.getValue(k);
if (element != null) { return element.get(); }

element = elementFactory.createElement(k, getService);
return cacheStore.putIfNotExists(k, element).get();
}
...
}

Summing it up


The new and improved caching service now reads once and only once for a given k, across the cluster using this algorithm:

  1. Reads the cache for Key k. If the cache contains the value, return the value

  2. Instantiate a CacheElement which delays the get using the Key k and the underlying GetService until the method get() is called.

  3. Put the new CacheElement into the cache using a First Writer Wins strategy. Ask the resultant CacheElement to get() the value.

Resources

8 comments:

Lord Blagger said...

Personally, I find on thing still ugly.

Why not have an abstract service class, and have a concrete service inherit from it, and a cached service, which has a reference to the concrete service.

That way, you can easily test non-cached against cached, and its also pretty clear where the factory ends up, on the concrete service side.

Testing the caching is then isolated.

Taylor said...

It is open source if you feel like changing it I would be more than happy to review for suggestions. Just send me a patch diff.

David Biesack said...

Another solution is the Memoizer class which the authors of Java Concurrency in Practice demonstrate on p. 108 of their book. It uses a FutureTask to compute the value once, in a thread safe manner, and readers only block if trying to read while the value is being generated.

Taylor said...

@david

Well I'll be! That sure looks like the code I wrote, but a little more baked.

That certainly backs up my first (second) post to this blog http://javathink.blogspot.com/2006/09/research-it-first.html

Thanks!

Anonymous said...

(法新社倫敦四日電) 英國情色大亨芮孟的a片下載公司昨天AV片說,芮孟日成人影片前去成人網站世,sex享壽八十二歲;色情這位身av價上億的房地產開發情色電影商,曾經在倫敦推成人網站出第一場脫衣舞表av演。

色情影片
芮孟的財產成人影片估計成人達六億五千萬英鎊(台幣將a片近四百億),由於他名下事業大多分布在倫敦夜生活區蘇活區色情成人因此擁有「蘇活情色視訊之王」日本av的稱號。
部落格

他的成人電影公司「保羅芮孟集團」旗成人網站下發行多a片種情色雜av誌,包括「Razavzav女優leavdvd」、「男性世界」以及「Mayfai情色電影r」。色情a片
a片下載
色情
芮孟情色本名傑福瑞.安東尼.奎恩,父av女優親為搬運承a片包商。芮孟十五歲離開學校,矢言要在表演事部落格業留名,起先表演讀心術,後來成為巡迴歌舞雜耍表演av女優的製作情色人。


許多評論a片成人電影認為,他把情色表演帶進主流社會,一九五九部落格年主持破天荒的脫衣舞表演,後來成人影片更靠著在蘇活區與成人光碟倫敦西區開發房地產賺得大筆財富。


有人形容芮孟是英國的海夫納,地位等同美國的「花花公子」創辦人海夫納。

Anonymous said...

handmade jewelry
jewelry wholesale
discount jewelry
handcrafted jewelry
cheap jewelry
crystal jewelry
wholesale discount jewelry
wholesale fashion jewelry
wholesale china jewelry
wholesale discount jewellry
china jewelry manufacturer
wholesale pearl
wholesale jewelry
costume jewelry
handmade jewelry
fashion jewelry
pearl jewelry

商標註冊/專利申請達人 said...

當舖
專利
專利
關鍵字
專利
當舖
存證信函
存證信函
關鍵字
存證信函
seo
當舖
商標
商標設計
自創品牌
商標
自然排序
商標
當舖
網站排名

Anonymous said...

jklhj I like your blog. Thank you. They are really great . Ermunterung ++ .
Some new style Puma Speed is in fashion this year.
chaussure puma is Puma shoes in french . Many Franzose like seach “chaussure sport” by the internet when they need buy the Puma Shoes Or nike max shoes. The information age is really convenient .
By the way ,the nike max ltd is really good NIKE air shoes ,don’t forget buy the puma mens shoes and nike air max ltd by the internet when you need them . Do you know Nike Air Shoes is a best Air Shoes . another kinds of Nike shoes is better . For example , Nike Air Rift is good and Cheap Nike Shoes .the nike shox shoes is fitting to running.
Spring is coming, Do you think this season is not for Ugg Boots? maybe yes .but this season is best time that can buy the cheap ugg boots. Many sellers are selling discounted. Do not miss . Please view my fc2 blog and hair straighteners blog.
.thank you .

I like orange converse shoes ,I like to buy the cheap converse shoes by the internet shop . the puma shoes and the adidas shoes (or addidas shoes) are more on internet shop .i can buy the cheap nike shoes and cheap puma shoes online. It’s really convenient.
Many persons more like Puma basket shoes than nike air rift shoes . the Puma Cat shoes is a kind of Cheap Puma Shoes .
If you want to buy the Cheap Nike Air shoes ,you can buy them online. They are same as the Nike Air shoes authorized shop. Very high-caliber Air shoes and puma cat shoes . the cheap puma shoes as same as other.