Chronicles of a Terracotta Integration: Compass
Last week, I met up with Shay Banon, author of Compass, at the The Server Side: Vegas conference. We thought it would be great to see if we couldn't crank out an integration between Terracotta and Compass. You can read more about our integration from Shay himself.
I wanted to write a log of our efforts, because I thought it might provide some insight for anyone considering integrating Terracotta into their own project. I was particularly happy with our effort, because it outlines what I feel is the best approach for developing with Terracotta. The approach is actually quite simple. Because Terracotta adds clustering concerns to your application using configuration, you don't write code directly to Terracotta. Instead, you just write a simple POJO application without Terracotta, and then add the clustering later.
So the approach I recommend is the following:
- Figure out how to implement the solution using a single JVM. NO TERRACOTTA. Use just simple POJOs and threads.
- Implement and test your solution.
- It helps to have envisioned, beforehand, what part of your implementation will become a Terracotta root. But it's not necessary. If your application is stateful, it will have a root.
- Using the root, start with a basic Terracotta config file, and build up the appropriate config file to cover all the instrumentation and locking.
- Test your application again, with a single jvm, but this time with Terracotta.
- Tune your implementation.
- Move to 2 or more JVMs.
10:00 am - Shook Hands - Shay and I met up at the conference.
10:05 am - Started coding - First we chatted a bit about our strategy. It seemed easiest to start with the existing Lucene RAMDirectory implementation and tune it up a bit.
10:30 am - Strategy decided - Based on my knowledge of Terracotta, and Shay's knowledge of Lucene/Compass, we decided on the following:
- Start with the Lucene RAMDirectory implementation, but rewrite it as necessary to fit a simple POJO model
- Since RAMDirectory is mostly unmaintained, we knew we had to just go through the implementation and clean it up. It comprises about 4 classes total, about 100 lines long, so the task was feasible.
- Because Terracotta can just "plug" in to a well written application, and Shay has a comprehensive unit test suite (over 1,000 tests), a load test, and a concurrency test, we'd write the implementation first in POJOs
- After verifying that the implementation works as expected in pure POJOs, then we would work on the configuration to inject Terracotta clustering
- After running the solution with Terracotta, we would tune it.
- And finally, we would wrap up various bits and pieces into a Terracotta Integration Module (TIM)
Just a quick note - it was a real joy coding with Shay. He is a super smart guy, and it's great to work with someone like that. Of note, he really understands synchronization, which is really important to write applications correctly. Even better, he really got the principle of writing better code by writing less code. We went through the RAMDirectory implementation with a weed wacker, and what was left was about 1/2 the code. That was more readable and more maintainable. And is better performing. That was fun.12:00 pm - Unit Tests pass - With some minor corrections, we had unit tests passing. We were both running out of power, and hungry, so we took a break to eat lunch, and agreed to resume in the afternoon.
1:30 pm - Write the Terracotta config file - While writing the POJO implementation, we already knew the key concepts we were going to need for writing up our config file. We added the appropriate instrumentation. We added the locking. A few config statements later, we had a working Terracotta configuration.
2:00 pm - We had Compass running on Terracotta! - Approx. time elapsed? 2 1/2 hours (most of which was spent rewriting the RAMDirectory implementation)
2:30 pm - Tuning Time- At first Shay threw me - he said oh man it looks like it's running really fast. Except it turns out he wasn't testing the right thing. And then he tells me oh man its really slow!
Now don't misunderstand this. I know Terracotta can go really fast. But I wasn't in the least bit surprised. And you shouldn't be either. How many pieces of code have you ever written that compiled and ran correctly - on the first try? Right. One, if you are lucky.
Terracotta is kind of like that. The first step is to get it right. And that means synchronization, and locking, and once you have all that, your application runs correctly, but slowly.
Fortunately, it's easy to fix.
And so I taught Shay how to tune up his Terracotta integration. Or rather, I showed him the tools he needed, and he went to town. I just sort of stood by watching, giving the occasional comment or two.
This was the fun part. It was time to take out the Admin console. The Terracotta Admin console gives you a wealth of information about your application. Of note:
- You can browse your clustered data in realtime
- You can monitor realtime statistics - including Terracotta txns/sec, Java Heap Memory, and CPU
- You can access lock statistics using the lock profiler
- You can snapshot over 30 metrics using the Statistics Recorder and visualize them using the Snapshot Visualizer
On our first run, we measured the Terracotta txns/sec. I was actually pretty impressed to see that our server on his MacBook Pro was cranking out 10k/sec. But I knew we wanted this number to be lower. A lot lower.
So here comes the first rule for tuning Terracotta: adjust your locking to match your workload. It turns out that we had enabled an autolock for every single byte being written to the Lucene "files" - and this was hurting us pretty bad. Because we already had a lock on our byte array that we were writing to, we actually just deleted the synchronization, and the lock config from the method that wrote bytes into the "file" - and we observed a big drop in the Terracotta txns/sec. We went from the aforementioned 10k/sec to about 1750/sec.
Now what this means is that the Terracotta server was working just about 10x less for the same workload. And that means we were doing more work/transaction, and so our performance improved accordingly. You get the same effect with Hibernate - it batches up a bunch of little POJO updates into a single SQL statement - and that means you can do more real work because each SQL statement has more data in it. Lots of little SQL statements means lots of overhead, and maybe more SQL queries executed/sec, but much less application txns/sec. Same concept here with locking.
How did we identify what lock(s) to target?
That's the second rule of tuning with Terracotta: USE THE ADMIN CONSOLE
We used the lock profiler feature included in the Admin Console to determine the exact stack trace that generated these locks. The process is simple:
- Enable lock profiling with stack traces in the Admin Console,
- run your application,
- then refresh the view to get a count of the lock acquires/releases/held times etc.
- sort on # of lock acquires, and now you know what lock is being requested the most, what stack trace caused that lock, and what Terracotta config was responsible for making that lock.
At the end we got down to about 750 Terracotta txns/sec, which improved the application performance quite a bit.
Still not satisfied, we moved on to the Terracotta Statistics Recorder. This is a new feature in Terracotta 2.6.
Turning on this feature records just about everything you ever wanted to know about your application, Terracotta, the JVM, and your system (including CPU, disk stats, and network stats). You can export these stats as a CSV file, and import them into our Snapshot Visualizer Tool. The SVT gives you a view like so:
4:30 pm - TIM time - We were pretty satisfied with the performance. Even though we wanted more, Shay felt it was best to focus on turning Compass into a TIM (Terracotta Integration Module).
5:30 pm - Time to call it quits - We had hacked up the ant build.xml file to get ourselves a TIM in no-time - except that it wouldn't quite load correctly. (Later we learned we had just specified the filename wrong - easy fix).
Overall, I thought we had a pretty good day. We wrote and tuned a Terracotta integration in about 6 hours flat. With a few more hours of work, Shay was able to complete the integration.
I was really happy to use some of the recent tools we have been building, like the Lock Profiler and the Statistics Recorder. Seeing the real-world use of those was invaluable, and confirmed that our commitment to enabling the developer to self-tune by providing enhanced visibility is spot on.
I am looking forward to people downloading 2.6, trying out these awesome tools for themselves and providing feedback!