Sometimes I feel as though my little, simple, apps' source will end up being a terabyte after packaging all their dependancies. It's a little ridiculous.
Regardless, this seems like it has some pretty powerful implications for big-data processing. The potential integrating this with Clojure somehow, and parallelizing the computation across those 10 (at least) servers with 1TB of memory is pretty astonishing to think about. (Though you don't need Clojure to make it parallel, yes.)
And they're listening to people and continuing to work on the latter, e.g. 3 days ago they updated the Linux source code releases. A complete SRPM, particularly for Fedora Core 12, and a kernel patch suitable for auditing and applying to the newer 2.6.34 containing the memory management half with remaining scheduling part to follow: http://lists.managedruntime.org/pipermail/dev/2010-July/0000....
Key point: terabyte-sized memory pools. Which is quite awesome.
I'd initially thought it referred to terabyte-sized executables. Followed by a "oh great. Now someone's going to make one, and some government's going to want one."
An organization could put their entire database into memory, which would reduce the latency of the application by "a couple of orders of magnitude," he said.
That works well until the power goes out (and it does) or the OS (or the JVM) crashes. Keeping the hot portion of the data cached in memory (and maintaining a smarter cache vs. simple LRU heuristics) without sacrificing durability is still a must for data you care about it.
You can checkpoint your data to disk and assume you'll never have more data in memory, but that starts to become very expensive when you factor in obsolete versions, replication (to make your system immune to machine failures), logs for recovery.
Ultimately there's a lot to be said about the redundance of putting a cache in front of a database. The right thing to do, however, is to build storage systems (that may or may not resemble conventional databases) that integrate caching. I highly suggest reading about LSM trees as used by BigTable (a way to reduce write latency without significantly sacrificing durability) as well as the BigTable paper (for the "keep the hot set in memory, maintain disk persistence" model): ehCache is a useful product, but it's simplistic to say it can replace databases and file systems.
Regardless, this seems like it has some pretty powerful implications for big-data processing. The potential integrating this with Clojure somehow, and parallelizing the computation across those 10 (at least) servers with 1TB of memory is pretty astonishing to think about. (Though you don't need Clojure to make it parallel, yes.)
http://www.terracotta.org/ehcache-2.2?src=/index.html for the product itself.
reply