Terabyte-sized Java Apps Now Possible

ihodes | karma 4377 | avg karma 9.33 · 2010-07-23 11:42:38

Sometimes I feel as though my little, simple, apps' source will end up being a terabyte after packaging all their dependancies. It's a little ridiculous.

Regardless, this seems like it has some pretty powerful implications for big-data processing. The potential integrating this with Clojure somehow, and parallelizing the computation across those 10 (at least) servers with 1TB of memory is pretty astonishing to think about. (Though you don't need Clojure to make it parallel, yes.)

http://www.terracotta.org/ehcache-2.2?src=/index.html for the product itself.

reply

hga | karma 12801 | avg karma 1.47 · 2010-07-23 17:56:33+00:00

What Azul is doing gets you about 3/4s that far with true SMP. They've got their Vega custom hardware, a new x86-64 software only version named Zing (http://www.azulsystems.com/products/zing) and they're pushing an open source version of the foundation (or more) of Zing through the Managed Runtime Initiative (http://news.ycombinator.com/item?id=1491653 and http://www.managedruntime.org/).

And they're listening to people and continuing to work on the latter, e.g. 3 days ago they updated the Linux source code releases. A complete SRPM, particularly for Fedora Core 12, and a kernel patch suitable for auditing and applying to the newer 2.6.34 containing the memory management half with remaining scheduling part to follow: http://lists.managedruntime.org/pipermail/dev/2010-July/0000....

reply

Groxx | karma 17784 | avg karma 2.5 · 2010-07-23 18:33:17+00:00

Key point: terabyte-sized memory pools. Which is quite awesome.

I'd initially thought it referred to terabyte-sized executables. Followed by a "oh great. Now someone's going to make one, and some government's going to want one."

reply

zandorg | karma 1271 | avg karma 0.98 · 2010-07-23 22:37:07+00:00

Don't worry, Adobe's slowly getting there.

gojomo | karma 29822 | avg karma 3.73 · 2010-07-23 14:48:52

And that Terabyte-sized app? "Hello, World!"

strlen | karma 4910 | avg karma 4.11 · 2010-07-23 21:43:08+00:00

An organization could put their entire database into memory, which would reduce the latency of the application by "a couple of orders of magnitude," he said.

That works well until the power goes out (and it does) or the OS (or the JVM) crashes. Keeping the hot portion of the data cached in memory (and maintaining a smarter cache vs. simple LRU heuristics) without sacrificing durability is still a must for data you care about it.

You can checkpoint your data to disk and assume you'll never have more data in memory, but that starts to become very expensive when you factor in obsolete versions, replication (to make your system immune to machine failures), logs for recovery.

Ultimately there's a lot to be said about the redundance of putting a cache in front of a database. The right thing to do, however, is to build storage systems (that may or may not resemble conventional databases) that integrate caching. I highly suggest reading about LSM trees as used by BigTable (a way to reduce write latency without significantly sacrificing durability) as well as the BigTable paper (for the "keep the hot set in memory, maintain disk persistence" model): ehCache is a useful product, but it's simplistic to say it can replace databases and file systems.

reply

dnsworks | karma 1156 | avg karma 1.85 · 2010-07-24 00:26:52

I'm guessing 10 servers for redundancy? A server with 192GB of memory can be had for about $15k nowadays.