Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Terabyte-sized Java Apps Now Possible (www.pcworld.com) similar stories update story
25.0 points by vkalladath | karma 96 | avg karma 6.0 2010-07-23 15:56:17+00:00 | hide | past | favorite | 10 comments



view as:

Sometimes I feel as though my little, simple, apps' source will end up being a terabyte after packaging all their dependancies. It's a little ridiculous.

Regardless, this seems like it has some pretty powerful implications for big-data processing. The potential integrating this with Clojure somehow, and parallelizing the computation across those 10 (at least) servers with 1TB of memory is pretty astonishing to think about. (Though you don't need Clojure to make it parallel, yes.)

http://www.terracotta.org/ehcache-2.2?src=/index.html for the product itself.


What Azul is doing gets you about 3/4s that far with true SMP. They've got their Vega custom hardware, a new x86-64 software only version named Zing (http://www.azulsystems.com/products/zing) and they're pushing an open source version of the foundation (or more) of Zing through the Managed Runtime Initiative (http://news.ycombinator.com/item?id=1491653 and http://www.managedruntime.org/).

And they're listening to people and continuing to work on the latter, e.g. 3 days ago they updated the Linux source code releases. A complete SRPM, particularly for Fedora Core 12, and a kernel patch suitable for auditing and applying to the newer 2.6.34 containing the memory management half with remaining scheduling part to follow: http://lists.managedruntime.org/pipermail/dev/2010-July/0000....


Key point: terabyte-sized memory pools. Which is quite awesome.

I'd initially thought it referred to terabyte-sized executables. Followed by a "oh great. Now someone's going to make one, and some government's going to want one."


Don't worry, Adobe's slowly getting there.

And that Terabyte-sized app? "Hello, World!"

An organization could put their entire database into memory, which would reduce the latency of the application by "a couple of orders of magnitude," he said.

That works well until the power goes out (and it does) or the OS (or the JVM) crashes. Keeping the hot portion of the data cached in memory (and maintaining a smarter cache vs. simple LRU heuristics) without sacrificing durability is still a must for data you care about it.

You can checkpoint your data to disk and assume you'll never have more data in memory, but that starts to become very expensive when you factor in obsolete versions, replication (to make your system immune to machine failures), logs for recovery.

Ultimately there's a lot to be said about the redundance of putting a cache in front of a database. The right thing to do, however, is to build storage systems (that may or may not resemble conventional databases) that integrate caching. I highly suggest reading about LSM trees as used by BigTable (a way to reduce write latency without significantly sacrificing durability) as well as the BigTable paper (for the "keep the hot set in memory, maintain disk persistence" model): ehCache is a useful product, but it's simplistic to say it can replace databases and file systems.


I'm guessing 10 servers for redundancy? A server with 192GB of memory can be had for about $15k nowadays.

Legal | privacy