Ask HN: On scalability and memory footprint

ezmobius | karma 1049 | avg karma 8.01 · 2008-10-11 06:08:34+00:00

Yes there is a big advantage to keeping your memory consumption low with ruby apps. Ruby's garbage collector is not the best and has to walk all the objects in the process when it GC's. The more memory your process uses the longer and more often GC will happen. This will degrade performance the more memory you use. In fact I've seen really leaky apps spend most of their wall clock time in the garbage collector.

Sure, throwing more ec2 instances at the problem is one solution, but if you care about your apps performance you will try to optimize for smaller memory footprint.

reply

ashleyw | karma 1067 | avg karma 2.1 · 2008-10-11 06:24:35+00:00

1) Build your app how you would normally build it (obviously dont expect to put massive objects into memory for every request — just dont worry about the small things)

2) Go back and refactor the things that really stick out as bulky

3) Deploy

4) Continue your release cycles and refactor stuff as you come across them

.

As the saying goes — hardware is cheap, your time isn't.

reply

jamesbritt | karma 28062 | avg karma 4.6 · 2008-10-11 11:19:10

"As the saying goes — hardware is cheap, your time isn't."

Don' t many small companies find that they have more time than money?

reply

ashleyw | karma 1067 | avg karma 2.1 · 2008-10-11 20:02:20+00:00

I guess, but that "extra hardware" wouldn't be that much. I was talking about like 1 second extra CPU time per 50 page loads or something — I wouldn't worry about that kind of thing, even if I had time to make my code slightly faster, I'm sure I could spend my time better. Maybe I'd refactor it later on when I'm adding extra functionality to that area of code.

I think if your a small company you shouldn't be worrying about small issues like this, even if you have more time than money. It may be making your app more efficient, but your doing it the same way people make apps bulky and never get them out the door! Spend the time on things which truly matter, and fix slightly slower code when its beginning to cause a problem. :)

reply

furiouslol | karma 545 | avg karma 2.77 · 2008-10-11 07:34:20+00:00

Here is my experience. I used Rails for one project of mine that requires processing of millions of data rows. Because the Rails ORM create an object for each data row, we end up using a lot of memory. We had to get a 2GB memory server to hold up the project. Even after we avoided the ORM (which removes the pleasure of coding in Rails), the memory usage was still high.

Sure, we could process the data rows outside of Rails in C but because the processing of data rows is an integral part of the project, that would mean coding 80% in C and 20% in rails. Not exactly an enjoyable experience.

So we rewrote it in PHP and avoid objects and use just functions and hashes/arrays. And it worked very well for us. The site render time drops from 0.8s to 0.03s. Memory usage rarely exceeds 100MB.

reply

davidw | karma 63893 | avg karma 3.82 · 2008-10-11 07:56:55+00:00

You do know that you can connect to databases and do stuff in Ruby without AR, right?

furiouslol | karma 545 | avg karma 2.77 · 2008-10-11 08:24:31+00:00

Yes. That's what we did. But the memory usage was still higher than the PHP option.

mixmax | karma 18160 | avg karma 6.08 · 2008-10-11 09:25:34+00:00

Interesting to see yet another company starting out with ruby on rails and then rewriting the whole thing in PHP when it has to scale. There seems to be a pattern to it...

jgalvez | karma 1503 | avg karma 3.5 · 2008-10-11 13:31:41+00:00

AR usage is one of my main concerns. I myself am a Python guy with good proficiency and understanding of Rails, but we employ mostly Rails people. Our biggest project right now uses Rails for the API and website and several background daemons (work queues) in Python.

I'm already rewriting the API code in a minimal Merb app, which uses a lot fewer memory and can-haz C-based ORM code with DataMapper. My plan in the future is to try and push possible migration of the Rails-based site code to Merb as well.

reply

alecco | karma 8121 | avg karma 3.01 · 2008-10-11 13:50:19+00:00

Had a similar situation in a Python project. Got it working by adding a few Python callbacks to the SQL as functions. The code stayed in the same language and memory consumption flatlined. Don't know if that is easy to do from RnR, though.

wheels | karma 19055 | avg karma 6.57 · 2008-10-11 08:11:49+00:00

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." -Donald Knuth

That said, watch your 'n's. If you scale linearly, you can keep throwing servers at the problem. If you end up with some critical piece of your code that's, say, quadratic in run-time or memory consumption, you can't. And when you get there: profile, don't guess. Monte-Carlo code optimization is the beginning of the end for an otherwise clean code-base.

reply

alecco | karma 8121 | avg karma 3.01 · 2008-10-11 08:44:40

Only consider that quote if you've been through all or most of Knuth's TAOCP.

Good design lets your site stay lean and mean.

reply

ericb | karma 7630 | avg karma 3.36 · 2008-10-11 15:00:52+00:00

While going through TAOCP is good advice, I think the quote stands on its own.

iamelgringo | karma 20107 | avg karma 7.36 · 2008-10-11 06:31:08

Your bottleneck in a web app is almost always the database. In some cases, it's not. So, it depends.

If, like furiouslol, you're trying to access a 2-3 million row table and Active Record is topping out at 2GB of RAM and taking close to 1 second on a production machine just to render a page, then yes, you might need to worry about memory usage.

But, if you're prototyping a site, and just getting it off the ground, I would worry about memory usage later.

If you're really freaked out about performance, you can always try PHP or Django. They tend to be a bit more sparing on system resources.

reply

tdavis | karma 6243 | avg karma 5.09 · 2008-10-11 09:14:06

"Memory usage" is pretty vague. If you're just talking about writing poor code that naturally requires more memory to store objects and such, it's not a big issue if you can scale linearly. What you really need to watch out for is memory leaking. This can cause processes to slow down dramatically and there's nothing you can do to scale it when that happens; your app server will take requests and probably even have RAM to spare, but the actual processing of the request will take forever and your site will be sloooow. This is especially relevant for long-running processes. I have accidently written memory leaks into programs before that, despite having more than enough RAM to grow, eventually slowed down to an absolute crawl. I'm talking 1 loop per second to begin with and by morning it's doing 1 loop every 4 hours.

Premature optimization? Avoid it. Writing good code? Don't avoid it because bad code is easier and memory is cheap. There are some times when it makes no logical sense to allow something to use tons of memory, despite how much you have available. For instance, if you request the same page from your site over and over again and the memory usage continues to increase, that's probably a bad thing. What is that process storing in local memory and not garbage-collecting after a request? It's HTTP so there should be no persistence at the web framework level (i.e. in Rails). It should manufacture a request, send it, and move on to the next one, shedding all the local variables made in the process. If your memory footprint is increasing while your real load is remaining the same, that's bad. If you're storing the same object in memory in 5 different places, that's bad -- but not that big a deal so long as they're all gone when the request has been served.

(Flamebait P.S. -- save memory, kill Rails.)

reply

ericb | karma 7630 | avg karma 3.36 · 2008-10-11 14:59:49+00:00

If you're building a fairly standard web app you probably shouldn't be worrying at this stage. Keeping memory usage low does not mean a significant speedup necessarily. It means your memory usage will be low. Sometimes you trade one thing for another depending on what you're doing.

When the app is written, if you run a load test (or you can let your users be your load test, like Twitter), there will be a bottleneck for some n of users. Remove this bottleneck, which could be cpu, database, memory, connections, bandwidth, etc., and there will be a new bottleneck at some > n users. Rinse, lather, repeat.

If you optimize now, you're most certainly optimizing the wrong thing. When I load tested my first production rails app, I found cpu was the first bottleneck I hit, not memory.

reply

Hoff | karma 2699 | avg karma 4.72 · 2008-10-11 13:40:11

Arguing performance and scalability and footprint can be less than fruitful; a distraction.

If you're serious about this, build your test cases, and benchmark.

But before you invest here, ensure you have nothing better to do with your time, and ensure that the probable payback can be justified against the aggregate investment; against the costs of the testing and of the migration. And nothing better to debate, for that matter.

reply

anotherjesse | karma 556 | avg karma 3.61 · 2008-10-11 17:09:45

I run userscripts.org, a rails site which runs on a single serverbeach box.

To keep memory usage low, I find recycling mongrel processes via monit can help a bunch.

Before monit, my mongrels would hit 2GB quickly, after monit with a rule to restart when they hit over 100MB, my memory usage is around 800MB for 15 mongrels. (I'm assuming you are using HAProxy or a similar balancer that can deal with changes of availability).

http://userscripts.org/articles/2-scaling-a-rails-site

Another useful tip is make sure your queries aren't doing stupid stuff. A long time ago I had integrated Beast (a rails forum project) into my site. Unfortunately it was loading EVERY topic on the forum index page, which was resulting in slowness as well as large jumps in memory usage. (The culprit was a .last method that caused all the records to load and then grab the last value)

So - I tend to agree with "don't worry" because you can "patch" over memory usage pretty easily.

My site is currently getting 17req/s on a single box - it is never what you think will be the issue that you have to fix.

reply

jgalvez | karma 1503 | avg karma 3.5 · 2008-10-11 20:42:58

That's the thing that making me nervous. When you have got to a point where you need to periodically restart your processes because they somehow uncontrollably start consuming more memory than they should, something must be failing badly. Where there's smoke, there's fire.