Our Rails customers often run into memory issues. The most frequent cause these days is what we in Support dub ‘bloated mongrels.’
To be fair, bloat has absolutely nothing to do with mongrel itself, which is a solid and fine piece of work. You can run into this problem just as easily with thin, passenger, etc. Changing to a different server will not save you, as the root cause is not the server, but the code the server is running for you.
A real true-blooded memory leak is rare in comparison to the occurrence of bloating Rails instances. If your mongrels (or thins, or passenger instances) are suddenly sporting 100MB or more of extra weight, look no further: we’ve got the diet plan for you!
What Is Bloat?
In short: you are loading in too much. Too much what, you ask? Why it’s too much ActiveRecord!
Bloat is easily identifiable. Last week, your mongrels were at 110MB, but after a new feature or two and a bit of ‘optimization’…. well, lets just say that you’d have trouble fitting one on a CD. It’s not always that dramatic (probably the average size of bloated mongrels are 200-300MB), but basically the mongrels are 2-5x larger than they should be, or spike in size suddenly after a certain subset of requests.
Detecting Bloat
The easiest way to detect bloat is to watch the Application Server process size. New Relic, for example, will show you combined memory usage. You could watch it live with “top” on your slice/server. In both cases, you are looking for quick jumps in process size. If you’re using mongrel, you should be using monit to watch it precisely for this reason. Monit will log to syslog, and assuming that you’ve setup memory limits, you could run something:
grep resource /var/log/syslog
This would print out lines like so:
Aug 29 03:35:05 myserver monit[5194]: 'mongrel_myapp_5000' total mem amount of 133256kB matches resource limit [total mem amount>130360kB]
This is saying the mongrel was caught at 133MB, which is over 130MB. Not too bad. The problem is when you have bloat, you start seeing them skyrocket past the memory limits, sometimes multiple times an hour:
Aug 29 03:35:05 myserver monit[5194]: 'mongrel_myapp_5000' total mem amount of 210256kB matches resource limit [total mem amount>130360kB]
This is bad. Basically, the mongrels were fine one minute (under 130MB) and the next minute they weren’t (~210MB). That’s a pretty big jump, but yet, when it comes to bloat, this is fairly mild.
OverActiveRecord
Misuse of ActiveRecord is probably the largest and most common threat to a Ruby process size that we see. Instantiation of ActiveRecord models is expensive, and it is very very easy to accidentally instantiate 100k records, especially in earlier versions of Rails. Though these records are not cached for the next request, the Ruby process still needs to request the required amount memory from the system and allocate it. On top of that, Ruby is greedy with memory—it doesn’t hand the memory back to the OS after the request. So, one action with memory bloat will mean that your process is just cruising at a bloated 400MB.
I’m Too Smart to Have Bloat!
The ActiveRecord tips below may seem obvious to seasoned Rails programmers, but even the most experienced programmers run into these issues. Don’t worry, I won’t name names! Ok, I’ll name one: after I wrote the first draft of this very article, I was running a migration on one of our internal sites, and I noticed that the process size was growing… and growing… and growing. Luckily I had enough memory overhead on-slice, but I had a good laugh.
Also keep in mind that code written for a shiny new application with a few hundred users and code written for an application with 100k users have very different needs. Growing and scaling a healthy application requires regular tending and pruning, just like growing healthy garden. Assume that queries written a year ago will come back to bite you as you scale up.
Development and Production Behavior Will Differ
Ok, so your mongrels may reach 500MB in production, yet they stay a cool 80MB on localhost. It doesn’t matter, it’s probably irrelevant. Unless you’re running the app locally in production mode, with cache_classes=true, are using the exact same data set as production, and are simulating production traffic (with the same params as real-life traffic), the differences are not worth investigation. This is a distraction from the fact that you have a production issue that you should be dealing with. Let’s instead go identify the action(s) that are causing the bloat, and work from there.
Use Tools Like Rack::Bug, MemoryLogic and Oink
Unless you have a very small and manageable code-base, it can be relatively difficult and time consuming to blindly cruise through your application looking for problematic areas.
Luckily, the very awesome Ruby community has some awesome open source tools for the job. We often recommend people try Rack::Bug, MemoryLogic or Oink . These are amazing time savers and will allow you to inspect how many ActiveRecord instances are being loaded up on any given request.
Run these tools in production mode on production data, they are built to be non-obtrusive. They should point you pretty immediately to the actions that have issues, and you can begin to explore in script/console and check out the size of the data sets the action and view are loading in. Be sure to use the same exact parameters that the troubled actions are receiving.
Nailing Down the Root Cause
After you’ve found a couple of troublesome actions, here are some more detailed tips on what to look for in your code. A “memory leak” from leaky dependencies or Rails itself would be 25th on the list of things to check. Enjoy!
1. Model.find(:all)
In versions of Rails before 2.3, this is a memory killer. The most common form in the wild is:
Comment.find(:all).each{ |record| do_something_with_each(record) }
If you have 100,000 Comments, this will load and instantiate all 100k records in memory, then go through each one. In Rails 2.3, the .each will paginate through the results, so you’ll only load in small batches, but this won’t save you from the following variations:
@records = Comment.all
@records = Comment.find(:all)
@record_ids = Comment.find(:all).collect{|record| record.id }
Each of these will load up all Comment records into an instance variable, regardless if you have 100 or 100,000 and regardless if you are on Rails 2.1 or 2.3
2. :includes are Including Too Much
Article.find(:all, :include => [:user => [:posts => :comments]])
This is a variant of the above, intensified by the one or multiple joins on other tables. If you only have 1000 articles you may have thought loading them in is not a big deal. But when you multiply 1000 that by the number of users, the posts they have and the comments that they have… it adds up.
3. :includes on a has_many
@articles.users.find(:all, :include => [:posts => :comments]])
Variation on the above, but through a has_many.
4. @model_instance.relationship
Referring to a has_many relationship directly like so:
@authors.comments
is a shortcut to the potentially bloated:
@authors.comments.find(:all)
Be sure that you don’t have thousands of related records, because you will be loading them all up.
5. Filtering Records with Ruby Instead of SQL
This is also fairly common, especially as requirements change or when folks are in a hurry to just get the results they want:
Model.find(:all).detect{ |record| record.attribute == "some_value" }
ActiveRecord almost always has the ability to efficiently give you what you need:
Model.find(:all, :conditions => {:attribute => "some_value"})
This is a simple example to make the point clear, but I’ve seen more convoluted chunks of code where detect or reject is using some non-attribute model method to determine inclusion. Almost always, these queries can be written with ActiveRecord, and if not, with SQL.
6. Evil Callbacks in the Model
I’ve helped a couple of customers track down memory issues where their controller action looked perfectly reasonable:
def update @model = Model.find_by_id(params[:id]) end
However, a look at the filters on the model showed something like this:
after_save :update_something_on_related_model . . def update_something_on_related_model self.relationship.each do |instance| instance.update_attribute(:status, self.status) end end
7. Named scopes, default scopes, and has_many relationships that specify :include Where Inappropriate
Remember the first time you setup your model’s relationships? Maybe you were thinking smartly and did something like this:
class User has_many :posts, :include => :comments end
So, by default, posts includes :comments. Which is great for when you are displaying posts and comments on the same page together. But lets say you are doing something in a migration which has something to do with all posts and nothing to do with comments:
@posts = User.find(:all, :conditions => {:activated => true}).posts
This could feel ’safe’ to you, because you only have 50 users and maybe a total of 1000 posts, but the include specified on the has_many will load in all related comments – something you probably weren’t expecting.
8. Use :select When You Must Instantiate Large Quantities of Records
Sometimes, in the reality of running a real production site, you need to have a query return a large data set, and no, you can’t paginate. In that case, the first question you should ask is “Do I need to instantiate all of the attributes?”
Maybe you need all the comment_ids in an Array for some reason.
@comment_ids = Comment.find(:all).collect{|comment| comment.id }
In this case, you are looking for an array of ids. Maybe you will be delivering them via JSON, maybe you need to cache them in memcached, maybe they are the first step of some calculation you need. Whatever the need, this is a much more efficient query:
@comment_ids = Comment.find(:all, :select => 'comments.id').collect{|comment| comment.id }
9. Overfed Feeds
Check all the places you are making XML sandwiches. Often these controllers are written early on and don’t scale well. Maybe you have a sitemap XML feed that delivers every record under the sun to Google, or are rending some large amount of data for an API.
10. Monster Migrations
Finally, watch out for your Migrations, as this is a common place where you need to do things like iterate over every record of a Model, or instantiate and save a ton of records. Watch the process size on the server with top or with “watch ‘ps aux | grep migrate’”.
Summary For the Twitter Lovers
Big honking Rails instances on production? You’re loading too much! Look at AR usage in production and adjust; stop looking for leaks!
Keep an eye out for more posts like this in the future, and feel free to share your very own horror stories in the comments :)

It is definitely good idea to reduce as much bloat as possible. However, I think this article is a bit misleading. It's not really that a few extra libraries really create a 2-5x increase in memory, and the root cause for the really huge increases in memory lie somewhere else I believe.
What apparently happens is that the way MRI handles memory allocation is to grow exponentially as more memory is required. So just going slightly over the current allocated heap slots causes memory used by the ruby process to almost double, but this is just preallocated, not really used (if I understand correctly myself).
I only know this because of the following article though (which has a fix for the exponential memory increase by using REE and changing some basic ruby configurations): http://www.mikeperham.com/2009/05/25/memory-hungr...
Thanks for the input!
The issue we're frequently seeing (and that the article addresses) isn't a problem with extra libraries creating bloat. It's a problem of the application loading in 200MB or 400MB of Active Record objects during a request. This is usually done by accident and takes developers by surprise, as it was not intended. It also can be tough in a complex app to track down *where* it's happening.
I'm with you on the greedy memory behavior of MRI, though. That makes the issue multiply and persist.
Good post and great points. I learned early in my Rails development years that AR will give you just what you ask for. I kept seeing large spikes in memory consumption. Having come to Rails via ColdFusion, Java, and PHP development the first place I looked was at the SQL. Watching the SQL whiz by on the terminal I noticed lots of SELECT statements; the proverbial N+1 SELECT issue. So I learned early on to optimize and cache as much SQL as possible.
Databases are always the biggest resource hogs in any app, regardless of what language you use. Tune your SQL, tune your app. Thanks for the read!
Avoiding N+1 query problems is a lot easier with DataMapper.
Rails 3.0's library agnosticism is much anticipated. :-)
One other cause is your :dependent callbacks.
class Blog < AR::Base
has_many :posts, :dependent => :destroy
end
This will iterate through every post and call #destroy on it. Use :delete_all if you want to just issue a single delete query. HOWEVER, this won't hit your destroy callbacks on Post.
class Blog < AR::Base
has_many :posts
after_destroy :purge_posts
private
def purge_posts
# one step better, it only loads a page of posts and hits every post destroy callback
posts.paginated_each { |p| p.destroy }
# even better, but not very dry
# cheating a bit w/ a denormalized Comment table,
# but this is a blog comment and i'm pressed for time :)
Comment.delete_all :blog_id => id
Post.delete_all :blog_id => id
end
The solution I've been using for this is WillPaginate.
# instantiate only 30 users at a time
User.paginated_each do |user|
user.do_some_stuff!
end
Keep in mind that using Sequel or DM won't make you immune to all these (though DM has a nice identity map that helps in a lot of cases).
http://gist.github.com/180517 (better format of the above post)
Awesome, totally overlooked destroy callbacks in the article. Thanks Rick. This was the exact thing that bit me on a migration on that internal app I mentioned.
Thanks for mentioning paginated_each, too. It's a developers best friend when using rails < 2.3
You can also use some of the new "batch processing" stuff in Rails 2.3 to do paginated loading/processing:
http://guides.rubyonrails.org/2_3_release_notes.h...
Hi,
"On top of that, Ruby is greedy with memory—it doesn’t hand the memory back to the OS after the request."
Just to elaborate on the above statement :
Basically the GC allocates an initial heap space of 10k slots.Each slot is large enough to hold an object.Whenever the first heap is maxed, a GC run triggers, attempts to free more space and eventually allocates another heap area if free slots is less than 4096.
The second time this happens, another 10k slots is made available.Additional heaps from this point onwards is allocated in slabs of ( last heap size allocated * 1.8 ).This is why jumps from 70MB RSS to 120MB RSS etc. is so common.
Ruby uses the libc memory allocation functions malloc, realloc, calloc etc.Stock malloc implementations don't return allocated memory back to the OS until the running process exits.This is an optimization to save on syscalls (brk) and to ensure relatively fast alloc/dealloc during the process lifetime.
The JVM for example allocates a memory region of 168MB with mmap on startup, which pretty much assigns all memory management tasks to the various GC implementations for the runtime.This platform, and JRuby for that matter, doesn't have this 'greedy' behavior and it'll just be able to return memory allocated during an intense spike back to the OS.
- Lourens
Regarding loading a bunch of ids (8) I'd probably rather use something like this:
connection.select_values(construct_finder_sql({ :select => 'comments.id' }))
and cast it to integer if needed this even saves you from creating Comment instances which still may consume a larger amount of memory depending on the table size.
60179 0.0 8.6 50576 44504 ?? S 24Aug09 1:42.02 /usr/local/bin/ruby18 /usr/local/bin/mongrel_rails start
good article, but for other intermediate railists out there, how do you solve it? is the answer using :limit? find_by_sql? alot of the rails folks have been taught this very way in various other blogs and tutorials. can you limit a call when you use :include on the associated models?
any insight would be golden.
Hi Pjammer
I would say that it depends on your specific needs and circumstance. The only generic answer is to be aware of what your Active Record calls are actually doing and how much data is being loaded into rails. If you find yourself with bloat, the first thing you should do is figure out where it's coming from. Next, evaluate: Do I *need* 10,000 records to be instantiated for this controller action? Rarely will the answer be "Yes."
In the majority of cases I've seen, the situation has been that the loading in of an excessive number of records was not a business need or a conscious decision by the developer, but more of a "Whups!" that is fairly easily solved.
Using :limit, paginated_each (will_paginate plugin), find_each (rails 2.3), :select (to only instantiate the attributes you need), or :include => nil on a find call (which guarantees that nothing is being included) are all pro-active ways to make sure you aren't building killer queries. Poking around in script/console in production can give you a great idea of how many records a query might try to instantiate. Also, watching the queries run in development mode can be useful. What associations are included? Are those associations needed? Is that dataset large in production?
Again, it's going to depend on your situation and the quantity of data in your database. Queries that might have worked great when you had 2,000 users simply won't scale up to when you have 200,000 users.
Regarding loading a bunch of ids: I created a plugin called find-ids which allows User.find(:all_ids) and User.find_all_ids_by_…. types of methods to ActiveRecord objects. See http://github.com/pkmiec/find-ids/tree/master for more details.
Sudara, this is an excellent post!
One thing I'm trying to work out, and I was wondering if you already knew the answer, is whether the following will instantiate every instance of the associations before calling the find:
@some_record.things.find(:all, :conditions => { :is_active => true})
I suspect that all 'things' would get instantiated first, based on what's above and … well, logic, but maybe method_missing and find conspire in such situations.
If this were the case, it would be a real bummer, because it's really handy to find within the scope of a relationship, but if it's that expensive, it's definitely not worth doing so. Curious to see what you (and others here) think.
Just a quick follow-up in the vein of look-before-one-asks — I'm watching the query log for find on a relationship and it looks like only the find query itself is run.
@posts = User.find(:all, :conditions => {:activated => true}).posts
Does not work as the User.find would return an array.
Actually User.find returns an association proxy, which isn't quite the same as an array. it can respond to association methods such as "posts" here.
You would want to do a find on Posts that does a join to Users to do your condition. For example:
@posts = Post.find(:all, :joins => :user, :conditions => "users.activated = true")
My horror story : user.comments.size instead of user.coments.count, the first one instanciates all the comments and calls the size method on the array whereas the other one uses the AR method count which is only what i want.
btw, great article !
I don't think that's the case anymore… and maybe only of late, but:
—-
p = Property.last
p.units.count
#=> SQL (0.5ms) SELECT count(*) AS count_all FROM `units` WHERE (`units`.property_id = 1210)
p.units.size
#=> SQL (0.3ms) SELECT count(*) AS count_all FROM `units` WHERE (`units`.property_id = 1210)
p.units.length
#=> SQL (0.5ms) SELECT count(*) AS count_all FROM `units` WHERE (`units`.property_id = 1210)
—-
Thanks Sudara for this superb post!
Although can you explain more about this statement:
"Ruby is greedy with memory—it doesn’t hand the memory back to the OS after the request"
Do you mean if the request allocate some memory for a variable, that memory is not released after the request? That means the process size will just keep going bigger?
Thx!
I discovered that another way to fight bloat is to fork (or thread, or queue) the bloat-inducing code. I've written up a blog post about my experiences: http://www.fozworks.com/admin/articles/6009
Whoops that link should have been:
http://www.fozworks.com/2009/11/25/a-really-simpl...
For what its worth regarding profiling tools, the statement "Run these tools in production mode on production data " is not particularly good advice. While these tools are "… built to be non-obtrusive." they aren't as non-obtrusive as they might seem by looking at the code.
We were running with MemoryLogic/memory-usage-logger in production for a long while, having performance problems under even moderate load, and struggle to tune everything. Eventually, on a lark, we removed the memory-usage-logger and our page render time averages plummeted from 600ms down to 180ms. Adding back in the memory-usage-logger and our page render times go right back up. While this is a great tool, it adds way too much overhead to leave in a production environment.
@paul
why not has_finder ;)
named_scope :ids, :select => 'id'
works fine for me with a geodata-set ~650k