<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Engine Yard Blog &#187; Sudara Williams</title>
	<atom:link href="http://www.engineyard.com/blog/author/sudarawilliams/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.engineyard.com/blog</link>
	<description></description>
	<lastBuildDate>Tue, 07 Feb 2012 19:36:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>That&#8217;s Not a Memory Leak, It&#8217;s Bloat</title>
		<link>http://www.engineyard.com/blog/2009/thats-not-a-memory-leak-its-bloat/</link>
		<comments>http://www.engineyard.com/blog/2009/thats-not-a-memory-leak-its-bloat/#comments</comments>
		<pubDate>Thu, 03 Sep 2009 17:00:35 +0000</pubDate>
		<dc:creator>Sudara Williams</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Tips & Tricks]]></category>
		<category><![CDATA[ActiveRecord]]></category>
		<category><![CDATA[Mongrel]]></category>
		<category><![CDATA[Monitoring]]></category>
		<category><![CDATA[Passenger]]></category>
		<category><![CDATA[Ruby on Rails]]></category>

		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=1973</guid>
		<description><![CDATA[<p>Our Rails customers often run into memory issues. The most frequent cause these days is what we in Support dub 'bloated mongrels.'</p>
<p>To be fair, bloat has absolutely nothing to do with mongrel itself, which is a solid and fine piece of work. You can run into this problem just as easily with thin, passenger, etc. Changing to a different server will not save you, as the root cause is not the server, but the code the server is running for you.</p>
<p>A real true-blooded memory leak is rare in comparison to the occurrence of bloating Rails instances. If your mongrels (or thins, or passenger instances) are suddenly sporting 100MB or more of extra weight, look no further: we've got the diet plan for you!</p>
<h2>What Is Bloat?</h2>
<p>In short: you are loading in too much. Too much what, you ask? Why it's too much ActiveRecord!</p>
<p>Bloat is <em>easily</em> identifiable. Last week, your mongrels were at 110MB, but after a new feature or two and a bit of 'optimization'.... well, lets just say that you'd have trouble fitting one on a CD. It's not always <em>that</em> dramatic (probably the average size of bloated mongrels are 200-300MB), but basically the mongrels are 2-5x larger than they should be, or spike in size suddenly after a certain subset of requests.</p>
<h2>Detecting Bloat</h2>
<p>The easiest way to detect bloat is to watch the Application Server process size. <a href="http://www.newrelic.com">New Relic</a>, for example, will show you combined memory usage. You could watch it live with "top" on your slice/server. In both cases, you are looking for quick jumps in process size. If you're using mongrel, you should be using monit to watch it precisely for this reason. Monit will log to syslog, and assuming that you've setup memory limits, you could run something:</p>
<pre escaped="true">grep resource /var/log/syslog</pre>
<p>This would print out lines like so:</p>
<pre escaped="true">Aug 29 03:35:05 myserver monit[5194]: 'mongrel_myapp_5000' total mem amount of 133256kB matches resource limit [total mem amount&gt;130360kB]</pre>
<p>This is saying the mongrel was caught at 133MB, which is over 130MB. Not <em>too</em> bad. The problem is when you have bloat, you start seeing them skyrocket past the memory limits, sometimes multiple times an hour:</p>
<div>
<pre escaped="true">Aug 29 03:35:05 myserver monit[5194]: 'mongrel_myapp_5000' total mem amount of 210256kB matches resource limit [total mem amount&gt;130360kB]</pre>
<p>This is bad. Basically, the mongrels were fine one minute (under 130MB) and the next minute they weren't (~210MB). That's a pretty big jump, but yet, when it comes to bloat, this is fairly mild.</p>
</div>
<h2><em>Over</em>ActiveRecord</h2>
<p><span style="font-weight: normal; font-size: 13px;">Misuse of </span>ActiveRecord<span style="font-weight: normal; font-size: 13px;"> is probably the largest and most common threat to a Ruby process size that we see. Instantiation of </span>ActiveRecord<span style="font-weight: normal; font-size: 13px;"> models is expensive, and it is very very easy to accidentally instantiate 100k records, especially in earlier versions of Rails. Though these records are not cached for the next request, the Ruby process still needs to request the required amount memory from the system and allocate it. On top of that, Ruby is greedy with memory</span>—<span style="font-weight: normal; font-size: 13px;">it doesn't hand the memory back to the OS after the request. So, one action with memory bloat will mean that your process is just cruising at a bloated 400MB.</span></p>
<h2>I'm Too Smart to Have Bloat!</h2>
<p>The ActiveRecord tips below may seem obvious to seasoned Rails programmers, but even the most experienced programmers run into these issues. Don't worry, I won't name names! Ok, I'll name one: after I wrote the first draft of this very article, I was running a migration on one of our internal sites, and I noticed that the process size was growing... and growing... and growing. Luckily I had enough memory overhead on-slice, but I had a good laugh.</p>
<p>Also keep in mind that code written for a shiny new application with a few hundred users and code written for an application with 100k users have very different needs. Growing and scaling a healthy application requires regular tending and pruning, just like growing healthy garden. <strong>Assume that queries written a year ago will come back to bite you as you scale up.</strong></p>
<h2>Development and Production Behavior <em>Will</em> Differ</h2>
<p>Ok, so your mongrels may reach 500MB in production, yet they stay a cool 80MB on localhost. It doesn't matter, it's probably irrelevant. Unless you're running the app locally in production mode, with <code>cache_classes=true</code>, are using the exact same data set as production, and are simulating production traffic (with the same params as real-life traffic), the differences are not worth investigation. This is a distraction from the fact that you have a production issue that you should be dealing with. Let's instead go identify the action(s) that are causing the bloat, and work from there.</p>
<h2>Use Tools Like Rack::Bug, MemoryLogic and Oink</h2>
<p>Unless you have a very small and manageable code-base, it can be relatively difficult and time consuming to blindly cruise through your application looking for problematic areas.</p>
<p>Luckily, the very awesome Ruby community has some awesome open source tools for the job. We often recommend people try <a href="http://github.com/brynary/rack-bug/">Rack::Bug</a>, <a href="http://wiki.github.com/binarylogic/memorylogic">MemoryLogic</a> or <a href="http://github.com/noahd1/oink/">Oink</a> . These are <em>amazing </em>time savers and will allow you to inspect how many ActiveRecord instances are being loaded up on any given request.</p>
<p>Run these tools in production mode on production data, they are built to be non-obtrusive. They should point you pretty immediately to the actions that have issues, and you can begin to explore in script/console and check out the size of the data sets the action and view are loading in. Be sure to use the same exact parameters that the troubled actions are receiving.</p>
<h2>Nailing Down the Root Cause</h2>
<p>After you've found a couple of troublesome actions, here are some more detailed tips on what to look for in your code. A "memory leak" from leaky dependencies or Rails itself would be 25th on the list of things to check. Enjoy!</p>
<h3>1. Model.find(:all)</h3>
<p>In versions of Rails <strong>before</strong> 2.3, this is a memory killer. The most common form in the wild is:</p>
<pre escaped="true">Comment.find(:all).each{ |record| do_something_with_each(record) }</pre>
<p>If you have 100,000 Comments, this will load and instantiate all 100k records in memory, then go through each one. In Rails 2.3, the .each will paginate through the results, so you'll only load in small batches, but this won't save you from the following variations:</p>
<pre escaped="true">@records = Comment.all
@records = Comment.find(:all)
@record_ids = Comment.find(:all).collect{|record| record.id }</pre>
<p>Each of these will load up all Comment records into an instance variable, regardless if you have 100 or 100,000 and regardless if you are on Rails 2.1 or 2.3</p>
<h3>2. :includes are Including Too Much</h3>
<pre escaped="true">Article.find(:all, :include =&gt; [:user =&gt; [:posts =&gt; :comments]])</pre>
<p>This is a variant of the above, intensified by the one or multiple joins on other tables. If you only have 1000 articles you may have thought loading them in is not a big deal. But when you multiply 1000 that by the number of users, the posts they have and the comments that they have... it adds up.</p>
<h3>3. :includes on a has_many</h3>
<pre escaped="true">@articles.users.find(:all, :include =&gt; [:posts =&gt; :comments]])</pre>
<p>Variation on the above, but through a has_many.</p>
<h3>4. @model_instance.relationship</h3>
<p>Referring to a has_many relationship directly like so:</p>
<pre escaped="true">@authors.comments</pre>
<p>is a shortcut to the potentially bloated:</p>
<pre escaped="true">@authors.comments.find(:all)</pre>
<p>Be sure that you don't have thousands of related records, because you will be loading them all up.</p>
<h3>5. Filtering Records with Ruby Instead of SQL</h3>
<p>This is also fairly common, especially as requirements change or when folks are in a hurry to just get the results they want:</p>
<pre escaped="true">Model.find(:all).detect{ |record| record.attribute == "some_value" }</pre>
<p>ActiveRecord almost always has the ability to efficiently give you what you need:</p>
<pre escaped="true">Model.find(:all, :conditions =&gt; {:attribute =&gt; "some_value"})</pre>
<p>This is a simple example to make the point clear, but I've seen more convoluted chunks of code where detect or reject is using some non-attribute model method to determine inclusion. Almost always, these queries can be written with ActiveRecord, and if not, with SQL.</p>
<h3>6. Evil Callbacks in the Model</h3>
<p>I've helped a couple of customers track down memory issues where their controller action looked perfectly reasonable:</p>
<pre escaped="true">def update
  @model = Model.find_by_id(params[:id])
end</pre>
<p>However, a look at the filters on the model showed something like this:</p>
<pre escaped="true">after_save :update_something_on_related_model
.
.
def update_something_on_related_model
  self.relationship.each do |instance|
    instance.update_attribute(:status, self.status)
  end
end</pre>
<h3>7. Named scopes, default scopes, and has_many relationships that specify :include Where Inappropriate</h3>
<p>Remember the first time you setup your model's relationships? Maybe you were thinking smartly and did something like this:</p>
<pre escaped="true">class User
  has_many :posts, :include =&gt; :comments
end</pre>
<p>So, by default, posts includes :comments. Which is great for when you are displaying posts and comments on the same page together. But lets say you are doing something in a migration which has something to do with all posts and nothing to do with comments:</p>
<pre escaped="true">@posts = User.find(:all, :conditions =&gt; {:activated =&gt; true}).posts</pre>
<p>This could feel 'safe' to you, because you only have 50 users and maybe a total of 1000 posts, but the include specified on the has_many will load in all related comments - something you probably weren't expecting.</p>
<h3>8. Use :select When You Must Instantiate Large Quantities of Records</h3>
<p>Sometimes, in the reality of running a real production site, you need to have a query return a large data set, and no, you can't paginate. In that case, the first question you should ask is "Do I need to instantiate all of the attributes?"</p>
<p>Maybe you need all the comment_ids in an Array for some reason.</p>
<p><span style="font-family: Consolas, Monaco, 'Courier New', Courier, monospace; line-height: 18px; font-size: 12px;">@comment_ids = Comment.find(:all).collect{|comment| comment.id }</span></p>
<p>In this case, you are looking for an array of ids. Maybe you will be delivering them via JSON, maybe you need to cache them in memcached, maybe they are the first step of some calculation you need. Whatever the need, this is a much more efficient query:</p>
<pre escaped="true">@comment_ids = Comment.find(:all, :select =&gt; 'comments.id').collect{|comment| comment.id }</pre>
<h3>9. Overfed Feeds</h3>
<p>Check all the places you are making XML sandwiches. Often these controllers are written early on and don't scale well. Maybe you have a sitemap XML feed that delivers every record under the sun to Google, or are rending some large amount of data for an API.</p>
<h3>10. Monster Migrations</h3>
<p>Finally, watch out for your Migrations, as this is a common place where you need to do things like iterate over every record of a Model, or instantiate and save a ton of records. Watch the process size on the server with top or with "watch 'ps aux | grep migrate'".</p>
<h2>Summary For the Twitter Lovers</h2>
<p>Big honking Rails instances on production? You're loading too much! Look at AR usage in production and adjust; stop looking for leaks!</p>
<p>Keep an eye out for more posts like this in the future, and feel free to share your very own horror stories in the comments :)
<p><a href="http://www.engineyard.com/blog"><img height="98" width="61" title="logo-engineyard" alt="" class="attachment-post-thumbnail wp-post-image" src="http://www.engineyard.com/blog/wp-content/uploads/logo-engineyard.png"/></a></p>
]]></description>
		<wfw:commentRss>http://www.engineyard.com/blog/2009/thats-not-a-memory-leak-its-bloat/feed/</wfw:commentRss>
		<slash:comments>29</slash:comments>
		</item>
	</channel>
</rss>

