<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Engine Yard Blog &#187; Avrohom Katz</title>
	<atom:link href="http://www.engineyard.com/blog/author/avrohomkatz/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.engineyard.com/blog</link>
	<description></description>
	<lastBuildDate>Tue, 07 Feb 2012 19:36:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Bundler Pro Tips</title>
		<link>http://www.engineyard.com/blog/2011/bundler-pro-tip/</link>
		<comments>http://www.engineyard.com/blog/2011/bundler-pro-tip/#comments</comments>
		<pubDate>Thu, 03 Feb 2011 22:45:21 +0000</pubDate>
		<dc:creator>Avrohom Katz</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Tips & Tricks]]></category>
		<category><![CDATA[Bundler]]></category>
		<category><![CDATA[gembundler]]></category>

		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=6473</guid>
		<description><![CDATA[<p><a id="internal-source-marker_0.527404292486608" href="http://gembundler.com/">Bundler</a> has been out for over a year now. Many people have adopted Bundler as their development tool of choice to handle dependencies. At Engine Yard we sometimes encounter customers who haven’t yet made the switch to Bundler. While some of these folks are unable to because of development priorities, some actually actively reject Bundler. These rejections surprise me since Bundler is such a functional tool.</p>
<p>I'm going to walk you through some key benefits of Bundler. Then, I'll answer some Bundler questions and concerns to leave you with a better understanding of Bundler so you can use it better day to day.</p>
<p><img class="alignright" src="http://gembundler.com/images/gembundler.png" alt="" width="400" height="155" /></p>
<h3>Key Benefits</h3>
<ol>
<li><strong>Version Consistency</strong><br />
Ensuring your developers are all working with the same version of things can be very challenging. One of the great things about Bundler is that it allows you to ensure that all machines are running the same versions of gems AND their dependency while at the same time giving you a simple path to update one or two of your gems.</li>
<li><strong>Dependency Resolution</strong><br />
If you've ever come across an error like this:<br />
<code>can't activate rack (~&gt; 1.0.1, runtime) for [], already activated rack-1.0.0 for []</code>, then you know how hard it can be to debug. You need to figure out which gem requires the older version, which gem requires the newer version, and then (usually through trial and error) determine which rack version satisfies the dependency of both. Bundler solves this challenge.</li>
<li><strong>Development Freedom</strong><br />
Does your application depend on a gem that you are developing? You used to stick it in vendor/gems or maybe even used git submodules. Now it’s as simple as forking the gem and passing the :git parameter to your gem. No more committing large swaths of files to your repo just because your app relies on them.</li>
</ol>
<p><span id="more-6473"></span>Now that I've given the brief overview of some of Bundler’s benefits I hope it’s apparent how useful it is. It's as simple as adding a gem to your gemfile and running bundle install. If you're using Rails 3 you're already using Bundler, so go right ahead.</p>
<h3>Your Questions</h3>
<p>A few weeks ago we asked you to suggest challenges encountered using Bundler, or common hesitations to using Bundler. Here’s what you came up with.</p>
<p><strong>How do I run code in one bundle from code in a different bundle?</strong> (via <a href="http://twitter.com/#!/joshsusser">@joshsusser</a>)</p>
<p>Good question. This isn’t heavily documented.</p>
<p>It all comes down to this method: <code>Bundler.with_clean_env { #do stuff }</code>.</p>
<p>This resets your env to what it was before you called the require to bundler/setup. This will clear out the <code>env['BUNDLE_GEMFILE']</code> and allow you to use a different Gemfile. Here <a href="https://gist.github.com/b8d47d4ca590e7b00539">Andre Arko shows how this works</a>, and what to expect from running code within a <code>Bundler.with_clean_env block</code>.</p>
<p><strong>How do I avoid fetching source index every time?</strong> (via <a href="http://twitter.com/#!/pistos">@Pistos</a>)</p>
<p>There is a flag you can pass to bundle install that does this: <code>bundle install --local</code>.<br />
According to <a href="http://gembundler.com/man/bundle-install.1.html">the docs</a>, the local flag does this:</p>
<blockquote><p>Do not attempt to connect to <a href="http://rubygems.org/">rubygems.org</a>, instead using just the gems located in vendor/cache. Note that if a more appropriate platform-specific gem exists on <a href="http://rubygems.org/">rubygems.org</a>, this will bypass the normal lookup.</p></blockquote>
<p><strong>When should I run bundle install vs. bundle update?</strong> (via <a href="http://twitter.com/#!/jamiecobbett">@jamiecobbett</a>)</p>
<p>Always run bundle install. Run bundle update only if you actually wish to update a gem, or if bundle install warns you that you need to update a gem. You can also conservatively update gems with <code>bundle update #{gemname}</code> which will keep everything else at the same version but re-resolve the dependency with the newer version of the gem you want to update. The implication of bundle install is that you want to change only the things that you changed in the Gemfile and any non-shared dependencies. If a dependency of a gem you change is a dependency of another gem, the bundle install will fail and actually warn you to update.</p>
<p><strong>Is Bundler compatible with Ruby 1.9.2?</strong> (via <a href="http://twitter.com/#!/vandrijevik">@vandrijevik</a>)</p>
<p>Yes. Bundler is fully compatible with Ruby 1.9.2.</p>
<p><strong>How can I use Capistrano with Bundler?</strong> (via <a href="http://twitter.com/#!/smathy">@smathy</a>)</p>
<p>You can add <code>require 'bundler/capistrano'</code> to the top of your deploy.rb. This adds the bundle install task to run after every deployment. By default this runs a bundle install with the --deployment and --quiet flags as well as without the development and test groups. It installs the bundle to shared/bundle. You can override these defaults by setting any of these in your deploy.rb.</p>
<pre lang="ruby" escaped="true">set :bundle_gemfile,  "Gemfile"
set :bundle_dir,      File.join(fetch(:shared_path), 'bundle')
set :bundle_flags,    "--deployment --quiet"
set :bundle_without,  [:development, :test]
set :bundle_cmd,      "bundle" # e.g. "/opt/ruby/bin/bundle"
set :bundle_roles,    #{role_default} # e.g. [:app, :batch]</pre>
<p><strong>What can I do about the indirection Bundler adds to the command line? For example, `bundler exec foo` instead of simply `foo`</strong> (via <a href="http://twitter.com/#!/peteraronoff">@peteraronoff</a>)</p>
<p>Unfortunately you need at least a little indirection or there wouldn’t be a way to keep track of which gems you need to activate. Here are some tricks you can use to minimize the indirection as much as possible.</p>
<ol>
<li><code>alias b=”bundle exec”</code><br />
This makes it easy to run most bundled commands, for example "b rails c".</li>
<li><code>bundle exec bash</code><br />
You can bundle exec into bash (or zsh or whatever you favorite shell is) and have the shell executed within the bundle environment. This allows you to run binary gems from the command line without having to bundle exec.</li>
<li><code>bundle install --binstubs</code><br />
This will install bin files into the bin/ directory of your app. Now you can run bin/rspec or bin/executable and it will be run within the bundle environment.</li>
</ol>
<p><strong>Why is there is a platform option for Windows, but not Darwin?</strong> (via <a href="http://twitter.com/#!/soederpop">@soederpop</a>)</p>
<p>Windows is a Ruby platform. Darwin is an operating system that the ruby interpreter runs on. For now, use groups.</p>
<p><strong>I'm interested in some examples for 'bundle pack.' I find this command useful to deploy to machines off the network.</strong> (via <a href="http://twitter.com/#!/hoxworth">@hoxworth</a>)</p>
<p>Simple Example:<br />
<code>bundle pack &amp;&amp; git add vendor/cache &amp;&amp; git commit -am “Packed gems” &amp;&amp; git push &amp;&amp; cap deploy</code></p>
<p>You can use this as a drop in replacement for vendor/gems, with the added benefit of knowing that even gems depended on will be consistent across platforms. Keep in mind this doesn’t work with gems from git yet, though.</p>
<p><strong>Any performance issues with switching from Bundler 0.9.26 to 1.0.7? I’m seeing an odd behavior with higher CPU usage on prod server.</strong> (via <a href="http://twitter.com/#!/ttdonovan">@ttdonovan</a>)</p>
<p>Currently there are no other reports of performance issues. Please <a href="https://github.com/carlhuda/bundler/issues">file an issue</a> with the steps to reproduce and the developers can look into it for you. The simpler the issue is to reproduce, the easier it will be to track down.</p>
<p><strong>Does Bundler work on Windows?</strong> (via <a href="http://twitter.com/#!/drnic">@drnic</a>)</p>
<p>Yes, Bundler does work on Windows. There is a known issue regarding lockfiles and multi-platform use of Bundler. If you use Bundler on Windows and Mac or Linux systems, and run bundle install on the Windows machine the lockfile gets generated specifically for Windows. For now, the recommendation is to run bundle install on a machine that closely resembles your deployment setup and commit that lockfile into the repository.</p>
<h3>Other Questions?</h3>
<p>What other hesitations or frequently encountered issues have we missed? Leave them in the comments section and we’ll do our best to tackle them too.</p>
<p>To learn more about Bundler visit the <a href="http://gembundler.com/">Bundler website</a>. You can also <a href="https://github.com/carlhuda/bundler/">fork Bundler on GitHub</a>.
<p><a href="http://www.engineyard.com/blog"><img height="98" width="61" title="logo-engineyard" alt="" class="attachment-post-thumbnail wp-post-image" src="http://www.engineyard.com/blog/wp-content/uploads/logo-engineyard.png"/></a></p>
]]></description>
		<wfw:commentRss>http://www.engineyard.com/blog/2011/bundler-pro-tip/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>5 Tips for Sphinx Indexing</title>
		<link>http://www.engineyard.com/blog/2009/5-tips-for-sphinx-indexing/</link>
		<comments>http://www.engineyard.com/blog/2009/5-tips-for-sphinx-indexing/#comments</comments>
		<pubDate>Tue, 25 Aug 2009 17:00:53 +0000</pubDate>
		<dc:creator>Avrohom Katz</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[BackgroundJob]]></category>
		<category><![CDATA[Capistrano]]></category>
		<category><![CDATA[Sphinx]]></category>
		<category><![CDATA[ThinkingSphinx]]></category>
		<category><![CDATA[UltraSphinx]]></category>

		<guid isPermaLink="false">http://www.engineyard.com/blog/?p=1965</guid>
		<description><![CDATA[<p>Web applications often have the need to do text searches against data stored in a database. While the built in MySQL and PostgresSQL functions for full text searching <em>work</em>, they are often not the best solutions for fast and complex full text searching.</p>
<p>This is where dedicated search engines come into play, and Sphinx is our favorite tool for the job. Written in C++ by Andrew Aksyonoff, and originally released to open source in 2001, Sphinx is a blazing fast search engine. Considering that fast and complex full-text searching is a somewhat frequent need, I've put together this post with my top five tips for implementing Sphinx.</p>
<h2>1) Use <code>thinking_sphinx</code></h2>
<p>There are several plugins out there for Sphinx integration into Rails. UltraSphinx, ThinkingSphinx, and <code>acts_as_sphinx</code> (no longer under active development) are the most commonly used plugins.</p>
<p>We recommend ThinkingSphinx over UltraSphinx for several reasons. Ultrasphinx, unfortunately, can throw cryptic errors because of the way it preloads your indexed model. If you have patches that are in your lib directory, they must be explicitly required. And while the UltraSphinx way of defining blocks is simpler for simple cases, in more advanced cases it can become far less readable, and you'll hit those advanced cases before you know it.</p>
<h2>2) Know When to Index, and How Often</h2>
<p>This next tip is really three tips all bundled into one (because who wants to read <em>seven</em> tips for anything? ;) )</p>
<ol>
<li><strong>Know your requirements</strong><br />
What's an acceptable lag between the data being updated and becoming available in search results? Does it need to be instantaneous, or is it acceptable to wait 5, 10, 15 or even 30 minutes?<em><strong> </strong></em>The longer you wait between index updates, the less resources (and CPU time) your search engine consumes.</li>
<li><strong>Know how long your indexing takes</strong><br />
If it takes three minutes to complete an index run, but you kick off a full re-index every 60 seconds, that's not going to work out well. If you absolutely need an updated index every 60 seconds, then you need to consider alternatives like a bigger instance for your search engine or other strategies like delta indexing (below).</li>
<li><strong>Test your new indexes or new data in staging</strong><br />
When making changes to data being indexed or adding new indexes, do a test run of your indexing in a staging environment with snapshot of production data. Sometimes small changes to data or indexing result in an enormous increase in index size. If your changes create gigantic indexes, it's best to learn that on staging, instead of running out of space in your production environment.</li>
</ol>
<h2>3) Use Delta Indexes (When You Need To)</h2>
<p>Without a reindex, your search won't be up to date; the question is <em>when</em> to reindex.</p>
<p>When indexing small data-sets, a full reindex can be done frequently. But as size grows, so does the index, and with it the time it takes to index. This is when delta indexing comes into play. A delta index is nothing more than a second index containing indexes for only the documents that changed since your last index. There are three main methods of delta indexing built into ThinkingSphinx: the default behavior, timestamped deltas, and <code>delayed_job</code> integration.</p>
<p>The first method—the default behavior of <code>thinking_sphinx</code>—is big on convenience. On every save it fires off the delta indexer and you get near instantaneous index updating. However, while this works great on development environments, and most staging environments, in production this can be problematic.</p>
<p>One problem is that the indexer is now part of the request cycle, which means that with each save comes a reindex. This method will cause scaling problems—with increased traffic, the indexer will fire more frequently. This puts increased load on the database as well as the filesystem.</p>
<p>Another problem is that in a production environment with many instances, the delta index is only created on the instance that handles the request. This results in instances with out of date information until the next full index. We deal with this by adding a cron entry to run the delta index cron task on all machines that run the search daemon. This has the effect of keeping your indexes in sync to the interval that the cron job runs at.</p>
<p>The second method—the timestamped version—works by adding a time threshold to the define_index block e.g.</p>
<pre>set_property :delta =&gt; :datetime, :threshold =&gt; 1.hour</pre>
<p>This is the frequency with which you run your rake task to reindex the delta. This means your deltas are updated every hour (in this example). While a nice improvement to the built-in default, this means that your indexes are out of date until the next rake tasks run, so you need to set the frequency according to  user expectations (or reset expectations).</p>
<p>The third method uses the <code>delayed_job</code> gem and pushes a job onto the <code>delayed_job</code> queue that tells the indexer to run as needed. This is more immediate than the threshold option, while still running outside of the request cycle. This is the most promising setup, although it lends itself to a single searchd server setup. Specifically, a single machine running the indexer and search daemon with each instance sending reindex tasks to the queue when needed.</p>
<p>The drawbacks to this third approach are:</p>
<ol>
<li>You lose availability. If the <code>searchd</code> server goes down, your search goes down. Allowing each instance to have its own instance of <code>searchd</code> builds redundancy into the setup.</li>
<li> As mentioned in the official documentation,"because the delta indexing requests are queued, they will not be processed immediately—and so your search results will not not be accurate straight after a change. Delayed_Job is pretty fast at getting through the queue though, so it shouldn’t take too long."</li>
</ol>
<h2>4) Know Your Bottleneck: Database or Filesystem</h2>
<p>When maintaining your indexes, you have a choice of merging delta indexes into your main index or doing a full reindex. Merging can save you a database hit, but require twice the I/O of the two indexes to be merged, and hits the filesystem hard. On the other hand, reindexing hits the database hard. So you have to know your bottleneck. Most Rails developers are acutely aware of their database load. We optimize queries, we index tables, and we even use methods to read exclusively from the replica and write only to the master. So, instinctively, most developers select to merge their delta index into the main, rather than perform a full reindex, in order to take load off the database. But this isn't always right.</p>
<p>If your application processes a lot of uploads or your application has poor cacheability (and you're serving direct from filesystem a lot) then you probably want to avoid putting more load on the filesystem. In these cases, reindexing will make more sense then merging delta indexes.</p>
<h2>5) If you're on <code>ultrasphinx</code>, switch to <code>thinking_sphinx</code></h2>
<p>Rein Henrichs wrote a great <a title="A Thinking Man's Sphinx" href="http://reinh.com/blog/2008/07/14/a-thinking-mans-sphinx.html">blog post</a> which included the steps to make the switch. I'll expand on those here, and include some real world code samples.</p>
<p>Switching is actually relatively simple and in these four steps you can convert an Ultrasphinx application to a ThinkingSphinx one.</p>
<p><strong>1. Uninstall UltraSphinx and install ThinkingSphinx: </strong></p>
<p>Run:</p>
<p><code>script/plugin remove ultrasphinx</code></p>
<p>and add this line to you environment.rb:</p>
<p><code>config.gem('freelancing-god-thinking-sphinx', :lib =&gt; 'thinking_sphinx')</code></p>
<p><strong>2. Translate your <code>is_indexed</code> declaration into a <code>define_index</code> block and change your search actions to use the ThinkingSphinx API:</strong></p>
<pre>class Post &lt; ActiveRecord::Base
belongs_to :blog
belongs_to :category

is_indexed :conditions =&gt; "posts.state = 'published'",
  :fields     =&gt; [{:field =&gt; 'title', :sortable =&gt; true},
  {:field =&gt; 'body'},
  {:field =&gt; 'cached_tag_list'}],
  :include    =&gt; [{:association_name =&gt; "blog",
    :field            =&gt; "title",
    :as               =&gt; "blog",
    :sortable         =&gt;  true},
    {:association_name =&gt; "blog",
  :field            =&gt; "description",
  :as               =&gt; "blog_description"},
  {:association_name =&gt; "category",
  :field            =&gt; "title",
  :as               =&gt; "category",
  :sortable         =&gt;  true}]
end

class Post &lt; ActiveRecord::Base
  belongs_to :blog
  belongs_to :category

  define_index do
    indexes title, :sortable =&gt; true
    indexes body, cached_tag_list

    indexes blog.description, :as =&gt; :blog_description
    indexes blog.title,       :as =&gt; :blog,     :sortable =&gt; true
    indexes category.title,   :as =&gt; :category, :sortable =&gt; true

    where "posts.state = 'published'"
  end
end</pre>
<p>Your old search task might look like:</p>
<p><code>Ultrasphinx::Search.new(:query =&gt; params[:query])</code></p>
<p>Where your new one would look like (assuming you've indexed the model Post):</p>
<p><code>Post.search(params[:query])</code></p>
<p><strong>3. Rewrite your deployment tasks to run the ThinkingSphinx rake tasks:</strong></p>
<pre>namespace :sphinx do
  desc "Stop the sphinx server"
  task :stop, :roles =&gt; [:app], :only =&gt; {:sphinx =&gt; true} do
    run "cd #{latest_release} &amp;&amp; RAILS_ENV=#{rails_env} rake thinking_sphinx:stop"
  end

  desc "Reindex the sphinx server"
  task :index, :roles =&gt; [:app], :only =&gt; {:sphinx =&gt; true} do
    run "cd #{latest_release} &amp;&amp; RAILS_ENV=#{rails_env} rake thinking_sphinx:index"
  end

  desc "Configure the sphinx server"
  task :configure, :roles =&gt; [:app], :only =&gt; {:sphinx =&gt; true} do
    run "cd #{latest_release} &amp;&amp; RAILS_ENV=#{rails_env} rake thinking_sphinx:configure"
  end

  desc "Start the sphinx server"
  task :start, :roles =&gt; [:app], :only =&gt; {:sphinx =&gt; true} do
    run "cd #{latest_release} &amp;&amp; RAILS_ENV=#{rails_env} rake thinking_sphinx:start"
  end

  desc "Restart the sphinx server"
  task :restart, :roles =&gt; [:app], :only =&gt; {:sphinx =&gt; true} do
    run "cd #{latest_release} &amp;&amp; RAILS_ENV=#{rails_env} rake thinking_sphinx:running_start"
  end
end</pre>
<p>and you'll probably want to add these as well to automate the reindexing and starting on deploy:</p>
<pre>after "deploy:symlink_configs", "new_sphinx:configure"
after "sphinx:configure", "sphinx:index"
after "sphinx:index", "sphinx:restart"</pre>
<p>If you're <em>not</em> running on Engine Yard Slices, you can still get the benefit of prewritten Capistrano tasks by adding:</p>
<pre>require "vendor/plugins/thinking-sphinx/lib/thinking_sphinx/deploy/capistrano"</pre>
<p>to the top of your deploy.rb file. If you're running on the latest release from GitHub, as a plugin, the tasks should be included automatically. This is of course assuming that you are running Capistrano from a working repository and by itself (a non-developer deploying code for example—thanks for the tip, commenter <a href="http://blog.tty.nl/">Josh</a>!)</p>
<p>Because of a custom script that exists on Engine Yard Slices, if you are using our eycap gem, these tasks are included for you as:</p>
<pre>sphinx:configure
sphinx:reindex
sphinx:restart
sphinx:start
sphinx:stop
thinking_sphinx:configure
thinking_sphinx:reindex</pre>
<p><strong>4. Stop <code>searchd</code> and then run your new <code>configure</code>, <code>index</code> and <code>start</code> start tasks:</strong></p>
<pre>cap sphinx:stop &amp;&amp; cap sphinx:configure &amp;&amp; cap sphinx:index &amp;&amp; cap sphinx:start</pre>
<p>Solid searching is key in numerous applications; Sphinx is a great tool for many cases, and I hope this post helped convince you!
<p><a href="http://www.engineyard.com/blog"><img height="98" width="61" title="logo-engineyard" alt="" class="attachment-post-thumbnail wp-post-image" src="http://www.engineyard.com/blog/wp-content/uploads/logo-engineyard.png"/></a></p>
]]></description>
		<wfw:commentRss>http://www.engineyard.com/blog/2009/5-tips-for-sphinx-indexing/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
	</channel>
</rss>

