Background Job processing is all the rage lately, with numerous folks speaking and blogging about it—and rightly so. Since response time is a critical factor when scaling a web application, it makes sense to focus on keeping response times low, even when the app has tasks to perform. Moving the heavy lifting out of the request and response cycle is key to scaling a web application with high performance.
There’s been a fair bit of good coverage of available background job frameworks recently. But I’m not going to do a technology review here, instead, I’ll walk through some of the deployment best practices we’ve come to agree on here at Engine Yard.
1. Know Your Limits
Most background jobs do heavy number crunching, content fetching, video transcoding, etc. It makes sense then, that the user has to wait. That said, we want to keep that wait time to a reasonable limit. If you need to transcode an average of 15 videos every hour, your maximum (average) execution time for your background job is four minutes. Knowing how many jobs you have to process at peak will help you provide enough resources to make sure that your jobs complete in time.
Be sure you’re using technology you understand. Just because something is getting lots of attention on Hacker News doesn’t mean it’s right for you. If you don’t understand the tool you’re using you will likely have difficultly installing and operating it correctly.
2. You Did Benchmark Your Jobs… Right?
Far too often in the rush to scale, folks leave out load testing. Because most background jobs are heavy tasks, understanding the resource utilization of your jobs is extremely important. First you want to know how much memory and CPU your most frequently utilized job consumes. You can figure this out by submitting one job and watching its resource utilization in top. Alternatively you can run benchmarks on the code being run in the background (please use “bmbm” benchmarks!)
When benchmarking your jobs, you must use production data, and use libraries that match the production environment—be it 32bit or 64bit. If you have slow benchmarks and high memory usage, review your code for areas that may create large objects (use Model.find_in_batches and Array.each for example.)
3. Track Job Failures and Queue Length
Knowing that your jobs are performant and that you’ve deployed adequate resources is a start, but you’re not quite there yet. At some point your jobs will fail due to resource starvation, bugs in your code or other unexpected monkey wrenches. The last thing you want is an inbox full of emails from your hard-earned customers informing you that ‘the site is broken.’
Use an exception notifier (such as hoptoad) to instrument your jobs. If your background job implementation defaults to automatically deleting failed jobs, disable this feature. Make sure you can easily pause the processing of specific jobs when you receive numerous failure alerts. Implement a “failure” job to test that your notification system works. Generate graphs that make it easy to visualize your job queue.
4. Know the Hidden Pitfalls
- Make sure your Monit or God config starts one or two fewer workers than you expect your instance to be able to handle. This leaves headroom, so if a job or two hangs (or gets backed up) your system will stay out of swap and the jobs will complete.
- Delayed Job uses UTC time; if your workers use local time and your database uses UTC (or vice versa), this may cause issues with jobs failing to be executed.
- Loading the full Rails environment increases resource costs and job processing time; only load what you need when you need it. If you have a job that is particularly memory or resource intensive, split it into multiple jobs so the work can be divided over additional servers.
- Make sure you set timeouts for your jobs; you don’t want to pull an RSS feed and wait forever just for the HTTP timeout.
5. Don’t Use BackgrounDRb
Friends don’t let friends use backgrounDRb. BackgrounDRb served its purpose with excellence when it was created (at a time when there were no alternatives). Now there are many alternatives to get the job done that don’t leak memory and work reliably (like our recommended backgroundjob and delayed_job).
So what are the takeaways?
Measure the scale of your work both in time and resources. Benchmark your jobs with live data in a cloned production environment. Don’t wait for your jobs to fail—expect them to. Make sure you’re notified before your users see trouble. And last, don’t forget to periodically review your job performance. You are planning on doing some heavy lifting aren’t you?

Is there anyone currently maintaining delayed_job? Because there's this huge proliferation of forks, and I don't see much merging.
I'm not aware if Tobias is still maintaing or not. I think with all the forks the "masses" are choosing at will. Collectiveidea and Weplay have pretty active forks on GH with some good patches (http://github.com/collectiveidea/delayed_job and http://github.com/weplay/delayed_job). Looks like there is a good opportunity for someone to step in and help here :)
How can you discern the features of each fork? It's a conundrum of willy-nilly forkin'.
I just review the commit logs and look at the code itself. Also you can always hop on irc (irc.freenode.net) and talk to the author(s).
I believe the wiki now has a good overview of each fork, i personally use collectiveidea
Backgroundjob on Phusion Passenger starts the same job for each thread – does anyone know a fix?
Strange because it does put a lock on the record when fetching the job from the database, job_fu does the same thing.
ah, I remember when setting up Bj, it complained about my sucky database ;-)
I am on MySQL 5, maybe no locking possible…will check into that. Thank you!
I ran bj setup again, but the complaints were not about locking, but rather:
WARNING: your database is sucking and does not support unique indexes on text fields!?
This might be true for MySQL, but could it be a reason for the multiple job execution ?
Are you using MySQL InnoDB engine?
Any thoughts on what's wrong with BackgrounDRb as you see it now? I used it a few years ago when it relied on threads (it was ridiculously easy to eat database connections for example).
Personally I didn't like backgroundjob when I last saw it (documentation looked a bit rough) and delayed_job looks like an excellent solution from a coding point of view, but there is a proliferation of forks. It would be nice to see a table of pros and cons between each, because picking what gems and plugins to use is a pitfall in the Rails world.
Hey Sav,
I'd suggest the main issues with BackgrounDRB are around memory leaks and process control. Re a table of pros and cons — sounds like a good idea for a future blog post :)
Hey Sav,
I'd suggest the main issues with BackgrounDRB are around memory leaks and process control. Re a table of pros and cons — sounds like a good idea for a future blog post :)
The one thing I still like about BackgrounDRb is the built-in scheduler, AFAIK neither backgroundjob or delayed_job have this?
I'm using job_fu add you can queue tasks to run at specifik times, very handy.
Looked at job_fu, but didn't see ability to schedule tasks. Would be cool if you could point to a post of how you did that (I don't see anything in job_fu's readme in Github on how to indicate a schedule). Also looked at daemon-kit, but it looks like it is more for plain Ruby (isn't integrated with Rails). I was pointed to Rails Cron/RailsCron but it that looks like the project died and was buggy. Then there is Craken, but that is Rake-specific. Whenever sounds neat, but I would rather it not require cron (since I want the Rails app to be able to be deployed to environment that doesn't have cron). I basically would just like to have a Ruby-based cron-like scheduler kicked off when my Rails app is deployed, without having to rely on telling the user to setup cron or some other scheduler to do tasks (since they are an integral part of the application as part of setup), and without requiring cron to live on the system it is deployed on. Any ideas?
Here's something that might work for tasks scheduling in Rails: scheduler_daemon (http://github.com/ssoroka/scheduler_daemon/tree/m...
Oops. Try this link instead for scheduler_daemon: http://github.com/ssoroka/scheduler_daemon/tree/m...
Thanks for the review and the link! :)
We had an odd issue when using thinking sphinx and vendored rails. Using rails 2.3.3 and thinking sphinx with delta indexes we'd get seg faults sporadically, the exception happening in a few places in rails, when running a cyclic reindex through a cron job. Moving rails back to a system installed gem fixed it. This happened on two separate servers. Has anyone else seen this?
I use BJ but the jobs do not run. They always have the state "pending".
I'm having this issue as well
Actually, some knee-deep googling turned up a solution, applied here: http://github.com/ambethia/bj
Ambethia, you just made my day! I had that some problem and your fork solved it! Now I'm gonna follow you on github… I had to build the gem by myself since you had it not working. I guess you should remove the whole rails folder in your spec, so that the gem is fine and small. It worked for me (using your code).
Any good tutorials on running multiple backgroundjob instances across multiple servers to handle huge backend needs? Thanks!
Hey Todd,
There isn't anything special that should be required. There is logic in most of the libraries to handle multiple hosts processing jobs. The main thing to remember is that they need to be deployed on the same code base and utilize the same database (since this is where the jobs live and are locked). For instance we have a number of customers using BJ configured to run under cron on multiple slices. Additionally we have customers running DJ (workers) on multiple slices. The setup is the same each time.
Any good solutions to getting a dedicated instance of delayed_job job_runner to execute only certain locally submitted jobs? We have an upload task that reads from a local file, and as we have no shared disk on EY, has to run on the app instance in which the user originally uploaded.
Has any one here gotten Bj to setup on a Windows XP box? I've tried both gem and plugin and neither is working. I can share details, for sure, but don't want to take up bandwidth unless there' s someone out there interested or experienced.
Thanks, Bill
one more thought: Monitor your processes that manage background tasks using something like Webmin. That's how we monitor our Delayed Jobs rake taks at Active Interview, to make sure everything is actually running smoothly:
http://blog.activeinterview.com/post/2010/01/28/u...