Faster Rails tests with Hydra

May 18, 2011 § 6 Comments

When working on a large Rails app, the feedback loop between writing code, running tests and finding out about failures can become too long. Hydra is a distributed testing framework that helps speed things up.

Configuring your local machine

Configuring Hydra for use in a typical Rails project is pretty straight forward. We need to install the gem, add some config and finally create a rake task to kick off our tests.

Gemfile

First off we need to add the Hydra gem to our Project. If you are using Bundler with Rails, this is as simple as:

group :test do
  gem 'hydra', :require => false
end

Note we are not imediately requiring Hydra. We only want to require Hydra when we are running tests. We’ll get to that in a little bit. A quick bundle install and you should be all set to start telling Hydra what to do.

hydra.yml

The configuration of Hydra takes place in a hydra.yml file. In a typical Rails application, this will live in the config directory. Initially, the hydra.yml should look something like this:

--- 
workers: 
  - type: local 
    runners: 2

Here, we are telling Hydra that we want one local worker (our local machine) and two runners on that worker (representing the number of CPU cores our worker machine has. In my case, this is two. You may have four or even eight, you lucky thing).

hydra.rake

To utilise Hydra to run our tests, we can create a rake task to sort out all the files we want to run, then run them in parallel using Hydra’s provided TestTask. In a regular Rails app, this will live in lib/tasks. Initially, our rake task will look like this:

# require hydra and rescue any load errors if it's not available for some reason (i.e. in the production environment).
begin
  require 'hydra'
  require 'hydra/tasks'
rescue
  LoadError
else
  # Put all tests into one array
  all_tests = (Dir.glob("test/unit/**/*_test.rb") +
  Dir.glob("test/functional/**/*_test.rb") +
  Dir.glob("test/integration/**/*_test.rb"))

  # Specify tests that don't play well with others
  dangerous_tests = []

  # Separate the good from the bad
  safe_tests = all_tests - dangerous_tests

  Hydra::TestTask.new('hydra:safe') do |t|
  t.files = safe_tests
  t.verbose = true
end

Hydra::TestTask.new('hydra:dangerous') do |t|
  t.files = dangerous_tests
  
  # Dangerous tests are run in serial
  t.serial = true
  t.verbose = true
end

# run all tasks together
task :hydra => ['hydra:safe', 'hydra:dangerous']

You can organise your test files any way you see fit, create multiple rake tasks and string them together, or otherwise customise how you want Hydra to run your tests.

We can now try this out with rake hydra. Congratulations! You are now running your tests in parallel!

Configuring Remote Machines

In order to distribute our tests and massively reduce the time they take to run, we need to tell Hydra about the machines we have access to. This could be a machine on your local network (I’ve had great results utilizing a Mac Mini), or potentially an Amazon EC2 instance (something I am yet to try, but pretty excited about).

Adding heads to our mythical beast is easy enough. hydra.yml is where we define our workers and runners.

To add information about our remote machine, we simply add another worker of the type ‘ssh’. Our hydra.yml now looks like this:

--- workers: 
  - type: local 
    runners: 2 
  - type: ssh 
    connect: remote_machine directory: /Absolute/path/to/project 
    runners: 2

The value passed to connect: can either be in the form user@ip_addy_or_url or the name of a host defined in~/.ssh/config. Either way, you must be able to ssh to this machine .

directory: is the absolute path to where the project will live on the remote machine (you’ll have to ssh in and create this directory if it doesn’t already exist).

Depending on how many cores you have on your remote machine, runners: will vary (remember, one runner per core, one worker per machine).

Synchronising files

As you work on your project, files will change. Hydra itself does not provide a way to keep your all machines up to date with the latest version of your local code. It farms that task out to rsync. If configured, every time you run a Hydra::TestTask, rsync will jump into action first and send all your remote machines any changes to your code that might have happened. To configure rsync, we simply provide some more info in hydra.yml:

--- 
sync: 
  directory: /Absolute/path/to/local/project 
  exclude: 
    - tmp 
    - log 
    - doc 
  workers: 
    - type: local 
      runners: 2 
    - type: ssh 
      connect: mini 
      directory: /Absolute/path/to/remote/project 
      runners: 2

The sync: option allows us to tell rsync where our project lives (again, be sure to pass it an absolute path), and a list of directories we’re not interested in. Now when we run our tests with hydra, rsync knows if anything needs to be sent out to the remotes first, making sure you’re testing your latest code. At this point Hydra is almost ready. We just need to configure our remotes.

To make life a little easier we’ll set up some convenience tasks in out hydra.rake file.

Sync Tasks

Sometimes we’ll want to syc up the project before running tests. If you create a migration or update the Gemfile, the tests won’t have a hope of passing. Luckily, we can define Hydra::SyncTask.new('hydra:sync') in our hydra.rake. This allows us to run rake hydra:sync locally and push out any changes.

Remote and Global Tasks

Hydra provides a collection of remote tasks that can be used to run your existing rake tasks on your remote machines.Hydra::GlobalTask.new('some:rake:task') will create the rake task hydra:some:rake:task. You can then run this on your local machine and Hydra will dutifully carry out some:rake:task locally, then on any remote machines you have configured.

I’ve found the following useful:

Hydra::GlobalTask.new('db:migrate') 
# => rake hydra:db:migrate 
Hydra::GlobalTask.new('db:create:all') 
# => rake hydra:db:create:all 
Hydra::GlobalTask.new('db:test:prepare') 
# => rake hydra:db:test:prepare

I also created a ‘bundle’ task to facilitate bundler:

task :bundle do 
  %x"(bundle check || bundle install)" 
end 
Hydra::GlobalTask.new('bundle')

These tasks assume that the remote is pre-configured to a certain extent – we need Ruby to be installed, the target directory to exist and to have the bundler gem and the hydra gem installed prior to starting (hydra needs to be installed on the box as well as be in the Gemfile because we need to load hydra before we load the Rails environment). You can use some of these, all of these or some entirely different tasks. I think this strikes a nice balance between convention and configuration.

hydra_worker_init.rb

The final piece of the puzzle is the hydra_worker_init file. This file lives in the root of the project and gets loaded by each worker if it exists. This is the only way I have found to add the test directory to the load path on every machine. Because it lives in the root dir, and because nothing is really loaded when the worker is spawned, all it contains is this:

$: << 'test'

Now we have all our tasks in place, we can sync our project out to our remote machines, create databases, migrate and prepare them and finally run the tests lightning fast.

Example

I have created a sample Rails app for you to refer to. This is configured pretty much as outlined here. All the files are in the right place and, whilst the tests are a rather contrived, I think it demonstrates how to get set up.

Gotchas

Hydra only runs on ruby 1.8. because of the discrepancies with Test::Unit and Minitest. It seems like implementing the test runner for 1.9 would break backwards compatibility.

I hope this helps you get your tests distributed and running blisteringly fast. Let me know how you get on!

§ 6 Responses to Faster Rails tests with Hydra

says:

August 12, 2011 at 9:31 pm

What kind of performance results are you seeing?

Reply
- says:
  
  August 14, 2011 at 2:07 pm
  
  Hey Daniel,
  
  I documented my results on the Hydra project wiki (I’m Adam Rogers): https://github.com/ngauthier/hydra/wiki/Success-Stories
  
  The best I got was a 400% increase, but that was kind of cheating. I normally run Hydra locally and it provides a 25% speedup running on a dual-core Mac Book Pro.
  
  Hope that helps.
  
  Reply
Millisami says:

October 17, 2011 at 6:28 am

Doesn’t work in Ruby 1.9.2 and there ain’t any devs in any GH forks as well. Just a note for others using Ruby 1.9.2

Reply
- Rodreegez says:
  
  October 17, 2011 at 12:21 pm
  
  Yep, as I mentioned in the last paragraph, issue 29 on Hydra highlights the incompatibility with Ruby 1.9. This is down to 1.9 using MiniTest rather than Test::Unit under the hood. 1.9 compatibility would require re-architecting the whole thing to work with MiniTest.
  
  https://github.com/ngauthier/hydra/issues/29
  
  Reply
BvD says:

May 23, 2012 at 2:26 pm

This looks great, but I can’t get the rake task to run; copied your exact code in a file called Hydra.rb in lib/tasks but then when i do ‘rake hydra’ i get ‘Don’t know how to build task ‘hydra”
Is there any particular title for this file?? Not sure why this doesn’t work…
thnx for any pointers

Reply
- Phil Nash says:
  
  May 23, 2012 at 4:51 pm
  
  Try calling the file hydra.rake, I believe the Rails Rakefile loads in files with .rake extensions.
  
  Reply

Logical Friday