Faster Rails tests with Hydra
May 18, 2011 § 6 Comments
When working on a large Rails app, the feedback loop between writing code, running tests and finding out about failures can become too long. Hydra is a distributed testing framework that helps speed things up.
Configuring your local machine
Configuring Hydra for use in a typical Rails project is pretty straight forward. We need to install the gem, add some config and finally create a rake task to kick off our tests.
First off we need to add the Hydra gem to our Project. If you are using Bundler with Rails, this is as simple as:
group :test do gem 'hydra', :require => false end
Note we are not imediately requiring Hydra. We only want to require Hydra when we are running tests. We’ll get to that in a little bit. A quick
bundle install and you should be all set to start telling Hydra what to do.
The configuration of Hydra takes place in a
hydra.yml file. In a typical Rails application, this will live in the
config directory. Initially, the
hydra.yml should look something like this:
--- workers: - type: local runners: 2
Here, we are telling Hydra that we want one local worker (our local machine) and two runners on that worker (representing the number of CPU cores our worker machine has. In my case, this is two. You may have four or even eight, you lucky thing).
To utilise Hydra to run our tests, we can create a rake task to sort out all the files we want to run, then run them in parallel using Hydra’s provided TestTask. In a regular Rails app, this will live in
lib/tasks. Initially, our rake task will look like this:
# require hydra and rescue any load errors if it's not available for some reason (i.e. in the production environment). begin require 'hydra' require 'hydra/tasks' rescue LoadError else # Put all tests into one array all_tests = (Dir.glob("test/unit/**/*_test.rb") + Dir.glob("test/functional/**/*_test.rb") + Dir.glob("test/integration/**/*_test.rb")) # Specify tests that don't play well with others dangerous_tests =  # Separate the good from the bad safe_tests = all_tests - dangerous_tests Hydra::TestTask.new('hydra:safe') do |t| t.files = safe_tests t.verbose = true end Hydra::TestTask.new('hydra:dangerous') do |t| t.files = dangerous_tests # Dangerous tests are run in serial t.serial = true t.verbose = true end # run all tasks together task :hydra => ['hydra:safe', 'hydra:dangerous']
You can organise your test files any way you see fit, create multiple rake tasks and string them together, or otherwise customise how you want Hydra to run your tests.
We can now try this out with
rake hydra. Congratulations! You are now running your tests in parallel!
Configuring Remote Machines
In order to distribute our tests and massively reduce the time they take to run, we need to tell Hydra about the machines we have access to. This could be a machine on your local network (I’ve had great results utilizing a Mac Mini), or potentially an Amazon EC2 instance (something I am yet to try, but pretty excited about).
Adding heads to our mythical beast is easy enough. hydra.yml is where we define our workers and runners.
To add information about our remote machine, we simply add another worker of the type ‘ssh’. Our
hydra.yml now looks like this:
--- workers: - type: local runners: 2 - type: ssh connect: remote_machine directory: /Absolute/path/to/project runners: 2
The value passed to
connect: can either be in the form
user@ip_addy_or_url or the name of a host defined in
~/.ssh/config. Either way, you must be able to ssh to this machine without using a password.
directory: is the absolute path to where the project will live on the remote machine (you’ll have to ssh in and create this directory if it doesn’t already exist).
Depending on how many cores you have on your remote machine,
runners: will vary (remember, one runner per core, one worker per machine).
As you work on your project, files will change. Hydra itself does not provide a way to keep your all machines up to date with the latest version of your local code. It farms that task out to rsync. If configured, every time you run a
Hydra::TestTask, rsync will jump into action first and send all your remote machines any changes to your code that might have happened. To configure rsync, we simply provide some more info in
--- sync: directory: /Absolute/path/to/local/project exclude: - tmp - log - doc workers: - type: local runners: 2 - type: ssh connect: mini directory: /Absolute/path/to/remote/project runners: 2
sync: option allows us to tell rsync where our project lives (again, be sure to pass it an absolute path), and a list of directories we’re not interested in. Now when we run our tests with hydra, rsync knows if anything needs to be sent out to the remotes first, making sure you’re testing your latest code. At this point Hydra is almost ready. We just need to configure our remotes.
To make life a little easier we’ll set up some convenience tasks in out
Sometimes we’ll want to syc up the project before running tests. If you create a migration or update the Gemfile, the tests won’t have a hope of passing. Luckily, we can define
Hydra::SyncTask.new('hydra:sync') in our
hydra.rake. This allows us to run
rake hydra:sync locally and push out any changes.
Remote and Global Tasks
Hydra provides a collection of remote tasks that can be used to run your existing rake tasks on your remote machines.
Hydra::GlobalTask.new('some:rake:task') will create the rake task
hydra:some:rake:task. You can then run this on your local machine and Hydra will dutifully carry out
some:rake:task locally, then on any remote machines you have configured.
I’ve found the following useful:
Hydra::GlobalTask.new('db:migrate') # => rake hydra:db:migrate Hydra::GlobalTask.new('db:create:all') # => rake hydra:db:create:all Hydra::GlobalTask.new('db:test:prepare') # => rake hydra:db:test:prepare </code>
I also created a ‘bundle’ task to facilitate bundler:
task :bundle do %x"(bundle check || bundle install)" end Hydra::GlobalTask.new('bundle')
These tasks assume that the remote is pre-configured to a certain extent – we need Ruby to be installed, the target directory to exist and to have the bundler gem and the hydra gem installed prior to starting (hydra needs to be installed on the box as well as be in the Gemfile because we need to load hydra before we load the Rails environment). You can use some of these, all of these or some entirely different tasks. I think this strikes a nice balance between convention and configuration.
The final piece of the puzzle is the hydra_worker_init file. This file lives in the root of the project and gets loaded by each worker if it exists. This is the only way I have found to add the test directory to the load path on every machine. Because it lives in the root dir, and because nothing is really loaded when the worker is spawned, all it contains is this:
$: << 'test'
Now we have all our tasks in place, we can sync our project out to our remote machines, create databases, migrate and prepare them and finally run the tests lightning fast.
I have created a sample Rails app for you to refer to. This is configured pretty much as outlined here. All the files are in the right place and, whilst the tests are a rather contrived, I think it demonstrates how to get set up.
Hydra only runs on ruby 1.8. because of the discrepancies with Test::Unit and Minitest. It seems like implementing the test runner for 1.9 would break backwards compatibility.
I hope this helps you get your tests distributed and running blisteringly fast. Let me know how you get on!