If you only ever interact with your database through your ORM’s interface and syntax, you may be slowing your application down.
Now and then, I see developers do validations, constraints and hooks using their ORM, and think to myself: “Why don’t these people use their database to do that?”
I’ve been comfortable with SQL for both queries and data definition for a long time, so I always write my all schemas and most queries in SQL, by hand. But I began to wonder: Is there really any difference between doing these things natively in the database as opposed to in Ruby? So I wrote up a modest benchmark script. Following are my findings.
I am far from being an expert in benchmarking or statistics. I would be appreciative of anyone that could point out errors in my method. I also acknowledge that the processes covered are not at all exhaustive of the many possible validations and hooks one might use.
The Test Code
- schema.sql: the data definition SQL
- models.rb: the modelling code
- better-benchmark.rb: some code to help with the statistics
- test1.rb: the actual test code
- results.txt: the full result output
In all test output, method 1 uses Ruby, whereas method 2 uses the database.
Test 1: Record creation hook
In this test I compare a before_create Ruby hook to a functional column default.
# Ruby before_create do set( :time_created => Time.now ) end
-- SQL time_created TIMESTAMP NOT NULL DEFAULT NOW()
Results
Set 1 mean: 7.756 s Set 1 std dev: 2.373 Set 2 mean: 2.861 s Set 2 std dev: 0.052 p.value: 1.45088891038497e-11 W: 400.0 The difference (-63.1%) IS statistically significant.
By using a database function to fill in default values on creation, I chopped my record creation time to nearly 1/3 of its original cost.
Test 2: Insertion validation
In this test, I compare validation of the length of a text field; ORM validation versus database validation.
# Ruby validates_length_of :name, :minimum => 8
-- SQL name VARCHAR( 255 ) NOT NULL CHECK( char_length( name ) > 7 ),
Results
Set 1 mean: 3.772 s Set 1 std dev: 0.993 Set 2 mean: 4.157 s Set 2 std dev: 0.607 p.value: 0.13441884182804 W: 144.0 The difference (+10.2%) IS NOT statistically significant.
Although the difference is substantively significant, we are informed by the statistical test that it is not statistically significant. Therefore, we cannot conclude that there is any difference between these two approaches to validation.
Test 3: Deletion hook - cascade deletion
In this test, I compare cascading a delete using your ORM versus the database.
# Ruby before_destroy do Employee1.where( :company_id => pk ).destroy end
-- SQL company_id INTEGER NOT NULL REFERENCES companies2( id ) ON DELETE CASCADE,
Results
Set 1 mean: 3.581 s Set 1 std dev: 0.836 Set 2 mean: 0.087 s Set 2 std dev: 0.028 p.value: 1.75847735938658e-08 W: 383.0 The difference (-97.6%) IS statistically significant.
Here, the database method crushes in-Ruby cascading, having a microscopic 2.4% of the time cost of the Ruby method.
Test 4: Deletion hook - update other table
In this test, I compare the in-Ruby and in-database means of updating a table after deleting a record from another one.
# Ruby class Post1 < Sequel::Model( :posts1 ) after_destroy do author.update_post_count end end class Author1 < Sequel::Model( :authors1 ) def update_post_count post_count = posts.size save end end
-- SQL CREATE RULE update_post_count AS ON DELETE TO posts2 DO ALSO ( UPDATE authors2 SET post_count = ( SELECT COUNT(*) FROM posts2 WHERE author_id = OLD.author_id ) - 1 WHERE id = OLD.author_id ; );
Results
Set 1 mean: 1.801 s Set 1 std dev: 0.230 Set 2 mean: 1.103 s Set 2 std dev: 0.234 p.value: 2.90177782076994e-11 W: 399.0 The difference (-38.7%) IS statistically significant.
Again, the database version wins out. This time, the improvement is over 1/3 the time cost of the Ruby version.
Conclusion
The in-database method beat the in-Ruby method significantly in 3 of the 4 tests I ran. In the remaining test, the results were inconclusive (being statistically insignificant), so we are unable to reject the null hypothesis that there is no difference between the two methods in that test.
If you are after a more natural syntax, and want to stay in your programming language as much as possible, using your ORM for hooks, validations and constraints is the way to go.
But if speed is a priority, sharpen your SQL Fu so you can open up your options and harness the full potential of your RDBMS. As these test results demonstrate, most hooks and checks are better off done in your database, and not with your ORM.
Footnotes
Substantive significance is the importance, meaningfulness, or magnitude of an effect or difference. For example, a speed improvement of 100% is substantively significant, whereas a speed improvement of 0.1% is not.Share This
Comment by Pistos — March 6, 2008 @ 13:25
See also this post by Ara Howard about how ActiveRecord validations (and those of most ORMs too, probably) are fatally flawed.
Comment by Pistos — June 5, 2008 @ 07:49
I’ve more formally created/released better-benchmark here.