How to not get into trouble with Active Record

How to not get into trouble with Active Record

Good practices with Active Record

Active Record is a layer, in the Rails application, that makes database manipulation easier but can also harm your application’s performance and reliability if not used properly. This article is a list of good practices that you should follow, and harmful practices that you should avoid.

Improve the performance, produce fewer bugs, and make the code responsible for database communication readable and testable by selecting the right Active Record features for the given scenario.

All my notes are based on years of hands-on experience at iRonin.IT - a top software development company, where we provide custom software development and IT staff augmentation services for a wide array of technologies.

Messing with one million

All examples included in this article are tested against one million of records in the database. I did this so you can see what is the difference in memory usage and execution time for the given scenario.

Generating test data

You can fill your database with the test data by generating the User model with the following definition:

rails g model User first_name:string last_name:string age:integer email:string

And creating the records with FFaker gem:

1.upto(1_000_000) do
  User.create!(
    first_name: FFaker::Name.first_name,
    last_name: FFaker::Name.last_name,
    age: (1..99).to_a.sample,
    email: FFaker::Internet.email
  )
end

Collecting the information about the performance

I collected execution time with the Benchmark library:

Benchmark.measure { … }

I measured allocated memory with the memory_profiler gem:

# bundle add memory_profiler
report = MemoryProfiler.report do
  # execute code
end

report.pretty_print(scale_bytes: true)

The test data that you will create will be different from the one I created so the results can be different on your machine.

Performance

Fetching large collections of records

Is your database filled with thousands or millions of records? Loading them at once into the memory is one of the easiest ways to crash your application in the production environment.

Avoid: using each and all with Active Record queries that are designed to return many records:

User.all.each do |user|
  puts user.id
end

Use: find_each to pull records in batches. It won’t load all records into the database and you can use it like an ordinary each method to iterate through all items in the collection:

User.find_each do |user|
  puts user.id
end

Selecting values for one column

It often happens that you have a collection of records but you want to receive the array of values for the given column. Doing it wrong you can load too many records into memory instead of doing the heavy-lifting on the database level.

Avoid: using map or collect directly on the collection produced by the Active Record:

User.where(age: 30).map(&:email)

Use: pluck to narrow the query to the database and reduce the time needed to pull the records:

User.where(age: 30).pluck(:email)

Messing with one million

My test database contains exactly 10 186 records where a user has an age equal to 30.

MethodTime of executionMemory allocated
#map0.17s11.67 MB
#pluck0.04s1.75 MB

Counting records

It seems that there is no simpler task than counting the records for a given query. Nothing is more wrong, especially when it comes to databases with thousands or millions of records.

In Rails, we have three methods for counting items in the Active Record collection: size, count, and length. If you will use the wrong one, you can harm the performance of your application and make way more queries to the database than necessary.

It’s enough for you to memorize the following:

  • When you use length, it always loads all records into memory and then counts them

  • Use count if you want to perform SQL count query as this method always does it even when you pulled the collection earlier

  • size is the most flexible one as it performs SQL count query if records are not loaded in the memory and it counts the elements from memory when they are already loaded

An additional advantage of using size is that you can benefit from using cache counters as Active Record handles them automatically.

Messing with one million

My test database contains exactly 9 960 records where a user has an age equal to 27. If I want to count all users with the age of 27 the results are the following:

MethodTime of executionMemory allocated
#count0.036s (same if records are loaded)8.34kb
#size0.035s (0.000007 if records are loaded)8.34kb
#length0.19s (0.000007 if records are loaded)9.45mb

Bulk insert and update

You can harm your application’s performance not only by pulling the information from the database but also by inserting it (especially when it comes to inserting a lot of data).

When you need to insert or update multiple records it is always good to consider doing it in bulk mode.

Avoid: creating records in loop:

users_data.each do |attrs|
  User.create!(attrs)
end

Use: create records in bulk:

User.insert_all(users_data)

Of course, inserting records in bulk is not always as easy as it seems to be. You have to deal with duplicate records or other cases but the insert_all method can handle this also with some additional parameters.

Messing with one million

If we would like to create 20 records the difference between creating in loop and using the bulk mode is the following:

MethodTime of executionMemory allocated
.create0.035s590.54kb
.insert_all0.0076s65.06kb

Let the database do the job

It sounds obvious but the practice sometimes looks different especially when we are in hurry or when we work with the old legacy codebases. By saying about letting the database do the job I mean stuff like sorting, filtering, etc.

Avoid: doing with Ruby operations that can be performed on SQL level

User.where(age: 51).select { |user| user.first_name.starts_with?('A') }.take(10)

Use: SQL to perform actions for better performance

User.where(age: 51).where("first_name LIKE 'A%'").limit(10)

Messing with one million

MethodTime of executionMemory allocated
Ruby0.19s12.61mb
SQL0.000158s3.15kb

Checking the record presence

Another simple yet tricky thing. Rails developers got used to use .present? or .blank? for checking the presence of the data given variable or collection. However, calling these methods on Active Record’s collection can introduce performance problems.

If you don’t want to mess up the performance, remember:

  • Calling .exists? will trigger an SQL query even if records are already loaded into the memory

  • Calling .present? or .blank? will load records into memory so use them only if you are doing something else with these records later on

  • If you want to load the collection and check for its existence, use @collection.load.any?, @collection.load.none? or @collection.load.none?

Data integrity

Finding records by…

I bet that in the Ruby world, the “undefined method for NilClass” is one of the most common errors that developers experience, especially when working with big and old legacy codebases.

Each time you use the find_by method, always take into account that it can return a nil value. You have to handle the nil value or use find_by! method instead to raise an error when the record is not found.

Avoid: using find_by method without explicitly handling possible nil value:

user = User.find_by(email: 'john@doe.com')
user.first_name

Use: find_by! method if you would like to raise an exception if the record is not present:

user = User.find_by!(...)
user.first_name

Use: find_by and handle the nil value explicitly if you don’t want to raise an exception:

user = User.find_by(...)
user&.first_name

# or

if (user = User.find_by(...))
  user.first_name
end

The bang methods

Continuing the previous example, the bang method (find_by! in the previous example) is a special method that will raise an exception in case something went wrong. When saving, creating, or updating the record we can also take advantage of the special version of methods.

Transactions

Using bang methods is extremely important when dealing with database transactions.

Avoid: using non-bang methods when dealing with transactions as transaction won’t be rollbacked when the record won’t be saved:

User.transaction do
  user = User.create(...)
  SomeService.call(user)
end

Use: bang methods to rollback the whole transaction in case of a problem. The whole idea behind the transactions is to achieve “all or nothing” state where all steps of the process are completed or none of them is performed:

User.transaction do
  user = User.create!(...)
  SomeService.call(user)
end

Difference between delete and destroy

The most important difference is that when you call destroy, all related callbacks from the model are called, and when you call delete you just simply delete the record from the database. Because of that, the delete option is faster.

If you just want to remove the record without triggering any workflows, validations, and other callbacks, simply go for delete. Otherwise use destroy which is a standard way of removing records from the database in a typical Rails application.

Readability and testability

You can bring trouble to your application not only by calling the database in an ineffective manner but also by writing code that is not readable and hard to maintain.

Use scopes for common use cases

If you use the same queries multiple times in your application, consider moving them to the scope for better maintainability but also for better readability.

Avoid: using the same query multiple times in different places

User.where('age > ?', 17).where(...)

Use: scope to have more readable code and be able to introduce changes faster

scope :adults, -> { where('age > ?', 17) }

User.adults.where(...)

Query objects to isolate the query layer

Scopes are great for smaller and repeatable queries but when it comes to complex queries that are used once or more, the query object pattern might be helpful as well.

Avoid: mixing the logic for more complex queries with other layers of logic

results = User.where('first_name = ? or ?', 'John', 'Angie').where('age > ?', 17).order('age DESC')

results.each do |user|
 # do something
end

Use: query object pattern to isolate the logic and test the code easier

AdultJohnOrAngieQuery.call.each do |user|
  # do something
end

Didn't get enough of Ruby?

Check out our free books about Ruby to level up your skills and become a better software developer.