📝

Why does ActiveRecord often have performance issues?

2024/05/26に公開

What is this

This article explain why the performance of ActiveRecord(※) tend to be wrong.

※ What I call ActiveRecord means ActiveRecord library of Ruby(on Rails)

What is often said about rich ORM like ActiveRecord

Feature Rich ORM like ActiveRecord Go's database/sql
Productivity High Low
Performance Some overhead High
Flexibility Limited by abstraction High
Consistency Consistent API Hard to maintain consistency
Validation and Callbacks Built-in Need to implement manually
Dependencies Many Few
Learning Cost At first low but later large due to many features At first high due to requires deep SQL knowledge but later low

Introduction to ActiveRecord

Introduction to ActiveRecord of Ruby on Rails

Introduction to ActiveRecord of Ruby on Rails

What is ActiveRecord?

ActiveRecord is the Object Relational Mapping (ORM) layer supplied with Ruby on Rails. It provides an interface and binding between the tables in a relational database and the Ruby program code that manipulates database records.

Key concepts

1. Models

ActiveRecord models are Ruby classes that are mapped to database tables.
Each model inherits from ApplicationRecord, which in turn inherits from ActiveRecord::Base.

2. Migrations

Migrations are Ruby classes that are used to make changes to the database schema over time in a consistent and easy way.

3. CRUD Operations

ActiveRecord makes it easy to create, read, update, and delete (CRUD) records in the database.

Example Application

erDiagram
    POST {
        int id PK
        string title
        text body
        datetime created_at
        datetime updated_at
    }
    COMMENT {
        int id PK
        int post_id FK
        text body
        datetime created_at
        datetime updated_at
    }
    POST ||--o{ COMMENT : "has many"

Step 1: Creating a New Rails Application

First, let's create a new Rails application:

$ rails new blog
$ cd blog

Step 2: Generating a Model

Let's generate a Post model with a title and body:

$ rails generate model Post title:string body:text

This command generates a migration file, the model file, and the test files for the Post model.

Step 3: Running Migrations

Run the migrations to create the posts table in the database:

$ rails db:migrate

Step 4: Using the Model

Now that we have our model and table, let's use ActiveRecord to interact with the database.

Creating a Record

post = Post.new(title: 'First Post', body: 'This is the body of the first post')
post.save

Or in a single step:

Post.create(title: 'First Post', body: 'This is the body of the first post')

Reading Records

Find a post by its ID:

post = Post.find(1)

Find all posts:

posts = Post.all

Find posts with specific conditions:

posts = Post.where(title: 'First Post')

Updating Records

post = Post.find(1)
post.update(title: 'Updated Post Title')

Deleting Records

post = Post.find(1)
post.destroy

Step 5: Validations

ActiveRecord allows you to add validations to your models to ensure that only valid data is saved to the database. For example:

class Post < ApplicationRecord
  validates :title, presence: true
  validates :body, presence: true
end

With these validations, ActiveRecord will prevent saving a Post without a title or body:

post = Post.new(title: '')
post.save # returns false
post.errors.full_messages # returns ["Title can't be blank", "Body can't be blank"]

Step 6: Associations

ActiveRecord makes it easy to manage associations between models. For example, let's say we have a Comment model that belongs to a Post:

Generate the Comment model:

$ rails generate model Comment post:references body:text
$ rails db:migrate

Set up the association in the models:

class Post < ApplicationRecord
  has_many :comments, dependent: :destroy
end

class Comment < ApplicationRecord
  belongs_to :post
end

Now, you can create comments for a post and access them easily:

post = Post.create(title: 'First Post', body: 'This is the body of the first post')
post.comments.create(body: 'This is a comment')
post.comments # returns all comments for this post

Conclusion

This introduction covers the basics of ActiveRecord in Ruby on Rails. With these fundamentals, you can start building and interacting with your database models effectively. For more advanced topics, refer to the Rails Guides.

Classification for performance issue

Ruby

  • Dynamic-typed language
  • Require runtime interpreter
Advantage of Golang
  • Statistical-typed language
  • Concurrent GC
  • Native code(Already compiled)
  • Optimize compile by for example, inline expansion, removing dead codes,...etc
  • Light runtime with small overhead

ORM

  • Require some ORM specific techniques to improve performance because of many abstractions (ex. query, transactions, connections, ...etc)

ActiveRecord library

To summarize, some default behavior and convension can invoke performance issue at certain circumstances and they require us to have some knowledge or techniques to optimize them.

Understanding N+1 Problem and Lazy Loading in ActiveRecord

What is the N+1 Problem?

The N+1 problem is a performance issue that occurs when fetching data from a database. It happens when one query is executed to fetch the main records (the "1"), and then N additional queries are executed to fetch related records for each of the main records.

Example of N+1 Problem

Consider the following code that retrieves posts and their comments:

posts = Post.all
posts.each do |post|
  puts post.title
  post.comments.each do |comment|
    puts comment.body
  end
end

In this example:

  • SELECT * FROM posts is executed once.
  • For each post, SELECT * FROM comments WHERE post_id = ? is executed N times.

This results in a total of 1 + N queries, which can significantly degrade performance when N is large.

What is Lazy Loading?

Lazy loading means that related table data is only fetched when it is actually accessed. This is the default behavior in ActiveRecord. Each time a related record is accessed, a separate query is executed.

Lazy Loading Example

Using the previous example, post.comments triggers a new query each time it is accessed, leading to the N+1 problem:
Here, comments are lazily loaded for each post, causing a new query to be executed for each post's comments.

Eager Loading to Solve the N+1 Problem as an example of solution

Eager loading is a strategy to fetch all necessary data in a single query. In ActiveRecord, this can be achieved using the includes method. Eager loading reduces the number of queries by loading all related records at once beforehand.

Eager Loading Example

-posts = Post.all
+posts = Post.includes(:comments).all
posts.each do |post|
  puts post.title
  post.comments.each do |comment|
    puts comment.body
  end
end

In this example:

  • SELECT * FROM posts is executed once.
  • SELECT * FROM comments WHERE post_id IN (?) is executed once at the line of posts = Post.includes(:comments).all(before actually need title column's value)

This results in a total of 2 queries, regardless of the number of posts, thus solving the N+1 problem.

(Appendix) When Lazy Loading is Not Relevant

If there are no related tables, lazy loading and the N+1 problem are irrelevant.
For example, fetching records from a single table does not involve related records, so there is no concern about lazy loading or multiple queries.

Example with No Related Tables

users = User.all
users.each do |user|
  puts user.name
end

In this case:

  • Only SELECT * FROM users is executed.

There are no related records to fetch, so lazy loading and the N+1 problem do not apply.

Summary

  • N+1 Problem:
    • Occurs when fetching records of the related table with multiple queries.
  • Lazy Loading:
    • Default behavior where related data is fetched when accessed, leading to the N+1 problem.
  • Eager Loading:
    • Strategy to fetch all related data in a single query using includes, solving the N+1 problem.
  • (Single Table:)
    • Lazy loading and the N+1 problem are not relevant when there are no related tables.

ActiveRecord Pattern

The ActiveRecord pattern is just a pattern of Object design:

The object wrap and corresponds to a DB record, encupcellate DB access, include domain logic, and has both data and behavior.

Of course this pattern itself cannot invoke performance issue because it's just a design pattern, but this pattern has an affinity with LazyLoading of ActiveRTecord library and N+1 problems.

Discussion