Arel, Part III: Set Subtraction - Rails

This post is the third entry in our series on Arel. For Part I, see here. For Part II, see here.

In modern life science research, it's relatively common practice to work with large datasets. As time goes on and new information is gleaned from experiments or analytics, elements in that original set may be deemed irrelevant or uninteresting and are removed from the active dataset.

As developers, the notion of subtracting one set from another is a common one. In Ruby, for example, we have methods like Array#- and Set#subtract. Similar constructs exist in SQL, like the Postgres' EXCEPT clauses.

Unfortunately, ActiveRecord doesn't provide an interface for database set subtraction. Luckily for us, Arel does.

Let's say that you have the following models in your system:

class Post < ActiveRecord::Base
  has_many :posts_authors
  has_many :authors, through: :posts_authors
end

class PostsAuthor < ActiveRecord::Base
  belongs_to :post
  belongs_to :author
end

class Author < ActiveRecord::Base
  has_many :posts_authors
  has_many :posts, through: :posts_authors
end

Let's further assume that your client wants to be able to search posts for those that do not have one or more specific authors tied to them. The following query is trivial to write using ActiveRecord and might appear to be the correct solution.

SELECT posts.*
FROM posts
INNER JOIN posts_authors ON posts_authors.post_id = posts.id
INNER JOIN authors ON authors.id = posts_authors.author_id
WHERE authors.id NOT IN <given_author_ids>

However, imagine the situation where the post Rails is the best! has two authors: Emerson Huitt and Mike Simpson. If you search for posts not written by Emerson with the above query, you'll actually see Rails is the best! in your search results. Why? The row in the posts_authors join table that's tied to the post "Rails is the best!" and the author "Mike Simpson" satisfies the condition where the user is not Emerson Huitt, giving us a false positive in our results.

Luckily, there are a few different ways we can approach this problem. In this post, we'll be using set subtraction.

If we were writing raw SQL for, say, an Oracle database, we could do the following:

SELECT *
FROM posts

MINUS

SELECT *
FROM posts
INNER JOIN authors ON authors.post_id = posts.id
WHERE authors.id IN <given_author_ids>

Looking over the Arel documentation, it may not be immediately apparent how to accomplish this. Luckily, though, instances of Arel::SelectManager have the except method, which we can utilize to get our desired query.

First, we want to store the Arel tables of our two models as variables for convenience and reusability.

posts = Post.arel_table
authors = Author.arel_table

Next, we need to figure out how to get the ids of the posts that are tied to the author ids that we want to exclude.

unwanted_post_ids = Post
                      .joins(:authors)
                      .select(posts[:id])
                      .where(authors[:id].in(unwanted_author_ids))

Here, we're doing an INNER JOIN between the posts and authors tables and selecting posts.id for all posts tied to our unwanted authors. Now, we just need to subtract our unwanted posts from the rest of the rows in the posts table using except.

target_post_ids = posts
                    .project(posts[:id])
                    .except(unwanted_post_ids)

Now that we have our Arel set up, we just need to pass it along to ActiveRecord to get back our ActiveRecord::Relation of posts.

Post.where(
  posts[:id].in(target_post_ids)
)

VoilĂ ! We now have all of the posts in our database that have not been tied to our given authors packaged up into a neat ActiveRecord::Relation.

Learning More About Arel


This brings our series on Arel to a close. Hopefully, these last few posts have shown that Arel can be a hugely powerful asset in the Rails toolbox. If you'd like to learn more about the features of Arel not covered in these posts or the Arel README, I'd encourage you to dive into the source code itself. The tests give some example use cases while offering a relatively easy way to see some of the different methods that might be of use to you, such as lteq_any or does_not_match_all. If you haven't already been exposed to the visitor pattern, you may want to familiarize yourself with it first to have a better understanding of Arel's underlying structure.

Happy hacking!


Andy Andrea

Andy is a developer at SciMed Solutions.