Order of multi column index in rails polymorphic association

Asked 13/2, 2017 at 20:38 Answered 15/2, 2017 at 6:18

ruby-on-rails ruby activerecord polymorphic-associations

I have a table with a polymorphic reference that I generated via the following migration:

def change                  
  add_reference :table_name, :thing, polymorphic: true, index: true
end

When I ran the migration it generated the following:

add_index "workflow_engine_task_bases", ["thing_type", "thing_id"], name: "index_workflow_engine_task_bases_on_thing_type_and_thing_id", using: :btree

Why is the left-most column the thing_type? To me it seems that this is suboptimal as it is less specific.

Marquardt answered 13/2, 2017 at 20:38 Comment(0)

This is the commit in Rails by Derek Prior that updates add_reference to use type before id when generating an index for a polymorphic association. The justification for the change is reproduced below:

Use type column first in multi-column indexes

add_reference can very helpfully add a multi-column index when you use it to add a polymorphic reference. However, the first column in the index is the id column, which is less than ideal.

The [PostgreSQL docs][1] say:

A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns.

The [MySQL docs][2] say:

MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.

In a polymorphic relationship, the type column is much more likely to be useful as the first column in an index than the id column. That is, I'm more likely to query on type without an id than I am to query on id without a type.

[1]: http://www.postgresql.org/docs/9.3/static/indexes-multicolumn.html

[2]: http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html

I think in most scenarios, this order makes sense. It allows you to have a single index that performs well on queries including type and id or just type.

Having said that, your use case may vary depending on the database you use, your dataset, and what queries you plan to run. Your best bet is to profile your most common use cases on a production dump and choose your indexing strategy accordingly.

Birnbaum answered 15/2, 2017 at 6:18 Comment(1)

I'm not clear how to reconcile this with the advice given in Percona’s article, “Choosing Column Order in Indexes,” which is to prioritize the most selective columns (the ones that will most reduce the number of matches) -- in technical terms, the column with highest cardinality. Since the cardinality of thing_type is so low (both absolutely and relative to thing_id), then if we query exclusively by type the index wouldn’t be useful, AFAICT. Better to optimize querying by both. – Lymphoid 30/6, 2021 at 14:26

From http://apidock.com/rails/v4.2.7/ActiveRecord/ConnectionAdapters/SchemaStatements/add_reference.

In the source code for add_reference:

add_index(table_name, polymorphic ? [type id].map{ |t| "#{ref_name}_#{t}" } : "#{ref_name}_id", index_options.is_a?(Hash) ? index_options : {}) if index_options

It's just mapping from an array where type is ahead of id. Not sure why it was designed this way, but it might be better to just use add_index instead.

add_index(:table, [:thing_id, :thing_type])

Terrazas answered 14/2, 2017 at 0:50 Comment(0)

Recommended topics

Hot tags