Changing table name at query run time in a Rails application
Asked Answered
N

2

6

I have a fat multi-tenant Rails app on Apache + mod_passenger that outputs product prices from a PostgreSQL table as follows:

Table "public.products"
Column | Type
id     | bigint
name   | character varying(100)
price  | numeric(8,2)

Then inside products.rb I have...

class Product < PostgresDatabase
     self.table_name = "products"

     # ... yadda yadda

end

What I want is to partition the "products" table in a very specific way so that I end up with something like products_TENANT-ID for each tenant (basically views of the main products table but that's another story) and be able to query like this:

Products.for_tenant(TENANT-ID).where(:name => "My product")......

I figure I can just create a method:

class Product < PostgresDatabase
     self.table_name = "products"

     # ... yadda yadda
     def for_tenant(tid)
          self.table_name = "products_" + tid.to_s
          self
     end
end

But what kind of impact could this have on the application considering there's lots of traffic (thousands of requests per second)? Is there something I am missing? Should I try a different strategy?

Thank you very much for any feedback/thoughts!

Nursemaid answered 27/9, 2018 at 13:18 Comment(0)
S
12

The method

def self.for_tenant(tid)
  self.table_name = "products_" + tid.to_s
  self
end

makes sense, however, it has a side effect: it changes table name for Product class. When this class is used later in the same request, for example, in this way:

Product.where(name: "My other product") ...

the table name won't be products as you may expect; it will stay the same as changed by for_tenant method previously.

To avoid this ambiguity and keep code clean you can use another strategy:

1) Define a module which holds all logic of work with tenant partitions:

# app/models/concerns/partitionable.rb

module Partitionable
  def self.included(base)
    base.class_eval do
      def self.tenant_model(tid)
        partition_suffix = "_#{tid}"

        table = "#{table_name}#{partition_suffix}"

        exists = connection.select_one("SELECT EXISTS (SELECT 1 FROM pg_tables WHERE schemaname = 'public' AND tablename = '#{table}')")
        unless exists['exists'] == 't' || exists['exists'] == true  # different versions of pg gem give different answers
          return self # returning original model class
        end

        class_name = "#{name}#{partition_suffix}"

        model_class = Class.new(self)

        model_class.define_singleton_method(:table_name) do
          table
        end

        model_class.define_singleton_method(:name) do
          class_name
        end

        model_class
      end
    end
  end
end

2) Include this module in your model class:

class Product < PostgresDatabase
  include Partitionable

  ...
end

3) Use it the same way as you intended:

Product.tenant_model(TENANT_ID).where(name: "My product")...

What's happened there:

Method tenant_model(TENANT_ID) creates another model class for the tenant with ID TENANT_ID. This class has name Product_<TENANT_ID>, works with table products_<TENANT_ID> and inherits all methods of Product class. So it could be used like a regular model. And class Product itself remains untouched: its table_name is still products.

Socialistic answered 27/9, 2018 at 14:52 Comment(12)
Hi Ilya. Very nice answer, thank you for that. I do not mind if the table name stays changed within the same request since this same request comes from an user of a specific tenant (website). It will only become a problem if table_name gets set in one request and stays the same for other requests(users). I plan on refactoring all controllers to use Products.for_tenant(TENANT-ID) anyway. Should I still implement your module in this particular scenario, or can I just go with my original idea? Basically this reduces the question to determining the exact way self.table_name behaves vs requests.Nursemaid
You decide :) My work experience tells me that methods with side effects usually are not a good idea. If you don't like to use this module, and pretty sure that user request should work with one tenant only, there is another strategy: define a filter for you base controller, which will set table name for Product class, and use this class later without any additional methods. Thus all controllers inherited from that base controller will have Product class tuned for that particular tenant.Socialistic
Thanks Ilya, you are awesome! Too bad SO doesn't allow sending people beer!Nursemaid
Hope one day it will be implemented :) Thanks!Socialistic
Very doubtful, they're all too stiff. Someone should start a clone with more elaborate user incentive features. I even had this on a to-do list at some point.Nursemaid
Yup. Actually I use similar module in my project, but in my case I create a table in case it doesn't exist. Just a sec, I'll correct my answer.Socialistic
This answer should be in a hall of fame or something.Nursemaid
I did a "return self if !PostgresDatabase.connection.table_exists? table". In R5 it would be like data_source_exists I think.Nursemaid
Yes, it works too. I wrote that check in a bit more complex way since my code should work with different versions of Rails.Socialistic
Yours does an extra query per hit?Nursemaid
You mean 'select_one' which check if the table exists? Yes, I do. Otherwise there is no way to know if there is a table or not. To reduce these checks I call tenant_model method only once, store model class to a variable and use it later where required, with all where queries.Socialistic
Hey @Ilya :) I hope you are doing well! Just dropping another quick thank you for this answer, I've been using this in prod for almost 8 years and it works great. Thanks again!Nursemaid
E
1

This is an elegant solution, and I've been using it in some places for a while now, but it may be the root cause of some production issues I faced Friday. (This sucks...because I like it for DB performance improvements when using partitioned tables! It is faster in postgres in my experience to use partition_name.table_name.thing than to use table_name.thing where partition_key=X.)

For models that are referenced a lot, creating a new class is a relatively expensive operation. Still quick -- averaged 0.0016659964 per creation and an un-sent-to-do .find_by() versus the non-subclassed version which was 0.000468873 seconds. My theory, though, is that doing 100s of thousands of these led to significantly higher CPU load than normal.

Earthly answered 29/7 at 18:3 Comment(1)
You are right, it can get a bit on the expensive side. I haven't had any issues yet but I'm known for throwing money at infra rather than time at code :)Nursemaid

© 2022 - 2024 — McMap. All rights reserved.