Postgres Common Table Expression query with Ruby on Rails
Asked Answered
K

3

8

I'm trying to find the best way to do a Postgres query with Common Table Expressions in a Rails app, knowing that apparently ActiveRecord doesn't support CTEs.

I have a table called user_activity_transitions which contains a series of records of a user activity being started and stopped (each row refers to a change of state: e.g started or stopped).

One user_activity_id might have a lot of couples started-stopped, which are in 2 different rows. It's also possible that there is only "started" if the activity is currently going on and hasn't been stopped. The sort_key starts at 0 with the first ever state and increments by 10 for each state change.

id      to_state     sort_key     user_activity_id    created_at
1       started      0            18                  2014-11-15 16:56:00
2       stopped      10           18                  2014-11-15 16:57:00
3       started      20           18                  2014-11-15 16:58:00
4       stopped      30           18                  2014-11-15 16:59:00
5       started      40           18                  2014-11-15 17:00:00

What I want is the following output, grouping couples of started-stopped together to be able to calculate duration etc.

user_activity_id     started_created_at      stopped_created_at
18                   2014-11-15 16:56:00     2014-11-15 16:57:00
18                   2014-11-15 16:58:00     2014-11-15 16:59:00
18                   2014-11-15 17:00:00     null

The way the table is implemented makes it much harder to run that query but much more flexible for future changes (e.g new intermediary states), so that's not going to be revised.

My Postgres query (and the associated code in Rails):

query = <<-SQL
    with started as (
    select 
        id,
        sort_key,
        user_activity_id,
        created_at as started_created_at
    from
        user_activity_transitions
    where  
        sort_key % 4 = 0
    ), stopped as (
    select 
        id,
        sort_key-10 as sort_key2,
        user_activity_id,
        created_at as stopped_created_at
    from
    user_activity_transitions
    where
        sort_key % 4 = 2
    )
    select
        started.user_activity_id AS user_activity_id,
        started.started_created_at AS started_created_at,
        stopped.stopped_created_at AS stopped_created_at
    FROM
        started
    left join stopped on stopped.sort_key2 = started.sort_key
    and stopped.user_activity_id = started.user_activity_id
SQL

results = ActiveRecord::Base.connection.execute(query)

What it does is "trick" SQL into joining 2 consecutive rows based on a modulus check on the sort key.

The query works fine. But using this raw AR call annoys me, especially since what connection.execute returns is quite messy. I basically need to loop through the results and put it in the right hash.

2 questions:

  1. Is there a way to get rid of the CTE and run the same query using Rails magic?
  2. If not, is there a better way to get the results I want in a nice-looking hash?

Bear in mind that I'm quite new to Rails and not a query expert so there might be an obvious improvement...

Thanks a lot!

Kabyle answered 15/12, 2014 at 19:46 Comment(2)
You can use postgres_ext to write CTEs with ActiveRecord see: github.com/dockyard/postgres_ext/blob/master/docs/…Froehlich
> If not, is there a better way to get the results I want in a nice-looking hash As it correctly Pospiszil wrote, you should rewrite the sql with arel.Atalya
G
3

I'm trying to find the best way to do a Postgres query with Common Table Expressions in a Rails app, knowing that apparently ActiveRecord does support CTEs.

As far as I know ActiveRecord doesn't support CTE. Arel, which is used by AR under the hood, supports them, but they're not exposed to AR's interface.

Is there a way to get rid of the CTE and run the same query using Rails magic?

Not really. You could write it in AR's APIs but you'd just write the same SQL split into a few method calls.

If not, is there a better way to get the results I want in a nice-looking hash?

I tried to run the query and I'm getting the following which seems nice enough to me. Are you getting a different result?

[
  {"user_activity_id"=>"18", "started_created_at"=>"2014-11-15 16:56:00", "stopped_created_at"=>"2014-11-15 16:57:00"},
  {"user_activity_id"=>"18", "started_created_at"=>"2014-11-15 16:58:00", "stopped_created_at"=>"2014-11-15 16:59:00"},
  {"user_activity_id"=>"18", "started_created_at"=>"2014-11-15 17:00:00", "stopped_created_at"=>nil}
]

I assume you have a model called UserActivityTransition you use for manipulating the data. You can use the model to get the results as well.

results = UserActivityTransition.find_by_sql(query)
results.size # => 3
results.first.started_created_at # => 2014-11-15 16:56:00 UTC

Note that these "virtual" attributes will not be visible when inspecting the result but they're there.

Geo answered 15/12, 2014 at 20:31 Comment(3)
Thanks - I misspelt "does support CTEs", I meant doesn't. Do you mean that I could use Arel in any way to rewrite the query? My array is slightly different from yours: I get duplicate results e.g {"user_activity_id"=>18, "started_created_at"=>"2014-11-15 16:56:50.136922", "stopped_created_at"=>"2014-11-15 16:57:02.552191", 0=>18, 1=>"2014-11-15 16:56:50.136922", 2=>"2014-11-15 16:57:02.552191"}. I can live with it but it looked strange to have twice the results.Kabyle
@Kabyle You could use Arel but honestly I don't think it would be much of an improvement as Arel can be quite verbose too and I generally find raw SQL more readable for longer queries. You can of course try, this should help. Did you try to use the model directly? It seems to me like the best way so far.Nightrider
Thanks a lot! I'll check your link but probably going to do as you advise and stick with the raw SQL.Kabyle
G
5

While Rails does not directly support CTEs, you can emulate a single CTE and still take advantage of ActiveRecord. Instead of a CTE, use a from subquery.

Thing
  .from(
    # Using a subquery in place of a single CTE
    Thing
      .select(
        '*',
        %{row_number() over(
            partition by
              this, that
            order by
              created_at desc
          ) as rank
        }
      )
    :things
  )
  .where(rank: 1)

This is not exactly the same as, but equivalent to...

with ranked_things as (
  select
    *,
    row_number() over(
      partition by
        this, that
      order by
        created_at desc
    ) as rank
)
select *
from ranked_things
where rank = 1
Gefell answered 31/3, 2020 at 23:8 Comment(0)
G
3

I'm trying to find the best way to do a Postgres query with Common Table Expressions in a Rails app, knowing that apparently ActiveRecord does support CTEs.

As far as I know ActiveRecord doesn't support CTE. Arel, which is used by AR under the hood, supports them, but they're not exposed to AR's interface.

Is there a way to get rid of the CTE and run the same query using Rails magic?

Not really. You could write it in AR's APIs but you'd just write the same SQL split into a few method calls.

If not, is there a better way to get the results I want in a nice-looking hash?

I tried to run the query and I'm getting the following which seems nice enough to me. Are you getting a different result?

[
  {"user_activity_id"=>"18", "started_created_at"=>"2014-11-15 16:56:00", "stopped_created_at"=>"2014-11-15 16:57:00"},
  {"user_activity_id"=>"18", "started_created_at"=>"2014-11-15 16:58:00", "stopped_created_at"=>"2014-11-15 16:59:00"},
  {"user_activity_id"=>"18", "started_created_at"=>"2014-11-15 17:00:00", "stopped_created_at"=>nil}
]

I assume you have a model called UserActivityTransition you use for manipulating the data. You can use the model to get the results as well.

results = UserActivityTransition.find_by_sql(query)
results.size # => 3
results.first.started_created_at # => 2014-11-15 16:56:00 UTC

Note that these "virtual" attributes will not be visible when inspecting the result but they're there.

Geo answered 15/12, 2014 at 20:31 Comment(3)
Thanks - I misspelt "does support CTEs", I meant doesn't. Do you mean that I could use Arel in any way to rewrite the query? My array is slightly different from yours: I get duplicate results e.g {"user_activity_id"=>18, "started_created_at"=>"2014-11-15 16:56:50.136922", "stopped_created_at"=>"2014-11-15 16:57:02.552191", 0=>18, 1=>"2014-11-15 16:56:50.136922", 2=>"2014-11-15 16:57:02.552191"}. I can live with it but it looked strange to have twice the results.Kabyle
@Kabyle You could use Arel but honestly I don't think it would be much of an improvement as Arel can be quite verbose too and I generally find raw SQL more readable for longer queries. You can of course try, this should help. Did you try to use the model directly? It seems to me like the best way so far.Nightrider
Thanks a lot! I'll check your link but probably going to do as you advise and stick with the raw SQL.Kabyle
L
0

Now it's possible to use with method. AR query looks like this:

results =
  UserActivityTransition
    .with(started: UserActivityTransition.select('id, sort_key, user_activity_id, created_at AS started_created_at').where('sort_key % 4 = 0'))
    .with(stopped: UserActivityTransition.select('id, sort_key-10 as sort_key2, user_activity_id, created_at AS stopped_created_at').where('sort_key % 4 = 2'))
    .select('started.user_activity_id AS user_activity_id, started.started_created_at AS started_created_at, stopped.stopped_created_at AS stopped_created_at')
    .from('started')
    .joins('LEFT JOIN stopped ON stopped.sort_key2 = started.sort_key AND stopped.user_activity_id = started.user_activity_id')

Let's verify that SQL is correct

puts results.to_sql

Output:

WITH "started" AS (SELECT id, sort_key, user_activity_id, created_at AS started_created_at FROM "user_activity_transitions" WHERE (sort_key % 4 = 0)), "stopped" AS (SELECT id, sort_key-10 as sort_key2, user_activity_id, created_at AS stopped_created_at FROM "user_activity_transitions" WHERE (sort_key % 4 = 2)) SELECT started.user_activity_id AS user_activity_id, started.started_created_at AS started_created_at, stopped.stopped_created_at AS stopped_created_at FROM started LEFT JOIN stopped ON stopped.sort_key2 = started.sort_key AND stopped.user_activity_id = started.user_activity_id

Now it's possible somehow to process data

results.each do |r|
  puts "#{r.user_activity_id} | #{r.started_created_at} | #{r.stopped_created_at}"
end

Output

18 | 2014-11-15 16:56:00 | 2014-11-15 16:57:00
18 | 2014-11-15 16:58:00 | 2014-11-15 16:59:00
18 | 2014-11-15 17:00:00 | 

This query from the question is large enough

More readable example of usage:

Book
  .with(books_with_reviews: Book.where("reviews_count > ?", 0))
  .with(books_with_ratings: Book.where("ratings_count > ?", 0))
  .joins("JOIN books_with_reviews ON books_with_reviews.id = books.id")

# WITH books_with_reviews AS (
#   SELECT * FROM books WHERE (reviews_count > 0)
# ), books_with_ratings AS (
#   SELECT * FROM books WHERE (ratings_count > 0)
# )
# SELECT * FROM books JOIN books_with_reviews ON books_with_reviews.id = books.id
Lustful answered 5/10, 2023 at 22:9 Comment(1)
with is only available in Rails 7.1 and laterPennebaker

© 2022 - 2025 — McMap. All rights reserved.