Generate CROSS JOIN queries with dbplyr
Asked Answered
B

1

6

Given 2 remote tables (simulated with tbl_lazy for this example)

library("dplyr")
library("dbplyr")

t1 <- tbl_lazy(df = iris, src = dbplyr::simulate_mysql())
t2 <- tbl_lazy(df = mtcars, src = dbplyr::simulate_mysql())

How can I perform an actual* cross join between t1 and t2 using R and dbplyr?

* i.e. using CROSS JOIN in the translated SQL query

Note that I know how to perform all the other types of joins, this is precisely about CROSS joins.

I am aware of the following trick:

joined <- t1 %>%
  mutate(tmp = 1) %>%
  full_join(mutate(t2, tmp = 1), by = "tmp") %>%
  select(-tmp)

However

  1. This is ugly (even if it could be hidden in a function)
  2. I would like to take advantage of the highly optimised join capabilities of the DB, so I'd like to pass a real SQL CROSS JOIN. Using show_query(joined) shows that the generated SQL query uses LEFT JOIN.

Sadly, there is no cross_join operator in dplyr and sql_join(t1, t2, type = "cross") does not work either (not implemented for tbls, works only on DB connections).

How can I generate an SQL CROSS JOIN with dbplyr?

Bushbuck answered 16/7, 2019 at 16:40 Comment(0)
O
5

According to the dbplyr NEWS file, since version 1.10, if you use a full_join(..., by = character()), it will "promote" the join to a cross join. This doesn't seem to be documented anywhere else yet, but searching the dbplyr Github repo for "cross" turned it up in both code and the NEWS file.

This syntax does not seem to work for local data frames, only via SQL.

Ortegal answered 16/7, 2019 at 16:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.