Function that composes functions with existing sql translations in dbplyr
Asked Answered
N

2

6

This question arises because I wish to make a function for my convenience:

as.numeric_psql <- function(x) {

   return(as.numeric(as.integer(x)))
}

to convert boolean values in a remote postgres table into numeric. The step to convert to integer is needed as:

There is no direct cast defined between numeric and boolean. You can use integer as middle-ground. (https://mcmap.net/q/758270/-cannot-cast-type-numeric-to-boolean)

Of course this function works as expected locally:

copy_to(con_psql, cars, 'tmp_cars')

tmp_cars_sdf <-
    tbl(con_psql, 'tmp_cars')


tmp_cars_sdf %>%
    mutate(low_dist = dist < 5) %>%
    mutate(low_dist = as.numeric(as.integer(low_dist)))

# # Source:   lazy query [?? x 3]
# # Database: postgres 9.5.3
#     speed  dist low_dist
#     <dbl> <dbl>    <dbl>
# 1     4     2        1
# 2     4    10        0
# 3     7     4        1
# 4     7    22        0
# 5     8    16        0

cars %>%
    mutate(low_dist = dist < 5) %>%
    mutate(low_dist = as.numeric_psql(low_dist)) %>%
    head(5)

#   speed dist low_dist
# 1     4    2        1
# 2     4   10        0
# 3     7    4        1
# 4     7   22        0
# 5     8   16        0

However, it doesn't work when used on the remote data frame, since as.numeric_psql is not in the list of sql translations, so is passed to the query verbatim:

> tmp_cars_sdf %>%
+     mutate(low_dist = dist < 5) %>%
+     mutate(low_dist = as.numeric_psql(low_dist))
Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : ERROR:  syntax error at or near "as"
LINE 1: SELECT "speed", "dist", as.numeric_psql("low_dist") AS "low_...
                                ^
)

My question is whether there exist a easy way (i.e. not defining a custom sql translation) of getting dplyr to understand that the function as.numeric_psql is a composition of functions that have existing sql translations, and to use those translations instead.

Nadda answered 3/10, 2019 at 1:5 Comment(0)
V
2

As long as all functions you are composing have existing translations, the following should work:

  1. Wrap the function calls using dbplyr::translate_sql
  2. Embrace the variable in your function using {{ }}
  3. Unquote your new function call inside mutate with !! to force evaluation
library(DBI)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
con <- dbConnect(RSQLite::SQLite(), ":memory:")

cars <- tibble::tribble( ~ speed, ~ dist, ~ low_dist,
                         4,    2,        1,
                         4,   10,        0,
                         7,    4,        1,
                         7,   22,        0,
                         8,   16,        0)

dbWriteTable(con, "tmp_cars", cars)

tmp_cars_sdf <- tbl(con, "tmp_cars")

as.numeric_psql <- function(x) {
  return(as.numeric(as.integer(x)))
}

as.numeric_psql2 <- function(x) {
  dbplyr::translate_sql(
    as.numeric(
      as.integer({{x}})
      )
    )
}

tmp_cars_sdf %>%
  mutate(low_dist = dist < 5) %>%
  mutate(
    low_dist_old = as.numeric_psql(low_dist),
    low_dist_new = !!as.numeric_psql2(low_dist)
  ) %>% show_query()
#> <SQL>
#> SELECT `speed`, `dist`, `low_dist`, as.numeric_psql(`low_dist`) AS `low_dist_old`, CAST(CAST(`low_dist` AS INTEGER) AS NUMERIC) AS `low_dist_new`
#> FROM (SELECT `speed`, `dist`, `dist` < 5.0 AS `low_dist`
#> FROM `tmp_cars`)

Created on 2021-12-09 by the reprex package (v2.0.1)

Varela answered 10/12, 2021 at 0:53 Comment(0)
B
1

One way to avoid the error is to set up the function to operate on a data frame, rather than inside mutate. For example:

copy_to(con_psql, cars, 'tmp_cars')

tmp_cars_sdf <- tbl(con_psql, 'tmp_cars')

as.numeric_psql <- function(data, x) {
  return(data %>% mutate({{x}} := as.numeric(as.integer({{x}}))))
}

tmp_cars_sdf %>%
  mutate(low_dist = dist < 5) %>%
  as.numeric_psql(low_dist)

#> # Source:   lazy query [?? x 3]
#> # Database: sqlite 3.30.1 [:memory:]
#>    speed  dist low_dist
#>    <dbl> <dbl>    <dbl>
#>  1     4     2        1
#>  2     4    10        0
#>  3     7     4        1
#>  4     7    22        0
#>  5     8    16        0
#>  6     9    10        0
#>  7    10    18        0
#>  8    10    26        0
#>  9    10    34        0
#> 10    11    17        0
#> # … with more rows

Note that in your example, in the database version low_dist already gets coded as integer when it's created, rather than as logical as it would be in a standard R data frame:

tmp_cars_sdf %>%
  mutate(low_dist = dist < 5) 
#> # Source:   lazy query [?? x 3]
#> # Database: sqlite 3.30.1 [:memory:]
#>    speed  dist low_dist
#>    <dbl> <dbl>    <int>
#>  1     4     2        1
#>  2     4    10        0
#>  3     7     4        1
#>  4     7    22        0
#>  5     8    16        0
#>  6     9    10        0
#>  7    10    18        0
#>  8    10    26        0
#>  9    10    34        0
#> 10    11    17        0
#> # … with more rows

cars %>%
  mutate(low_dist = dist < 5) %>% head
#>   speed dist low_dist
#> 1     4    2     TRUE
#> 2     4   10    FALSE
#> 3     7    4     TRUE
#> 4     7   22    FALSE
#> 5     8   16    FALSE
#> 6     9   10    FALSE
Bunting answered 20/5, 2020 at 20:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.