Database calculations with dbplyr
Asked Answered
L

1

5

I have very simple problem that produces error. Example will clear this one.

library(odbc)
library(DBI)
library(dplyr)
library(dbplyr)

con <- dbConnect(odbc(), "myDSN")

tbl_test <- tibble(ID = c("A", "A", "A", "B", "B", "B"),
                   val = c(1, 2, 3, 4, 5, 6),
                   cond = c("H", "H", "A", "A", "A", "H"))

dbWriteTable(con, "tbl_test", tbl_test, overwrite = TRUE)

After writing simple table to DB I add link to table in db and try to use simple conditional sums that work normally. But will face an error.

db_tbl <- tbl(con, in_schema("dbo", "tbl_test"))

db_tbl %>% 
  group_by(ID) %>% 
  summarise(sum = sum(val, na.rm = TRUE),
            count_cond = sum(cond == "H", na.rm=TRUE),
            sum_cond = sum(val == "H", na.rm=TRUE))

Error: <SQL> 'SELECT  TOP 10 "ID", SUM("val") AS "sum", SUM(CONVERT(BIT, IIF("cond" = 'H', 1.0, 0.0))) AS "count_cond", SUM(CONVERT(BIT, IIF("val" = 'H', 1.0, 0.0))) AS "sum_cond"
FROM dbo.tbl_test
GROUP BY "ID"'
  nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Operand data type bit is invalid for sum operator.

I'm no expert, but feels like SQL can't understand TRUE as 1 and for that reason can't calculate sums. Is there away around this, as alot of times I face some kind of conditions. Below is just code for normal tibble to show that they should work.

tbl_test %>% 
  group_by(ID) %>% 
  summarise(sum = sum(val),
            count_cond = sum(cond == "H"),
            sum_cond = sum(val[cond == "H"]))

# A tibble: 2 x 4
  ID      sum count_cond sum_cond
  <chr> <dbl>      <int>    <dbl>
1 A        6.          2       3.
2 B       15.          1       6.

I Understand that this might not be reproducible example, as not everyone have DB connection available.

Lutist answered 1/4, 2018 at 11:35 Comment(8)
you could use mutate and ifelse before summarize to create a column of 1s and 0s to sum on afterwardsClubman
here are the translations you want to get to https://stackoverflow.com/questions/582637/sql-server-equivalent-of-a-countif-aggregate-function, so use ifelse or casewhen, it seem you were right and you can't sum booleans with SQL ServerClubman
I can't test now but what does this return ? sum_cond = sum(ifelse(cond == "H",1,0),na.rm=TRUE)Clubman
That will give me number of conditions each group, so it works. but conditional sums with other columns still won't. I think it could be solved with filter and then join, but feels like there might be easier way as well. Ooh, just use ifelse to refer to val column. Thanks.Lutist
Do you mean that your example case would be solved by this but your real case is more complex that the one posted ?Clubman
db_tbl %>% group_by(ID) %>% summarise(sum = sum(val, na.rm=TRUE), count_cond = sum(ifelse(cond == "H",1,0),na.rm=TRUE), sum_cond = sum(ifelse(cond == "H",val,0),na.rm=TRUE)), I think this will do for now, using ifelse. Post this as answer and will approve it.Lutist
Normally you can replace any condition by ifelse(condition,1,0) and i think it should workClubman
what I posted few comment ago works and gives expected result.Lutist
C
7

SQL server can't sum booleans (it doesn't coerce TRUE to 1).

So you have to manually convert them, and one way is to use ifelse, your code becomes:

db_tbl %>%
  group_by(ID) %>% 
  summarise(sum = sum(val, na.rm=TRUE), 
            count_cond = sum(ifelse(cond == "H",1,0),na.rm=TRUE),
            sum_cond = sum(ifelse(cond == "H",val,0),na.rm=TRUE))
Clubman answered 1/4, 2018 at 12:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.