Unable to use rank() over functions in R using sqldf
Asked Answered
S

1

6
arm<-as.data.frame(matrix(c(1,1,1,2,2,6,7,4,9,10),ncol=2))

colnames(arm)<-c("a","b")

This is a dataset I created in R.

Now I want to rank the column b and group by column a.

The following piece of code is throwing this error, no matter what changes I make to the syntax(like adding [], "", etc...)

Error in sqliteSendQuery(con, statement, bind.data) : error in statement: near "(": syntax error

I was using "sqldf" package.

arm2<-sqldf("select a,
         b,
         rank() over (partition by a order by b) as rank1 
         from arm")

Then I installed the RH2 package and it started to throw the following error:

Error in .verify.JDBC.result(s, "Unable to execute JDBC statement ", statement) : Unable to execute JDBC statement select a, b, rank() over (partition by a order by b) as rank1 from arm (Function "rank" not found; SQL statement: select a, b, rank() over (partition by a order by b) as rank1 from arm [90022-175])

How to use rank() over function of sql in sqldf package of R?

Syllogize answered 2/9, 2015 at 22:44 Comment(7)
Downvote seems a bit harsh. The question outlines the package used, the data used, the code used, and the error messages. Considering some of the crap that gets upvoted here, it shows effort to explain the issue.Zahavi
rsqldf uses sqlite by default, which doesn't have a rank() function. I'm not familiar with RH2, but it seems like ti has the same problem. You can try postgresql following the examples from hereIatrogenic
I don't understand this question. Is this a bug report? If so, please report on GH. Or are you looking for a general solution? In that case, this error should be only relevant in order to show your initial effort, and thus you can use some other alternatives such as arm$rank1 <- unlist(with(arm, tapply(b, a, rank))), for example.Secrecy
@DavidArenburg - unlist(tapply(...)) would be better replaced by ave, so it is isn't dependent on the order of the dataset.Zahavi
@Zahavi yeah, you have a point thereSecrecy
Hi David, I do know how to do with other methods. But the thing I wanted here was "how to do it with sql". Thanks a lot jeremycyg and thelatemail. @DavidArenburgSyllogize
Thanks a lot @IatrogenicSyllogize
I
3

sqldf uses SQLite which does not support the rank() function - see here. From the error message you got from H2, it does not either, though it is currently planned.

sqldf has capability to use PostgreSQL rather than SQLite, which does support rank(): see here for an example. Your code as posted should then work.

If you don;t want to use PostgreSQL, you can get the data out in the right order with SQLite and sqldf using:

sqldf("select a, b from arm 
          order by a, b", drv = "SQLite")

but the ranking column is more difficult - see some related answers: 1, 2, 3

Since you are already in R, you could use dplyr, a native R package:

library(dplyr)
arm %>% group_by(a) %>%
        mutate(rank = rank(b))

Or data.table, a faster alternative:

library(data.table)
setDT(arm)[ , rank := rank(b), by = a]
Iatrogenic answered 3/9, 2015 at 0:3 Comment(1)
Note that if you really want to use SQLite then: a1 <- sqldf("select a, b from arm order by a, b"); min_rowid <- sqldf("select x.*, min(y.rowid) min_rowid from a1 x left join a1 y using(a) group by x.a"); sqldf("select a1.*, a1.rowid - min_rowid + 1 rank from a1 join min_rowid using(a)")Roemer

© 2022 - 2024 — McMap. All rights reserved.