NA values using sqldf
Asked Answered
C

3

5

If I try to get an average of c(NA, NA, 3, 4, 5, 6, 7, 8, 9, 10) using AVG from SQL, I get a value of 5.2, instead of the expected 6.5.

# prepare data and write to file
write.table(data.frame(col1 = c(NA, NA, 3:10)),
        "my.na.txt", row.names = FALSE)

mean(c(NA, NA, 3:10), na.rm = TRUE) # 6.5

my.na <- read.csv.sql("my.na.txt", sep = " ",
        sql = "SELECT AVG(col1) FROM file") # 5.2

# this is identical to
sum(3:10)/10

unlink("my.na.txt") # remove file

Which leads me to believe that sql(df) treats NA values as zero. Is it possible to ignore (exclude) NA values in an SQL call as it can be done using na.rm argument (in R)?

Colettecoleus answered 14/1, 2012 at 0:42 Comment(1)
?sqldf says: "The dbWriteTable/sqliteImportFile routines that sqldf uses to transfer files to the data base are intended for speed and they are not as flexible as read.table." Although perhaps not the ideal solution, the filter argument of read.csv.sql can be used to filter out the NA lines before they are read into SQLite. There are some examples of the filter argument in ?sqldf and on the sqldf home page: sqldf.googlecode.com .Gama
C
7

Modify your query to ignore the NA values:

SELECT AVG(col1)
FROM file
WHERE col1 IS NOT \"NA\"
Classicist answered 14/1, 2012 at 0:45 Comment(3)
Hum, I'm using SELECT AVG(col1) FROM file WHERE col1 IS NOT NULL and still get 5.2.Caudal
are the values in the DB N/A or NULL? (select col1 from file where col1 is not null) [do the NA's show up?] or (Select Col1 from file where col1 <> 'NA' [do the NA's show up?]Depredation
Using SELECT AVG(col1) FROM file WHERE col1 IS NOT \"NA\" works fine and dandy. The trick is that NAs are quoted, and you need to escape them. Thank you both for the tip.Caudal
P
2

The problem is that the read.csv.sql function does not recognize the missing values, and converts them to zero, instead of NULL. This does not happen if you first load the data into a data.frame, and only then call sqldf.

d <- read.csv("my.na.txt")
sqldf("SELECT AVG(col1) FROM d") # 6.5
Pardon answered 14/1, 2012 at 1:0 Comment(1)
Loading the data into a data.frame is out of the question in my case.Caudal
O
0

For future reference. I found this thread when researching the issue, a solution can be found in the definition of NA. When transferred to SQLite, NAs are converted to NULL, and then you can use IS NULL or IS NOT NULL to exclude them.

Obeng answered 21/4, 2021 at 12:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.