If I try to get an average of c(NA, NA, 3, 4, 5, 6, 7, 8, 9, 10)
using AVG
from SQL, I get a value of 5.2, instead of the expected 6.5.
# prepare data and write to file
write.table(data.frame(col1 = c(NA, NA, 3:10)),
"my.na.txt", row.names = FALSE)
mean(c(NA, NA, 3:10), na.rm = TRUE) # 6.5
my.na <- read.csv.sql("my.na.txt", sep = " ",
sql = "SELECT AVG(col1) FROM file") # 5.2
# this is identical to
sum(3:10)/10
unlink("my.na.txt") # remove file
Which leads me to believe that sql(df) treats NA values as zero. Is it possible to ignore (exclude) NA values in an SQL call as it can be done using na.rm
argument (in R)?
?sqldf
says: "ThedbWriteTable/sqliteImportFile
routines thatsqldf
uses to transfer files to the data base are intended for speed and they are not as flexible asread.table
." Although perhaps not the ideal solution, thefilter
argument ofread.csv.sql
can be used to filter out theNA
lines before they are read into SQLite. There are some examples of thefilter
argument in?sqldf
and on the sqldf home page: sqldf.googlecode.com . – Gama