There is a solution to read SPSS data file in R by ODBC driver.
1) There is a IBM SPSS Statistics Data File Driver. I could not find the download link. I got it from my SPSS provider. The Standalone Driver is all you need. You do not need SPSS to install or use the driver.
2) Create a DSN for the SPSS data driver.
3) Using RODBC
package you can read in R any SPSS data file. It will be possible to get value labels for each variable as separate tables. Then it is possible to use the labels in R in any way as you wish.
Here is a working example on Windows (I do not have SPSS on my computer now) to read in R your example data file. I have not testted this on Linux. It probably works also on Linux, because there is a SPSS data driver also for Linux.
require(RODBC)
# Create connection
# Change the DSN name and CP_CONNECT_STRING according to your setting
con <- odbcDriverConnect("DSN=spss_ehsis;SDSN=SAVDB;HST=C:\\Program Files\\IBM\\SPSS\\StatisticsDataFileDriver\\20\\Standalone\\cfg\\oadm.ini;PRT=StatisticsSAVDriverStandalone;CP_CONNECT_STRING=C:\\temp\\data_expt.sav")
# List of tables
Tables <- sqlTables(con)
Tables
# List of table names to extract
table.names <- Tables$TABLE_NAME[Tables$TABLE_SCHEM != "SYSTEM"]
# Function to query a table by name
sqlQuery.tab.name <- function(table) {
sqlQuery(con, paste0("SELECT * FROM [", table, "]"))
}
# Retrieve all tables
Data <- lapply(table.names, sqlQuery.tab.name)
# See the data
lapply(Data, head)
# Close connection
close(con)
For example we can that value labels are defined for two variables:
[[5]]
VAR00002 VAR00002_label
1 1 Male
2 2 Female
[[6]]
VAR00003 VAR00003_label
1 2 Student
2 3 Employed
3 4 Unemployed
Additional information
Here is a function that allows to read SPSS data after the connection has been made to the SPSS data file. The function allows to specify the list of variables to be selected. If value.labels=T
the selected variables with value labels in SPSS data file are converted to the R factors with labels attached.
I have to say I am not satisfied with the performance of this solution. It work good for small data files. The RAM limit is reached quite often for large SPSS data files (even the subset of variables is selected).
get.spss <- function(channel, variables = NULL, value.labels = F) {
VarNames <- sqlQuery(channel = channel,
query = "SELECT VarName FROM [Variables]", as.is = T)$VarName
if (is.null(variables)) variables <- VarNames else {
if (any(!variables %in% VarNames)) stop("Wrong variable names")
}
if (value.labels) {
ValueLabelTableName <- sqlQuery(channel = channel,
query = "SELECT VarName FROM [Variables]
WHERE ValueLabelTableName is not null",
as.is = T)$VarName
ValueLabelTableName <- intersect(variables, ValueLabelTableName)
}
variables <- paste(variables, collapse = ", ")
data <- sqlQuery(channel = channel,
query = paste("SELECT", variables, "FROM [Cases]"),
as.is = T)
if (value.labels) {
for (var in ValueLabelTableName) {
VL <- sqlQuery(channel = channel,
query = paste0("SELECT * FROM [VLVAR", var,"]"),
as.is = T)
data[, var] <- factor(data[, var], levels = VL[, 1], labels = VL[, 2])
}
}
return(data)
}
read.spss
, "There may be attributes "label.table" and "variable.labels". Attribute "label.table" is a named list of value labels with one element per variable, either NULL or a names character vector. Attribute "variable.labels" is a named character vector with names the short variable names and elements the long names." – Necolenecrospss.get()
in theHmisc
package – Interinsurancespss.get()
fromHmisc
package internally usesread.spss()
, so not useful. – Soembaforeign::read.spss("C:/temp/data_expt.sav")
converts the variables with value labels in SPSS data to R factors. I did not know that. – Bullaas.numeric()
andas.character()
on a factor). And this becomes a wrong statement "gives me option to retain either values OR value labels". – Bullaas.numeric(factor(4:6)) == 1:3
. – Bulla