Read SPSS file into R
Asked Answered
V

15

58

I am trying to learn R and want to bring in an SPSS file, which I can open in SPSS.

I have tried using read.spss from foreign and spss.get from Hmisc. Both error messages are the same.

Here is my code:

## install.packages("Hmisc")
library(foreign)

## change the working directory
getwd()
setwd('C:/Documents and Settings/BTIBERT/Desktop/')

## load in the file
## ?read.spss
asq <- read.spss('ASQ2010.sav', to.data.frame=T)

And the resulting error:

Error in read.spss("ASQ2010.sav", to.data.frame = T) : error reading system-file header In addition: Warning message: In read.spss("ASQ2010.sav", to.data.frame = T) : ASQ2010.sav: position 0: character `\000' (

Also, I tried saving out the SPSS file as a SPSS 7 .sav file (was previously using SPSS 18).

Warning messages: 1: In read.spss("ASQ2010_test.sav", to.data.frame = T) : ASQ2010_test.sav: Unrecognized record type 7, subtype 14 encountered in system file 2: In read.spss("ASQ2010_test.sav", to.data.frame = T) : ASQ2010_test.sav: Unrecognized record type 7, subtype 18 encountered in system file

Villanelle answered 28/6, 2010 at 21:30 Comment(1)
Regarding the last effort: It was only a warning, not an error and so you should have gotten useful results.Piliform
M
46

I had a similar issue and solved it following a hint in read.spss help. Using package memisc instead, you can import a portable SPSS file like this:

data <- as.data.set(spss.portable.file("filename.por"))

Similarly, for .sav files:

data <- as.data.set(spss.system.file('filename.sav'))

although in this case I seem to miss some string values, while the portable import works seamlessly. The help page for spss.portable.file claims:

The importer mechanism is more flexible and extensible than read.spss and read.dta of package "foreign", as most of the parsing of the file headers is done in R. They are also adapted to load efficiently large data sets. Most importantly, importer objects support the labels, missing.values, and descriptions, provided by this package.

Murmansk answered 14/9, 2012 at 15:10 Comment(3)
is there something similar to convert.factors = FALSE or use.value.labels = FALSE available for this?Askari
"I had a similar issue and solved it following a hint in read.spss help": What was the hint in read.spss that helped you solve the issue?Charliecharline
@Charliecharline simply this in the See Also section: A different interface also based on the PSPP codebase is available in package memisc: see its help for spss.system.file.Murmansk
U
18

The read.spss seems to be outdated a little bit, so I used package called memisc.

To get this to work do this:

install.packages("memisc")
data <- as.data.set(spss.system.file('yourfile.sav'))
Uranography answered 29/9, 2013 at 7:11 Comment(1)
I am more familiar with data.frames so I wrapped it with data <- data.frame(as.data.set(spss.system.file('yourfile.sav')))Corky
Q
11

You may also try this:

setwd("C:/Users/rest of your path")

library(haven)
data <- read_sav("data.sav")

and if you want to read all files from one folder:

temp <- list.files(pattern = "*.sav")
read.all <- sapply(temp, read_sav)
Quench answered 19/1, 2016 at 13:12 Comment(3)
What if you want to read all files from a folder with a specific start of a name?Veratridine
You can try this for example to extract only those files that start with "Session1" or "Session2" or "Session3" : temp <- temp[grepl("Session(1|2|3)", temp)]. Put it just before read.allQuench
Can I do something like temp <- list.files(pattern = "Session(1|2|3)*.sav") read.all <- sapply(temp, read_sav)Veratridine
V
9

I know this post is old, but I also had problems loading a Qualtrics SPSS file into R. R's read.spss code came from PSPP a long time ago, and hasn't been updated in a while. (And Hmisc's code uses read.spss(), too, so no luck there.)

The good news is that PSPP 0.6.1 should read the files fine, as long as you specify a "String Width" of "Short - 255 (SPSS 12.0 and earlier)" on the "Download Data" page in Qualtrics. Read it into PSPP, save a new copy, and you should be in business. Awkward, but free.

alt text,

Vacuva answered 11/12, 2010 at 3:35 Comment(1)
Regarding the qualtrics import issue, which is why I ended up here. I wrote the following functions to import CSV output from Qualtrics into R: gist.github.com/jeromyanglim/8b3e55cb06628fee9776ae897fe987e9Chiller
C
6

You can read SPSS file from R using above solutions or the one you are currently using. Just make sure that the command is fed with the file, that it can read properly. I had same error and the problem was, SPSS could not access that file. You should make sure the file path is correct, file is accessible and it is in correct format.

library(foreign)
asq <- read.spss('ASQ2010.sav', to.data.frame=TRUE)

As far as warning message is concerned, It does not affect the data. The record type 7 is used to store features in newer SPSS software to make older SPSS software able to read new data. But does not affect data. I have used this numerous times and data is not lost.

You can also read about this at http://r.789695.n4.nabble.com/read-spss-warning-message-Unrecognized-record-type-7-subtype-18-encountered-in-system-file-td3000775.html#a3007945

Charliecharline answered 21/9, 2014 at 20:6 Comment(0)
F
5

It looks like the R read.spss implementation is incomplete or broken. R2.10.1 does better than R2.8.1, however. It appears that R gets upset about custom attributes in a sav file even with 2.10.1 (The latest I have). R also may not understand the character encoding field in the file, and in particular it probably does not work with SPSS Unicode files.

You might try opening the file in SPSS, deleting any custom attributes, and resaving the file. You can see whether there are custom attributes with the SPSS command

display attributes.

If so, delete them (see VARIABLE ATTRIBUTE and DATAFILE ATTRIBUTE commands), and try again.

HTH, Jon Peck

Fleetwood answered 28/6, 2010 at 23:2 Comment(2)
Used display attributes. in a syntax file and I get a warning telling me there are no attributes to display. Some of the string fields are 2000 length, should that matter?Villanelle
Hi Jon - I tried Deleting the attributes with the following syntax. Did have a few, removed them, still no luck *//// Display the Custom attributes. DISPLAY ATTRIBUTES. *//// DELETE all for every variable with custom attributes. VARIABLE ATTRIBUTE VARIABLES=GameID TO OUTCOME_DIFF DELETE=$ODBC.Name. VARIABLE ATTRIBUTE VARIABLES=GameID TO OUTCOME_DIFF DELETE=$ODBC.Table. VARIABLE ATTRIBUTE VARIABLES=GameID TO OUTCOME_DIFF DELETE=$ODBC.Size. VARIABLE ATTRIBUTE VARIABLES=GameID TO OUTCOME_DIFF DELETE=$ODBC.Type. *//// Confirm that they are removed and manually save. DISPLAY ATTRIBUTES.Villanelle
L
2

If you have access to SPSS, save file as .csv, hence import it with read.csv or read.table. I can't recall any problem with .sav file importing. So far it was working like a charm both with read.spss and spss.get. I reckon that spss.get will not give different results, since it depends on foreign::read.spss

Can you provide some info on SPSS/R/Hmisc/foreign version?

Lavonda answered 28/6, 2010 at 23:38 Comment(5)
2.10.1 - updated all packages. I am getting the SPSS file from another software.Villanelle
Which one, if we may know? I'm not interested in particular SPSS version "per se", I just need to know is it SPSS or PASW file... maybe it has something to do with changes made latest releases... Dunno... =(Lavonda
PASW - created from survey software Qualtrics and modified/saved in PASW 18Villanelle
Try saving the file in SPSS compatible version... if applicable. Then import it with read.spss. BTW, you can always export it as a plain text (ASCII) file.Lavonda
It seems that there's no support for PASW files in R whatsoever... You're not alone: mail-archive.com/[email protected]/msg04648.htmlLavonda
A
2

Another solution not mentioned here is to read SPSS data in R via ODBC. You need:

  1. IBM SPSS Statistics Data File Driver. Standalone driver is enough.
  2. Import SPSS data using RODBC package in R.

See the example here. However I have to admit that, there could be problems with very big data files.

Attractive answered 3/3, 2013 at 8:42 Comment(0)
T
2

For me it works well using memisc!

install.packages("memisc")
load('memisc')
Daten.Februar <-as.data.set(spss.system.file("NPS_Februar_15_Daten.sav"))
names(Daten.Februar)
Treadle answered 22/9, 2015 at 10:10 Comment(0)
C
2

I agree with @SDahm that the haven package would be the way to go. I myself have struggled a bit with string values when starting to use it, so I thought I'd share my approach on that here, too.

The "semantics" vignette has some useful information on this topic.

library(tidyverse)
library(haven)

# Some interesting information in here
vignette('semantics')

# Get data from spss file
df <- read_sav(path_to_file)

# get value labels
df <- map_df(.x = df, .f = function(x) {
  if (class(x) == 'labelled') as_factor(x)
  else x})
# get column names
colnames(df) <- map(.x = spss_file, .f = function(x) {attr(x, 'label')})
Carbohydrate answered 3/4, 2018 at 17:0 Comment(1)
I really like your method @Carbohydrate of getting the labels for the levels of the categorical variables, but to get mine to work I had to change "labelled" to "haven_labelled" and as_factor to labelled::to_factor. Also, for assigning the column names, what does spss_file refer to in your code?Medea
H
1

There is no such problem with packages you are using. The only requirement for read a spss file is to put the file into a PORTABLE format file. I mean, spss file have *.sav extension. You need to transform your spss file in a portable document that uses *.por extension.

There is more info in http://www.statmethods.net/input/importingdata.html

Hardigg answered 28/12, 2011 at 1:53 Comment(0)
C
1

In my case this warning was combined with a appearance of a new variable before first column of my data with values -100, 2, 2, 2, ..., a shift in the correspondence between labels and values and the deletion of the last variable. A solution that worked was (using SPSS) to create a new dump variable in the last column of the file, fill it with random values and execute the following code: (filename is the path to the sav file and in my case the original SPSS file had 62 columns, thus 63 with the additional dumb variable)

library(memisc)
data <- as.data.set(spss.system.file(filename))

copyofdata = data
for(i in 2:63){
  names(data)[i] <- names(copyofdata)[i-1]
}
data[[1]] <- NULL

newcopyofdata = data
for(i in 2:62){
  labels(data[[i]]) <- labels(newcopyofdata[[i-1]])
}
labels(data[[1]]) <- NULL

Hope the above code will help someone else.

Claud answered 19/1, 2015 at 6:58 Comment(0)
T
1

Turn your UNICODE in SPSS off

Open SPSS without any data open and run the code below in your syntax editor

SET UNICODE OFF.

Open the data set and resave it to remove the Unicode

read.spss('yourdata.sav', to.data.frame=T) works correctly then

Topliffe answered 15/8, 2016 at 10:59 Comment(0)
S
1

I just came came across an SPSS file that I couldn't get open using haven, foreign, or memisc, but readspss::read.por did the trick for me:

download.file("http://www.tcd.ie/Political_Science/elections/IMSgeneral92.zip",
              "IMSgeneral92.zip")

unzip("IMSgeneral92.zip", exdir = "IMSgeneral92")

# rio, haven, foreign, memisc pkgs don't work on this file! But readspss does:
if(!require(readspss)) remotes::install_git("https://github.com/JanMarvin/readspss.git")
ims92 <- readspss::read.por("IMSgeneral92/IMS_Nov7 92.por", convert.factors = FALSE)

Nice! Thanks, @JanMarvin!

Shandy answered 25/3, 2021 at 21:1 Comment(0)
C
0

1)

I've found the program, stat-transfer, useful for importing spss and stata files into R.

It resolves the issue you mention by converting spss to R dataset. Also very useful for subsetting super large datasets into smaller portions consumable by R. Not free, but a very useful tool for working with datasets from different programs -- especially if you don't have access to them.

2)

Memisc package also has an spss function worth trying.

Cranwell answered 29/6, 2010 at 0:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.