How to convert sas7bdat file to csv?
Asked Answered
W

6

12

I want to convert a .sas7bdat file to a .csv/txt format so that I can upload it into a hive table. I'm receiving the .sas7bdat file from an outside server and do not have SAS on my machine.

Winslow answered 23/10, 2014 at 16:17 Comment(3)
What have you done so far?Easily
It's very difficult to retrieve the data from a sas7bdat file without having SAS installed on your machine. Can you get the data in a different format, or transfer it to a computer or server that does have SAS installed?Deathful
This isn't possible without a tool of some sort. SAS7BDAT is a closed format, and only a few people have reverse engineered it.Alansen
T
15

Use one of the R foreign packages to read the file and then convert to CSV with that tool.

http://cran.r-project.org/doc/manuals/R-data.pdf Pg 12

Using the SAS7BDAT package instead. It appears to ignore custom formatted, reading the underlying data.

In SAS:

proc format;
value agegrp
   low - 12 = 'Pre Teen'
   13 -15 = 'Teen'
   16 - high = 'Driver';
run;

libname test 'Z:\Consulting\SAS Programs';

data test.class;
set sashelp.class;
age2=age;
format age2 agegrp.;
run;

In R:

 install.packages(sas7bdat)
 library(sas7bdat)
 x<-read.sas7bdat("class.sas7bdat", debug=TRUE)
 x  
Thatcher answered 23/10, 2014 at 18:49 Comment(3)
What happens to custom-formatted variables in the imported SAS dataset when using this approach? Does R just see the underlying values?Orvieto
Yes, modified the answer above to demonstrate that.Thatcher
github.com/hadley/haven is now a much faster alternative to sas7bdat packageDiphyllous
S
9

The python package sas7bdat, available here, includes a library for reading sas7bdat files:

from sas7bdat import SAS7BDAT
with SAS7BDAT('foo.sas7bdat') as f:
    for row in f:
        print row

and a command-line program requiring no programming

$ sas7bdat_to_csv in.sas7bdat out.csv
Sinasinai answered 11/9, 2017 at 21:11 Comment(1)
The dtype information is lost with this (the metadata/header like type, length, label, etc.). All numbers show up as floats.Theatrics
D
4

I recently wrote this package that allows you convert sas7bdat to csv using Hadoop/Spark. It's able to split giant sas7bdat file thus achieving high parallelism. The parsing also uses parso as suggested by @Ashpreet

https://github.com/saurfang/spark-sas7bdat

Diphyllous answered 14/5, 2015 at 21:50 Comment(0)
O
2

If this is a one-off, you can download the SAS system viewer for free from here (after registering for an account, which is also free):

http://support.sas.com/downloads/package.htm?pid=176

You can then open the sas dataset using the viewer and save it as a csv file. There is no CLI as far as I can tell, but if you really wanted to you could probably write an autohotkey script or similar to convert SAS datasets to csv.

It is also possible to use the SAS provider for OLE DB to read SAS datasets without actually having SAS installed, and that's available here:

http://support.sas.com/downloads/browse.htm?fil=0&cat=64

However, this is rather complicated - some documentation is available here if you want to get an idea:

http://support.sas.com/documentation/cdl/en/oledbpr/59558/PDF/default/oledbpr.pdf

Orvieto answered 23/10, 2014 at 21:7 Comment(2)
Here is a description on how to view data using powershell, so I would think it is possible to use the same approach to export to CSV: blogs.sas.com/content/sasdummy/2012/04/12/…Rheo
Thanks, this helped me test the data, as I could not originally view the sas fileWinslow
W
2

Thanks for your help. I ended us using the parso utility in java and it worked like a charm. The utility returns the rows as object arrays which i wrote into a text file.

I referred to the utility from: http://lifescience.opensource.epam.com/parso.html

Winslow answered 9/11, 2014 at 8:43 Comment(0)
P
0

For the sake of completion and complementing Andrew's Python answer, pandas also allows you to read SAS files using the function read_sas. You just need to pass the proper parameters (including the encoding), for example:

import pandas as pd
import pathlib

path_file = pathlib.Path('..', 'data', 'data_file.sas7bdat')

df = pd.read_sas(path_file, format='sas7bdat', encoding='utf-8')

Afterwards, you can save the DataFrame to pretty much any format you wish, including .csv.

I have never had issues with this approach, but be aware that some users have reported some bugs with it.

Pagoda answered 16/4 at 13:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.