I want to convert a .sas7bdat file to a .csv/txt format so that I can upload it into a hive table. I'm receiving the .sas7bdat file from an outside server and do not have SAS on my machine.
Use one of the R foreign packages to read the file and then convert to CSV with that tool.
http://cran.r-project.org/doc/manuals/R-data.pdf Pg 12
Using the SAS7BDAT package instead. It appears to ignore custom formatted, reading the underlying data.
In SAS:
proc format;
value agegrp
low - 12 = 'Pre Teen'
13 -15 = 'Teen'
16 - high = 'Driver';
run;
libname test 'Z:\Consulting\SAS Programs';
data test.class;
set sashelp.class;
age2=age;
format age2 agegrp.;
run;
In R:
install.packages(sas7bdat)
library(sas7bdat)
x<-read.sas7bdat("class.sas7bdat", debug=TRUE)
x
The python package sas7bdat
, available here, includes a library for reading sas7bdat files:
from sas7bdat import SAS7BDAT
with SAS7BDAT('foo.sas7bdat') as f:
for row in f:
print row
and a command-line program requiring no programming
$ sas7bdat_to_csv in.sas7bdat out.csv
I recently wrote this package that allows you convert sas7bdat to csv using Hadoop/Spark. It's able to split giant sas7bdat file thus achieving high parallelism. The parsing also uses parso as suggested by @Ashpreet
If this is a one-off, you can download the SAS system viewer for free from here (after registering for an account, which is also free):
http://support.sas.com/downloads/package.htm?pid=176
You can then open the sas dataset using the viewer and save it as a csv file. There is no CLI as far as I can tell, but if you really wanted to you could probably write an autohotkey script or similar to convert SAS datasets to csv.
It is also possible to use the SAS provider for OLE DB to read SAS datasets without actually having SAS installed, and that's available here:
http://support.sas.com/downloads/browse.htm?fil=0&cat=64
However, this is rather complicated - some documentation is available here if you want to get an idea:
http://support.sas.com/documentation/cdl/en/oledbpr/59558/PDF/default/oledbpr.pdf
Thanks for your help. I ended us using the parso utility in java and it worked like a charm. The utility returns the rows as object arrays which i wrote into a text file.
I referred to the utility from: http://lifescience.opensource.epam.com/parso.html
For the sake of completion and complementing Andrew's Python answer, pandas
also allows you to read SAS files using the function read_sas
. You just need to pass the proper parameters (including the encoding), for example:
import pandas as pd
import pathlib
path_file = pathlib.Path('..', 'data', 'data_file.sas7bdat')
df = pd.read_sas(path_file, format='sas7bdat', encoding='utf-8')
Afterwards, you can save the DataFrame to pretty much any format you wish, including .csv
.
I have never had issues with this approach, but be aware that some users have reported some bugs with it.
© 2022 - 2024 — McMap. All rights reserved.