most efficient I/O setup between Stata and Python (Pandas)
Asked Answered
P

1

2

I am using Stata to process some data, export the data in a csv file and load it in Python using the pandas read_csv function.

The problem is that everything is so slow. Exporting from Stata to a csv file takes ages (exporting in the dta Stata format is much faster), and loading the data via read_csv is also very slow. Using the read_stata pandas function is even worse.

I wonder is there are any other options? Like exporting a format other than csv? My csv dataset is approx 6-7 Gb large.

Any help appreciated

Thanks

Pastiness answered 30/4, 2015 at 16:22 Comment(1)
read_stata() is much faster starting with version 15.0 of pandas, so make sure you are up to date.Mikkanen
B
2

Pretty efficient pd.read_stata()/.to_stata(), see here

Burden answered 30/4, 2015 at 17:22 Comment(4)
Thanks jeff but it appears that loading stata large datasets in pandas is even slower than using csv...Mireille
@Noobie make sure you are using pandas 15.0 or higher which is much faster at reading DTA files than version 14 and earlier. That said, I have had some problems with larger stata datasets. e.g. #28748588Mikkanen
you can use chunksize=.. option as of 0.16.0. should be quite efficientBurden
Good point, I did still have that same problem with version 16 but didn't try the chunksize option.Mikkanen

© 2022 - 2024 — McMap. All rights reserved.