I'm wondering if anyone knows a Python package that allows you to save numpy arrays/recarrays in the .dta
format of the statistical data analysis software Stata. This would really speed up a few steps in a system I have.
pandas DataFrame objects now have a "to_stata" method. So you can do for instance
import pandas as pd
df = pd.read_stata('my_data_in.dta')
df.to_stata('my_data_out.dta')
DISCLAIMER: the first step is quite slow (in my test, around 1 minute for reading a 51 MB dta - also see this question), and the second produces a file which can be way larger than the original one (in my test, the size goes from 51 MB to 111MB). This answer may look less elegant, but it is probably more efficient.
The scikits.statsmodels package includes a reader for Stata data files, which relies in part on PyDTA as pointed out by @Sven. In particular, genfromdta()
will return an ndarray
, e.g.
from Python 2.7/statsmodels 0.3.1:
>>> import scikits.statsmodels.api as sm
>>> arr = sm.iolib.genfromdta('/Applications/Stata12/auto.dta')
>>> type(arr)
<type 'numpy.ndarray'>
The savetxt()
function can be used in turn to save an array as a text file, which can be imported in Stata. For example, we can export the above as
>>> sm.iolib.savetxt('auto.txt', arr, fmt='%2s', delimiter=",")
and read it in Stata without a dictionary file as follows:
. insheet using auto.txt, clear
I believe a *.dta
reader should be added in the near future.
The only Python library for STATA interoperability I could find merely provides read-only access to .dta
files. The R foreign
library however provides a function write.dta
, and RPy provides a Python interface to R. Maybe the combination of these tools can help you.
pandas DataFrame objects now have a "to_stata" method. So you can do for instance
import pandas as pd
df = pd.read_stata('my_data_in.dta')
df.to_stata('my_data_out.dta')
DISCLAIMER: the first step is quite slow (in my test, around 1 minute for reading a 51 MB dta - also see this question), and the second produces a file which can be way larger than the original one (in my test, the size goes from 51 MB to 111MB). This answer may look less elegant, but it is probably more efficient.
© 2022 - 2024 — McMap. All rights reserved.
.dta
have a common format. This is not true. The file format you are interested in is specific to STATA and doesn't seem to be used in any other software. Here is the documentation of the format, and I very much doubt there exists a library being able to write this format. – Polishodbc
command. – Laverne