Save .dta files in python
Asked Answered
M

3

13

I'm wondering if anyone knows a Python package that allows you to save numpy arrays/recarrays in the .dta format of the statistical data analysis software Stata. This would really speed up a few steps in a system I have.

Morrismorrison answered 21/9, 2011 at 16:42 Comment(6)
What exactly is an .dta file supposed to be?Polish
A .dta file is a file format associated with data, primarily used for the statistical computing program STATA. I don't know enough about file type to elaborate, but there might be more detail here: filext.com/file-extension/DTAMorrismorrison
You seem to have the misconception that all files having the extension .dta have a common format. This is not true. The file format you are interested in is specific to STATA and doesn't seem to be used in any other software. Here is the documentation of the format, and I very much doubt there exists a library being able to write this format.Polish
Probably you can use STATA's infile command to import a CSV file generated with Python.Polish
I am able to use infile/insheet commands to bring in .csv files to STATA, but .dta files can be appended (ie,stacked) many-fold faster than the process of bringing in .csvs, saving them, bringing in other .csvs (it's a rather inefficient program, but is necessary for my team's research).Morrismorrison
If you are concerned about efficiency and speed, you could work with a relational database. Write the python arrays into a database and access it with Stata's odbccommand.Laverne
I
3

pandas DataFrame objects now have a "to_stata" method. So you can do for instance

import pandas as pd
df = pd.read_stata('my_data_in.dta')
df.to_stata('my_data_out.dta')

DISCLAIMER: the first step is quite slow (in my test, around 1 minute for reading a 51 MB dta - also see this question), and the second produces a file which can be way larger than the original one (in my test, the size goes from 51 MB to 111MB). This answer may look less elegant, but it is probably more efficient.

Incur answered 15/4, 2014 at 8:57 Comment(0)
V
8

The scikits.statsmodels package includes a reader for Stata data files, which relies in part on PyDTA as pointed out by @Sven. In particular, genfromdta() will return an ndarray, e.g. from Python 2.7/statsmodels 0.3.1:

>>> import scikits.statsmodels.api as sm
>>> arr = sm.iolib.genfromdta('/Applications/Stata12/auto.dta')
>>> type(arr)
<type 'numpy.ndarray'>

The savetxt() function can be used in turn to save an array as a text file, which can be imported in Stata. For example, we can export the above as

>>> sm.iolib.savetxt('auto.txt', arr, fmt='%2s', delimiter=",")

and read it in Stata without a dictionary file as follows:

. insheet using auto.txt, clear

I believe a *.dta reader should be added in the near future.

Votary answered 29/1, 2012 at 19:24 Comment(0)
P
7

The only Python library for STATA interoperability I could find merely provides read-only access to .dta files. The R foreign library however provides a function write.dta, and RPy provides a Python interface to R. Maybe the combination of these tools can help you.

Polish answered 21/9, 2011 at 18:45 Comment(0)
I
3

pandas DataFrame objects now have a "to_stata" method. So you can do for instance

import pandas as pd
df = pd.read_stata('my_data_in.dta')
df.to_stata('my_data_out.dta')

DISCLAIMER: the first step is quite slow (in my test, around 1 minute for reading a 51 MB dta - also see this question), and the second produces a file which can be way larger than the original one (in my test, the size goes from 51 MB to 111MB). This answer may look less elegant, but it is probably more efficient.

Incur answered 15/4, 2014 at 8:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.