I apologize in advance if this has a simple answer somewhere. It seems like the kind of thing that would, but I can't seem to locate it in the help files, by searching SO, or by Googling.
I'm working with some datasets that are several GB right now. It's enough to fit in memory on one of the cluster nodes I have access to, but takes quite a bit of time to load. For many debugging/programming activities with this data, I don't need the entire file loaded, just the first few thousand observations to have a dataset on which to test code. I can of course just read the whole file in and subset, but I was wondering if there's a way to tell read.dta()
to only read in the first N rows? This would of course be much faster.
I could also use a proper format like .csv and then use read.csv()
's nrows argument, but then I'd lose the factor labels in the Stata dataset (and have to recreate quite a few GB of data from someone else's code that's feeding in to this project. So a direct solution on .dta files is preferred.
outsheet
function for exporting to CSV. A little late for this project perhaps, but it might make it easier next time you work together. ats.ucla.edu/stat/stata/faq/outsheet.htm – Hysteric