Is there a way to select a subset from objects (data frames, matrices, vectors) without making a copy of selected data?
I work with quite large data sets, but never change them. However often for convenience I select subsets of the data to operate on. Making a copy of a large subset each time is very memory inefficient, but both normal indexing and subset
(and thus xapply()
family of functions) create copies of selected data. So I'm looking for functions or data structures that can overcome this issue.
Some possible approaches that may fit my needs and hopefully are implemented in some R packages:
- copy-on-write mechanism, i.e. data structures that are copied only when you add or rewrite existing elements;
- immutable data structures, that only require recreating indexing information for the data structure, but not its content (like making substring from the string by only creating small object that holds length and a pointer to the same char array);
xapply()
analogues that do not create subsets.
data.table
package (someone will presumably show up here shortly to give you more details ...) – Fideliadata.table
seems to be nice package, but unfortunately it doesn't fit my needs in most cases. In particular,data.table
has another indexing model and makes it much harder (and slower) to perform selection likedata[1:50, 1:10]
(i.e. selection by both - row & column) and many linear algebra operations. I was thinking of using matrices instead of my data frames to save both space and time, but matrices have their limitations too, so I'm looking for alternative options too. – OsmoDF <- data[1:10000, ]
takes about 30 seconds, which is much longer than is needed to create promise object. Also this means that data structures have to be permanent not to break language semantics, but they are not. Can you explain it, please? I definitely miss something. (Let me know if it's worth to post it as a separate question.) – OsmodelayedAssign
and theforce
functions if you want some control over this.process. Most of us do not think much about it (until it bites us during function evaluations.) – Tran