What is the best format to persist simple data frames to disc in R for storage while limiting semantic loss?
I ask because I'm archiving a data set. In an ideal world, my data format would have the follow characteristics:
- Stability - the storage format will be compatible with future version of R
- Semantic compatibility - the storage format will understand the semantics of R's primative data types. For example, it will be able to store ordered factors with labels in a sensible manner.
- Open standard - ideally, the format will be an open standard so other statistics packages (now or in the future) will be able to understand it
My first thought was to use CSV which is very stable, but lacks the semantic richness required. On the other hand, R's builtin RData format completely captures R's semantics, but seems likely to change between releases (correct me if I'm wrong).
Is there another format that finds a balance between these three imperatives?
?save
mention thatAny recent version of R can read compressed save file
so I doubt that .Rdata format can change between releases. – Uroyaml
. It can handleR
's basic data types (e.g. named lists, vectors, ...) and is human-readable (in a better way than XML in my opinion). – Brosine