This is one of the innovations addressed in rio (full disclosure: I wrote this package). Basically, it provides various ways of importing variable labels, including haven's way of doing things and foreign's. Here's a trivial example:
Start by making a reproducible example:
> library("rio")
> export(iris, "iris.dta")
Import using foreign::read.dta()
(via rio::import()
):
> str(import("iris.dta", haven = FALSE))
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "datalabel")= chr ""
- attr(*, "time.stamp")= chr "15 Jan 2016 20:05"
- attr(*, "formats")= chr "" "" "" "" ...
- attr(*, "types")= int 255 255 255 255 253
- attr(*, "val.labels")= chr "" "" "" "" ...
- attr(*, "var.labels")= chr "" "" "" "" ...
- attr(*, "version")= int -7
- attr(*, "label.table")=List of 1
..$ Species: Named int 1 2 3
.. ..- attr(*, "names")= chr "setosa" "versicolor" "virginica"
Read in using haven::read_dta()
using its native variable attributes because the attributes are stored at the data.frame level rather than the variable level:
> str(import("iris.dta", haven = TRUE, column.labels = TRUE))
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species :Class 'labelled' atomic [1:150] 1 1 1 1 1 1 1 1 1 1 ...
.. ..- attr(*, "labels")= Named int [1:3] 1 2 3
.. .. ..- attr(*, "names")= chr [1:3] "setosa" "versicolor" "virginica"
Read in using haven::read_dta()
using an alternative that we (the rio developers) have found more convenient:
> str(import("iris.dta", haven = TRUE))
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "var.labels")=List of 5
..$ Sepal.Length: NULL
..$ Sepal.Width : NULL
..$ Petal.Length: NULL
..$ Petal.Width : NULL
..$ Species : NULL
- attr(*, "label.table")=List of 5
..$ Sepal.Length: NULL
..$ Sepal.Width : NULL
..$ Petal.Length: NULL
..$ Petal.Width : NULL
..$ Species : Named int 1 2 3
.. ..- attr(*, "names")= chr "setosa" "versicolor" "virginica"
By moving the attributes to be at the level of the data.frame, they're much easier to access using attr(data, "label.var")
, etc. rather than digging through each variable's attributes.
Note: the values of attributes will be NULL because I'm just writing a native R dataset to a local file in order to make this reproducible.
lbs <- setNames(_labels_, names(df))
; then accessing the label can be done via, e.g.,lbs["var"]
– Turbellarianread.dta
function in pkg:foreign.haven
is a relatively recent package and at the moment it doesn't seem to have documented plans for labels. – Tarterread_dta
inhaven
does have label. In contrast,foreign::read.dta
actually doesn't. Also, theforeign
packages does not work with Stata 13, let alone 14. – Choreographerread.dta
says the value will be:"A data frame with attributes. These will include "datalabel", "time.stamp", "formats", "types", "val.labels", "var.labels" and "version" and may include "label.table" and "expansion.table"
. – Tarterforeign
. Glancing through theforeign
doc, they suggest the readstata13 package for later versions of Stata. Presumably it also conforms to whatever idiom/norm is found in foreign. – Manumissionforeign
does have avar.labels
attribute that is attached to the data frame. This is different fromhaven
, but this shows your point that there are different implementations. – Choreographer