Hadley Wickham's haven
package, applied to a Stata file, returns a tibble with many columns of type "labeled". You can see these with str(), e.g.:
$ MSACMSZ :Class 'labelled' atomic [1:8491861] NA NA NA NA NA NA NA NA NA NA ...
.. ..- attr(*, "label")= chr "metropolitan area size (cmsa/msa)"
.. ..- attr(*, "labels")= Named int [1:7] 0 1 2 3 4 5 6
.. .. ..- attr(*, "names")= chr [1:7] "not identified or nonmetropolitan" "100,000 - 249,999" "250,000 - 499,999" "500,000 - 999,999" ...
It would be nice if I could simply extract all these labeled vectors to factors, but I have compared the length of the labels attribute to the number of unique values in each vector, and it is sometimes longer and sometimes shorter. So I think I need to look at all of them and decide how to handle each one individually.
So I would like to extract the values of the labels attribute to a list. However, this function:
labels93 <- lapply(cps_00093.df, function(x){attr(X, which="labels", exact=TRUE)})
returns NULL for all variables.
Is this a tibble vs data frame problem? How do I extract these attributes from the tibble columns into a list?
Note that the labels vector is named, and I need both the labels and the names.
As per @Hack-R's request here is a tiny snippet of my data as converted by dput (which I had never used before). I applied this code:
filter(cps_00093.df, YEAR==2015) %>%
sample_n(10) %>%
select(HHTENURE, HHINTYPE) -> tiny
dput(tiny, file = "tiny")
to produce the file tiny. Hey! That was easy! I thought it would be hard to break off a piece this small.
Opening tiny with Notepad++, this is what I found:
structure(list(HHTENURE = structure(c(2L, 1L, 1L, 2L, 1L, 1L,
1L, 2L, 1L, 1L), labels = structure(c(0L, 1L, 2L, 3L, 6L, 7L), .Names = c("niu",
"owned or being bought", "rented for cash", "occupied without payment of cash rent",
"refused", "don't know")), class = "labelled"), HHINTYPE = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), labels = structure(1:3, .Names = c("interview",
"type a non-interview", "type b/c non-interview")), class = "labelled")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("HHTENURE",
"HHINTYPE"))
I suspect this could be made more readable with a little spacing, but I did not want to muck with it for fear of accidentally destroying relevant information.
dput()
the minimum amount of data necessary for a reproducible example that encapsulates the problem? – Codfish?haven::labelled
; they have their ownas_factor
method. – Aidetiny %>% mutate_all(haven::as_factor)
looks pretty good to me... – Aidemutate_if
instead ofmutate_all
if any of your columns are of other types. – Aide