R percentage of counts from a data.frame

Asked 28/9, 2016 at 21:58 Answered 22/4, 2021 at 8:9

I need to calculate the percentage of counts of variables and put it in a vector

I have a frame as follows:

group <- c('A','A','A','B','B','B')
hight <- c('tall','tall','short','tall','short','short')

group    hight
A        tall
A        tall
A        short
B        tall
B        short
B        short

If a run table(df) I get:

     hight
group short tall
A     1    2
B     2    1

To calculate the percanteges

t=table(df)
percentages <- data.frame(group=c('A','B'), percent = c(t[1]/(t[1]+t[2]),t[3]/(t[3]+t[4])))
percentages
percent.vector <- c(t[1]/(t[1]+t[2]),t[3]/(t[3]+t[4]))
percent.vector

I get what I want:

    group   percent
1     A 0.3333333
2     B 0.6666667

[1] 0.3333333 0.6666667

... but I guess there is a better way to do it. I couldn't do this calculation for a higher range of groups.

How can I simplify the calculation of the percentages?

Thanks

Vegetable answered 28/9, 2016 at 21:58 Comment(2)

Anything wrong with this? round(prop.table(table(df)), 2) – Rodrick 28/9, 2016 at 22:2

Chriss, thanks for you comment. The results you get from your code don't seem to be right (short <- A=0.17, b=0.13). It should be short <- A=0.33, b=0.67. The problem also is that I need to extract the output to use it in other calculations. I don't see how to extract the values from the table. – Vegetable 30/9, 2016 at 7:21

If we are using dplyr/tidyr, the way to get the expected is

library(dplyr)
library(tidyr)
df %>%
    count(group, hight) %>% 
    mutate(percent = n/sum(n)) %>% 
    select(-n) %>% 
    spread(hight, percent)
#     group     short      tall
#    <fctr>     <dbl>     <dbl>
#1      A 0.3333333 0.6666667
#2      B 0.6666667 0.3333333

Or as @JoeRoe mentioned in the comments, we could use pivot_wider in the newer versions of tidyr as a replacement to spread

 ...
 pivot_wider(names_from = hight, values_from = percent)

data

df <- data.frame(group, hight)

Hershey answered 29/9, 2016 at 3:21 Comment(5)

Thanks, that definitely does it as a table result. I'm surprised that it is so complicated. Also, how can I extract the results so that I can use them or further process? For example with this: percent.vector <- c(t[1]/(t[1]+t[2]),t[3]/(t[3]+t[4])), I can use afterwards the percent.vector for calculations, etc. – Vegetable 29/9, 2016 at 7:26

@Vegetable There are three columns in the output. Which one do you want? – Hershey 29/9, 2016 at 7:31

@Vegetable If you need to extract the 'short' just append %>% .$short to the end of the code – Hershey 29/9, 2016 at 7:33

I've go it, thanks: percent.vector <- df %>% [... rest of the code ...] %>% .$short. Complicated, but great!! – Vegetable 29/9, 2016 at 7:40

spread() is superseded as of tidyr 1.0.0, so you could replace that line with pivot_wider(group, names_from = hight, values_from = percent). This will also implicitly drop n, so you no longer need to select(-n). – Objective 22/4, 2021 at 8:29

Solution using good old base-r

x = data.frame(group = c('A','A','A','B','B','B'),
               hight = c('tall','tall','short','tall','short','short'))

prop.table(table(x)[,1])

#        A         B 
#0.3333333 0.6666667 



prop.table(table(x)[,2])
#        A         B 
#0.6666667 0.3333333

To extract the numbers just use indexing as in table(x)[,1]

Overtrade answered 22/4, 2021 at 8:9 Comment(1)

Fantastic answer! I love base-R :) :) :) – Gastroenterology 2/12, 2022 at 4:10

data

Recommended topics

Hot tags