Count occurrences of value in a set of variables in R (per row)
Asked Answered
K

6

14

Let's say I have a data frame with 10 numeric variables V1-V10 (columns) and multiple rows (cases).

What I would like R to do is: For each case, give me the number of occurrences of a certain value in a set of variables.

For example the number of occurrences of the numeric value 99 in that single row for V2, V3, V6, which obviously has a minimum of 0 (none of the three have the value 99) and a maximum of 3 (all of the three have the value 99).

I am really looking for an equivalent to the SPSS function COUNT: "COUNT creates a numeric variable that, for each case, counts the occurrences of the same value (or list of values) across a list of variables."

I thought about table() and library plyr's count(), but I cannot really figure it out. Vectorized computation preferred. Thanks a lot!

Keneth answered 3/6, 2014 at 12:45 Comment(1)
If you want to count the number of occurrences column vice. You can find the answer here: #69036776Gurdwara
P
7

Try

apply(df,MARGIN=1,table)

Where df is your data.frame. This will return a list of the same length of the amount of rows in your data.frame. Each item of the list corresponds to a row of the data.frame (in the same order), and it is a table where the content is the number of occurrences and the names are the corresponding values.

For instance:

df=data.frame(V1=c(10,20,10,20),V2=c(20,30,20,30),V3=c(20,10,20,10))
#create a data.frame containing some data
df #show the data.frame
  V1 V2 V3
1 10 20 20
2 20 30 10
3 10 20 20
4 20 30 10
apply(df,MARGIN=1,table) #apply the function table on each row (MARGIN=1)
[[1]]

10 20 
 1  2 

[[2]]

10 20 30 
 1  1  1 

[[3]]

10 20 
 1  2 

[[4]]

10 20 30 
 1  1  1 

#desired result
Pitcher answered 3/6, 2014 at 12:53 Comment(0)
B
11

If you need to count any particular word/letter in the row.

#Let df be a data frame with four variables (V1-V4)
df <- data.frame(
  V1=c(1,1,2,1,"L"),
  V2=c(1,"L",2,2,"L"), 
  V3=c(1,2,2,1,"L"), 
  V4=c("L","L",1,2,"L"))

For counting number of L in each row just use

#This is how to compute a new variable counting occurences of "L" in V1-V4.      
df$count.L <- apply(df, 1, function(x) length(which(x=="L")))

The result will appear like this

> df
  V1 V2 V3 V4 count.L
1  1  1  1 L       1
2  1  L  2 L       2
3  2  2  2  1      0
4  1  2  1  2      0
Binder answered 9/8, 2018 at 11:42 Comment(0)
P
7

Try

apply(df,MARGIN=1,table)

Where df is your data.frame. This will return a list of the same length of the amount of rows in your data.frame. Each item of the list corresponds to a row of the data.frame (in the same order), and it is a table where the content is the number of occurrences and the names are the corresponding values.

For instance:

df=data.frame(V1=c(10,20,10,20),V2=c(20,30,20,30),V3=c(20,10,20,10))
#create a data.frame containing some data
df #show the data.frame
  V1 V2 V3
1 10 20 20
2 20 30 10
3 10 20 20
4 20 30 10
apply(df,MARGIN=1,table) #apply the function table on each row (MARGIN=1)
[[1]]

10 20 
 1  2 

[[2]]

10 20 30 
 1  1  1 

[[3]]

10 20 
 1  2 

[[4]]

10 20 30 
 1  1  1 

#desired result
Pitcher answered 3/6, 2014 at 12:53 Comment(0)
A
6

I think that there ought to be a simpler way to do this, but the best way that I can think of to get a table of counts is to loop (implicitly using sapply) over the unique values in the dataframe.

#Some example data
df <- data.frame(a=c(1,1,2,2,3,9),b=c(1,2,3,2,3,1))
df
#  a b
#1 1 1
#2 1 2
#3 2 3
#4 2 2
#5 3 3
#6 9 1

levels=unique(do.call(c,df)) #all unique values in df
out <- sapply(levels,function(x)rowSums(df==x)) #count occurrences of x in each row
colnames(out) <- levels
out
#     1 2 3 9
#[1,] 2 0 0 0
#[2,] 1 1 0 0
#[3,] 0 1 1 0
#[4,] 0 2 0 0
#[5,] 0 0 2 0
#[6,] 1 0 0 1
Arnaud answered 3/6, 2014 at 13:23 Comment(0)
U
5

Here is another straightforward solution that comes closest to what the COUNT command in SPSS does — creating a new variable that, for each case (i.e., row) counts the occurrences of a given value or list of values across a list of variables.

#Let df be a data frame with four variables (V1-V4)
df <- data.frame(V1=c(1,1,2,1,NA),V2=c(1,NA,2,2,NA),
       V3=c(1,2,2,1,NA), V4=c(NA, NA, 1,2, NA))

 #This is how to compute a new variable counting occurences of value "1" in V1-V4.      
    df$count.1 <- apply(df, 1, function(x) length(which(x==1)))

The updated data frame contains the new variable count.1 exactly as the SPSS COUNT command would do.

 > df
      V1 V2 V3 V4 count.1
    1  1  1  1 NA       3
    2  1 NA  2 NA       1
    3  2  2  2  1       1
    4  1  2  1  2       2
    5 NA NA NA NA       0

You can do the same to count how many time the value "2" occurs per row in V1-V4. Note that you need to select the columns (variables) in df to which the function is applied.

df$count.2 <- apply(df[1:4], 1, function(x) length(which(x==2)))

You can also apply a similar logic to count the number of missing values in V1-V4.

df$count.na <- apply(df[1:4], 1, function(x) sum(is.na(x)))

The final result should be exactly what you wanted:

 > df
      V1 V2 V3 V4 count.1 count.2 count.na
    1  1  1  1 NA       3       0        1
    2  1 NA  2 NA       1       1        2
    3  2  2  2  1       1       3        0
    4  1  2  1  2       2       2        0
    5 NA NA NA NA       0       0        4

This solution can easily be generalized to a range of values. Suppose we want to count how many times a value of 1 or 2 occurs in V1-V4 per row:

df$count.1or2 <- apply(df[1:4], 1, function(x) sum(x %in% c(1,2)))
Underplot answered 6/6, 2016 at 11:44 Comment(0)
R
3

A solution with functions from the dplyr package would be the following:

Using the example data set from LechAttacks answer:

df <- data.frame(V1=c(1,1,2,1,NA),V2=c(1,NA,2,2,NA),
       V3=c(1,2,2,1,NA), V4=c(NA, NA, 1,2, NA))

Count the appearances of "1" and "2" each and both combined:

df %>%
  rowwise() %>%
  mutate(count_1 = sum(c_across(V1:V4) == 1, na.rm = TRUE),
         count_2 = sum(c_across(V1:V4) == 2, na.rm = TRUE),
         count_12 = sum(c_across(V1:V4) %in% 1:2, na.rm = TRUE)) %>%
  ungroup()

which gives the table:

     V1    V2    V3    V4 count_1 count_2 count_12
1     1     1     1    NA       3       0        3
2     1    NA     2    NA       1       1        2
3     2     2     2     1       1       3        4
4     1     2     1     2       2       2        4
5    NA    NA    NA    NA       0       0        0
Rossy answered 31/5, 2022 at 9:23 Comment(0)
B
-1

In my effort to find something similar to Count from SPSS in R is as follows:

`df <- data.frame(a=c(1,1,NA,2,3,9),b=c(1,2,3,2,NA,1))` #Dummy data with NAs 

`df %>% 
  dplyr::mutate(count = rowSums( #this allows calculate sum across rows
    dplyr::select(., #Slicing on .  
                  dplyr::one_of( #within select use one_of by clarifying which columns your want
                    c('a','b'))), na.rm = T)) #once the columns are specified, that's all you need, na.rm is cherry on top

That's how the output looks like

>df
   a  b count
1  1  1     2
2  1  2     3
3 NA  3     3
4  2  2     4
5  3 NA     3
6  9  1    10

Hope it helps :-)

Brooke answered 24/10, 2019 at 20:6 Comment(1)
This is not what the count function from SPSS does. It does not sum up the Values from multiple variables but counts the occurance of specific value, like "1" for instance. In that case the number of counted "1" would be: 2,1,0,0,0,1Rossy

© 2022 - 2024 — McMap. All rights reserved.