Counting the number of elements with the values of x in a vector

Asked 17/12, 2009 at 17:21 Answered 23/11, 2023 at 22:38

489

I have a vector of numbers:

numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435,
         453,435,324,34,456,56,567,65,34,435)

How can I have R count the number of times a value x appears in the vector?

Strachey answered 17/12, 2009 at 17:21 Comment(0)

623

You can just use table():

> a <- table(numbers)
> a
numbers
  4   5  23  34  43  54  56  65  67 324 435 453 456 567 657 
  2   1   2   2   1   1   2   1   2   1   3   1   1   1   1

Then you can subset it:

> a[names(a)==435]
435 
  3

Or convert it into a data.frame if you're more comfortable working with that:

> as.data.frame(table(numbers))
   numbers Freq
1        4    2
2        5    1
3       23    2
4       34    2
...

Bilberry answered 17/12, 2009 at 17:25 Comment(1)

Don't forget about potential floating point issues, especially with table, which coerces numbers to strings. – Menace 17/12, 2009 at 18:10

317

The most direct way is sum(numbers == x).

numbers == x creates a logical vector which is TRUE at every location that x occurs, and when suming, the logical vector is coerced to numeric which converts TRUE to 1 and FALSE to 0.

However, note that for floating point numbers it's better to use something like: sum(abs(numbers - x) < 1e-6).

Menace answered 17/12, 2009 at 18:9 Comment(0)

I would probably do something like this

length(which(numbers==x))

But really, a better way is

table(numbers)

Predecessor answered 17/12, 2009 at 17:55 Comment(2)

table(numbers) is going to do a lot more work than the easiest solution, sum(numbers==x), because it's going to figure out the counts of all the other numbers in the list too. – Manipulate 18/12, 2009 at 19:41

the problem with table is that it's more difficult to include it inside more complex calculus, for example using apply() on dataframes – Ila 2/12, 2015 at 12:16

There is also count(numbers) from plyr package. Much more convenient than table in my opinion.

Mixer answered 6/6, 2013 at 14:49 Comment(0)

My preferred solution uses rle, which will return a value (the label, x in your example) and a length, which represents how many times that value appeared in sequence.

By combining rle with sort, you have an extremely fast way to count the number of times any value appeared. This can be helpful with more complex problems.

Example:

> numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435,453,435,324,34,456,56,567,65,34,435)
> a <- rle(sort(numbers))
> a
  Run Length Encoding
    lengths: int [1:15] 2 1 2 2 1 1 2 1 2 1 ...
    values : num [1:15] 4 5 23 34 43 54 56 65 67 324 ...

If the value you want doesn't show up, or you need to store that value for later, make a a data.frame.

> b <- data.frame(number=a$values, n=a$lengths)
> b
    values n
 1       4 2
 2       5 1
 3      23 2
 4      34 2
 5      43 1
 6      54 1
 7      56 2
 8      65 1
 9      67 2
 10    324 1
 11    435 3
 12    453 1
 13    456 1
 14    567 1
 15    657 1

I find it is rare that I want to know the frequency of one value and not all of the values, and rle seems to be the quickest way to get count and store them all.

Standifer answered 13/12, 2012 at 21:43 Comment(0)

There is a standard function in R for that

tabulate(numbers)

Contrastive answered 19/4, 2012 at 13:13 Comment(0)

numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435 453,435,324,34,456,56,567,65,34,435)

> length(grep(435, numbers))
[1] 3


> length(which(435 == numbers))
[1] 3


> require(plyr)
> df = count(numbers)
> df[df$x == 435, ] 
     x freq
11 435    3


> sum(435 == numbers)
[1] 3


> sum(grepl(435, numbers))
[1] 3


> sum(435 == numbers)
[1] 3


> tabulate(numbers)[435]
[1] 3


> table(numbers)['435']
435 
  3 


> length(subset(numbers, numbers=='435')) 
[1] 3

Tilburg answered 7/6, 2017 at 13:14 Comment(0)

If you want to count the number of appearances subsequently, you can make use of the sapply function:

index<-sapply(1:length(numbers),function(x)sum(numbers[1:x]==numbers[x]))
cbind(numbers, index)

Output:

        numbers index
 [1,]       4     1
 [2,]      23     1
 [3,]       4     2
 [4,]      23     2
 [5,]       5     1
 [6,]      43     1
 [7,]      54     1
 [8,]      56     1
 [9,]     657     1
[10,]      67     1
[11,]      67     2
[12,]     435     1
[13,]     453     1
[14,]     435     2
[15,]     324     1
[16,]      34     1
[17,]     456     1
[18,]      56     2
[19,]     567     1
[20,]      65     1
[21,]      34     2
[22,]     435     3

Kernite answered 15/5, 2015 at 12:35 Comment(0)

here's one fast and dirty way:

x <- 23
length(subset(numbers, numbers==x))

Enjoin answered 17/12, 2009 at 17:27 Comment(0)

You can change the number to whatever you wish in following line

length(which(numbers == 4))

Danit answered 18/2, 2016 at 9:34 Comment(1)

sum(numbers == 4) will also do. – Commutate 2/8, 2022 at 13:4

One option could be to use vec_count() function from the vctrs library:

vec_count(numbers)

   key count
1  435     3
2   67     2
3    4     2
4   34     2
5   56     2
6   23     2
7  456     1
8   43     1
9  453     1
10   5     1
11 657     1
12 324     1
13  54     1
14 567     1
15  65     1

The default ordering puts the most frequent values at top. If looking for sorting according keys (a table()-like output):

vec_count(numbers, sort = "key")

   key count
1    4     2
2    5     1
3   23     2
4   34     2
5   43     1
6   54     1
7   56     2
8   65     1
9   67     2
10 324     1
11 435     3
12 453     1
13 456     1
14 567     1
15 657     1

Conviction answered 26/6, 2020 at 12:15 Comment(0)

One more way i find convenient is:

numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435,453,435,324,34,456,56,567,65,34,435)
(s<-summary (as.factor(numbers)))

This converts the dataset to factor, and then summary() gives us the control totals (counts of the unique values).

Output is:

4   5  23  34  43  54  56  65  67 324 435 453 456 567 657 
2   1   2   2   1   1   2   1   2   1   3   1   1   1   1

This can be stored as dataframe if preferred.

as.data.frame(cbind(Number = names(s),Freq = s), stringsAsFactors=F, row.names = 1:length(s))

here row.names has been used to rename row names. without using row.names, column names in s are used as row names in new dataframe

Output is:

     Number Freq
1       4    2
2       5    1
3      23    2
4      34    2
5      43    1
6      54    1
7      56    2
8      65    1
9      67    2
10    324    1
11    435    3
12    453    1
13    456    1
14    567    1
15    657    1

Bramlett answered 26/12, 2014 at 7:11 Comment(0)

Using table but without comparing with names:

numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435)
x <- 67
numbertable <- table(numbers)
numbertable[as.character(x)]
#67 
# 2

table is useful when you are using the counts of different elements several times. If you need only one count, use sum(numbers == x)

Festive answered 26/12, 2014 at 17:6 Comment(0)

Base r solution in 2021

aggregate(numbers, list(num=numbers), length)

       num x
1        4 2
2        5 1
3       23 2
4       34 2
5       43 1
6       54 1
7       56 2
8       65 1
9       67 2
10     324 1
11     435 3
12     453 1
13     456 1
14     567 1
15     657 1

tapply(numbers, numbers, length)
  4   5  23  34  43  54  56  65  67 324 435 453 456 567 657 
  2   1   2   2   1   1   2   1   2   1   3   1   1   1   1 

by(numbers, list(num=numbers), length)
num: 4
[1] 2
-------------------------------------- 
num: 5
[1] 1
-------------------------------------- 
num: 23
[1] 2
-------------------------------------- 
num: 34
[1] 2
-------------------------------------- 
num: 43
[1] 1
-------------------------------------- 
num: 54
[1] 1
-------------------------------------- 
num: 56
[1] 2
-------------------------------------- 
num: 65
[1] 1
-------------------------------------- 
num: 67
[1] 2
-------------------------------------- 
num: 324
[1] 1
-------------------------------------- 
num: 435
[1] 3
-------------------------------------- 
num: 453
[1] 1
-------------------------------------- 
num: 456
[1] 1
-------------------------------------- 
num: 567
[1] 1
-------------------------------------- 
num: 657
[1] 1

Benumb answered 31/8, 2021 at 15:17 Comment(0)

There are different ways of counting a specific elements

library(plyr)
numbers =c(4,23,4,23,5,43,54,56,657,67,67,435,453,435,7,65,34,435)

print(length(which(numbers==435)))

#Sum counts number of TRUE's in a vector 
print(sum(numbers==435))
print(sum(c(TRUE, FALSE, TRUE)))

#count is present in plyr library 
#o/p of count is a DataFrame, freq is 1 of the columns of data frame
print(count(numbers[numbers==435]))
print(count(numbers[numbers==435])[['freq']])

Vyse answered 16/11, 2018 at 16:56 Comment(0)

This is a very fast solution for one-dimensional atomic vectors. It relies on match(), so it is compatible with NA:

x <- c("a", NA, "a", "c", "a", "b", NA, "c")

fn <- function(x) {
  u <- unique.default(x)
  out <- list(x = u, freq = .Internal(tabulate(match(x, u), length(u))))
  class(out) <- "data.frame"
  attr(out, "row.names") <- seq_along(u)
  out
}

fn(x)

#>      x freq
#> 1    a    3
#> 2 <NA>    2
#> 3    c    2
#> 4    b    1

You could also tweak the algorithm so that it doesn't run unique().

fn2 <- function(x) {
  y <- match(x, x)
  out <- list(x = x, freq = .Internal(tabulate(y, length(x)))[y])
  class(out) <- "data.frame"
  attr(out, "row.names") <- seq_along(x)
  out
}

fn2(x)

#>      x freq
#> 1    a    3
#> 2 <NA>    2
#> 3    a    3
#> 4    c    2
#> 5    a    3
#> 6    b    1
#> 7 <NA>    2
#> 8    c    2

In cases where that output is desirable, you probably don't even need it to re-return the original vector, and the second column is probably all you need. You can get that in one line with the pipe:

match(x, x) %>% `[`(tabulate(.), .)

#> [1] 3 2 3 2 3 1 2 2

Kodok answered 12/3, 2020 at 16:40 Comment(1)

Really great solution! Thats also the fastest one I could come up with. It can be a little bit improved for performance for factor input using u <- if(is.factor(x)) x[!duplicated(x)] else unique(x). – Ciborium 25/5, 2020 at 14:0

This can be done with outer to get a metrix of equalities followed by rowSums, with an obvious meaning.
In order to have the counts and numbers in the same dataset, a data.frame is first created. This step is not needed if you want separate input and output.

df <- data.frame(No = numbers)
df$count <- rowSums(outer(df$No, df$No, FUN = `==`))

Retribution answered 17/12, 2018 at 15:52 Comment(0)

A method that is relatively fast on long vectors and gives a convenient output is to use lengths(split(numbers, numbers)) (note the S at the end of lengths):

# Make some integer vectors of different sizes
set.seed(123)
x <- sample.int(1e3, 1e4, replace = TRUE)
xl <- sample.int(1e3, 1e6, replace = TRUE)
xxl <-sample.int(1e3, 1e7, replace = TRUE)

# Number of times each value appears in x:
a <- lengths(split(x,x))

# Number of times the value 64 appears:
a["64"]
#~ 64
#~ 15

# Occurences of the first 10 values
a[1:10]
#~ 1  2  3  4  5  6  7  8  9 10 
#~ 13 12  6 14 12  5 13 14 11 14

The output is simply a named vector.
The speed appears comparable to rle proposed by JBecker and even a bit faster on very long vectors. Here is a microbenchmark in R 3.6.2 with some of the functions proposed:

library(microbenchmark)

f1 <- function(vec) lengths(split(vec,vec))
f2 <- function(vec) table(vec)
f3 <- function(vec) rle(sort(vec))
f4 <- function(vec) plyr::count(vec)

microbenchmark(split = f1(x),
               table = f2(x),
               rle = f3(x),
               plyr = f4(x))
#~ Unit: microseconds
#~   expr      min        lq      mean    median        uq      max neval  cld
#~  split  402.024  423.2445  492.3400  446.7695  484.3560 2970.107   100  b  
#~  table 1234.888 1290.0150 1378.8902 1333.2445 1382.2005 3203.332   100    d
#~    rle  227.685  238.3845  264.2269  245.7935  279.5435  378.514   100 a   
#~   plyr  758.866  793.0020  866.9325  843.2290  894.5620 2346.407   100   c 

microbenchmark(split = f1(xl),
               table = f2(xl),
               rle = f3(xl),
               plyr = f4(xl))
#~ Unit: milliseconds
#~   expr       min        lq      mean    median        uq       max neval cld
#~  split  21.96075  22.42355  26.39247  23.24847  24.60674  82.88853   100 ab 
#~  table 100.30543 104.05397 111.62963 105.54308 110.28732 168.27695   100   c
#~    rle  19.07365  20.64686  23.71367  21.30467  23.22815  78.67523   100 a  
#~   plyr  24.33968  25.21049  29.71205  26.50363  27.75960  92.02273   100  b 

microbenchmark(split = f1(xxl),
               table = f2(xxl),
               rle = f3(xxl),
               plyr = f4(xxl))
#~ Unit: milliseconds
#~   expr       min        lq      mean    median        uq       max neval  cld
#~  split  296.4496  310.9702  342.6766  332.5098  374.6485  421.1348   100 a   
#~  table 1151.4551 1239.9688 1283.8998 1288.0994 1323.1833 1385.3040   100    d
#~    rle  399.9442  430.8396  464.2605  471.4376  483.2439  555.9278   100   c 
#~   plyr  350.0607  373.1603  414.3596  425.1436  437.8395  506.0169   100  b

Importantly, the only function that also counts the number of missing values NA is plyr::count. These can also be obtained separately using sum(is.na(vec))

Drewdrewett answered 21/2, 2020 at 15:9 Comment(0)

Here is a way you could do it with dplyr:

library(tidyverse)

numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435,
             453,435,324,34,456,56,567,65,34,435)
ord <- seq(1:(length(numbers)))

df <- data.frame(ord,numbers)

df <- df %>%
  count(numbers)

numbers     n
     <dbl> <int>
 1       4     2
 2       5     1
 3      23     2
 4      34     2
 5      43     1
 6      54     1
 7      56     2
 8      65     1
 9      67     2
10     324     1
11     435     3
12     453     1
13     456     1
14     567     1
15     657     1

Bluma answered 11/7, 2020 at 13:25 Comment(0)

You can make a function to give you results.

# your list
numbers <- c(4,23,4,23,5,43,54,56,657,67,67,435,
         453,435,324,34,456,56,567,65,34,435)

function1<-function(x){
    if(x==value){return(1)}else{ return(0) }
}

# set your value here
value<-4

# make a vector which return 1 if it equal to your value, 0 else
vector<-sapply(numbers,function(x) function1(x))
sum(vector)

result: 2

Elasmobranch answered 13/11, 2020 at 14:28 Comment(0)

Recent (fast) answers to an old question. Two collapse options and one using tabulate (a variation of this answer).

collapse::qtable and ftabulate (custom function) returns a named vector (à la table), while collapse::fcount returns a data.frame (à la dplyr::count or vctrs::vec_count). The three functions are optimized to be much faster, and both collapse functions work with groups and weights.

collapse::qtab(numbers) #or collapse::qtable(numbers)
# numbers
#   4   5  23  34  43  54  56  65  67 324 435 453 456 567 657 
#   2   1   2   2   1   1   2   1   2   1   3   1   1   1   1 

ftabulate <- function(x){
  u <- unique.default(x)
  setNames(tabulate(match(x, u), length(u)), u)
}

ftabulate(numbers)
#  4  23   5  43  54  56 657  67 435 453 324  34 456 567  65 
#  2   2   1   1   1   2   1   2   3   1   1   2   1   1   1 

collapse::fcount(numbers)
#      x N
# 1    4 2
# 2   23 2
#      ...

Existing solutions and benchmark on a 1e6 length vector with 100 unique values.

Here's a panorama of existing solutions, grouped by whether they return a named vector or a data.frame, and a speed comparison over a vector of size 1,000,000 with 100 different values.

collapse options are the fastest, whether one needs to return a named vector (qtab) or a data.frame (fcount). tabulate solutions are also the fastest among base R options.

Code:

library(microbenchmark)
library(ggplot2)

set.seed(1)
numbers <- sample(sample(1000, 100), size = 1e6, replace = TRUE)

fn <- function(x) {
  u <- unique.default(x)
  out <- list(x = u, freq = .Internal(tabulate(match(x, u), length(u))))
  class(out) <- "data.frame"
  attr(out, "row.names") <- seq_along(u)
  out
}

mb <- 
  microbenchmark(
    #Named vectors
    "collapse::qtab" = qtab(numbers),
    table = table(numbers),
    tapply = tapply(numbers, numbers, length),
    lengths = lengths(split(numbers,numbers)),
    ftabulate = ftabulate(numbers),
    
    #Data.frame
    "collapse::fcount" = fcount(numbers),
    "vctrs::vec_count" = vctrs::vec_count(numbers),
    tabulate_df = fn(numbers),
    aggregate = aggregate(numbers, list(num=numbers), length),
    rle = with(rle(sort(numbers)), data.frame(number = values, n = lengths)),
    
    times = 50L
  )

mb$ntime <- microbenchmark:::convert_to_unit(mb$time, "t")
type <- setNames(levels(mb$expr), c(rep("Named vector", 5), rep("Data frame", 5)))
mb$type <- names(type)[match(mb$expr, levels(mb$expr))]
ggplot(mb, aes(x = expr, y = ntime)) +
  geom_violin() +
  scale_x_discrete(name = "", limits = rev) +
  scale_y_log10(name = sprintf("Time [%s]", attr(mb$ntime, "unit"))) +
  coord_flip() +
  ggforce::facet_col(facets = vars(type), scales = "free_y", space = "free")

Unit: milliseconds
             expr        min         lq      mean    median       uq      max neval
   collapse::qtab   7.992801  12.122301  18.08057  15.03340  17.1635  71.8245    50
            table 259.816201 370.759301 453.11712 447.25705 533.3613 613.8360    50
           tapply  51.133300  69.950101  95.82087  91.15720 101.9496 248.4416    50
          lengths  32.467401  51.007100  64.84489  60.97170  72.2331 179.6724    50
        ftabulate  24.455901  37.864101  46.76249  43.06755  56.3162  92.3557    50
 collapse::fcount   5.770500   7.499401  10.52078  10.53265  11.1268  32.4896    50
 vctrs::vec_count  15.830001  27.466501  32.60882  29.31685  37.8101  87.4450    50
      tabulate_df  24.235500  36.730401  45.35056  43.59385  51.3074 143.7773    50
        aggregate 404.713901 510.090201 592.82286 606.85290 644.4701 922.6984    50
              rle  52.259502  71.437701  92.71214  87.64515  99.4625 245.9526    50

Ess answered 23/11, 2023 at 22:38 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Existing solutions and benchmark on a 1e6 length vector with 100 unique values.

Recommended topics

Hot tags