How to find the statistical mode?
Asked Answered
P

36

501

In R, mean() and median() are standard functions which do what you'd expect. mode() tells you the internal storage mode of the object, not the value that occurs the most in its argument. But is there is a standard library function that implements the statistical mode for a vector (or list)?

Prepotent answered 30/3, 2010 at 17:55 Comment(2)
You need to clarify whether your data is integer, numeric, factor...? Mode estimation for numerics will be different, and uses intervals. See modeestForcemeat
Why does R not have a built-in function for mode? Why does R consider mode to be the same as the function class ?Corpsman
C
509

One more solution, which works for both numeric & character/factor data:

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

On my dinky little machine, that can generate & find the mode of a 10M-integer vector in about half a second.

If your data set might have multiple modes, the above solution takes the same approach as which.max, and returns the first-appearing value of the set of modes. To return all modes, use this variant (from @digEmAll in the comments):

Modes <- function(x) {
  ux <- unique(x)
  tab <- tabulate(match(x, ux))
  ux[tab == max(tab)]
}
Catechumen answered 18/11, 2011 at 21:33 Comment(10)
Also works for logicals! Preserves data type for all types of vectors (unlike some implementations in other answers).Purpure
This does not return all the modes in case of multi-modal dataset (e.g. c(1,1,2,2)). You should change your last line with : tab <- tabulate(match(x, ux)); ux[tab == max(tab)]Psychomotor
How would I modify this to return the number of times the modal value occurs? Eg for c(1,1,1,2,2) it would return 3.Hypnology
@Hypnology For that, you would replace ux[which.max(tabulate(match(x, ux)))] with just max(tabulate(match(x, ux))).Catechumen
You note that Mode(1:3) gives 1 and Mode(3:1) gives 3, so Mode returns the most frequent element or the first one if all of them are unique.Hybris
it doesnt seem to work in this example: a <- rnorm( 5000, 30, 2 ) b <- rnorm( 1000, 35, 2 ) c <- rnorm( 200, 37, 2 ) temperatureºC <- c( a, b, c ) hist(temperatureºC) #mean abline(v=mean(temperatureºC),col="red",lwd=2) #median abline(v=median(temperatureºC),col="black",lwd=2) #mode Mode <- function(x) { ux <- unique(x) ux[which.max(tabulate(match(x, ux)))] } abline(v=Mode(temperatureºC),col="orange",lwd=2)Slattern
Great way. But that function completely ignores missing values! So if you have missing values, scroll down for @jprockbelly's answer.Tomfool
@KenWilliams really useful, I just used your function in response to this SO question: https://mcmap.net/q/75289/-selecting-unique-rows-in-r/…Shlomo
As Enrique said: This fails when there is no mode, and instead give you the impression that the first value is the mode. Would have been far better if it returned 0 or at NA in those cases.Spodumene
This code here breaks the ties with NA: #56553209Murther
S
76

found this on the r mailing list, hope it's helpful. It is also what I was thinking anyways. You'll want to table() the data, sort and then pick the first name. It's hackish but should work.

names(sort(-table(x)))[1]
Sezen answered 30/3, 2010 at 18:19 Comment(4)
That's a clever work around as well. It has a few drawbacks: the sort algorithm can be more space and time consuming than max() based approaches (=> to be avoided for bigger sample lists). Also the ouput is of mode (pardon the pun/ambiguity) "character" not "numeric". And, of course, the need to test for multi-modal distribution would typically require the storing of the sorted table to avoid crunching it anew.Henry
I measured running time with a factor of 1e6 elements and this solution was faster than the accepted answer by almost factor 3!Chinkiang
I just converted it into number using as.numeric(). Works perfectly fine. Thank you!Khalsa
The problem with this solution is that it is not correct in cases where there is more than one mode.Chinkiang
R
74

There is package modeest which provide estimators of the mode of univariate unimodal (and sometimes multimodal) data and values of the modes of usual probability distributions.

mySamples <- c(19, 4, 5, 7, 29, 19, 29, 13, 25, 19)

library(modeest)
mlv(mySamples, method = "mfv")

Mode (most likely value): 19 
Bickel's modal skewness: -0.1 
Call: mlv.default(x = mySamples, method = "mfv")

For more information see this page

You may also look for "mode estimation" in CRAN Task View: Probability Distributions. Two new packages have been proposed.

Rudman answered 30/3, 2010 at 19:5 Comment(5)
So to just get the mode value, mfv(mySamples)[1]. The 1 being important as it actually returns the most frequent values.Cockle
it does not seem to work in this example: library(modeest) a <- rnorm( 50, 30, 2 ) b <- rnorm( 100, 35, 2 ) c <- rnorm( 20, 37, 2 ) temperatureºC <- c( a, b, c ) hist(temperatureºC) #mean abline(v=mean(temperatureºC),col="red",lwd=2) #median abline(v=median(temperatureºC),col="black",lwd=2) #mode abline(v=mlv(temperatureºC, method = "mfv")[1],col="orange",lwd=2)Slattern
@atomicules: with [1] you get only the first mode. For bimodal or general n-modal distribution you would need just mfv(mySamples)Blockhead
For R version 3.6.0, it says function 'could not find function "mlv"' and the same error when I tried mfv(mysamples). Is it depreciated?Uralaltaic
@DrNishaArora: Did you download the 'modeest' package?Blockhead
C
63

I found Ken Williams post above to be great, I added a few lines to account for NA values and made it a function for ease.

Mode <- function(x, na.rm = FALSE) {
  if(na.rm){
    x = x[!is.na(x)]
  }

  ux <- unique(x)
  return(ux[which.max(tabulate(match(x, ux)))])
}
Chirr answered 3/9, 2014 at 3:21 Comment(1)
I've found a couple of speed ups to this, see answer below.Stoneblind
C
45

A quick and dirty way of estimating the mode of a vector of numbers you believe come from a continous univariate distribution (e.g. a normal distribution) is defining and using the following function:

estimate_mode <- function(x) {
  d <- density(x)
  d$x[which.max(d$y)]
}

Then to get the mode estimate:

x <- c(5.8, 5.6, 6.2, 4.1, 4.9, 2.4, 3.9, 1.8, 5.7, 3.2)
estimate_mode(x)
## 5.439788
Candicecandid answered 14/12, 2012 at 8:0 Comment(5)
Just a note on this one: you can get a "mode" of any group of continuous numbers this way. The data don't need to come from a normal distribution to work. Here is an example taking numbers from a uniform distribution. set.seed(1); a<-runif(100); mode<-density(a)$x[which.max(density(a)$y)]; abline(v=mode)Halflife
error in density.default(x, from = from, to = to) : need at least 2 points to select a bandwidth automaticallyAddiel
@xhie That error message tells you everything you need to know. If you just have one point you need to set the bandwidth manually when calling density. However, if you just have one datapoint then the value of that datapoint will probably be your best guess for the mode anyway...Maryannmaryanna
You are right, but i added just one tweak: estimate_mode <- function(x) { if (length(x)>1){ d <- density(x) d$x[which.max(d$y)] }else{ x } } I'm testing the method to estimate predominant direction wind, instead of mean of direction using vectorial average with circular package. I', working with points over a polygon grade, so , sometimes there is only one point with direction. Thanks!Addiel
@xhie Sounds reasonable :)Maryannmaryanna
P
14

The following function comes in three forms:

method = "mode" [default]: calculates the mode for a unimodal vector, else returns an NA
method = "nmodes": calculates the number of modes in the vector
method = "modes": lists all the modes for a unimodal or polymodal vector

modeav <- function (x, method = "mode", na.rm = FALSE)
{
  x <- unlist(x)
  if (na.rm)
    x <- x[!is.na(x)]
  u <- unique(x)
  n <- length(u)
  #get frequencies of each of the unique values in the vector
  frequencies <- rep(0, n)
  for (i in seq_len(n)) {
    if (is.na(u[i])) {
      frequencies[i] <- sum(is.na(x))
    }
    else {
      frequencies[i] <- sum(x == u[i], na.rm = TRUE)
    }
  }
  #mode if a unimodal vector, else NA
  if (method == "mode" | is.na(method) | method == "")
  {return(ifelse(length(frequencies[frequencies==max(frequencies)])>1,NA,u[which.max(frequencies)]))}
  #number of modes
  if(method == "nmode" | method == "nmodes")
  {return(length(frequencies[frequencies==max(frequencies)]))}
  #list of all modes
  if (method == "modes" | method == "modevalues")
  {return(u[which(frequencies==max(frequencies), arr.ind = FALSE, useNames = FALSE)])}  
  #error trap the method
  warning("Warning: method not recognised.  Valid methods are 'mode' [default], 'nmodes' and 'modes'")
  return()
}
Persaud answered 25/3, 2013 at 17:21 Comment(5)
In your description of this functions you swapped "modes" and "nmodes". See the code. Actually, "nmodes" returns vector of values and "modes" returns number of modes. Nevethless your function is the very best soultion to find modes I've seen so far.Middlebrooks
Many thanks for the comment. "nmode" and "modes" should now behave as expected.Persaud
Your function works almost, except when each value occurs equally often using method = 'modes'. Then the function returns all unique values, however actually there is no mode so it should return NA instead. I'll add another answer containing a slightly optimised version of your function, thanks for the inspiration!Reamer
The only time a non-empty numeric vector should normally generate an NA with this function is when using the default method on a polymodal vector. The mode of a simple sequence of numbers such as 1,2,3,4 is actually all of those numbers in the sequence, so for similar sequences "modes" is behaving as expected. e.g. modeave(c(1,2,3,4), method = "modes") returns [1] 1 2 3 4 Regardless of this, I'd be very interested to see the function optimised as it's fairly resource intensive in its current statePersaud
For a more efficient version of this function, see @hugovdberg's post above :)Persaud
J
12

Here, another solution:

freq <- tapply(mySamples,mySamples,length)
#or freq <- table(mySamples)
as.numeric(names(freq)[which.max(freq)])
Jazminejazz answered 30/3, 2010 at 20:21 Comment(2)
You can replace the first line with table.Stage
I was thinking that 'tapply' is more efficient than 'table', but they both use a for loop. I think the solution with table is equivalent. I update the answer.Jazminejazz
R
11

Based on @Chris's function to calculate the mode or related metrics, however using Ken Williams's method to calculate frequencies. This one provides a fix for the case of no modes at all (all elements equally frequent), and some more readable method names.

Mode <- function(x, method = "one", na.rm = FALSE) {
  x <- unlist(x)
  if (na.rm) {
    x <- x[!is.na(x)]
  }

  # Get unique values
  ux <- unique(x)
  n <- length(ux)

  # Get frequencies of all unique values
  frequencies <- tabulate(match(x, ux))
  modes <- frequencies == max(frequencies)

  # Determine number of modes
  nmodes <- sum(modes)
  nmodes <- ifelse(nmodes==n, 0L, nmodes)

  if (method %in% c("one", "mode", "") | is.na(method)) {
    # Return NA if not exactly one mode, else return the mode
    if (nmodes != 1) {
      return(NA)
    } else {
      return(ux[which(modes)])
    }
  } else if (method %in% c("n", "nmodes")) {
    # Return the number of modes
    return(nmodes)
  } else if (method %in% c("all", "modes")) {
    # Return NA if no modes exist, else return all modes
    if (nmodes > 0) {
      return(ux[which(modes)])
    } else {
      return(NA)
    }
  }
  warning("Warning: method not recognised.  Valid methods are 'one'/'mode' [default], 'n'/'nmodes' and 'all'/'modes'")
}

Since it uses Ken's method to calculate frequencies the performance is also optimised, using AkselA's post I benchmarked some of the previous answers as to show how my function is close to Ken's in performance, with the conditionals for the various ouput options causing only minor overhead: Comparison of Mode functions

Reamer answered 29/6, 2016 at 11:5 Comment(8)
The code you present appears to be a more or less straight copy of the Mode function found in the pracma package. Care to explain?Javelin
Really? Apparently I'm not the only one to think this is a good way to calculate the Mode, but I honestly didn't know that (never knew that package before just now). I cleaned up Chris's function and improved on it by leveraging Ken's version, and if it resembles someone else's code that is purely coincidental.Reamer
I looked into it just now, but which version of the pracma package do you refer to? Version 1.9.3 has a completely different implementation as far as I can see.Reamer
Damn, I've been a giant booby. When I type pracma::Mode instead of just Mode I do indeed get a completely different code to yours. Apparently I haven't loaded a new workspace since I tested your function. :) Terribly sorry.Javelin
Nice amendment to the function. After some further reading, I'm led to the conclusion that there is no consensus on whether uniform or monofrequency distributions have nodes, some sources saying that the list of modes are the distributions themselves, others that the there is no node. The only agreement is that producing a list of modes for such distributions is neither very informative nor particularly meaningful. IF you wish the above function to produce modes such cases then remove the line: nmodes <- ifelse(nmodes==n, 0L, nmodes)Persaud
@greendiod sorry, I missed your comment. It is available through this gist: gist.github.com/Hugovdberg/0f00444d46efd99ed27bbe227bdc4d37Reamer
This is probably the most robust answer!Spodumene
I'd suggest using stop at the end not warning as the function doesn't have anything sensible to return.Decrypt
K
11

The generic function fmode in the collapse package now available on CRAN implements a C++ based mode based on index hashing. It is significantly faster than any of the above approaches. It comes with methods for vectors, matrices, data.frames and dplyr grouped tibbles. Syntax:

library(collapse)
fmode(x, g = NULL, w = NULL, ...)

where x can be one of the above objects, g supplies an optional grouping vector or list of grouping vectors (for grouped mode calculations, also performed in C++), and w (optionally) supplies a numeric weight vector. In the grouped tibble method, there is no g argument, you can do data %>% group_by(idvar) %>% fmode.

Kearse answered 19/3, 2020 at 21:45 Comment(0)
L
10

I can't vote yet but Rasmus Bååth's answer is what I was looking for. However, I would modify it a bit allowing to contrain the distribution for example fro values only between 0 and 1.

estimate_mode <- function(x,from=min(x), to=max(x)) {
  d <- density(x, from=from, to=to)
  d$x[which.max(d$y)]
}

We aware that you may not want to constrain at all your distribution, then set from=-"BIG NUMBER", to="BIG NUMBER"

Leonialeonid answered 12/9, 2013 at 11:50 Comment(2)
error in density.default(x, from = from, to = to) : need at least 2 points to select a bandwidth automaticallyAddiel
x should be a vectorLeonialeonid
B
10

A small modification to Ken Williams' answer, adding optional params na.rm and return_multiple.

Unlike the answers relying on names(), this answer maintains the data type of x in the returned value(s).

stat_mode <- function(x, return_multiple = TRUE, na.rm = FALSE) {
  if(na.rm){
    x <- na.omit(x)
  }
  ux <- unique(x)
  freq <- tabulate(match(x, ux))
  mode_loc <- if(return_multiple) which(freq==max(freq)) else which.max(freq)
  return(ux[mode_loc])
}

To show it works with the optional params and maintains data type:

foo <- c(2L, 2L, 3L, 4L, 4L, 5L, NA, NA)
bar <- c('mouse','mouse','dog','cat','cat','bird',NA,NA)

str(stat_mode(foo)) # int [1:3] 2 4 NA
str(stat_mode(bar)) # chr [1:3] "mouse" "cat" NA
str(stat_mode(bar, na.rm=T)) # chr [1:2] "mouse" "cat"
str(stat_mode(bar, return_mult=F, na.rm=T)) # chr "mouse"

Thanks to @Frank for simplification.

Bezel answered 20/7, 2017 at 13:43 Comment(0)
P
7

I've written the following code in order to generate the mode.

MODE <- function(dataframe){
    DF <- as.data.frame(dataframe)

    MODE2 <- function(x){      
        if (is.numeric(x) == FALSE){
            df <- as.data.frame(table(x))  
            df <- df[order(df$Freq), ]         
            m <- max(df$Freq)        
            MODE1 <- as.vector(as.character(subset(df, Freq == m)[, 1]))

            if (sum(df$Freq)/length(df$Freq)==1){
                warning("No Mode: Frequency of all values is 1", call. = FALSE)
            }else{
                return(MODE1)
            }

        }else{ 
            df <- as.data.frame(table(x))  
            df <- df[order(df$Freq), ]         
            m <- max(df$Freq)        
            MODE1 <- as.vector(as.numeric(as.character(subset(df, Freq == m)[, 1])))

            if (sum(df$Freq)/length(df$Freq)==1){
                warning("No Mode: Frequency of all values is 1", call. = FALSE)
            }else{
                return(MODE1)
            }
        }
    }

    return(as.vector(lapply(DF, MODE2)))
}

Let's try it:

MODE(mtcars)
MODE(CO2)
MODE(ToothGrowth)
MODE(InsectSprays)
Politic answered 18/11, 2011 at 4:41 Comment(0)
D
6

This hack should work fine. Gives you the value as well as the count of mode:

Mode <- function(x){
a = table(x) # x is a vector
return(a[which.max(a)])
}
Damnatory answered 13/9, 2016 at 7:1 Comment(0)
S
5

This builds on jprockbelly's answer, by adding a speed up for very short vectors. This is useful when applying mode to a data.frame or datatable with lots of small groups:

Mode <- function(x) {
   if ( length(x) <= 2 ) return(x[1])
   if ( anyNA(x) ) x = x[!is.na(x)]
   ux <- unique(x)
   ux[which.max(tabulate(match(x, ux)))]
}
Stoneblind answered 13/11, 2018 at 22:56 Comment(0)
M
4

This works pretty fine

> a<-c(1,1,2,2,3,3,4,4,5)
> names(table(a))[table(a)==max(table(a))]
Minuteman answered 7/2, 2014 at 4:16 Comment(0)
H
3

R has so many add-on packages that some of them may well provide the [statistical] mode of a numeric list/series/vector.

However the standard library of R itself doesn't seem to have such a built-in method! One way to work around this is to use some construct like the following (and to turn this to a function if you use often...):

mySamples <- c(19, 4, 5, 7, 29, 19, 29, 13, 25, 19)
tabSmpl<-tabulate(mySamples)
SmplMode<-which(tabSmpl== max(tabSmpl))
if(sum(tabSmpl == max(tabSmpl))>1) SmplMode<-NA
> SmplMode
[1] 19

For bigger sample list, one should consider using a temporary variable for the max(tabSmpl) value (I don't know that R would automatically optimize this)

Reference: see "How about median and mode?" in this KickStarting R lesson
This seems to confirm that (at least as of the writing of this lesson) there isn't a mode function in R (well... mode() as you found out is used for asserting the type of variables).

Henry answered 30/3, 2010 at 18:25 Comment(0)
B
3

Here is a function to find the mode:

mode <- function(x) {
  unique_val <- unique(x)
  counts <- vector()
  for (i in 1:length(unique_val)) {
    counts[i] <- length(which(x==unique_val[i]))
  }
  position <- c(which(counts==max(counts)))
  if (mean(counts)==max(counts)) 
    mode_x <- 'Mode does not exist'
  else 
    mode_x <- unique_val[position]
  return(mode_x)
}
Briannebriano answered 6/9, 2015 at 9:9 Comment(0)
L
3

Below is the code which can be use to find the mode of a vector variable in R.

a <- table([vector])

names(a[a==max(a)])
Lolalolande answered 21/2, 2017 at 10:58 Comment(0)
H
3

There are multiple solutions provided for this one. I checked the first one and after that wrote my own. Posting it here if it helps anyone:

Mode <- function(x){
  y <- data.frame(table(x))
  y[y$Freq == max(y$Freq),1]
}

Lets test it with a few example. I am taking the iris data set. Lets test with numeric data

> Mode(iris$Sepal.Length)
[1] 5

which you can verify is correct.

Now the only non numeric field in the iris dataset(Species) does not have a mode. Let's test with our own example

> test <- c("red","red","green","blue","red")
> Mode(test)
[1] red

EDIT

As mentioned in the comments, user might want to preserve the input type. In which case the mode function can be modified to:

Mode <- function(x){
  y <- data.frame(table(x))
  z <- y[y$Freq == max(y$Freq),1]
  as(as.character(z),class(x))
}

The last line of the function simply coerces the final mode value to the type of the original input.

Haunch answered 24/4, 2018 at 12:43 Comment(1)
This returns a factor, while the user probably wants to preserve the type of the input. Maybe add a middle step y[,1] <- sort(unique(x))Mesolithic
S
2

Another simple option that gives all values ordered by frequency is to use rle:

df = as.data.frame(unclass(rle(sort(mySamples))))
df = df[order(-df$lengths),]
head(df)
Seay answered 4/12, 2012 at 14:29 Comment(0)
C
2

I would use the density() function to identify a smoothed maximum of a (possibly continuous) distribution :

function(x) density(x, 2)$x[density(x, 2)$y == max(density(x, 2)$y)]

where x is the data collection. Pay attention to the adjust paremeter of the density function which regulate the smoothing.

Communard answered 2/5, 2014 at 10:3 Comment(0)
A
2

While I like Ken Williams simple function, I would like to retrieve the multiple modes if they exist. With that in mind, I use the following function which returns a list of the modes if multiple or the single.

rmode <- function(x) {
  x <- sort(x)  
  u <- unique(x)
  y <- lapply(u, function(y) length(x[x==y]))
  u[which( unlist(y) == max(unlist(y)) )]
} 
Aggi answered 24/12, 2014 at 16:8 Comment(3)
It would be more consistent for programmatic use if it always returned a list -- of length 1 if there is only one modeGilbertegilbertian
That's a valid point @antoine-sac. What I like about this solution is the vector that is returned leaves the answers easily addressable. Simply address the output of the function: r <- mode( c(2, 2, 3, 3)) with the modes available at r[1] and r[2]. Still, you do make a good point!!Aggi
Precisely, this is where your solution falls short. If mode returns a list with several values, then r[1] is not the first value ; it is instead a list of length 1 containing the first value and you have to do r[[1]] to get the first mode as a numeric and not a list. Now when there is a single mode, your r is not a list so r[1] works, which is why I thought it was inconsistent. But since r[[1]] also works when r is a simple vector, there is actually a consistency i hadn't realised in that you can always use [[ to access elements.Gilbertegilbertian
J
2

I was looking through all these options and started to wonder about their relative features and performances, so I did some tests. In case anyone else are curious about the same, I'm sharing my results here.

Not wanting to bother about all the functions posted here, I chose to focus on a sample based on a few criteria: the function should work on both character, factor, logical and numeric vectors, it should deal with NAs and other problematic values appropriately, and output should be 'sensible', i.e. no numerics as character or other such silliness.

I also added a function of my own, which is based on the same rle idea as chrispy's, except adapted for more general use:

library(magrittr)

Aksel <- function(x, freq=FALSE) {
    z <- 2
    if (freq) z <- 1:2
    run <- x %>% as.vector %>% sort %>% rle %>% unclass %>% data.frame
    colnames(run) <- c("freq", "value")
    run[which(run$freq==max(run$freq)), z] %>% as.vector   
}

set.seed(2)

F <- sample(c("yes", "no", "maybe", NA), 10, replace=TRUE) %>% factor
Aksel(F)

# [1] maybe yes  

C <- sample(c("Steve", "Jane", "Jonas", "Petra"), 20, replace=TRUE)
Aksel(C, freq=TRUE)

# freq value
#    7 Steve

I ended up running five functions, on two sets of test data, through microbenchmark. The function names refer to their respective authors:

enter image description here

Chris' function was set to method="modes" and na.rm=TRUE by default to make it more comparable, but other than that the functions were used as presented here by their authors.

In matter of speed alone Kens version wins handily, but it is also the only one of these that will only report one mode, no matter how many there really are. As is often the case, there's a trade-off between speed and versatility. In method="mode", Chris' version will return a value iff there is one mode, else NA. I think that's a nice touch. I also think it's interesting how some of the functions are affected by an increased number of unique values, while others aren't nearly as much. I haven't studied the code in detail to figure out why that is, apart from eliminating logical/numeric as a the cause.

Javelin answered 27/5, 2016 at 2:49 Comment(1)
I like that you included code for the benchmarking, but benchmarking on 20 values is pretty pointless. I'd suggest running on at least a few hundred thousand records.Decrypt
P
2

Mode can't be useful in every situations. So the function should address this situation. Try the following function.

Mode <- function(v) {
  # checking unique numbers in the input
  uniqv <- unique(v)
  # frquency of most occured value in the input data
  m1 <- max(tabulate(match(v, uniqv)))
  n <- length(tabulate(match(v, uniqv)))
  # if all elements are same
  same_val_check <- all(diff(v) == 0)
  if(same_val_check == F){
    # frquency of second most occured value in the input data
    m2 <- sort(tabulate(match(v, uniqv)),partial=n-1)[n-1]
    if (m1 != m2) {
      # Returning the most repeated value
      mode <- uniqv[which.max(tabulate(match(v, uniqv)))]
    } else{
      mode <- "Two or more values have same frequency. So mode can't be calculated."
    }
  } else {
    # if all elements are same
    mode <- unique(v)
  }
  return(mode)
}

Output,

x1 <- c(1,2,3,3,3,4,5)
Mode(x1)
# [1] 3

x2 <- c(1,2,3,4,5)
Mode(x2)
# [1] "Two or more varibles have same frequency. So mode can't be calculated."

x3 <- c(1,1,2,3,3,4,5)
Mode(x3)
# [1] "Two or more values have same frequency. So mode can't be calculated."
Phosphoresce answered 5/9, 2018 at 10:9 Comment(2)
Sorry, I just don't see how this adds anything new to what has already been posted. In addition your output seem inconsistent with your function above.Spodumene
Returning strings with messages is not useful programmatically. Use stop() for an error with no result or use warning()/message() with an NA result if the inputs are not appropriate.Decrypt
J
2

If you ask the built-in function in R, maybe you can find it on package pracma. Inside of that package, there is a function called Mode.

Javanese answered 29/7, 2020 at 20:26 Comment(0)
A
1

Another possible solution:

Mode <- function(x) {
    if (is.numeric(x)) {
        x_table <- table(x)
        return(as.numeric(names(x_table)[which.max(x_table)]))
    }
}

Usage:

set.seed(100)
v <- sample(x = 1:100, size = 1000000, replace = TRUE)
system.time(Mode(v))

Output:

   user  system elapsed 
   0.32    0.00    0.31 
Animalist answered 16/12, 2015 at 2:45 Comment(0)
D
1

I case your observations are classes from Real numbers and you expect that the mode to be 2.5 when your observations are 2, 2, 3, and 3 then you could estimate the mode with mode = l1 + i * (f1-f0) / (2f1 - f0 - f2) where l1..lower limit of most frequent class, f1..frequency of most frequent class, f0..frequency of classes before most frequent class, f2..frequency of classes after most frequent class and i..Class interval as given e.g. in 1, 2, 3:

#Small Example
x <- c(2,2,3,3) #Observations
i <- 1          #Class interval

z <- hist(x, breaks = seq(min(x)-1.5*i, max(x)+1.5*i, i), plot=F) #Calculate frequency of classes
mf <- which.max(z$counts)   #index of most frequent class
zc <- z$counts
z$breaks[mf] + i * (zc[mf] - zc[mf-1]) / (2*zc[mf] - zc[mf-1] - zc[mf+1])  #gives you the mode of 2.5


#Larger Example
set.seed(0)
i <- 5          #Class interval
x <- round(rnorm(100,mean=100,sd=10)/i)*i #Observations

z <- hist(x, breaks = seq(min(x)-1.5*i, max(x)+1.5*i, i), plot=F)
mf <- which.max(z$counts)
zc <- z$counts
z$breaks[mf] + i * (zc[mf] - zc[mf-1]) / (2*zc[mf] - zc[mf-1] - zc[mf+1])  #gives you the mode of 99.5

In case you want the most frequent level and you have more than one most frequent level you can get all of them e.g. with:

x <- c(2,2,3,5,5)
names(which(max(table(x))==table(x)))
#"2" "5"
Dactylology answered 26/3, 2019 at 11:46 Comment(0)
U
0

Could try the following function:

  1. transform numeric values into factor
  2. use summary() to gain the frequency table
  3. return mode the index whose frequency is the largest
  4. transform factor back to numeric even there are more than 1 mode, this function works well!
mode <- function(x){
  y <- as.factor(x)
  freq <- summary(y)
  mode <- names(freq)[freq[names(freq)] == max(freq)]
  as.numeric(mode)
}
Unpleasantness answered 5/4, 2014 at 7:36 Comment(0)
W
0

Calculating Mode is mostly in case of factor variable then we can use

labels(table(HouseVotes84$V1)[as.numeric(labels(max(table(HouseVotes84$V1))))])

HouseVotes84 is dataset available in 'mlbench' package.

it will give max label value. it is easier to use by inbuilt functions itself without writing function.

Wolfsbane answered 21/9, 2016 at 19:15 Comment(0)
D
0

Adding in raster::modal() as an option, although note that raster is a hefty package and may not be worth installing if you don't do geospatial work.

The source code could be pulled out of https://github.com/rspatial/raster/blob/master/src/modal.cpp and https://github.com/rspatial/raster/blob/master/R/modal.R into a personal R package, for those who are particularly keen.

Diuresis answered 15/11, 2019 at 6:58 Comment(0)
M
0

Here is my data.table solution that returns row-wise modes for a complete table. I use it to infer row class. It takes care of the new-ish set() function in data.table and should be pretty fast. It does not manage NA though but that could be added by looking at the numerous other solutions on this page.

majorityVote <- function(mat_classes) {
  #mat_classes = dt.pour.centroids_num
  dt.modes <- data.table(mode = integer(nrow(mat_classes)))
  for (i in 1:nrow(mat_classes)) {
    cur.row <- mat_classes[i]
    cur.mode <- which.max(table(t(cur.row)))
    set(dt.modes, i=i, j="mode", value = cur.mode)
  }

  return(dt.modes)
}

Possible usage:

newClass <- majorityVote(my.dt)  # just a new vector with all the modes
Morganne answered 8/2, 2021 at 14:22 Comment(0)
T
0

One quick way would be to use DescTools::Mode.

Twicetold answered 11/4 at 11:21 Comment(0)
M
-1

Sorry, I might take it too simple, but doesn't this do the job? (in 1.3 secs for 1E6 values on my machine):

t0 <- Sys.time()
summary(as.factor(round(rnorm(1e6), 2)))[1]
Sys.time()-t0

You just have to replace the "round(rnorm(1e6),2)" with your vector.

Martelle answered 10/4, 2013 at 14:33 Comment(1)
just look at summary.factor -- all this does is wrap the sort(table(...)) approach in other answers.Bezel
A
-1

You could also calculate the number of times an instance has happened in your set and find the max number. e.g.

> temp <- table(as.vector(x))
> names (temp)[temp==max(temp)]
[1] "1"
> as.data.frame(table(x))
r5050 Freq
1     0   13
2     1   15
3     2    6
> 
Alejandroalejo answered 3/12, 2013 at 19:16 Comment(0)
H
-1

It seems to me that if a collection has a mode, then its elements can be mapped one-to-one with the natural numbers. So, the problem of finding the mode reduces to producing such a mapping, finding the mode of the mapped values, then mapping back to some of the items in the collection. (Dealing with NA occurs at the mapping phase).

I have a histogram function that operates on a similar principal. (The special functions and operators used in the code presented herein should be defined in Shapiro and/or the neatOveRse. The portions of Shapiro and neatOveRse duplicated herein are so duplicated with permission; the duplicated snippets may be used under the terms of this site.) R pseudocode for histogram is

.histogram <- function (i)
        if (i %|% is.empty) integer() else
        vapply2(i %|% max %|% seqN, `==` %<=% i %O% sum)

histogram <- function(i) i %|% rmna %|% .histogram

(The special binary operators accomplish piping, currying, and composition) I also have a maxloc function, which is similar to which.max, but returns all the absolute maxima of a vector. R pseudocode for maxloc is

FUNloc <- function (FUN, x, na.rm=F)
        which(x == list(identity, rmna)[[na.rm %|% index.b]](x) %|% FUN)

maxloc <- FUNloc %<=% max

minloc <- FUNloc %<=% min # I'M THROWING IN minloc TO EXPLAIN WHY I MADE FUNloc

Then

imode <- histogram %O% maxloc

and

x %|% map %|% imode %|% unmap

will compute the mode of any collection, provided appropriate map-ping and unmap-ping functions are defined.

Hutchinson answered 30/10, 2019 at 23:47 Comment(0)
H
-3

An easy way to calculate MODE of a vector 'v' containing discrete values is:

names(sort(table(v)))[length(sort(table(v)))]
Hebner answered 27/8, 2016 at 7:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.