Combining tapply and 'not in' logic, using R
Asked Answered
U

2

7

How do I combine the tapply command with 'not in' logic?

Objective: Obtain the median sepal length for each species.

tapply(iris$Sepal.Length, iris$Species, median)

Constraint: Remove entries for which there is a petal width of 1.3 and 1.5.

!iris$Petal.Width %in% c('1.3', '1.5')

Attempt:

tapply(iris$Sepal.Length, iris$Species, median[!iris$Petal.Width %in% c('1.3', '1.5')])

Result: error message 'object of type 'closure' is not subsettable'.

---

My attempt here with the iris dataset is a stand-in demo for my own dataset. I have attempted the same approach with my own dataset and received the same error message. I imagine something is wrong with my syntax. What is it?

Unbodied answered 11/5, 2015 at 21:31 Comment(1)
median[!iris$Petal.Width %in% c('1.3', '1.5')] you are subsetting a function here. This yields in an error. You cant use [ ] on functions.Uniformize
V
9

Try

with(iris[!iris$Petal.Width %in% c('1.3', '1.5'),], tapply(Sepal.Length, Species, median))
# setosa versicolor  virginica 
#    5.0        5.8        6.5 

The idea here is to operate on the subset-ted data in the first place.

Your line didn't work because the FUN argument should be applied on X (Sepal.Length in your case) rather over the whole data set.

Volatilize answered 11/5, 2015 at 21:33 Comment(0)
E
1

This is the workaround you should not do:

tapply(
  1:nrow(iris),
  iris$Species,
  function(i) median(iris$Sepal.Length[
     (1:nrow(iris) %in% i) &
    !(iris$Petal.Width %in% c('1.3', '1.5'))
]))

Things get ugly if you subset after splitting the vector in this way. You effectively have to

  • split it again (when using 1:nrow(iris) %in% i) and
  • compute the subset once for each value of iris$Species (when using !(iris$Petal.Width %in% c('1.3', '1.5'))).
Enthalpy answered 11/5, 2015 at 21:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.