I just read the profile of @David Arenburg, and found a bunch of useful tips for how to develop good R-programming skills/habits, and one especially struck me. I have always thought that the apply functions in R was the cornerstone of working with dataframes, but he writes:
If you are working with data.frames, forget there is a function called apply- whatever you do - don't use it. Especially with a margin of 1 (the only good usecase for this function is to operate over matrix columns- margin of 2).
Some good alternatives: ?do.call, ?pmax/pmin, ?max.col, ?rowSums/rowMeans/etc, the awesome matrixStats packages (for matrices), ?rowsum and many more
Could anybody explain this to me? Why are apply functions frowned upon?
apply
- not the whole*apply
family. The main issue withapply
is that it converts the whole data to a matrix which messes up the data (becausematrix
can't store different classes unlike a dataframe), hence yields unexpected results. Hence, when operating over columns, it is better to use the rest of the*apply
family such aslapply
orsapply
. On the other hand, because R is vectorized languageapply
with a margin of 1 will be very slow (regardless of thematrix
issue), hence I'm offering to use vectorized alternatives instead. – Edmead*apply
family. – Edmead