For a factor with more than a sensible number of levels to color in a graph, I want to replace any levels that are not in the 'top 10' with 'other'.
Alternate Question: How do I reduce my factor levels to the number rcolorbrewer can plot as separate colors?
For example, if I want to plot number of homeruns per decade from the baseball data:
require(ggplot2)
qplot(data=baseball,10*year%/%10,hr,
stat="identity",geom="bar")
Perhaps I'd like to see what teams contributed to this:
qplot(data=baseball,10*year%/%10,hr,
fill=team,
stat="identity",geom="bar")
This creates too many color levels! The colors are so similar you can't distinguish them, and there are so many they won't fit on the screen.
I'd really like to see the top X (7) teams (by total homerun count) and then the rest all lumped together in a single category/color called 'other'.
Let's imagine we have a function called hotfactor
which knows how to do this:
hotfactor(afactor,orderby,count)={ ??? }
qplot(data=baseball,10*year%/%10,hr,
fill=hotfactor(factor(team),hr,n=7),
stat="identity",geom="bar") +
scale_fill_brewer("team","Dark2")
So what can I use for 'hotfactor'?