I have a data.frame with several factors such as:
df<-data.frame(Var1=as.factor(sample(c("AB", "BC", "CD", "DE", "EF"), 1000, replace=TRUE)))
with
summary(df$Var1)
AB BC CD DE EF
209 195 178 221 197
I want to plot the frequency of the levels of each factor in the data.frame as follows:
ggplot(df, aes(x=factor(1), fill=factor(Var1)))+
geom_bar(width=1, colour="black")+
coord_polar(theta="y")+
theme_void()
However, the order of the levels is alphabetically and not by frequency. Using count from library(plyr) I can create a new data.frame that gives me the frequency of each level:
df_count <-count(df, "Var1")
Var1 freq
1 AB 209
2 BC 195
3 CD 178
4 DE 221
5 EF 197
Which I can then reorder using
df_count$Var1<-factor(df_count$Var1, levels=df_count$Var1[order(df_count$freq, decreasing=TRUE)])
which when plotted give me what I want, the sorted frequency of each level.
1.) Is this the most elegant solution? It gives me an extra data.frame for each factor/column in my original data.frame, and I feel there must be a simpler way.
2.) When plotting, how can I rename the legend lables and ensure they are allocated the right factor level? If I use
scale_fill_manual(labels=c("Name of AB", "Name of BC", "Name of CD", "Name of DE","Name of EF"))
the labels do not relate to the right level. Here the first entry in the legend will be "DE" as it is the level with the highest frequency but the label will say "Name of AB" as defined in scale_fill_manual. I could check the order of the labels manually each time but there must be an automatic way?