Customizing aesthetics of faceted barplot

Asked 9/6, 2011 at 18:44 Answered 9/6, 2011 at 19:43

I am trying to do some analysis of the recent MLB draft with some ggplots in R

selection <- draft[c("Team","Division","Position")]
head(selection)

  Team   Division Position
1  pit NL Central        P
2  sea AL West           P
3  ari NL West           P
4  bal AL East           P
5  kc  AL Central        O
6  was NL East           I

where P = Pitcher , O=Outfield etc.

I want to show the number of players selected by team by position within each division

p <- ggplot(data=selection, aes(x=Team, fill= Position))  + geom_bar(position="stack")
p <-  p + coord_flip()
p <- p+ ylab("Players Selected")
p <- p + facet_wrap(~Division)
p

This gets me part of the way there but is very unattractive

a) the groupings work but all teams are shown in each divison grid - even though only the 5 or 6 team in each division actually - and correctly - show data

b) With the co-ord flip, the teams are listed in reverse alphabetical order down page. can I resort. It would also be nice to have left justification

c) How do i set the legend to Pitching, Outfield rather than P and O - is this a vector i somehow need to set and include

d) It would also be interesting to see the proportion of each teams selection committed to each type of player. This is accomplished by setting position= "fill". Can i set the axes to % rather than 0 to 1. I also tried setting a geom_vline(aes(xintercept=0.5) -and yintercept in case the flip factored in - but the line did not appear at halfway mark along the x axis

Help much appreciated

Cuttler answered 9/6, 2011 at 18:44 Comment(1)

If your goal is to just flip the factors, you can use reorder(Team, -as.numeric(Team)) in your ggplot2 aes call to x=. – Occidentalize 5/8, 2011 at 21:49

edit: complete revamping, including info from other answer, after grabbing the data (and storing it in a text file called mlbtmp.txt) and some more experimentation:

selection <- read.table("mlbtmp.txt",skip=1)
names(selection) <- c("row","League","Division","Position","Team")
## arrange order/recode factors
selection <- transform(selection,
       Team=factor(Team,levels=rev(levels(Team))),
                   Position=factor(Position,levels=c("P","I","C","O"),
                                  labels=c("Pitching","Infield",
                                    "Center","Outfield")))

I played around with various permutations of facet_grid, facet_wrap, scales, coord_flip, etc.. Some worked as expected, some didn't:

library(ggplot2)
p <- ggplot(data=selection, aes(x=Team, fill= Position))  +
  geom_bar(position="stack")
p + facet_grid(.~Division,scales="free_x") + coord_flip()  ## OK

## seems to fail with either "free_x" or "free_y"
p + facet_grid(Division~.,scales="free") + coord_flip()

## works but does not preserve 'count' axis:
p + facet_wrap(~Division,scales="free")

I ended up with facet_wrap(...,scales="free") and used ylim to constrain the axes.

p + facet_wrap(~Division,scales="free") + coord_flip() +
  ylim(0,60) + opts(axis.text.y=theme_text(hjust=0))

mlb1

In principle there might be a way to use ..density.., ..ncount.., ..ndensity.., or one of the other statistics computed by stat_bin instead of the default ..count.., but I couldn't find a combination that worked.

Instead (as is often the best solution when stuck with ggplot's on-the-fly transformations) I reshaped the data myself:

## pull out Team identification within Division and League
stab <- unique(subset(selection,select=c(Team,Division,League)))
## compute proportions by team
s2 <- melt(ddply(selection,"Team",function(x) with(x,table(Position)/nrow(x))))
## fix names
s2 <- rename(s2,c(variable="Position",value="proportion"))
## merge Division/League info back to summarized data
s3 <- merge(s2,stab)

p2 <- ggplot(data=s3, aes(x=Team, fill= Position,y=proportion))  +
  geom_bar(position="stack")+scale_y_continuous(formatter="percent")+
  geom_hline(yintercept=0.5,linetype=3)+ facet_wrap(~Division,scales="free") +
  opts(axis.text.y=theme_text(hjust=0))+coord_flip()

mlb2

There's obviously a little more prettying-up that could be done here, but this should get you most of the way there ...

Elasmobranch answered 9/6, 2011 at 18:58 Comment(9)

a) Shows correct data but the Team axis just shows the five teams for the first Division – Cuttler 9/6, 2011 at 19:57

If you include a larger portion of your data using dput() it will be easier for us to help you. – Kyanite 9/6, 2011 at 19:59

Hmm. Can you post a possibly-small-but-sufficiently-complete subset of data (i.e., a reproducible example) either by using dput or by putting the data on the web somewhere and posting a URL? (I missed joran's comment which is basically identical) – Elasmobranch 9/6, 2011 at 20:1

Not familiar with dput() but will try with assistance. Otherwise I can probably post some data on web – Cuttler 10/6, 2011 at 0:19

try this link for the data – Cuttler 10/6, 2011 at 0:39

@pssguy, dput makes things nice and easy for us. dput(mydata) will make the console spit out something like structure(list(...etc...)). It spells out the structure of your data as well as the contents. If you copy and paste the output of dput(selection) into your question, we can then paste it into our console to get your dataframe. If the output is long, then stackexchange will put it in a scrollable text box. – Yetac 10/6, 2011 at 23:29

(cont...) If your output is really long, considering taking a random sample of rows like sample(selection, 10), or a subset like subset(selection, Team == 'sea') – Yetac 10/6, 2011 at 23:36

Thanks v much Ben, fantastic work. I really appreciate the time and effort you put into this providing both total and proportion options. Just one point in case anyone is interested in the actual data - center should actually be catchers selected. center fielders are included in outfielders category – Cuttler 13/6, 2011 at 15:11

Oops. Oh well. (Don't know if I can be bothered to go back and fix it all.) – Elasmobranch 13/6, 2011 at 20:36

Filling in some gaps from @Ben Bolker's answer...

To order the teams differently, you'll need to store that column as a factor. There probably won't be a short, quick way to specify the order you want, since you most likely want to order the teams in each division separately. This means you'll need to order all teams such that each division subset remains properly ordered. Something like (this is schematic, not syntatically correct):

selection$Team <- factor(selection$Team,
    levels=c( (AL East teams in desired order), 
              (AL Central teams in desire order), etc))

Depending on what other stuff you have calculated there may be a quick way to specify that, or you might have to write them out by hand.

Axis text justification can be modified via

opts(axis.text.x=theme_text(hjust=1))

Stepping back a bit, notice that with ggplot2 the solution is often found by modifying your data that is used to build the plot, not the plot itself. It's a different way of thinking about things, but handy once you get used to it.

Kyanite answered 9/6, 2011 at 19:43 Comment(2)

looks good, but I don't think you actually need an ordered factor -- ggplot plots factors in the order of their levels, whether they are ordered or not ... (I'm not 100% sure of this, but a test would be fairly simple) – Elasmobranch 9/6, 2011 at 19:59

Thanks for your insight Joran, particularly about modifying data – Cuttler 13/6, 2011 at 15:12

Recommended topics

Hot tags