Plotting summary statistics
Asked Answered
M

2

5

For the following data set,

Genre   Amount
Comedy  10
Drama   30
Comedy  20
Action  20
Comedy  20
Drama   20

I want to construct a ggplot2 line graph, where the x-axis is Genre and the y-axis is the sum of all amounts (conditional on the Genre).

I have tried the following:

p = ggplot(test, aes(factor(Genre), Gross)) + geom_point()
p = ggplot(test, aes(factor(Genre), Gross)) + geom_line()
p = ggplot(test, aes(factor(Genre), sum(Gross))) + geom_line()

but to no avail.

Mackmackay answered 7/3, 2011 at 8:53 Comment(0)
A
8

If you don't want to compute a new data frame before plotting, you cvan use stat_summary in ggplot2. For example, if your data set looks like this :

R> df <- data.frame(Genre=c("Comedy","Drama","Action","Comedy","Drama"),
R+                  Amount=c(10,30,40,10,20))
R> df
   Genre Amount
1 Comedy     10
2  Drama     30
3 Action     40
4 Comedy     10
5  Drama     20

You can use either qplot with a stat="summary" argument :

R> qplot(Genre, Amount, data=df, stat="summary", fun.y="sum")

Or add a stat_summary to a base ggplot graphic :

R> ggplot(df, aes(x=Genre, y=Amount)) + stat_summary(fun.y="sum", geom="point")
Avernus answered 7/3, 2011 at 9:22 Comment(9)
Neat one-liner... though you can easily ommit factor, since stringsAsFactors is the default behaviour.Jijib
I think I'll let the factor() instruction because it is used in the question, but you're right, it is not useful here. Thanks for pointing it.Avernus
Thanks so much, the reason I was using factor was because I was trying to get the sum from lower to higher, but it does not do that.Mackmackay
Ok, so I finally removed it :)Avernus
@juba, is there anyway to order the bars according to the y value, which in this case is the sum?Mackmackay
What does "from lower to higher" mean? Maybe you were refering to ordered factors?Jijib
@Julio Diaz on the bar ordering, Yes, see this SO question: https://mcmap.net/q/86749/-order-bars-in-ggplot2-bar-graph/429846Correlate
it seems you can use a reorder call in your aes definition, something like ` aes(x=reorder(Genre, Amount, sum), y=Amount))`. But there may be a better and cleaner way to do it.Avernus
Where is the full documentation for the stat_ methods? the ggplot bok hardly touches this, yet they are clearly pwerful and useful.Zwolle
J
1

Try something like this:

dtf <- structure(list(Genre = structure(c(2L, 3L, 2L, 1L, 2L, 3L), .Label = c("Action", 
"Comedy", "Drama"), class = "factor"), Amount = c(10, 30, 20, 
20, 20, 20)), .Names = c("Genre", "Amount"), row.names = c(NA, 
-6L), class = "data.frame")

library(reshape)
library(ggplot2)
mdtf <- melt(dtf)
cdtf <- cast(mdtf, Genre ~ . , sum)
ggplot(cdtf, aes(Genre, `(all)`)) + geom_bar()
Jijib answered 7/3, 2011 at 9:18 Comment(6)
Did you automatically generate your structure() instruction from the example provided in the question ? If yes, I'd be very happy to know how :-)Avernus
No, I entered it by hand, hence applied dput on it.Jijib
But you can use read.clipboard function from psych package. It works like a charm: dtf <- read.clipboard(). Thanks for reminding me 'bout that.Jijib
Ah yes, great. For me selecting the data and read.table("clipboard",header=TRUE) does the trick. Thanks !Avernus
You can also use ?textConnection in conjunction with read.table. There's been some examples of that here on SO, e.g. https://mcmap.net/q/2032347/-r-list-row-nameKnotted
See text_to_table, here: #3936785Lucite

© 2022 - 2024 — McMap. All rights reserved.