Binning data, finding results by group, and plotting using R
Asked Answered
L

4

4

The pre-installed quakes dataset has 5 variables and 1000 observations.

The simple graph I'm trying to create should show the average earthquake magnitude by earthquake depth category (i.e. Y-axis = Magnitude , X-axis = Depth Categories).

In this dataset, the earthquake depth variables range from 40 to 680. I would like to turn the 1000 observations of earthquake depth into 8 categories, e.g. 40 - 120, 121 - 200, ... 600 - 680. Then, I'd like to take the average earthquake magnitude by depth category and plot it on a line chart.

I appreciate any help with this. Thanks!

Literalminded answered 1/3, 2011 at 22:13 Comment(2)
I might suggest you modify the question title to be more general in nature. This question is really about binning data, finding results by group, and plotting. The quakes dataset just happens to be useful to illustrate those concepts. Others that have similar questions will appreciate being able to find the question as well.Urn
this title is much better, it was unclear whether the q is really about "quakes" or just about the kinds of summaries and plots you can do, as pointed out by others a line plot is really not appropriate (but the actual kind of data may not be important)Arlyne
A
9

First classify into depth classes with cut:

depth.class <- cut(quakes$depth, c(40, 120, 200, 300, 400, 500, 600, 680), include.lowest = TRUE)

(Note that your class definitions may need to vary for exactly what you are after and given the details of cut()'s behaviour).

Find the mean magnitude within each depth.class (assumes no NAs):

mean.mag <- tapply(quake$mag, depth.class, mean)

(Add na.rm e.g. mean.mag <- tapply(quake$mag, depth.class, mean, na.rm = TRUE) for data sets with missing values where appropriate).

Plot as a line:

plot(mean.mag, type = "l", xlab = "magnitude class")

It's a little extra work to put the class labels on the X-axis, but at that point you might question if a line plot is really appropriate here.

A quick stab, turn off the axes and then put up the classes directly from the cut factor:

plot(mean.mag, type = "l", xlab = "magnitude class", axes = FALSE)
axis(1, 1:nlevels(depth.class), levels(depth.class))
axis(2)
box()
Arlyne answered 1/3, 2011 at 22:29 Comment(3)
Thanks, mdsumner. What if there were NAs?Literalminded
I added an edit for that a few minutes ago - basically you pass arguments to FUN in tapply: tapply(x, fac, mean, na.rm = TRUE)Arlyne
I am using your solution in some code and trying to make a slight modification to the way the data is graphed. This is causing me headaches. Care to take a look? https://mcmap.net/q/271349/-r-graphing-binned-dataOffhand
O
4

A line plot is not useful here; what relationship do the lines between the points represent in the data? Perhaps a dotchart might be useful instead?

cats <- with(quakes, cut(depth, breaks = seq(40L, max(depth), by = 80), 
                         include.lowest = TRUE))
dat <- aggregate(mag ~ cats, data = quakes, FUN = mean)
with(dat, dotchart(mag, group = cats, xlab = "Mean Magnitude"))

Which produces:

enter image description here

Olga answered 1/3, 2011 at 22:40 Comment(0)
U
3

Are you sure that you want a line plot here? I'm not sure that is the most appropriate plot to use here. Regardless, the trick here is to use cut to bin the data appropriately, and then use one of the many aggregation tools to find the average magnitude by those groups. Finally, we'll plot those aggregated values. I like the tools in ggplot2 and plyr for tasks like this:

library(ggplot2)
df <- quakes
df$bins <- with(df, cut(depth, breaks = c(0,40, 120, 200, 280, 360, 440, 520, 600, 680)))
df.plot <- ddply(df, .(bins), summarise, avg.mag = mean(mag))
qplot(bins, avg.mag, data = df.plot)

#If you want a line plot, here's one approach:
qplot(as.numeric(bins), avg.mag, data = df.plot, geom = "line") + 
xlim(levels(df.plot$bins))
Urn answered 1/3, 2011 at 22:46 Comment(1)
This is an interesting example of creating bin size and averaging the values. Can we have bin averaged by categories in mag? If mag is to be further divided in three categories as low, high and average. As every bin will have three values (low, min and high count for mag). I have a large data set 1000s of values in bins and want to categorize see it over time. ThanksSideburns
R
2

I agree that you likely don't want a line plot but rather a dotplot() or a box chart of some kind.

You can easily do this using shingles from the lattice package:

library(lattice)
x <- runif(100)
y <- runif(100)
bwplot(~x|equal.count(y))

Substituting shingle() for equal.count() will let you specify the intervals instead of allowing R to choose them for you.

box plots with shingles

Roeder answered 2/3, 2011 at 0:48 Comment(2)
I think it is more difficult to visualise the means as a function of depth if you shingle in this case. Having the boxplots all in a single panel would aid that. Shingling is more often used to condition on a third variable. Nice use of lattice for a change - ggplot2 is quite often the graphics package of choice here after base graphics.Olga
I agree. In retrospect, I should have made the plots vertical not horizontal and tried to coerce the lattice all into one row. The real advantage of lattice here would be if he/she cares about relationships within a group, in which case xyplot() becomes easy to change to.Roeder

© 2022 - 2024 — McMap. All rights reserved.