r stacked bargraph with colors representing values
Asked Answered
J

3

0

I'm looking to make a stacked barchart with colors representing values from a separate data column as well as add an accurate color bar using just the base graphics in R. There is one other post about this but it is pretty disorganized and in the end doesn't help me answer my question.

# create reproducible data
d <- read.csv(text='Day,Location,Length,Amount
            1,4,3,1.1
            1,3,1,.32
            1,2,3,2.3
            1,1,3,1.1
            2,0,0,0
            3,3,3,1.8
            3,2,1,3.54
            3,1,3,1.1',header=T)

# colors will be based on values in the Amount column
v1 <- d$Amount
# make some colors based on Amount - normalized
z <- v1/max(v1)*1000
colrs <- colorRampPalette(c('lightblue','blue','black'))(1000)[z]

# create a 2d table of the data needed for plotting
tab <- xtabs(Length ~ Location + Day, d)
# create a stacked bar plot
barplot(tab,col=colrs,space=0)

# create a color bar
plotr::color.bar

This for sure produces a color coded stacked bar graph, but the colors do not represent the data accurately.

For Day 1, Locations 4 and 1 should be identical in color. Another example, the first and last entries in the Amount column are identical, but color of the top of the left column doesn't match the bottom of the right column.

Also, I found how to make a color bar on a different post and it uses the plotr::color.bar code, but plotr apparently isn't a package and I'm not sure how to carry on.

How can I get the colors to match the appropriate section and add an accurate color bar?

Jaclin answered 27/5, 2015 at 15:18 Comment(0)
M
0

Based on the comments below:

library(ggplot2)
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount, order = Location), stat = "identity") 
Milkandwater answered 27/5, 2015 at 15:29 Comment(5)
Hi Chris - I want to try to stick with base graphics, but just for clarification, the plot that we can generate from above is accurate minus the fill. x=Day, y=Length, and the the fill would be the amount. I have the Location column to make sure that the rows are stacked correctly in the barchart. Row at which Location=4 at the top, 0 at the bottom.Jaclin
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount), stat = "identity")And after exploring your ggplot solution - the Locations are reversed (4 at the bottom, 0 at the top). Seems to have gotten the color scheme and bar correct though.Jaclin
you can use ggplot(d, aes(x = Day, y = Amount)) + geom_bar(aes(fill = rev(Location)), stat = "identity") to reverse the order of location (or define the column as a factor to set the order manually), but it seems you want to fill by amount anyway. I can't see why you want to avoid ggplot, but the base solution below also works (although like all non-ggplot solutions, I find the entire syntax + manipulation convoluted)Milkandwater
Hi Chris. I am open to other solutions, for sure, but wanted to direct feedback a certain way by requesting base graphics solution. I'm still exploring your solution as well - Yes I need to fill the colors by amount, but have the bar sections stacked from lowest to highest Location. The y-axis is still Length and the x-axis should still be Day. The second line of code you've provided changes the color scheme, but doesn't rearrange the order of the stacking. The visual profided by Pafnucy demonstrates the correct arrangement for the stacking.Jaclin
you need to use the order argument, with either ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount, order = Location), stat = "identity") or ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount, order = rev(Location)), stat = "identity"), or use factors to preset the order of LocationMilkandwater
S
1

I hope the "pretty disorganized" post is not my answer to How to create a time series plot in the style of a horizontal stacked bar plot in r! That's fine, no offense taken.

The solution can be adapted to your data as follows:

## store data
df <- read.csv(text='Day,Location,Length,Amount\n1,4,3,1.1\n1,3,1,.32\n1,2,3,2.3\n1,1,3,1.1\n2,0,0,0\n3,3,3,1.8\n3,2,1,3.54\n3,1,3,1.1',header=T);

## extract bar segment lengths from Length and bar segment colors from a function of Amount, both stored in a logical matrix form
lengths <- xtabs(Length~Location+Day,df);
amounts <- xtabs(Amount~Location+Day,df);
colors <- matrix(colorRampPalette(c('lightblue','blue','black'))(1001)[amounts/max(amounts)*1000+1],nrow(amounts));

## transform lengths into an offset matrix to appease design limitation of barplot(). Note that colors will be flattened perfectly to accord with this offset matrix
lengthsOffset <- as.matrix(setNames(reshape(cbind(id=1:length(lengths),stack(as.data.frame(unclass(lengths)))),dir='w',timevar='ind')[-1],colnames(lengths)));
lengthsOffset[is.na(lengthsOffset)] <- 0;

## draw plot
barplot(lengthsOffset,col=colors,space=0,xlab='Day',ylab='Length');

offset-stacked-barplot


Notes

  • In your question, you tried to build a color vector using colrs <- colorRampPalette(c('lightblue','blue','black'))(1000)[z] with z being the 8 original Amount values converted to "per mille" form. This had a slight flaw, in that one of the z elements was zero, which is not a valid index value. That's why you got 7 colors, when it should have been 8. I fixed this in my code by adding 1 to the per mille values and generating 1001 colors.
  • Also related to generating colors, instead of just generating 8 colors (i.e. one per original Amount value), I generated a complete matrix of colors to parallel the lengths matrix (which you called tab in your code). This color matrix can actually be used directly as the color vector passed to barplot()'s col argument, because internally it is flattened to a vector (at least conceptually) and will correspond with the offset bar segment lengths that we'll pass to barplot() for the height argument (see next note).
  • The linchpin of this solution, as I describe in more detail in my aforementioned post, is creating an "offset matrix" of the bar segment lengths with zeroes in adjacent columns, such that a different color can be assigned to every segment. I create this as lengthsOffset from the lengths matrix.
  • Note that, perhaps somewhat counter-intuitively, lower index values in the height argument are drawn by barplot() as lower segments, and vice-versa, meaning the textual display when you print that data in your terminal is vertically reversed from how it appears in the bar plot. You can vertically reverse the lengthsOffset matrix and the colors vector if you want the opposite order, but I haven't done this in my code.

For reference, here are all the data structures:

df;
##   Day Location Length Amount
## 1   1        4      3   1.10
## 2   1        3      1   0.32
## 3   1        2      3   2.30
## 4   1        1      3   1.10
## 5   2        0      0   0.00
## 6   3        3      3   1.80
## 7   3        2      1   3.54
## 8   3        1      3   1.10
lengths;
##         Day
## Location 1 2 3
##        0 0 0 0
##        1 3 0 3
##        2 3 0 1
##        3 1 0 3
##        4 3 0 0
amounts;
##         Day
## Location    1    2    3
##        0 0.00 0.00 0.00
##        1 1.10 0.00 1.10
##        2 2.30 0.00 3.54
##        3 0.32 0.00 1.80
##        4 1.10 0.00 0.00
colors;
##      [,1]      [,2]      [,3]
## [1,] "#ADD8E6" "#ADD8E6" "#ADD8E6"
## [2,] "#4152F5" "#ADD8E6" "#4152F5"
## [3,] "#0000B3" "#ADD8E6" "#000000"
## [4,] "#8DB1EA" "#ADD8E6" "#0000FA"
## [5,] "#4152F5" "#ADD8E6" "#ADD8E6"
lengthsOffset;
##    1 2 3
## 1  0 0 0
## 2  3 0 0
## 3  3 0 0
## 4  1 0 0
## 5  3 0 0
## 6  0 0 0
## 7  0 0 0
## 8  0 0 0
## 9  0 0 0
## 10 0 0 0
## 11 0 0 0
## 12 0 0 3
## 13 0 0 1
## 14 0 0 3
## 15 0 0 0
Stealing answered 27/5, 2015 at 17:4 Comment(1)
Hi bgoldst, thanks for your input. Your solution does seem to work for the example (as do the others) but I ran into problems when applying it to a more specific data set. I appreciate the in depth answer, other references, and explanations for generating colors. For whatever reason, your solution to this question requires a lot more computing time when compared to the ggplot2 solution offered by Chris. We are also missing the color bar with this solution. Thanks again!Jaclin
M
0

Based on the comments below:

library(ggplot2)
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount, order = Location), stat = "identity") 
Milkandwater answered 27/5, 2015 at 15:29 Comment(5)
Hi Chris - I want to try to stick with base graphics, but just for clarification, the plot that we can generate from above is accurate minus the fill. x=Day, y=Length, and the the fill would be the amount. I have the Location column to make sure that the rows are stacked correctly in the barchart. Row at which Location=4 at the top, 0 at the bottom.Jaclin
ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount), stat = "identity")And after exploring your ggplot solution - the Locations are reversed (4 at the bottom, 0 at the top). Seems to have gotten the color scheme and bar correct though.Jaclin
you can use ggplot(d, aes(x = Day, y = Amount)) + geom_bar(aes(fill = rev(Location)), stat = "identity") to reverse the order of location (or define the column as a factor to set the order manually), but it seems you want to fill by amount anyway. I can't see why you want to avoid ggplot, but the base solution below also works (although like all non-ggplot solutions, I find the entire syntax + manipulation convoluted)Milkandwater
Hi Chris. I am open to other solutions, for sure, but wanted to direct feedback a certain way by requesting base graphics solution. I'm still exploring your solution as well - Yes I need to fill the colors by amount, but have the bar sections stacked from lowest to highest Location. The y-axis is still Length and the x-axis should still be Day. The second line of code you've provided changes the color scheme, but doesn't rearrange the order of the stacking. The visual profided by Pafnucy demonstrates the correct arrangement for the stacking.Jaclin
you need to use the order argument, with either ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount, order = Location), stat = "identity") or ggplot(d, aes(x = Day, y = Length)) + geom_bar(aes(fill = Amount, order = rev(Location)), stat = "identity"), or use factors to preset the order of LocationMilkandwater
T
0

I think this was a mistake with defining colors, barchart needs only 5 colors, as there are 5 locations and one of colors won't be used as location 1 has zero elements everyday.

Fix:

colrs <- colorRampPalette(c('yellow', 'lightblue','blue','black', 'lightblue'))(5)

output after fixing colrs vector

Notice that 'yellow' isn't drawn as there are 0 observations in it's group (in sample data from OP)

Timothy answered 27/5, 2015 at 15:45 Comment(1)
Hi Pafnucy - I think I'm confused with the color definition mistake. I want to assign a color to each Amount based on where it is from 0 to 1000, then in turn assign those colors to each Day by Length bar. In the first post, there are 6 different values for Amount, and the colorRampPalette produces 7 different colors, and they are not assigned correctly. The way you've detailed seems to limit the color scheme.Jaclin

© 2022 - 2024 — McMap. All rights reserved.