How to add custom labels from a dataset on top of bars using ggplot/geom_bar in R?
Asked Answered
L

1

3

I have the attached datasets and use this R code to plot the data:

plotData <- read.csv("plotdata.csv")
ix <- 1:nrow(plotData)
long <- melt(transform(plotData, id = ix), id = "id") # add id col; melt to long form
ggp2 <- ggplot(long, aes(id, value, fill = variable))+geom_bar(stat = "identity", position = "dodge")+
    scale_x_continuous(breaks = ix) +
    labs(y='Throughput (Mbps)',x='Nodes') +
    scale_fill_discrete(name="Legend",
                        labels=c("Inside Firewall (Dest)",
                                 "Inside Firewall (Source)",
                                 "Outside Firewall (Dest)",
                                 "Outside Firewall (Source)")) +
    theme(legend.position="right") +  # The position of the legend
    theme(legend.title = element_text(colour="blue", size=14, face="bold")) + # Title appearance
    theme(legend.text = element_text(colour="blue", size = 12, face = "bold")) # Label appearance
plot(ggp2)

The resulting plot is attached as well.

Now I need to add numbers from different datasets on top of each bar. For example:

  1. on top of "Inside Firewall (Dest)" should be the numbers from sampleNumIFdest.csv
  2. on top of "Inside Firewall (Source)" should be the numbers from sampleNumIFsource.csv
  3. on top of "Outside Firewall (Dest)" should be the numbers from sampleNumOFdest.csv
  4. on top of "Outside Firewall (Source)" should be the numbers from sampleNumOFsource.csv

I have tried to use geom_text() but I do not know how to read the numbers from the different datasets. Please note, that the datasets have different number of rows (which causes additional problems for me). Any suggestion is highly appreciated.

The attached files are here.

Sorry, I had to zip all my files as I am not allowed to add more then 2 URLs in my post.

Luis answered 1/4, 2014 at 5:5 Comment(0)
G
4

I think the best solution is to combine all the datasets into one:

# loading the different datasets
plotData <- read.csv("plotData.csv")
IFdest <- read.table("sampleNumIFdest.csv", sep="\t", header=TRUE, strip.white=TRUE)
IFsource <- read.table("sampleNumIFsource.csv", sep="\t", header=TRUE, strip.white=TRUE)
OFdest <- read.table("sampleNumOFdest.csv", sep="\t", header=TRUE, strip.white=TRUE)
OFsource <- read.table("sampleNumOFsource.csv", sep="\t", header=TRUE, strip.white=TRUE)

# add an id
ix <- 1:nrow(plotData)
plotData$id <- 1:nrow(plotData)
plotData <- plotData[,c(5,1,2,3,4)]

# combine the different dataframe
plotData$IFdest <- c(IFdest$Freq, NA)
plotData$IFsource <- c(IFsource$Freq, NA, NA)
plotData$OFdest <- OFdest$Freq
plotData$OFsource <- c(OFsource$Freq, NA, NA)

# reshape the dataframe
long <- cbind(
  melt(plotData, id = c("id"), measure = c(2:5),
       variable = "type", value.name = "value"),
  melt(plotData, id = c("id"), measure = c(6:9),
       variable = "name", value.name = "numbers")
)
long <- long[,-c(4,5)] # this removes two unneceassary columns

When you have done that, you can use geom_text to plot the numbers on top of the bars:

# create your plot
ggplot(long, aes(x = id, y = value, fill = type)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(aes(label = numbers), vjust=-1, position = position_dodge(0.9), size = 3) +
  scale_x_continuous(breaks = ix) +
  labs(x = "Nodes", y = "Throughput (Mbps)") +
  scale_fill_discrete(name="Legend",
                      labels=c("Inside Firewall (Dest)",
                               "Inside Firewall (Source)",
                               "Outside Firewall (Dest)",
                               "Outside Firewall (Source)")) +
  theme_bw() +
  theme(legend.position="right") +
  theme(legend.title = element_text(colour="blue", size=14, face="bold")) + 
  theme(legend.text = element_text(colour="blue", size=12, face="bold"))

The result: enter image description here

As you can see, the text labels overlap sometimes. You can change that by decreasing the size of the text, but then you run the risk that the labels become hard to read. You might therefore consider to use facets by adding facet_grid(type ~ .) (or facet_wrap(~ type)) to the plotting code:

ggplot(long, aes(x = id, y = value, fill = type)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(aes(label = numbers), vjust=-0.5, position = position_dodge(0.9), size = 3) +
  scale_x_continuous("Nodes", breaks = ix) +
  scale_y_continuous("Throughput (Mbps)", limits = c(0,1000)) +
  scale_fill_discrete(name="Legend",
                      labels=c("Inside Firewall (Dest)",
                               "Inside Firewall (Source)",
                               "Outside Firewall (Dest)",
                               "Outside Firewall (Source)")) +
  theme_bw() +
  theme(legend.position="right") +
  theme(legend.title = element_text(colour="blue", size=14, face="bold")) + 
  theme(legend.text = element_text(colour="blue", size=12, face="bold")) +
  facet_grid(type ~ .)

which results in the following plot:

enter image description here

Grig answered 1/4, 2014 at 8:41 Comment(3)
Thank you for your answer. I personally like the second option as it is much easier to see the presented data. I tried to group them (red+green bars plotted next to each other and blue+purple next to each other) some time ago without any success so I gave up. Do you know if this is possible?Luis
I didn't try it, but I think this should work: create a new variable type2 which has two possible values, one for the 'Inside Firewall' types & one for the 'Outside Firewall' types. Then use facet_grid(type2 ~.) with the last ggplot-example and you should be fine.Grig
Does the new type2 variable have to have just 2 variables? I tried it with 2 variables but it did not work. Maybe the length of type2 should be equal to the length of long?Luis

© 2022 - 2024 — McMap. All rights reserved.