Drawing a contour line around connected cells in a heatmap in R
Asked Answered
P

2

6

I have data with two time axes and measurements for each cell. From this I create a heatmap. I also know for each cell whether the measurement is significant.

My problem is to draw a contour line around all cells that are significant. If cells form clusters with the same significance value, I need to draw the contour around the cluster and not around each individual cell.

The data are in this format:

   x_time y_time    metric signif
1       1      1 0.3422285  FALSE
2       2      1 0.6114085  FALSE
3       3      1 0.5381621  FALSE
4       4      1 0.5175120  FALSE
5       1      2 0.6997991  FALSE
6       2      2 0.3054885  FALSE
7       3      2 0.8353888   TRUE
8       4      2 0.3991566   TRUE
9       1      3 0.7522728   TRUE
10      2      3 0.5311418   TRUE
11      3      3 0.4972816   TRUE
12      4      3 0.4330033   TRUE
13      1      4 0.5157972   TRUE
14      2      4 0.6324151   TRUE
15      3      4 0.4734126   TRUE
16      4      4 0.4315119   TRUE

The code below generates this data, where the measurements are random (dt$metrics) and the significance is logical (dt$signif).

# data example
dt <- data.frame(x_time=rep(seq(1, 4), 4), 
                 y_time=rep(seq(1, 4), each=4),
                 metric=(rnorm(16, 0.5, 0.2)),
                 signif=c(rep(FALSE, 6), rep(TRUE, 10)))

The heatmap alone can be generated using ggplot2's geom_tile

# Generate heatmap using ggplot2's geom_tile
library(ggplot2)
p <- ggplot(data = dt, aes(x = x_time, y = y_time))
p <- p + geom_tile(aes(fill = metric))

Based on this question, I managed to draw contours with different colors around each cell according to the significance value.

# Heatmap with lines around each significant cell
p <- ggplot(data = dt, aes(x = x_time, y = y_time))
p <- p + geom_tile(aes(fill = metric, color = signif), size = 2)
p <- p + scale_color_manual(values = c("black", "white"))

This Figure displays the result of this approach.

However, this approach does not group adjacent significant cells together by drawing a contour around the entire group (as is also discussed in the question I linked to).

As this question shows, it is possible to draw boxes around specified areas, but I do not think this can be extended to all possible clusters of cells.

Profound answered 20/3, 2019 at 10:51 Comment(1)
It's rather straightforward with raster::clump. See e.g. How to get contour lines around the grids in R-raster?Anceline
A
4

This answer is based on How to get contour lines around the grids in R-raster?.

library(data.table)
library(raster)

Also note that clump requires that igraph package is installed, and dissolve = TRUE in rasterToPolygons requires rgeos.

# convert data.frame to data.table
# not strictly necessary, but enables use of convenient functions: dcast and rbindlist.
setDT(d)

# reshape to wide 
d2 <- dcast(d, y ~ x, value.var = "sig")

# reverse order of rows to match raster order
# remove first column
# convert to matrix and then to raster
r <- raster(as.matrix(d2[ , .SD[.N:1, -1]]),
            xmn = 0, xmx = ncol(d2) - 1, ymn = 0, ymx = ncol(d2) - 1)

# detect clumps of connected cells of the value TRUE
# convert raster to polygons
# dissolve polygons into multi-polygons
polys <- rasterToPolygons(clump(r), dissolve = TRUE)

# grab coordinates of individual polygons and convert to a data.table
# use idcol = TRUE to enable grouping of paths when plotting
d_poly <- rbindlist(lapply(polys@polygons,
                           function(x) as.data.table(x@Polygons[[1]]@coords)),
                    idcol = TRUE)

# plot an outline around each 'patch of significant values' using geom_path 
ggplot(d, aes(x = x, y = y)) +
  geom_tile(aes(fill = z)) +
  geom_path(data = d_poly, aes(x = x + 0.5, y = y + 0.5, group = .id),
            size = 2, color = "red")

enter image description here


Data:

d <- structure(list(x = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
                          3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L),
                    y = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
                          1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L),
                    sig = c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE,
                            TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE),
                    z = c(0.96, 0.76, 0.14, 0.93, 0.39, 0.06, 0.99, 0.77,
                          0.7, 0.72, 0.08, 0.94, 0.98,  0.83, 0.12, 0.42)),
               row.names = c(NA, -16L), class = "data.frame")
Anceline answered 22/3, 2019 at 19:25 Comment(2)
This is the solution that worked for me and that I believe to scale up to large matrices best. I want to note that it is possible that the x passed to the function in the lapply statement can be another list of polygons. I rewrote this as a for-loop and checked each element in order to collect the complete set of coordinates.Profound
Thanks for your feedback. Great to hear that it worked. Yes, I think the structure of the SpatialPolygonsDataFrame et al. object could be a bit messy to dig around in. Originally I was hoping to make an sf solution, where the objects are a bit cleaner, but failed. Cheers.Anceline
S
2

Surely this would be a bit tedious if you were to create lots of heatmaps (even though it's probably possible to create data frames with the necessary values from your data), but otherwise you can play with geom_segments:

p + geom_segment(aes(x = .5, xend = 4.5, y = 4.5, yend = 4.5), colour = "white", size = 2) +
  geom_segment(aes(x = .5, xend = 2.5, y = 2.5, yend = 2.5), colour = "white", size = 2) +
  geom_segment(aes(x = 2.5, xend = 4.5, y = 1.5, yend = 1.5), colour = "white", size = 2) +
  geom_segment(aes(x = .5, xend = .5, y = 2.5, yend = 4.5), colour = "white", size = 2) +
  geom_segment(aes(x = 2.5, xend = 2.5, y = 1.5, yend = 2.5), colour = "white", size = 2) +
  geom_segment(aes(x = 4.5, xend = 4.5, y = 1.5, yend = 4.5), colour = "white", size = 2)

enter image description here

Subsequent answered 20/3, 2019 at 12:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.