Riverplot package in R - output plot covered in gridlines or outlines
Asked Answered
S

2

11

I've made a Sankey diagram in R Riverplot (v0.5), the output looks OK small in RStudio, but when exported or zoomed in it the colours have dark outlines or gridlines.

The Riverplot image linked here shows the problem

I think it may be because the outlines of the shapes are not matching the transparency I want to use for the fill?

I possibly need to find a way to get rid of outlines altogether (rather than make them semi-transparent), as I think they're also the reason why flows with a value of zero still show up as thin lines.

my code is here:

#loading packages
library(readr)
library("riverplot", lib.loc="C:/Program Files/R/R-3.3.2/library")
library(RColorBrewer)

#loaing data
Cambs_flows <- read_csv("~/RProjects/Cambs_flows4.csv")

#defining the edges
edges = rep(Cambs_flows, col.names = c("N1","N2","Value"))
edges    <- data.frame(edges)
edges$ID <- 1:25

#defining the nodes
nodes <- data.frame(ID = c("Cambridge","S Cambs","Rest of E","Rest of UK","Abroad","to Cambridge","to S Cambs","to Rest of E","to Rest of UK","to Abroad"))
nodes$x = c(1,1,1,1,1,2,2,2,2,2)
nodes$y = c(1,2,3,4,5,1,2,3,4,5)

#picking colours
palette = paste0(brewer.pal(5, "Set1"), "90")

#plot styles
styles = lapply(nodes$y, function(n) {
  list(col = palette[n], lty = 0, textcol = "black")
})

#matching nodes to names
names(styles) = nodes$ID

#defining the river
r <- makeRiver( nodes, edges,
                node_labels = c("Cambridge","S Cambs","Rest of E","Rest of UK","Abroad","to Cambridge","to S Cambs","to Rest of E","to Rest of UK","to Abroad"),
                node_styles = styles)

#Plotting
plot( r, plot_area = 0.9)

And my data is here

dput(Cambs_flows)
structure(list(N1 = c("Cambridge", "Cambridge", "Cambridge", 
"Cambridge", "Cambridge", "S Cambs", "S Cambs", "S Cambs", "S Cambs", 
"S Cambs", "Rest of E", "Rest of E", "Rest of E", "Rest of E", 
"Rest of E", "Rest of UK", "Rest of UK", "Rest of UK", "Rest of UK", 
"Rest of UK", "Abroad", "Abroad", "Abroad", "Abroad", "Abroad"
), N2 = c("to Cambridge", "to S Cambs", "to Rest of E", "to Rest of UK", 
"to Abroad", "to Cambridge", "to S Cambs", "to Rest of E", "to Rest of UK", 
"to Abroad", "to Cambridge", "to S Cambs", "to Rest of E", "to Rest of UK", 
"to Abroad", "to Cambridge", "to S Cambs", "to Rest of E", "to Rest of UK", 
"to Abroad", "to Cambridge", "to S Cambs", "to Rest of E", "to Rest of UK", 
"to Abroad"), Value = c(0L, 1616L, 2779L, 13500L, 5670L, 2593L, 
0L, 2975L, 4742L, 1641L, 2555L, 3433L, 0L, 0L, 0L, 6981L, 3802L, 
0L, 0L, 0L, 5670L, 1641L, 0L, 0L, 0L)), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -25L), .Names = c("N1", "N2", 
"Value"), spec = structure(list(cols = structure(list(N1 = structure(list(), class = c("collector_character", 
"collector")), N2 = structure(list(), class = c("collector_character", 
"collector")), Value = structure(list(), class = c("collector_integer", 
"collector"))), .Names = c("N1", "N2", "Value")), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
Seymore answered 11/12, 2016 at 17:7 Comment(3)
have you been able to solve that?Sunwise
No. Oddly if you run it twice, the second time the plot looks OK, although export it and the lines come back. I've just been doing a screen grab instead which isn't ideal.Seymore
OK. Perhaps a small bounty will help to get some attention?Sunwise
S
13

The culprit is a line in riverplot::curveseg. We can hack this function to fix it, or there is also a very simple workaround that does not require hacking the function. In fact, the simple solution is probably preferably in many cases, but first I explain how to hack the function, so we understand why the workaround also works. Scroll to the end of this answer if you only want the simple solution:

UPDATE: The change suggested below has now been implemented in riverplot version 0.6

To edit the function, you can use

trace(curveseg, edit=T)

Then find the line near the end of the function that reads

polygon(c(xx[i], xx[i + 1], xx[i + 1], xx[i]), c(yy[i], 
      yy[i + 1], yy[i + 1] + w, yy[i] + w), col = grad[i], 
      border = grad[i])

We can see here that the package authors chose not to pass the lty parameter to polygon (UPDATE: see this answer for an explanation of why the package author did it this way). Change this line by adding lty = 0 (or, if you prefer, border = NA) and it works as intended for OPs case. (But note that this may not work well if you wish to render a pdf - see here)

polygon(c(xx[i], xx[i + 1], xx[i + 1], xx[i]), c(yy[i], 
      yy[i + 1], yy[i + 1] + w, yy[i] + w), col = grad[i], 
      border = grad[i], lty=0)

enter image description here

As a side note, this also explains the somewhat odd reported behaviour in the comments that "if you run it twice, the second time the plot looks OK, although export it and the lines come back". When lty is not specified in a call to polygon, the default value it uses is lty = par("lty"). Initially, the default par("lty") is a solid line, but after running the riverplot function once, par("lty") gets set to 0 during a call to riverplot:::draw.nodes thus, suppressing the lines when riverplot is run a 2nd time. But if you then try to export the image, opening a new device resets par("lty") to its default value.

An alternative way to update the function with this edit is to use assignInNamespace to overwrite the package function with your own version. Like this:

curveseg.new = function (x0, x1, y0, y1, width = 1, nsteps = 50, col = "#ffcc0066", 
          grad = NULL, lty = 1, form = c("sin", "line")) 
{
  w <- width
  if (!is.null(grad)) {
    grad <- colorRampPaletteAlpha(grad)(nsteps)
  }
  else {
    grad <- rep(col, nsteps)
  }
  form <- match.arg(form, c("sin", "line"))
  if (form == "sin") {
    xx <- seq(-pi/2, pi/2, length.out = nsteps)
    yy <- y0 + (y1 - y0) * (sin(xx) + 1)/2
    xx <- seq(x0, x1, length.out = nsteps)
  }
  if (form == "line") {
    xx <- seq(x0, x1, length.out = nsteps)
    yy <- seq(y0, y1, length.out = nsteps)
  }
  for (i in 1:(nsteps - 1)) {
    polygon(c(xx[i], xx[i + 1], xx[i + 1], xx[i]), 
            c(yy[i], yy[i + 1], yy[i + 1] + w, yy[i] + w), 
            col = grad[i], border = grad[i], lty=0)
    lines(c(xx[i], xx[i + 1]), c(yy[i], yy[i + 1]), lty = lty)
    lines(c(xx[i], xx[i + 1]), c(yy[i] + w, yy[i + 1] + w), lty = lty)
  }
}

assignInNamespace('curveseg', curveseg.new, 'riverplot', pos = -1, envir = as.environment(pos))

Now for the simple solution, which does not require changes to the function:

Just add the line par(lty=0) before you plot!!!

Subgenus answered 16/2, 2017 at 20:11 Comment(4)
Unfortunately, that was actually the original code in riverplot (you can still see the outcommented line in the package sources). However, the generated PDFs then have thin, white lines between them. No idea why, but it makes the graphics look awful. I have described this problem in a separate answer.Coinage
Also: the lty parameter should not be passed on directly to "polygon", because it is used to draw the border around the whole curve, not just a segment. That is why in the original code the lty=0. Therefore the alternative code above is incorrect.Coinage
@Coinage Tx for details (Ive changed to include a link to your answer which provides the explanation). Nonetheless, the code above is not incorrect: It is a specific solution to a specific request, which it does solve. I didn't mean to suggest that it's a general solution for all cases. In fact I agree that par(lty=0) is preferable. But including the function hack first is useful to show why this workaround is needed.Subgenus
OP here. par(lty=0) works well for the plot, thank you! I am also getting white lines though when exporting, but it's now OK when using Zoom (in RStudio) to view a bigger plot. So I can now do a much better screengrab of a correct plot, which is a workable interim solution for me. Well done.Seymore
C
12

Here is the author of the package. I am now struggling for a satisfactory solution to be included in the next version of the package.

The problem is with how R renders PDFs as compared to bitmaps. In the original version of the package, indeed I passed on lty=0 to polygon() (you can still see it in the commented source code). However, polygon w/o borders looks good only on the png graphics. In the pdf output, thin white lines appear between the polygons. Take a look:

cc <- "#E41A1C90"
plot.new()
rect(0.2, 0.2, 0.4, 0.4, col=cc, border=NA)
rect(0.4, 0.2, 0.6, 0.4, col=cc, border=NA)
dev.copy2pdf(file="riverplot.pdf")

In X or on png, the output is correct. However, if rendered as PDF, you will see a thin white line between the recangles:

enter image description here

When you render a riverplot graphics as PDF like the one above, this looks really bad:

enter image description here

I therefore forced adding borders, however forgot about checking transparency. When no transparency is used, this looks OK -- the borders overlap with the polygons as well as which each other, but you cannot see it. The PDF is now acceptable. However, it messes up the figure if you have transparency.

EDIT:

I have now uploaded version 0.6 of riverplot to CRAN. Besides some new stuff (you can now add riverplot to any part of an existing drawing), by default it uses lty=0 again. However, there is now an option called "fix.pdf" which you can set to TRUE in order to draw the borders around the segments again.

Bottom line, and solutions for now:

  1. Use riverplot 0.6`
  2. If you want to render a PDF, don't use transparency and use fix.pdf=TRUE
  3. If you want to use both transparency and PDF, help me solving the issue.
Coinage answered 17/2, 2017 at 9:36 Comment(2)
Many thanks for the effort January in updating the package! Appreciated with bounty. Although I'd love to split it in half with dww for his initial hack!Sunwise
Thanks! I have invested the bounty + 50 additional points to get an answer to the actual problem described here: #42387566Coinage

© 2022 - 2024 — McMap. All rights reserved.