Basic operations in R giving different results on Windows and Linux
Asked Answered
W

4

10

I have been running some code in R and while testing realized the results were different on Windows and Linux. I have tried to understand why this happens, but couldn't find an answer. Let's illustrate it with an example:

These are some hard-coded values for reproducibility, always starting from a clean environment. I have checked that the bit representation of these values is exactly the same in both the Windows and the Linux machines:

data <- structure(list(x = c(0.1, 0.1, 0.1, 5, 5, 5, 10, 10, 10, 20, 20, 20), 
                       y = c(0.013624804, 0.014023006, 0.013169554, 0.70540352, 
                             0.68711807, 0.69233506, 1.4235181, 1.348244, 1.4141854, 2.779813, 
                             2.7567347, 2.7436437)), class = c("data.frame"), row.names = c(NA, 12L))
val <- c(43.3065849160736, 0.00134925463859564, 1.03218302435548, 270.328323775978)
theta <- 1.60812569803848
init <- c(b0 = 2.76836653333333, b1 = 0.0134350095, b2 = 2.15105945932773, 
          b3 = 6.85922519794374)

Now I define a new variable W which is again exactly the same in bit representation in Windows and Linux:

f <- function(X, b0, b1, b2, b3) {
  b0 + (b1 - b0) / (1 + exp(b2*(log(X) - log(b3))))
}

W <- 1 / f(data$x, val[1], val[2], val[3], val[4])^theta

And finally I apply an optim function:

SSw <- function(Y, X, b0, b1, b2, b3, w) {
  sum(w * (Y - f(X, b0, b1, b2, b3))^2)
}

SSw.vec <- function(par) SSw(data$y, data$x, par[1], par[2], par[3], par[4], W)

mod <- optim(init, SSw.vec, method = "L-BFGS-B", lower = c(-Inf,-Inf,-Inf,0))

print(mod$par)
# In Windows it returns:
#          b0           b1           b2           b3 
# 3.097283e+01 1.831543e-03 1.047613e+00 1.842448e+02
# In Linux it returns: 
#           b0           b1           b2           b3 
# 3.459241e+01 1.530134e-03 1.040363e+00 2.101996e+02 

As you can see the differences are quite significative, but even if they weren't... just why are there any differences?

Any help will be really appreciated!

Edit

Here I add the sessionInfo() on both Windows and Linux.

On Windows:

R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8.3        plyr_1.8.6          cellranger_1.1.0    compiler_3.6.3      pillar_1.7.0        nloptr_1.2.2.2      tools_3.6.3        
 [8] bit_4.0.4           boot_1.3-24         lme4_1.1-29         lifecycle_1.0.0     tibble_3.1.7        nlme_3.1-144        gtable_0.3.0       
[15] lattice_0.20-38     pkgconfig_2.0.3     rlang_1.0.2         Matrix_1.2-18       cli_3.4.1           rstudioapi_0.11     dplyr_1.0.6        
[22] generics_0.1.0      vctrs_0.3.8         lmerTest_3.1-3      grid_3.6.3          tidyselect_1.1.1    glue_1.4.2          R6_2.4.1           
[29] fansi_0.4.1         readxl_1.3.1        minqa_1.2.4         ggplot2_3.3.6       purrr_0.3.5         magrittr_1.5        scales_1.1.1       
[36] ellipsis_0.3.2      MASS_7.3-51.5       splines_3.6.3       colorspace_1.4-1    numDeriv_2016.8-1.1 renv_0.13.2         utf8_1.1.4         
[43] munsell_0.5.0       crayon_1.3.4       

On Linux:

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /opt/r/lib/R/lib/libRblas.so
LAPACK: /opt/r/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C           LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.6.3 tools_3.6.3    renv_0.13.2   

Wallasey answered 13/2, 2023 at 0:52 Comment(4)
I cannot reproduce the difference, I get your windows results in linux (R-4.2.2) and win11 (R-4.2.2).Fishmonger
Probably the two systems use different BLAS libraries. Unfortunately Windows doesn't report BLAS version in sixth line as windows does. Anyway I would give it a shot to update/re-install BLAS on your Debian system, since I think your Linux results are flawed (get same results as @r2evans). But first be sure to update R and probably also RStudio as well on both systems, you're using pretty outdated versions (on Linux I recommend to use a RStudio daily).Rabblerousing
I have tried with different BLAS libraries in Debian as indicated here, but I do always get the same results, so I am not that completely sure it is due to the differences in the BLAS libraries... Also I cannot update R due to version controlling (and don't use RStudio since run things from the terminal),Wallasey
Please update sessionInfo() in your question (also maybe that of Windows from a fresh session w/o extra namespaces).Rabblerousing
M
16

I ran your code on my local machine with Fedora 37 and R 4.2.2. As the other commenters, I also got the result you got on Windows. Then I pulled the rocker image for R version 3.6.3:

docker run -ti rocker/r-ver:3.6.3 R

This image is also Debian-based. Here I was able to recreate the result you got on your system.

Then I moved to the rocker versioned release 4.0.0:

docker run -ti rocker/r-ver:4.0.0 R

Here the result was the same as you got on Windows and everyone else got on their machine. It must be noted that with R 4.0.0 the rocker project moved from Debian-based images to Ubuntu LTS.

Fedora comes with the ability to easily switch the BLAS/LAPACK backend via the flexiblas package. Thanks to that, I was able to test your code with the eight different backends available on my system. As you can see below, they do yield different results. In particular, the ATLAS backend comes somewhat close to the result you got. In contrast, OPENBLAS-OPENMP (the default on Fedora), other OPENBLAS variants, and NETLIB all produce the same result as you received on Windows. A third family BLIS produces yet another set of possible results.

Is one of the results better than the others? Yes! optim() looks for a result that minimizes the supplied function. In its returned list, it reports not just the minimizing parameters, but also the value for them. I've included that in the table below. So the ATLAS backend wins the prize here. It must be said that optim() does NOT minimize analytically. So it always gives approximate results. That is why initial values and the method matter for what results we get. And apparently, with your function the backend also matters. And if you look at the parameters you got on Buster the function goes to 0.002800321. So it is actually a better result than what we all get on our more modern systems, except for the result I got with ATLAS. That also happens to be much slower than the other backends. So it seems, the newer backends might have traded speed for accuracy.

If your aim is consistency across platforms, you can upgrade your system to Debian 11 Bullseye, since that appears to have a backend producing the same results as other modern platforms, as the answer by @jay.sf indicates. You could also investigate if you can find the same BLAS backend version used on Buster for Windows.

Furthermore, you can try to change to another blas library on your current system. Here is a guide how to do that. Though it is for Ubuntu, as both use apt, it should work for your system as well. (Edit: I tried that in a VM for Buster. None of the available BLAS backends produced the same result as on the more modern systems)

Finally, if you feel you must have a newer BLAS library on your older system, then you could try to backport it yourself. I have no experience with this. I don't know how advisable it is or how likely to succeed. I am just mentioning it for completeness.

library(flexiblas)
library(tidyverse)

test_fun <- function(i) {
  flexiblas_switch(i)
  data <- structure(list(x = c(0.1, 0.1, 0.1, 5, 5, 5, 10, 10, 10, 20, 20, 20),
                         y = c(0.013624804, 0.014023006, 0.013169554, 0.70540352,
                               0.68711807, 0.69233506, 1.4235181, 1.348244, 1.4141854, 2.779813,
                               2.7567347, 2.7436437)), class = c("data.frame"), row.names = c(NA, 12L))
  val <- c(43.3065849160736, 0.00134925463859564, 1.03218302435548, 270.328323775978)
  theta <- 1.60812569803848
  init <- c(b0 = 2.76836653333333, b1 = 0.0134350095, b2 = 2.15105945932773,
            b3 = 6.85922519794374)
  f <- function(X, b0, b1, b2, b3) {
    b0 + (b1 - b0) / (1 + exp(b2*(log(X) - log(b3))))
  }
  W <- 1 / f(data$x, val[1], val[2], val[3], val[4])^theta
  SSw <- function(Y, X, b0, b1, b2, b3, w) {
    sum(w * (Y - f(X, b0, b1, b2, b3))^2)
  }
  SSw.vec <- function(par) SSw(data$y, data$x, par[1], par[2], par[3], par[4], W)
  mod <- optim(init, SSw.vec, method = "L-BFGS-B", lower = c(-Inf,-Inf,-Inf,0))
  return(c(mod$par, value = mod$value))
}
flexiblas_list() |>
  setdiff("__FALLBACK__") |>
  tibble(backend = _) |>
  mutate(
    idx = flexiblas_load_backend(backend),
    res = map(idx, test_fun)
  ) |>
  unnest_wider(res)
#> # A tibble: 8 × 7
#>   backend            idx    b0      b1    b2    b3   value
#>   <chr>            <int> <dbl>   <dbl> <dbl> <dbl>   <dbl>
#> 1 NETLIB               2  31.0 0.00183  1.05  184. 0.00282
#> 2 OPENBLAS-OPENMP      1  31.0 0.00183  1.05  184. 0.00282
#> 3 ATLAS                3  34.7 0.00168  1.04  209. 0.00280
#> 4 BLIS-SERIAL          4  27.1 0.00225  1.06  158. 0.00285
#> 5 BLIS-OPENMP          5  27.1 0.00225  1.06  158. 0.00285
#> 6 BLIS-THREADS         6  27.1 0.00225  1.06  158. 0.00285
#> 7 OPENBLAS-SERIAL      7  31.0 0.00183  1.05  184. 0.00282
#> 8 OPENBLAS-THREADS     8  31.0 0.00183  1.05  184. 0.00282
Mainstream answered 15/2, 2023 at 11:14 Comment(0)
R
6

I now can confirm your issue. I installed Debian Buster on a VM, did apt install r-base and got R3.5.2, ran your code, and it showed the same (probably) "flawed" Linux results from OP.

However, then I updated to R.4.2.2 but the "flawed" results didn't change! It used libblas3.8. On my real machine I'm running Ubuntu, R4.2.2, libblas3.10 and get the (probably) "right" Windows results.

Unfortunately I was not able to install libblas3.10 on Debian Buster and I am not sure if that's possible at all (see it's not listed under Buster in the link). Notice, that Debian Bullseye is now the actual version and your system is actually outdated.

The outdated BLAS/LAPACK may ― as noted in my comment — still be considered to produce the erroneous results, since these are the actual algebraic engines. You may be able to install an updated BLAS/LAPACK on Debian Buster, but I tend to recommend you upgrade to Debian Bullseye.

Rabblerousing answered 15/2, 2023 at 11:39 Comment(0)
O
5

This is a little bit tangential, but probably the easiest thing you can do to alleviate the between-platform differences is to use

control = list(parscale = abs(init))

in your optim() call.

The reason for this is that unless a gradient function is specified, L-BFGS-B automatically uses finite differences with a fixed stepsize (ndeps, defaulting to 1e-3 for all parameters) to approximate the gradient. This is usually good enough but can cause problems for hard/unstable optimization problems, or when the parameters are on very different scales. parscale as specified above tells optim how to scale the parameters internally, generally improving the results.

It might be even better to pass an analytic (or auto-differentiated) gradient, but that's more work ...

Obscurant answered 17/2, 2023 at 14:52 Comment(3)
With control = list(parscale = abs(init)) I get different results than before, but still substantially different across backends on Fedora and different on BusterMainstream
Hmmm, interesting. The next things I would do would be (1) define an auto-diff gradient and (2) inspect the goodness-of-fit surface for flatness/multiple minima ...Obscurant
I mean what we ultimately have here is data with an extremely linear relationship (R^2 = 0.9996). So what we are seeing is that if you stuff a function full of parameters, there are multiple parameter combinations that approximate a straight line similarly wellMainstream
R
4

This is also a little bit tangential but when I run your code on my system I get in the output of optim convergence=1, what indicates that the iteration limit ‘maxit’ had been reached. 0 indicates successful completion so maxit should be inceased.

mod <- optim(init, SSw.vec, method = "L-BFGS-B", lower = c(-Inf,-Inf,-Inf,0))
mod$convergence
#[1] 1
mod$message
#[1] "NEW_X"

mod <- optim(init, SSw.vec, method = "L-BFGS-B", lower = c(-Inf,-Inf,-Inf,0),
             control=list(maxit=1e5))
mod$convergence
#[1] 0
mod$message
#[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

mod$par
#          b0           b1           b2           b3 
#4.336406e+01 1.154007e-03 1.031650e+00 2.710800e+02 

mod$value
#[1] 0.002779518

Maybe this helps to get similar results on different system constellations.

Recapture answered 20/2, 2023 at 11:16 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.