Benchmarking a simple hydrology module in R, Python, and Julia

Asked 16/10, 2023 at 12:27 Answered 17/10, 2023 at 13:31

I started working recently with a team whose members variably prefer Python, R, and Julia (in order of popularity). We are translating a set of hydrology modules built by various team members into a common library and I want to do some basic benchmarking before coming to a consensus on which language we should choose for this purpose. The problem is that I'm pretty bad at Python and new to Julia, and I believe I may be biasing results due to poor coding practices.

The results for these modules (code below) are: Python (93 ms), R (55 ms), and Julia (0.5 ms). The code can't be vectorized (to my understanding), so I assumed Julia would be fastest but not by 100x, and I also assumed that Python would be faster than R.

Can anyone point out some basic errors in efficiency?

Python:

import numpy as np
def SM_routine_f(infiltration=None, PET=None, rcScl=0.1, rcExp=1.3, PETexp=2, Zr=1000, n=0.5, smhp=0.00, smwp=0.10, smfc=0.25, s0=0.5):
    nt = len(infiltration)
    SM_store = np.empty(nt+1)
    SM_store[0] = s0
    smhp_stor = Zr * smhp
    smwp_stor = Zr * smwp
    smfc_stor = Zr * smfc
    max_stor = Zr * n
    for i in range(nt):
        thisSMstor = SM_store[i]
        AET = np.where(thisSMstor > smhp_stor, (thisSMstor - smhp_stor) * PET[i] * (thisSMstor / max_stor) ** PETexp / max_stor, 0)
        thisSMstor -= AET
        deepDrainage = np.where(thisSMstor > smfc_stor, rcScl * thisSMstor * (thisSMstor / smfc_stor) ** rcExp - (thisSMstor - smfc_stor), 0)
        SM_store[i+1] = min(max_stor, thisSMstor + infiltration[i] - deepDrainage)
    return SM_store / n
exInfil = np.random.normal(loc=1, scale=1, size=365*20)
exPET = np.random.normal(loc=2, scale=1, size=365*20)
import timeit
timeit.timeit(lambda: SM_routine_f(infiltration=exInfil, PET=exPET), number=1)

SM_routine_f = function(infiltration = NA, PET = NA, 
    rcScl = 0.1,    # water retention curve scalar
    rcExp = 1.3,    # water retention curve exponent
    PETexp = 2,     # exponent of PET decay
    Zr = 1000,      # root depth
    n = 0.5,        # soil porosity
    smhp = 0.00,    # soil moisture at hydroscopic point
    smwp = 0.10,    # soil moisture at wilting point
    smfc = 0.25,    # soil moisture field capacity
    s0 = .5)    # initial soil moisture 
    {
    nt = length(infiltration)
    SM_store = c(s0, rep(NA, nt))
    smhp_stor = Zr * smhp
    smwp_stor = Zr * smwp
    smfc_stor = Zr * smfc
    max_stor = Zr * n
    
    for(i in 1:length(infiltration))    {
        thisSMstor = SM_store[i]
        AET = ifelse(thisSMstor > smhp_stor, 
            min(thisSMstor - smhp_stor, PET[i] * (thisSMstor / max_stor) ^ PETexp),
            0)
        thisSMstor = thisSMstor - AET
        
        deepDrainage = ifelse(thisSMstor > smfc_stor,
            min(thisSMstor - smfc_stor, rcScl * thisSMstor * (thisSMstor / smfc_stor)^rcExp),
            0)
        
        SM_store[i+1] = min(max_stor, thisSMstor - min(c(thisSMstor, deepDrainage)) + infiltration[i]) 
        }
    return(SM_store / n)
    }

    # generating dummy data for testing, e.g. 20 years of climate data
exInfil = rnorm(365*20, mean=1, sd=1)
exPET = rnorm(365*20, mean=2, sd=1)

    # benchmarking
microbenchmark::microbenchmark(SM_routine_f(infiltration=exInfil, PET = exPET))

Julia:

function SM_routine_f(infiltration::Vector{Float64}, PET::Vector{Float64})

rcScl = 0.1
rcExp = 1.3
PETexp = 2
Zr = 1000.0
n = 0.5
smhp = 0.00
smwp = 0.10
smfc = 0.25
s0 = 0.5
nt = length(infiltration)
SM_store = [s0; fill(missing,nt)]
smhp_stor = Zr * smhp
smwp_stor = Zr * smwp
smfc_stor = Zr * smfc
max_stor = Zr * n

for i in 1:length(infiltration)
    thisSMstor = SM_store[i]
    AET = (thisSMstor > smhp_stor) ? min(thisSMstor - smhp_stor, PET[i] * (thisSMstor / max_stor) ^ PETexp) : 0
    thisSMstor = thisSMstor - AET
    deepDrainage = (thisSMstor > smfc_stor) ?  min(thisSMstor - smfc_stor, rcScl * thisSMstor * (thisSMstor / smfc_stor)^rcExp) : 0
    SM_store[i+1] = min(max_stor, thisSMstor - min(thisSMstor, deepDrainage) + infiltration[i])
end
return(SM_store / n)
end

using Distributions, BenchmarkTools
exInfiltration = rand(Normal(1, 1), 365*20);
exPET = rand(Normal(1, 2), 365*20);
@benchmark SM_routine_f($exInfiltration, $exPET)

Beliabelial answered 16/10, 2023 at 12:27 Comment(14)

If you want to continue with R, I recommend implementing this simple for loop with Rcpp. That should be pretty straightforward and easy. – Boxhaul 16/10, 2023 at 12:33

In the Python lines that use np.where, aren't all the variables scalars? If so, what happens if you use an if..else structure without numpy? – Ephesus 16/10, 2023 at 12:36

Hello and welcome to Stack Overflow! Why are you using np.where(condition, a, b) instead of a if condition else b? – Cockney 16/10, 2023 at 12:38

Please read stackoverflow.com/help/minimal-reproducible-example to see how to write minimal examples that help to boil down your core problem. With regards to your actual problem: Julia is a JIT compiled language, Python and R are not (in their standard form). As you looping over a long for loop, I'd always expect Julia to be significantly faster. – Coumarone 16/10, 2023 at 12:52

@maxxilian R has had JIT enabled by default since version 3.4.0 (released April 2017). You can see it mentioned in the release notes. – Wyck 16/10, 2023 at 13:33

Thanks slothrop and Dima Chubarov, this does indeed improve performance to 70 ms. Still, though, Python is slowest on my machine though – Beliabelial 16/10, 2023 at 16:17

Looks like you could make the Python code 20x faster by not doing the wrong thing. – Malatya 16/10, 2023 at 16:47

Thanks @KellyBundy I would love to see what that would be. A couple folks above gave advice regarding np.where() --> a if condition else b, and that generated a substantial improvement (~ 25%) but my Python is still 140x slower than my Julia (and I'm not very good with Julia). I would love to see a 20x improvement, but I'm not sure what could be implemented to generate that effect – Beliabelial 16/10, 2023 at 19:1

Thanks @Boxhaul I've not used Rcpp directly before, but I'll give that a try and post results – Beliabelial 16/10, 2023 at 19:2

Similar to the Python comments, you can speed up your R code about 3x by replacing ifelse() (which is intended for vectors) with if() {} else{}, which only does length-one comparisons like you have here. – Wyck 16/10, 2023 at 20:11

Thanks Gregor Thomas, good call on ifelse() not being optimal in this situation. It does indeed result in a 3x improvement. including all the Python and R updates gets me to: Python (6 ms), R (16 ms), Julia (0.5 ms). Much appreciated! – Beliabelial 17/10, 2023 at 11:52

Just curious: "in order of popularity" increasing or decreasing? I.e., which one is most popular? – Malatya 17/10, 2023 at 13:36

@KellyBundy Python is by far the most popular, R is a distant second (mostly myself and a couple other people), and Julia is something that a few of us are playing with but none of us count as our "preferred" language – Beliabelial 17/10, 2023 at 13:39

@GregorThomas Python is also JIT compiled in that sense (and has been for a long time), but that’s not the sense in which Julia is JIT compiled: Julia compiles code to native machine code that is executed directly by the CPU. By contrast, both Python and R JIT compilation yields byte code that’s interpreted by a VM at runtime. – Nonpros 23/10, 2023 at 7:0

An "R" version using RCPP.

Rcpp::sourceCpp(code=r"(
#include <Rcpp.h>
#include <algorithm>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector rcpp(NumericVector infiltration,
                                      NumericVector PET) {
  double rcScl = 0.1;
  double rcExp = 1.3;
  double PETexp = 2;
  double Zr = 1000.0;
  double n = 0.5;
  double smhp = 0.00;
  double smwp = 0.10;
  double smfc = 0.25;
  double s0 = 0.5;
  size_t nt = infiltration.size();
  NumericVector SM_store(no_init(nt+1));
  SM_store[0] = s0;
  double smhp_stor = Zr * smhp;
  double smwp_stor = Zr * smwp;
  double smfc_stor = Zr * smfc;
  double max_stor = Zr * n;
  for(size_t i=0; i<nt; ++i) {
    double thisSMstor = SM_store[i];
    if(thisSMstor > smhp_stor) thisSMstor -= std::min(thisSMstor - smhp_stor,
                    PET[i] * pow(thisSMstor / max_stor, PETexp));
    double deepDrainage = (thisSMstor > smfc_stor) ?  std::min(thisSMstor - smfc_stor, rcScl * thisSMstor * pow(thisSMstor / smfc_stor, rcExp)) : 0;
    SM_store[i+1] = std::min(max_stor, thisSMstor - std::min(thisSMstor, deepDrainage) + infiltration[i]);
  }
  return SM_store;
}
)")

RNGkind('Mersenne-Twister', 'Inversion', 'Rejection')
set.seed(42)
exInfil = rnorm(365*20, mean=1, sd=1)
exPET = rnorm(365*20, mean=2, sd=1)
bench::mark(rcpp(exInfil, exPET))
#  expression      min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#  <bch:expr>    <bch> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#1 rcpp(exInfil… 284µs  305µs     3318.    59.6KB     2.01  1649     1      497ms

On my PC it takes 0.305ms but in fact this is then in C++.

A slightly changed Julia version:

function SM_routine_fB(infiltration::Vector{Float64}, PET::Vector{Float64})
  rcScl = 0.1
  rcExp = 1.3
  PETexp = 2
  Zr = 1000.0
  n = 0.5
  smhp = 0.00
  smwp = 0.10
  smfc = 0.25
  s0 = 0.5
  nt = length(infiltration)
  SM_store = Vector{Float64}(undef, nt+1)
  SM_store[1] = s0
  smhp_stor = Zr * smhp
  smwp_stor = Zr * smwp
  smfc_stor = Zr * smfc
  max_stor = Zr * n
  @fastmath for i in 1:nt
    thisSMstor = SM_store[i]
    if thisSMstor > smhp_stor
      thisSMstor -= min(thisSMstor - smhp_stor, PET[i] * (thisSMstor / max_stor) ^ PETexp)
    end
    deepDrainage = (thisSMstor > smfc_stor) ?  min(thisSMstor - smfc_stor, rcScl * thisSMstor * (thisSMstor / smfc_stor)^rcExp) : 0
    SM_store[i+1] = min(max_stor, thisSMstor - min(thisSMstor, deepDrainage) + infiltration[i])
  end
  return(SM_store / n)
end

using Distributions, BenchmarkTools
exInfiltration = rand(Normal(1, 1), 365*20);
exPET = rand(Normal(1, 2), 365*20);
@benchmark SM_routine_fB($exInfiltration, $exPET)
#BenchmarkTools.Trial: 10000 samples with 1 evaluation.
# Range (min … max):  348.894 μs …  1.257 ms  ┊ GC (min … max): 0.00% … 60.58%
# Time  (median):     369.202 μs              ┊ GC (median):    0.00%
# Time  (mean ± σ):   379.450 μs ± 41.742 μs  ┊ GC (mean ± σ):  0.39% ±  2.93%
#
#  █   ▆▇▂▁▂█▇▃▃▃▃▆▃▃▄▃▄▃▃▂▂▂▂▁▁▁▂▂▁▁▁▁▁▂▁▁▂▂▁▁▁▁               ▂
#  █▇▇▅███████████████████████████████████████████████▇█▇▆▆▆▆▆▆ █
#  349 μs        Histogram: log(frequency) by time       470 μs <
#
# Memory estimate: 114.22 KiB, allocs estimate: 4.

What is 0.369ms in Julia.

Pye answered 17/10, 2023 at 13:31 Comment(3)

Note that Julia matches the C++ if you add @fastmath to the loop (i.e. @fastmath for i in 1:nt) – Bobbie 17/10, 2023 at 13:54

Thanks @GKi! I see a ~ 10% improvement from changing 'SM_store = [s0; fill(missing,nt)]' to 'SM_store = Vector{Float64}(undef, nt+1)' and the if statement for updating thisSMstor. And then an additional 20% improvement from @fastmath. Is there any reason ever not to use @fastmath? In any case, I currently have on my machine: Python Improved (3.6 ms), Python Optimized (3.05 ms), R (12.1 ms), and Julia Original (0.4 ms), Julia Improved (0.31 ms). I'm having an issue with dependencies when trying to run Rcpp, but will update that when I get it fixed. – Beliabelial 17/10, 2023 at 14:15

docs.julialang.org/en/v1/manual/performance-tips/… gives: Use @fastmath to allow floating point optimizations that are correct for real numbers, but lead to differences for IEEE numbers. – Pye 17/10, 2023 at 14:19

It's bad to work with NumPy data types like that. If I convert the input arrays to Python lists of Python floats, do the work without NumPy, and convert back to a NumPy array at the end, it's about 20 times faster. Three attempts:

original  94.9 ms
improved   4.4 ms
original  95.1 ms
improved   4.4 ms
original  93.2 ms
improved   4.5 ms

Further improvement by avoiding Python's slow min function makes it about 30 times faster:

original  91.4 ms
improved   3.3 ms
original  99.4 ms
improved   3.3 ms
original  96.7 ms
improved   3.2 ms

Some micro-optimizations can speed it up some more:

  3.11 ± 0.03 ms  improved
  2.58 ± 0.02 ms  optimized

Code for original, improved, comparison and benchmark:

import numpy as np

def original(infiltration=None, PET=None, rcScl=0.1, rcExp=1.3, PETexp=2, Zr=1000, n=0.5, smhp=0.00, smwp=0.10, smfc=0.25, s0=0.5):
    nt = len(infiltration)
    SM_store = np.empty(nt+1)
    SM_store[0] = s0
    smhp_stor = Zr * smhp
    smwp_stor = Zr * smwp
    smfc_stor = Zr * smfc
    max_stor = Zr * n
    for i in range(nt):
        thisSMstor = SM_store[i]
        AET = np.where(thisSMstor > smhp_stor, (thisSMstor - smhp_stor) * PET[i] * (thisSMstor / max_stor) ** PETexp / max_stor, 0)
        thisSMstor -= AET
        deepDrainage = np.where(thisSMstor > smfc_stor, rcScl * thisSMstor * (thisSMstor / smfc_stor) ** rcExp - (thisSMstor - smfc_stor), 0)
        SM_store[i+1] = min(max_stor, thisSMstor + infiltration[i] - deepDrainage)
    return SM_store / n
exInfil = np.random.normal(loc=1, scale=1, size=365*20)
exPET = np.random.normal(loc=2, scale=1, size=365*20)

def improved(infiltration=None, PET=None, rcScl=0.1, rcExp=1.3, PETexp=2, Zr=1000, n=0.5, smhp=0.00, smwp=0.10, smfc=0.25, s0=0.5):
    infiltration = infiltration.tolist()
    PET = PET.tolist()
    nt = len(infiltration)
    SM_store = [None] * (nt+1)
    SM_store[0] = s0
    smhp_stor = Zr * smhp
    smwp_stor = Zr * smwp
    smfc_stor = Zr * smfc
    max_stor = Zr * n
    for i in range(nt):
        thisSMstor = SM_store[i]
        AET = (thisSMstor - smhp_stor) * PET[i] * (thisSMstor / max_stor) ** PETexp / max_stor if thisSMstor > smhp_stor else 0
        thisSMstor -= AET
        deepDrainage = rcScl * thisSMstor * (thisSMstor / smfc_stor) ** rcExp - (thisSMstor - smfc_stor) if thisSMstor > smfc_stor else 0
        tmp = thisSMstor + infiltration[i] - deepDrainage
        SM_store[i+1] = max_stor if max_stor < tmp else tmp
    SM_store = np.array(SM_store)
    return SM_store / n

exInfil = np.random.normal(loc=1, scale=1, size=365*20)
exPET = np.random.normal(loc=2, scale=1, size=365*20)

expect = original(infiltration=exInfil, PET=exPET)
result = improved(infiltration=exInfil, PET=exPET)
print((result == expect).all())

import timeit
for f in [original, improved] * 3:
    t = min(timeit.repeat(lambda: f(infiltration=exInfil, PET=exPET), number=1))
    print(f.__name__, f'{t * 1e3: 5.1f} ms')

Attempt This Online!

Malatya answered 17/10, 2023 at 7:59 Comment(6)

Thanks @kellybundy! This is very interesting, I never would have guessed that replacing min() with an if / else statement would improve performance so drastically. Running all code on the same machine, including all the Python and R updates, I'm now getting the following: Python (6 ms), R (16 ms), Julia (0.5 ms). This is much more in line with what I was expecting – Beliabelial 17/10, 2023 at 11:53

@ArikTashie The 6 ms is with which version? And what Python version are you using? (I used Python 3.11.4) – Malatya 17/10, 2023 at 12:25

I'm using 3.11.5 for Python 4.3.1 for R, and 1.9.3 for Julia. 6 ms is the mean of 1000 runs using the improved version and the min is 3.6 ms (for my original bad code, the min is 70.0 ms, mean is 93). And I recognize now that using the min is better than the mean, so the current minimum values are: Python (3.6 ms), R (13.8 ms), Julia (0.4 ms). And I'm running everything in a Jupyter notebook in AWS SageMaker Studio on the same instance type (ml.t3.medium). And thanks again, @KellyBundy I'm learning a lot from your insights! – Beliabelial 17/10, 2023 at 13:3

@ArikTashie Yeah, min of the times is recommended, as explained here. My script measuring the optimized solution does use mean+stdev, but only of the fastest 5 of 100 runs. That has been working well for me, it's almost min, but seeing a small stdev still gives me confidence that it's not a complete single outlier and that I got a stable run of the script (particularly valuable since I'm usually running it on a shared web service). – Malatya 17/10, 2023 at 13:19

Oh, sorry, and for the optimized code, I'm getting 3.05 ms. So Python Improved (3.6 ms), Python Optimized (3.05 ms), R (13.8 ms), and Julia (0.4 ms). And thanks for the extra info + link, that's useful – Beliabelial 17/10, 2023 at 13:28

And incorporating GKi's recommendations for Julia (in a thread in different comment), the current results are: Python Improved (3.6 ms), Python Optimized (3.05 ms), R (12.1 ms), and Julia Original (0.4 ms), Julia Improved (0.31 ms), Julia Optimized (0.23 ms) – Beliabelial 19/10, 2023 at 12:46

In general 10x-100x is about what to expect when comparing Julia (or other fast languages) to Python/R. Typically R is closer to the 100x side of this and python is often in the ~40x range although recent improvements in Python 3.12 bring modest (25% or so) speedups to python.

Bobbie answered 16/10, 2023 at 19:56 Comment(6)

Unless you can use NumPy properly to spend most of your CPU time inside loops in pre-compiled NumPy functions, not in CPython interpreter overhead from stuff like looping 1 element at a time. Yes, if you do that, yeah about 100x is the right order of magnitude for the slow CPython interpreter vs. things that JIT or ahead-of-time compile to native code with a decent optimizer. – Rori 17/10, 2023 at 9:18

After solely replacing the bad NumPy usage with just Python (see my improved version), Julia seems only about 10x faster. And with further optimizations only about 5x faster (estimated, I can't actually run Julia myself). Can the Julia code be much improved as well, so that it's about 100x faster than my Python code? Also @PeterCordes – Malatya 17/10, 2023 at 9:29

@KellyBundy: Unfortunately Julia is a language I haven't played around with, and I haven't really looked at what computation the OP is doing. :/ – Rori 17/10, 2023 at 9:31

You are correct. Throwing a @fastmath on the loop brings the time down to ~350ms. (back to ~10x over optimized python) – Bobbie 17/10, 2023 at 14:14

I'd suggest deleting this. (a) it's not really an answer--feels like it should be a comment, and (b) the other answers show Julia at about 10x faster, not 100x, so it's off by an order of magnitude. – Wyck 17/10, 2023 at 16:58

@KellyBundy also worth noting that the same trick you use in the optimized version bring the julia time down to 250 ms. – Bobbie 17/10, 2023 at 17:22

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags