I started working recently with a team whose members variably prefer Python, R, and Julia (in order of popularity). We are translating a set of hydrology modules built by various team members into a common library and I want to do some basic benchmarking before coming to a consensus on which language we should choose for this purpose. The problem is that I'm pretty bad at Python and new to Julia, and I believe I may be biasing results due to poor coding practices.
The results for these modules (code below) are: Python (93 ms), R (55 ms), and Julia (0.5 ms). The code can't be vectorized (to my understanding), so I assumed Julia would be fastest but not by 100x, and I also assumed that Python would be faster than R.
Can anyone point out some basic errors in efficiency?
Python:
import numpy as np
def SM_routine_f(infiltration=None, PET=None, rcScl=0.1, rcExp=1.3, PETexp=2, Zr=1000, n=0.5, smhp=0.00, smwp=0.10, smfc=0.25, s0=0.5):
nt = len(infiltration)
SM_store = np.empty(nt+1)
SM_store[0] = s0
smhp_stor = Zr * smhp
smwp_stor = Zr * smwp
smfc_stor = Zr * smfc
max_stor = Zr * n
for i in range(nt):
thisSMstor = SM_store[i]
AET = np.where(thisSMstor > smhp_stor, (thisSMstor - smhp_stor) * PET[i] * (thisSMstor / max_stor) ** PETexp / max_stor, 0)
thisSMstor -= AET
deepDrainage = np.where(thisSMstor > smfc_stor, rcScl * thisSMstor * (thisSMstor / smfc_stor) ** rcExp - (thisSMstor - smfc_stor), 0)
SM_store[i+1] = min(max_stor, thisSMstor + infiltration[i] - deepDrainage)
return SM_store / n
exInfil = np.random.normal(loc=1, scale=1, size=365*20)
exPET = np.random.normal(loc=2, scale=1, size=365*20)
import timeit
timeit.timeit(lambda: SM_routine_f(infiltration=exInfil, PET=exPET), number=1)
R:
SM_routine_f = function(infiltration = NA, PET = NA,
rcScl = 0.1, # water retention curve scalar
rcExp = 1.3, # water retention curve exponent
PETexp = 2, # exponent of PET decay
Zr = 1000, # root depth
n = 0.5, # soil porosity
smhp = 0.00, # soil moisture at hydroscopic point
smwp = 0.10, # soil moisture at wilting point
smfc = 0.25, # soil moisture field capacity
s0 = .5) # initial soil moisture
{
nt = length(infiltration)
SM_store = c(s0, rep(NA, nt))
smhp_stor = Zr * smhp
smwp_stor = Zr * smwp
smfc_stor = Zr * smfc
max_stor = Zr * n
for(i in 1:length(infiltration)) {
thisSMstor = SM_store[i]
AET = ifelse(thisSMstor > smhp_stor,
min(thisSMstor - smhp_stor, PET[i] * (thisSMstor / max_stor) ^ PETexp),
0)
thisSMstor = thisSMstor - AET
deepDrainage = ifelse(thisSMstor > smfc_stor,
min(thisSMstor - smfc_stor, rcScl * thisSMstor * (thisSMstor / smfc_stor)^rcExp),
0)
SM_store[i+1] = min(max_stor, thisSMstor - min(c(thisSMstor, deepDrainage)) + infiltration[i])
}
return(SM_store / n)
}
# generating dummy data for testing, e.g. 20 years of climate data
exInfil = rnorm(365*20, mean=1, sd=1)
exPET = rnorm(365*20, mean=2, sd=1)
# benchmarking
microbenchmark::microbenchmark(SM_routine_f(infiltration=exInfil, PET = exPET))
Julia:
function SM_routine_f(infiltration::Vector{Float64}, PET::Vector{Float64})
rcScl = 0.1
rcExp = 1.3
PETexp = 2
Zr = 1000.0
n = 0.5
smhp = 0.00
smwp = 0.10
smfc = 0.25
s0 = 0.5
nt = length(infiltration)
SM_store = [s0; fill(missing,nt)]
smhp_stor = Zr * smhp
smwp_stor = Zr * smwp
smfc_stor = Zr * smfc
max_stor = Zr * n
for i in 1:length(infiltration)
thisSMstor = SM_store[i]
AET = (thisSMstor > smhp_stor) ? min(thisSMstor - smhp_stor, PET[i] * (thisSMstor / max_stor) ^ PETexp) : 0
thisSMstor = thisSMstor - AET
deepDrainage = (thisSMstor > smfc_stor) ? min(thisSMstor - smfc_stor, rcScl * thisSMstor * (thisSMstor / smfc_stor)^rcExp) : 0
SM_store[i+1] = min(max_stor, thisSMstor - min(thisSMstor, deepDrainage) + infiltration[i])
end
return(SM_store / n)
end
using Distributions, BenchmarkTools
exInfiltration = rand(Normal(1, 1), 365*20);
exPET = rand(Normal(1, 2), 365*20);
@benchmark SM_routine_f($exInfiltration, $exPET)
np.where
, aren't all the variables scalars? If so, what happens if you use anif..else
structure without numpy? – Ephesusnp.where(condition, a, b)
instead ofa if condition else b
? – Cockneyifelse()
(which is intended for vectors) withif() {} else{}
, which only does length-one comparisons like you have here. – Wyck