This is only a comment WeNYoBen.
If you can avoid lists it is usually recommendable to avoid it.
Example
from numba import njit
import numpy as np
#with lists
@njit()
def cumli(x, lim):
total = 0
result = []
for i, y in enumerate(x):
total += y
if total < lim:
total = 0
result.append(total)
return result
#without lists
@njit()
def cumli_2(x, lim):
total = 0.
result = np.empty_like(x)
for i, y in enumerate(x):
total += y
if total < lim:
total = 0.
result[i]=total
return result
Timings
Without Numba (comment out@njit()):
x=(np.random.rand(1_000)-0.5)*5
%timeit a=cumli(x, 0.)
220 µs ± 2.25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit a=cumli_2(x, 0.)
227 µs ± 1.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
There is no difference between using lists or arrays. But that's not the case if you Jit-compile this function.
With Numba:
%timeit a=cumli(x, 0.)
27.4 µs ± 210 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit a=cumli_2(x, 0.)
2.96 µs ± 32.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Even in a bit more complicated cases (final array size unknown, or only max array size known) it often makes sense to allocate an array and shrink it at the end, or in simple cases even to run the algorithm once to know the final array size and than do the real calculation.
pandas
on GitHub @WeNYoBen – Nobie