what's the overhead of passing python callback functions to Fortran subroutines?
Asked Answered
P

1

9

I just wrapped a Fortran 90 subroutine to python using F2PY. The subtlety here is that the Fortran subroutine aslo takes a python call-back function as one of its arguments:

SUBROUTINE f90foo(pyfunc, a)
real(kind=8),intent(in) :: a
!f2py intent(callback) pyfunc
external pyfunc
!f2py real*8 y,x
!f2py y = pyfunc(x)

!*** debug begins***
print *, 'Start Loop'
do i=1,1000
  p = pyfunc(a)
end do
total = etime(elapsed)
print *, 'End: total=', total, ' user=', elapsed(1), ' system=', elapsed(2)
stop
!*** debug ends  ***

The pyfunc is a python function defined elsewhere in my python code. The wrapper works fine, but running the wrapped version above, I got an elapsed time about factor of 5 times longer than what I can get using pure python as follows,

def pythonfoo(k):
    """ k: scalar 
        returns: scalar
    """
    print('Pure Python: Start Loop')
    start = time.time()
    for i in xrange(1000):
        p = pyfunc(k)
    elapsed = (time.time() - start)
    print('End: total=%20f'% elapsed)

So, the question is, what is the overhead coming from? I really want to leave pyfunc as is because it is extremely time-consuming to re-code it into pure fortran function, so is there any way to improve the speed of the wrapper module?

Preconscious answered 23/9, 2011 at 6:10 Comment(0)
S
9

In the code you posted, a is double precision float. Passing it from Fortran to Python means wrapping the Fortran double to a PyFloat object, which does have a cost. In the pure Python version, k is a PyFloat and you don't pay the price for wrapping it 1000 times.

Another issue is the function call itself. Calling Python functions from C is already bad performance-wise, but calling them from Fortran is worse, because there is an additional layer of code to transform the Fortran function call conventions (regarding the stack etc.) to C function call conventions. When calling a Python function from C, you need to prepare the arguments as Python objects, generally create a PyTuple object to serve as the *args argument of the Python function, make a lookup in the table of the module to get the function pointer...

Last but not least: you need to take care of the array orders when passing 2D arrays between Fortran and Numpy. F2py and numpy can be smart in that regard, but you'll get performance hits if your Python code is not written to manipulate the arrays in Fortran order.

I don't know what pyfunc is meant to do, but if it is close to what you posted, writing the loop in Python, and calling the function only once will save you time. And if you need the intermediate values (p), let your Python function return a Numpy array with all the intermediate values.

Sanjuana answered 23/9, 2011 at 6:27 Comment(10)
Thanks for the comment! Currently pyfunc is mostly taking scalar arguments inside Fortran although it is designed in python to be able to take either nparrays or scalars.Preconscious
so basically calling python functions in to-be-wrapped fortran codes will cost 5 times more running time than pure python ones and there is no room for improvement? The problem here is that the pyfunc is extremely difficulty to code in pure fortran while speed is also a great concern...Preconscious
if speed is a great concern, why don't you try to optimize pyfunc by writing pieces of it in fortran using f2py?Sleeper
@Sleeper Well, pyfunc itself in python is not slow at all, thus only re-coding pieces of it in fortran won't buy me much, and I guess the calling-pyfunc-back-n-forth strategy would even make things worse, as it is the overhead between the two layers that's killing me.Preconscious
then you will either have to move it entirely to fortran if you want to use it there, or if you just want faster looping, try mapping in python instead of looping, see: wiki.python.org/moin/PythonSpeed/PerformanceTips#LoopsSleeper
I'm thinking about passing an array of tabulated outputs from pyfunc to the fortran subroutine then doing interpolation inside Fortran. pyfunc is a relatively smooth function but with a very large dynamic range in both x and y, so now the problem seems to get down to "how to pass large arrays to fortran efficiently" and "how to interpolate arrays in fortran efficiently"...I'll try to code them up to see how it goes.Preconscious
create the numpy array in Fortran order. I leave the interpolation to you, there are probably already questions on this topic in StackOverflow.Sanjuana
@gurneyalex I know how to initialize ndarrays in Fortran order using numpy.ones, and numpy.zeros etc by specifying the order="F" option, but the output from pyfunc is created on the fly without F-order, how can I convert the array to Fortran order? and is this conversion a gain at all considering the extra memory/time spent?Preconscious
@nye17: if it is a 1D array, the order is the same. Otherwise, the array.asfortranarray() method returns a new array in column major order.Sanjuana
I do not think if the callback performance would differ between Fortran and C. The Fortran interface can be made identical to the C code as if pure C code is calling Python.Hackberry

© 2022 - 2024 — McMap. All rights reserved.