Cython Memoryview as return value
Asked Answered
F

1

15

Consider this dummy Cython code:

#!python
#cython: boundscheck=False
#cython: wraparound=False
#cython: initializedcheck=False
#cython: cdivision=True
#cython: nonecheck=False

import numpy as np

# iterator function
cdef double[:] f(double[:] data):
    data[0] *= 1.01
    data[1] *= 1.02
    return data

# looping function
cdef double[:] _call_me(int bignumber, double[:] data):
    cdef int ii
    for ii in range(bignumber):
        data = f(data)
    return data

# helper function to allow calls from Python
def call_me(bignumber):
    cdef double[:] data = np.ones(2)
    return _call_me(bignumber, data)

Now, if I do a cython -a on this, it shows the return statements in yellow. I'm doing something similar in a very performance-critical program, and according to profiling this is really slowing my code down. So, why does cython need python for these return statements? The annotated file gives a hint:

PyErr_SetString(PyExc_TypeError,"Memoryview return value is not initialized");

Amazingly, a google search for cython "Memoryview return value is not initialized" gives zero results.

Frantz answered 7/1, 2014 at 14:20 Comment(4)
Cython version 0.19.2Frantz
In your real code, do you need to return the memoryview or can you modify it in place like here? Doing that changes gives me a 40x speedup. I'm not sure if there's a way to switch that check off...Bronchi
The real code iteratively solves ordinary differential equations, so yes, I do need to return it.Frantz
Mmm let's see if a cython wizard knows about a way to return small memoryviews fast. As a workaround, f can be rewritten to accept data_in and data_out buffers instead of returning it.Bronchi
S
8

The slow part isn't what you think it is. The slow part is (well... primarily)

data = f(data)

Not the f(data). The data =.

This assigns a struct, which is defined as so

typedef struct {
  struct __pyx_memoryview_obj *memview;
  char *data;
  Py_ssize_t shape[8];
  Py_ssize_t strides[8];
  Py_ssize_t suboffsets[8];
} __Pyx_memviewslice;

and the assignment mentioned does

__pyx_t_3 = __pyx_f_3cyt_f(__pyx_v_data);

where __pyx_t_3 is of that type. If this is done heavily in a loop as it is, it takes far longer to copy the structs than to do the trivial body of the function. I've done a timing in pure C and it gives similar numbers.

(Edit note: The assigning is actually primarily a problem because it also causes generation of structs and other copies to not be optimised out.)

However, the whole thing seems silly. The only reason to copy the struct is for if something has changed, but nothing has. The memory points at the same place, the data points at the same place and the shape, strides and offsets are the same.

The only way I see to avoid the struct copy is to not change any of what it references (aka. always return the memoryview given in). That's only possible in circumstances where returning is pointless anyway, like here. Or you can hack at the C, I guess, like I was. Just don't cry if you break something.


Also note that you can make your function nogil, so it can't have anything to do with harking back to Python.


EDIT

C's optimising compiler was throwing me slightly off. Basically, I removed some assigning and it removed loads of other things. Basically the slow path is this:

#include<stdio.h>


struct __pyx_memoryview_obj;


typedef struct {
  struct __pyx_memoryview_obj *memview;
  char *data;
  ssize_t shape[8];
  ssize_t strides[8];
  ssize_t suboffsets[8];
} __Pyx_memviewslice;


static __Pyx_memviewslice __pyx_f_3cyt_f(__Pyx_memviewslice __pyx_v_data) {
  __Pyx_memviewslice __pyx_r = { 0, 0, { 0 }, { 0 }, { 0 } };
  __pyx_r = __pyx_v_data;
  return __pyx_r;
}

main() {
    int i;
    __Pyx_memviewslice __pyx_v_data = {0, 0, { 0 }, { 0 }, { 0 }};

    for (i=0; i<10000000; i++) {
        __pyx_v_data = __pyx_f_3cyt_f(__pyx_v_data);
    }
}

(compile with no optimisations). I'm no C programmer, so apologies if what I've done sucks in some way not directly linked to the fact I've copied computer-generated code.

I know this doesn't help, but I did my best, OK?

Sulfonmethane answered 10/1, 2014 at 15:41 Comment(1)
+1 for showing that this is more complicated than I thought, and for the nogil tip.Frantz

© 2022 - 2024 — McMap. All rights reserved.