Is there a way I can efficiently make the bools in cond decide whether a or b should be returned?
Yes, you could do
cond * a + (1-cond) * b
cond
will be broadcast to (N, M)
shape.
This should be close to the theoretical limit, which is the memory bandwidth: this operation needs to read about N*M
elements and write N*M
.
Instead, we read 2*N*M
, but remove the conditional logic.
(I don't have Theano in front of me, so I am not sure if it's faster than T.switch
, but it should be about as good as it gets. Also, I'd try casting cond
to the same dtype
as a
and b
)
If you want to update a
in-place, you can do it using T.set_subtensor
:
a = np.random.uniform(size=(N, M)).astype(np.float32)
b = np.random.uniform(size=(N, M)).astype(np.float32)
a = theano.shared(a)
b = theano.shared(b)
c = T.vector() # mostly 0, presumably (1-cond)
nz = T.nonzero(c)
s = T.set_subtensor(a[nz], b[nz])
fn = theano.function([c], [], updates=[(a, s)])
...
fn(1-cond)
It may or may not be faster than the first approach, depending on N
, M
and other factors.
a
would be the right value to return and it's fine for the method to modifya
directly. Suppose only 5% of the timeb
should be returned for a given row, couldn't one obtain better performance by modifyinga
directly only on the rows needing modification? – Pamela