What would be the most efficient way to multiply (element-wise) a 2D tensor (matrix):
x11 x12 .. x1N
...
xM1 xM2 .. xMN
by a vertical vector:
w1
...
wN
to obtain a new matrix:
x11*w1 x12*w2 ... x1N*wN
...
xM1*w1 xM2*w2 ... xMN*wN
To give some context, we have M
data samples in a batch that can be processed in parallel, and each N
-element sample must be multiplied by weights w
stored in a variable to eventually pick the largest Xij*wj
for each row i
.