I am tasked with calculating hamming distances between 1D binary arrays in two groups - a group of 3000 arrays and a group of 10000 arrays, and every array is 100 items(bits) long. So thats 3000x10000 HD calculations on 100 bit long objects.And all that must be done in at most a dozen minutes
Here's the best of what I came up with
#X - 3000 by 100 bool np.array
#Y - 10000 by 100 bool np.array
hd = []
i=1
for x in X:
print("object nr " + str(i) + "/" + str(len(X)))
arr = np.array([x] * len(Y))
C = Y^arr # just xor this array by all the arrays in the other group simultainously
hd.append([sum(c) for c in C]) #add up all the bits to get the hamming distance
i+=1
return np.array(hd)
And it's still going to take 1-1.5 hours for it to finish. How do I go about making this faster?