Assuming I want have a numpy array of size (n,m)
where n
is very large, but with a lot of duplication, ie. 0:n1
are identical, n1:n2
are identical etc. (with n2%n1!=0
, ie not regular intervals). Is there a way to store only one set of values for each of the duplicates while having a view of the entire array?
example:
unique_values = np.array([[1,1,1], [2,2,2] ,[3,3,3]]) #these are the values i want to store in memory
index_mapping = np.array([0,0,1,1,1,2,2]) # a mapping between index of array above, with array below
unique_values_view = np.array([[1,1,1],[1,1,1],[2,2,2],[2,2,2],[2,2,2], [3,3,3],[3,3,3]]) #this is how I want the view to look like for broadcasting reasons
I plan to multiply array(view) by some other array of size (1,m)
, and take the dot product of this product:
other_array1 = np.arange(unique_values.shape[1]).reshape(1,-1) # (1,m)
other_array2 = 2*np.ones((unique_values.shape[1],1)) # (m,1)
output = np.dot(unique_values_view * other_array1, other_array2).squeeze()
Output is a 1D array of length n
.
(1,m)
, then storing the dot product with another array. The main consideration is fitting the array into memory. I could do the last step in chunks if it copying is enforced – Disproportion2
doesn't seem like a generic operation. Could you explain its significance? – Syringomyeliaindex_mapping
with bigger range of numbers andunique_values
with random numbers? – Syringomyeliaunique_values_view
, the zeroth and first inner list/array are both fromunique_values[0]
, the second, third and fourth are fromunique_values[1]
etc. The values will typically be floats, but I don't see how the type would matter. I want to duplicate array subsets into one large array, which doesn't store copies of all the duplicates. – Disproportionunique_values
using some kind of memory profiling, I will accept it right away. – Disproportion