How to generate a Rank 5 matrix with entries Uniform?
Asked Answered
E

2

8

I want to generate a rank 5 100x600 matrix in numpy with all the entries sampled from np.random.uniform(0, 20), so that all the entries will be uniformly distributed between [0, 20). What will be the best way to do so in python?

I see there is an SVD-inspired way to do so here (https://math.stackexchange.com/questions/3567510/how-to-generate-a-rank-r-matrix-with-entries-uniform), but I am not sure how to code it up. I am looking for a working example of this SVD-inspired way to get uniformly distributed entries.

I have actually managed to code up a rank 5 100x100 matrix by vertically stacking five 20x100 rank 1 matrices, then shuffling the vertical indices. However, the resulting 100x100 matrix does not have uniformly distributed entries [0, 20).

Here is my code (my best attempt):

import numpy as np
def randomMatrix(m, n, p, q):
    # creates an m x n matrix with lower bound p and upper bound q, randomly.
    count = np.random.uniform(p, q, size=(m, n))
    return count

Qs = []
my_rank = 5
for i in range(my_rank):
  L = randomMatrix(20, 1, 0, np.sqrt(20))
  # L is tall
  R = randomMatrix(1, 100, 0, np.sqrt(20)) 
  # R is long
  Q = np.outer(L, R)
  Qs.append(Q)

Q = np.vstack(Qs)
#shuffle (preserves rank 5 [confirmed])
np.random.shuffle(Q)

Eley answered 18/1, 2022 at 4:47 Comment(11)
Just a note: I tried to apply the method of the pdf but all my attempts resulted in a matrix with a Gaussian distribution and not a uniform one... I am wondering if this method should actually produce a uniform matrix or if I missed an important point.Standin
I'm sorry, what do you mean by the PDF? The only link I put involved the SVD.Eley
I was talking about the link in the answer of the post you linked in your question (this one).Standin
@JérômeRichard what this method is doing is a fancy way of implementing the reweighting of the base vectors of the space spanned by the vectors in the matrix OP wants to create. and if you read my answer you'll see why you get smth close to normally distributed values.Rinaldo
@yannziselman The thing is the answer of the link said that the resulting matrix have uniform entries which appear not to be the case, so I the linked answer wrong? The orthogonal matrices supposed to be uniform (even the one of Scipy) does not contains uniform entries in the first place. So I guess such generated matrix are uniformly picked from the space of all possible matrices but that does not mean each entries are uniformly distributed (which would also mean that the linked answer is wrong unless the way U(N) matrices have somehow "special" properties and I failed to reproduce it.Standin
@GalenBlueTalon How exactly do you define "uniformly distributed entries"? Can you provide some kind of metric that maps a given matrix to a scalar which determines how close that matrix is to what you expect?Archicarp
@Archicarp Well, based on the OP question you can find a simple metric: computing the mean and the standard deviation. If the means is close to 10 and the standard deviation is about (20**2/12)**0.5 ~= 5.77 and there is not values outside the range [0;20) and np.linalg.matrix_rank(mat) is 5, then this means you succeed to find a quite good solution. A better metric should be to compute the histogram and end then apply basic statistical tests in order to compute a p-value (I guess an Anova should do the job pretty well). using np.random.rand is almost perfect except the rank is not 5...Standin
@JérômeRichard As you wrote, different metrics are feasible, so that's why I asked the OP to clarify which one they are considering and what value of that metric is considered "good enough".Archicarp
@JérômeRichard according to your comment a normal distribution with the expected value and variance you computed will be considered "uniform". I could also create a random varibale that can be one of two values centered around 10 with a distance that achives the same variance and it would be considered "uniform". I would argue otherwise. Something that should be considered about ANOVA is that the test assumes the RVs are normally distributed. which means that it's shouldn't exactly be suited to this purpose.Rinaldo
@yannziselman the normal distribution N(10, 5.77) works could fit the mean and the SD criteria but not the range. If you truncate the normal distribution, it changes the SD. If you try to tune the parameters of the truncated normal distribution to find one that fit the std-dev of 5.77, you should see that sigma=+inf is a good fit... and the resulting distribution is... U(0, 20) ;) . Thus, no, the normal distribution will not be considered uniform unless it is nearly so. Very good catch for the Anova! I miss this point. I guess the non-parametric Kruskal-Wallis test should do the job.Standin
@jeromerichards yes, KW or chi squared goodness of fitRinaldo
R
1

I just couldn't take the fact the my previous solution (the "selection" method) did not really produce strictly uniformly distributed entries, but only close enough to fool a statistical test sometimes. The asymptotical case however, will almost surely not be distributed uniformly. But I did dream up another crazy idea that's just as bad, but in another manner - it's not really random.
In this solution, I do smth similar to OP's method of forming R matrices with rank 1 and then concatenating them but a little differently. I create each matrix by stacking a base vector on top of itself multiplied by 0.5 and then I stack those on the same base vector shifted by half the dynamic range of the uniform distribution. This process continues with multiplication by a third, two thirds and 1 and then shifting and so on until i have the number of required vectors in that part of the matrix.
I know it sounds incomprehensible. But, unfortunately, I couldn't find a way to explain it better. Hopefully, reading the code would shed some more light.
I hope this "staircase" method will be more reliable and useful.

import numpy as np 
from matplotlib import pyplot as plt

'''
params:
    N    - base dimention
    M    - matrix length
    R    - matrix rank
    high - max value of matrix
    low  - min value of the matrix
'''
N    = 100
M    = 600
R    = 5
high = 20
low  = 0

# base vectors of the matrix
base = low+np.random.rand(R-1, N)*(high-low)

def build_staircase(base, num_stairs, low, high):
    '''
    create a uniformly distributed matrix with rank 2 'num_stairs' different 
    vectors whose elements are all uniformly distributed like the values of 
    'base'.
    '''
    l = levels(num_stairs)
    vectors = []
    for l_i in l:
        for i in range(l_i):
            vector_dynamic = (base-low)/l_i
            vector_bias    = low+np.ones_like(base)*i*((high-low)/l_i)
            vectors.append(vector_dynamic+vector_bias)
    return np.array(vectors)


def levels(total):
    '''
    create a sequence of stritcly increasing numbers summing up to the total.
    '''
    l = []
    sum_l = 0
    i = 1
    while sum_l < total:
        l.append(i)
        i +=1
        sum_l = sum(l)
    i = 0
    while sum_l > total:
        l[i] -= 1
        if l[i] == 0:
            l.pop(i)
        else:
            i += 1
        if i == len(l):
            i = 0
        sum_l = sum(l)
    return l
        
n_rm = R-1 # number of matrix subsections
m_rm = M//n_rm
len_rms = [ M//n_rm for i in range(n_rm)]
len_rms[-1] += M%n_rm
rm_list = []
for len_rm in len_rms:
    # create a matrix with uniform entries with rank 2
    # out of the vector 'base[i]' and a ones vector.
    rm_list.append(build_staircase(
        base = base[i], 
        num_stairs = len_rms[i], 
        low = low,
        high = high,
    ))

rm = np.concatenate(rm_list)
plt.hist(rm.flatten(), bins = 100)

A few examples:
enter image description here enter image description here enter image description here

and now with N = 1000, M = 6000 to empirically demonstrate the nearly asymptotic behavior: enter image description here enter image description here enter image description here

Rinaldo answered 2/2, 2022 at 23:10 Comment(2)
Wow, thank you so much. I really didn't expect you to come up with an additional answer but I really appreciate it. Thank you so much, again! This really helps :)Eley
the code returns an error; what is "i" supposed to be in the last loop ?Bukovina
R
2

Not a perfect solution, I must admit. But it's simple and comes pretty close.
I create 5 vectors that are gonna span the space of the matrix and create random linear combinations to fill the rest of the matrix. My initial thought was that a trivial solution will be to copy those vectors 20 times.
To improve that, I created linear combinations of them with weights drawn from a uniform distribution, but then the distribution of the entries in the matrix becomes normal because the weighted mean basically causes the central limit theorm to take effect.
A middle point between the trivial approach and the second approach that doesn't work is to use sets of weights that favor one of the vectors over the others. And you can generate these sorts of weight vectors by passing any vector through the softmax function with an appropriately high temperature parameter.
The distribution is almost uniform, but the vectors are still very close to the base vectors. You can play with the temperature parameter to find a sweet spot that suits your purpose.

from scipy.stats import ortho_group
from scipy.special import softmax
import numpy as np
from matplotlib import pyplot as plt
N    = 100
R    = 5
low  = 0
high = 20
sm_temperature = 100

p       = np.random.uniform(low, high, (1, R, N))
weights = np.random.uniform(0, 1, (N-R, R, 1))
weights = softmax(weights*sm_temperature, axis = 1)
p_lc    = (weights*p).sum(1)

rand_mat = np.concatenate([p[0], p_lc])

plt.hist(rand_mat.flatten())

enter image description here

Rinaldo answered 18/1, 2022 at 9:58 Comment(12)
May I ask what the intuition behind the N-R and the softmax is?Eley
I'm not sure i fully understand your question. N is the size to the random matrix. R is the required rank. And as you can read in my answer, the intuition behind the softmax is to use sets of weights that favor one of the vectors over the others so that the resulting vectors will be close enough to the spanning vectors to make sure the distribution doesn't change too much but still make them fairly different.Rinaldo
Here N-R = 100-5 = 95, and I'm not sure what the 95 (e.g. N-R) is doing?Eley
If you mean the N-R in the twelfth line in my code, I generate 95 additional vectors to create a 100x100 matrix.Rinaldo
@yannziselman the sentence "the weighted mean basically causes the law of large numbers to take effect" is unclear to me. Did you mean that the central limit theorem applies on the sum of uniformly distributed entries resulting in a uniform distribution? It is also unclear to me how the softmax (and the temperature parameter) formally correct the distribution by favouring some vectors. Is the idea to not sum too many vectors (with equal weights) so the central limit theorem does not apply?Standin
Yea, sorry. I mixed up the LLN and thr CLT. And no, it's not a formal solution that guaranties the entries in the matrix will be uniformly distributed. When generation an NxR matrix you can guarantee the distribution of the entries. It will be uniform. To ensure it stays uniform, i don't want to change them to much. So i change each one just a little. Think of it as smoothing a collection of delta functions with randomly generated kernel with a small space support. The sm functions ensures that. Provided that you use the right temperature.Rinaldo
This was only a complicated way to randomly select one of the five vectors. Try plt.imshow(weights[:20, :, 0]) after running this code and you will see that all but one element are very close to zero.Any
@Bob, ty for the comment. You're not wrong. That's why I'm suggesting experimenting with different hyperparameters to find a case that works for OP's application. For example, you can decide you want the empirical distribution of the random matrix you create to pass the chi-squared goodness of fit test to U(0, 20) with a significance of 0.05 and reduce the parameter I introduced until you find its lowest acceptable value.Rinaldo
This is a difficult problem, I am curious to know how closely we can get to uniform, one interesting experiment I did was to take the projection of a matrix with uniformly distributed elements onto the space of rank 5 matrices, and the result is a matrix with nearly gaussian distribution.Any
@Bob, that's basically the original idea OP presented. I guess that you found the projection by using SVD decomposition, replacing all but the 5 highest singular values with zeros, and synthesizing the Frobenius-norm-wise closest matrix to the original. which is, again, a set of linear combinations of the same 5 base vectors that might be uniformly distributed, but the fact the new vectors are the means of random vectors means they will start approaching a normal distribution as stated by the CLT.Rinaldo
It turns out nobody succeed to find a better solution yet and the problem is particularly hard to solve so your solution appear to be quite good in the end ;) .Standin
Thanks for the additional attention to my question; yes, I agree it's difficult, and your answer solved the issue. I have awarded the bounty to you - thanks for your help!Eley
R
1

I just couldn't take the fact the my previous solution (the "selection" method) did not really produce strictly uniformly distributed entries, but only close enough to fool a statistical test sometimes. The asymptotical case however, will almost surely not be distributed uniformly. But I did dream up another crazy idea that's just as bad, but in another manner - it's not really random.
In this solution, I do smth similar to OP's method of forming R matrices with rank 1 and then concatenating them but a little differently. I create each matrix by stacking a base vector on top of itself multiplied by 0.5 and then I stack those on the same base vector shifted by half the dynamic range of the uniform distribution. This process continues with multiplication by a third, two thirds and 1 and then shifting and so on until i have the number of required vectors in that part of the matrix.
I know it sounds incomprehensible. But, unfortunately, I couldn't find a way to explain it better. Hopefully, reading the code would shed some more light.
I hope this "staircase" method will be more reliable and useful.

import numpy as np 
from matplotlib import pyplot as plt

'''
params:
    N    - base dimention
    M    - matrix length
    R    - matrix rank
    high - max value of matrix
    low  - min value of the matrix
'''
N    = 100
M    = 600
R    = 5
high = 20
low  = 0

# base vectors of the matrix
base = low+np.random.rand(R-1, N)*(high-low)

def build_staircase(base, num_stairs, low, high):
    '''
    create a uniformly distributed matrix with rank 2 'num_stairs' different 
    vectors whose elements are all uniformly distributed like the values of 
    'base'.
    '''
    l = levels(num_stairs)
    vectors = []
    for l_i in l:
        for i in range(l_i):
            vector_dynamic = (base-low)/l_i
            vector_bias    = low+np.ones_like(base)*i*((high-low)/l_i)
            vectors.append(vector_dynamic+vector_bias)
    return np.array(vectors)


def levels(total):
    '''
    create a sequence of stritcly increasing numbers summing up to the total.
    '''
    l = []
    sum_l = 0
    i = 1
    while sum_l < total:
        l.append(i)
        i +=1
        sum_l = sum(l)
    i = 0
    while sum_l > total:
        l[i] -= 1
        if l[i] == 0:
            l.pop(i)
        else:
            i += 1
        if i == len(l):
            i = 0
        sum_l = sum(l)
    return l
        
n_rm = R-1 # number of matrix subsections
m_rm = M//n_rm
len_rms = [ M//n_rm for i in range(n_rm)]
len_rms[-1] += M%n_rm
rm_list = []
for len_rm in len_rms:
    # create a matrix with uniform entries with rank 2
    # out of the vector 'base[i]' and a ones vector.
    rm_list.append(build_staircase(
        base = base[i], 
        num_stairs = len_rms[i], 
        low = low,
        high = high,
    ))

rm = np.concatenate(rm_list)
plt.hist(rm.flatten(), bins = 100)

A few examples:
enter image description here enter image description here enter image description here

and now with N = 1000, M = 6000 to empirically demonstrate the nearly asymptotic behavior: enter image description here enter image description here enter image description here

Rinaldo answered 2/2, 2022 at 23:10 Comment(2)
Wow, thank you so much. I really didn't expect you to come up with an additional answer but I really appreciate it. Thank you so much, again! This really helps :)Eley
the code returns an error; what is "i" supposed to be in the last loop ?Bukovina

© 2022 - 2024 — McMap. All rights reserved.