random unit vector in multi-dimensional space

Asked 8/6, 2011 at 17:53 Answered 21/1, 2018 at 15:25

Solved random distribution data-mining computational-geometry uniform

I'm working on a data mining algorithm where i want to pick a random direction from a particular point in the feature space.

If I pick a random number for each of the n dimensions from [-1,1] and then normalize the vector to a length of 1 will I get an even distribution across all possible directions?

I'm speaking only theoretically here since computer generated random numbers are not actually random.

Atc answered 8/6, 2011 at 17:53 Comment(0)

One simple trick is to select each dimension from a gaussian distribution, then normalize:

from random import gauss

def make_rand_vector(dims):
    vec = [gauss(0, 1) for i in range(dims)]
    mag = sum(x**2 for x in vec) ** .5
    return [x/mag for x in vec]

For example, if you want a 7-dimensional random vector, select 7 random values (from a Gaussian distribution with mean 0 and standard deviation 1). Then, compute the magnitude of the resulting vector using the Pythagorean formula (square each value, add the squares, and take the square root of the result). Finally, divide each value by the magnitude to obtain a normalized random vector.

If your number of dimensions is large then this has the strong benefit of always working immediately, while generating random vectors until you find one which happens to have magnitude less than one will cause your computer to simply hang at more than a dozen dimensions or so, because the probability of any of them qualifying becomes vanishingly small.

Graven answered 10/12, 2011 at 0:56 Comment(10)

Nice! Thank you for the additional suggestion. – Atc 10/12, 2011 at 6:23

By the way, this is how boost boost.org/doc/libs/1_47_0/boost/random/uniform_on_sphere.hpp implements it. ;) – Rhythmics 13/11, 2012 at 17:36

I think this is the best answer! However, there is a small chance that you will get the zero vector (and thus a divide-by-zero error) every once in a while – Successful 27/2, 2013 at 17:1

Here's a reference on why this method is right mathworld.wolfram.com/HyperspherePointPicking.html – Rodeo 9/10, 2013 at 14:43

A quick explanation of why this works: The probability P of a point being at a given <x,y> is P(x)*P(y). The gaussian distribution has roughly the form e^(-x^2), so e^(-x^2)*e^(-y^2) is e^(-(x^2+y^2)). That is a function only of the distance of the point from the origin, so the resulting distribution is radially symmetric. This generalizes easily to higher dimensions. – Callosity 20/8, 2014 at 19:5

Additional note: the Box–Muller transform may be used to generate independent pairs of normally distributed variables from independent pairs of uniformly distributed ones (with no 'waste'). – Sissel 13/10, 2014 at 15:47

with numpy this would be vec = numpy.random.randn(dims); return vec / numpy.linalg.norm(vec) – Schizogenesis 16/6, 2017 at 22:16

You can get a division by 0. What about a safer way? – Blanchblancha 31/12, 2021 at 7:30

@Blanchblancha just use mag=sqrt(dims), which is the expectation of mag. The result may have a slightly different norm than 1 (though the error moves towards zero with higher dims) – Velocity 2/2, 2022 at 16:19

@Blanchblancha Another thing is that each x**2 is an estimate of the normal's variance (1) and the sum is very unlikely to be zero... – Velocity 2/2, 2022 at 16:28

You will not get a uniformly distributed ensemble of angles with the algorithm you described. The angles will be biased toward the corners of your n-dimensional hypercube.

This can be fixed by eliminating any points with distance greater than 1 from the origin. Then you're dealing with a spherical rather than a cubical (n-dimensional) volume, and your set of angles should then be uniformly distributed over the sample space.

Pseudocode:

Let n be the number of dimensions, K the desired number of vectors:

vec_count=0
while vec_count < K
   generate n uniformly distributed values a[0..n-1] over [-1, 1]
   r_squared = sum over i=0,n-1 of a[i]^2
   if 0 < r_squared <= 1.0
      b[i] = a[i]/sqrt(r_squared)  ; normalize to length of 1
      add vector b[0..n-1] to output list
      vec_count = vec_count + 1
   else
      reject this sample
end while

Harim answered 8/6, 2011 at 17:59 Comment(4)

That's what I was worried about. I just wasn't able to formalize it in my head the way you described. Intuitively I know that I want my possible random vectors to describe a circle. I'm just not seeing how to implement it in code. – Atc 8/6, 2011 at 18:6

@Matt: I expanded my answer a bit, hope that helps. – Harim 8/6, 2011 at 18:19

Why would you use an algo with non-deterministic run time AND a branch if you could solve this with a closed-form expression? – Underbody 13/3, 2013 at 0:49

In high dimensions, this is extremely inefficient. In six dimensions, for example, only 8% of samples will be accepted. In ten dimensions, this falls to 0.25%. – Gwyn 9/12, 2021 at 19:34

There is a boost implementation of the algorithm that samples from normal distributions: random::uniform_on_sphere

Rodarte answered 7/6, 2012 at 13:20 Comment(0)

I had the exact same question when also developing a ML algorithm.
I got to the same conclusion as Jim Lewis after drawing samples for the 2-d case and plotting the resulting distribution of the angle.

Furthermore, if you try to derive the density distribution for the direction in 2d when you draw at random from [-1,1] for the x- and y-axis ,you will see that:

f_X(x) = 1/(4*cos²(x)) if 0 < x < 45⁰
and
f_X(x) = 1/(4*sin²(x)) if x > 45⁰

where x is the angle, and f_X is the probability density distribution.

I have written about this here: https://aerodatablog.wordpress.com/2018/01/14/random-hyperplanes/

Neddie answered 21/1, 2018 at 15:25 Comment(0)

-3

#define SCL1 (M_SQRT2/2)
#define SCL2 (M_SQRT2*2)

// unitrand in [-1,1].
double u = SCL1 * unitrand();
double v = SCL1 * unitrand();
double w = SCL2 * sqrt(1.0 - u*u - v*v);

double x = w * u;
double y = w * v;
double z = 1.0 - 2.0 * (u*u + v*v);

Guy answered 23/4, 2012 at 15:5 Comment(1)

Code no more is hard to read for non-mechanical guys like me. Any comments on what it does, why it's better than the accepted answer, or something ? – Brahma 27/10, 2012 at 0:45

Recommended topics

Hot tags