Generating a gaussian distribution with only positive numbers
Asked Answered
V

7

22

Is there any way to randomly generate a set of positive numbers such that they have a desired mean and standard deviation?

I have an algorithm to generate numbers with a gaussian distribution, but I don't know how to deal with negative numbers in a way the preserves the mean and standard deviation.
It looks like a poisson distribution might be a good approximation, but it takes only a mean.

EDIT: There's been some confusion in the responses so I'll try to clarify.

I have a set of numbers that give me a mean and a standard deviation. I would like to generate an equally sized set of numbers with an equivalent mean and standard deviation. Normally, I would use a gaussian distribution to do this, however in this case I have an additional constraint that all values must be greater than zero.

The algorithm I'm looking for doesn't need to be gaussian-based (judging by the comments so far, it probably shouldn't be) and doesn't need to be perfect. It doesn't matter if the resulting number set has a slightly different mean/standard deviation -- I just want something that will usually be in the ballpark.

Vitalism answered 5/11, 2009 at 20:44 Comment(8)
It would perhaps help if you could post the algorithm you have.Kershner
doesn't a gaussian distribution by definition also include negative numbers, i.e. no matter how big (positive) your mean is, the left tail will always span to negative infinity?Humperdinck
@netzwerg: You are correct. That's why I'm trying to find another method.Vitalism
@schnaader: The algorithms I've tried have involved generating numbers based on a gaussian distribution with additional steps for getting rid of negatives. For example, taking the absolute value of negative numbers.Vitalism
Besides mean and std dev, do you have more info about your distribution? For instance, is there a minimum, a maximum? From your description it sounds like you are not looking for integers, but for a continuous distribution - is that correct?Hiphuggers
@Mathias: The max is infinity, the minimum is just above zero (that is, I'm trying to generate numbers from zero to infinity, excluding the lower bound). Integers or reals, it doesn't matter in this case. I'm looking for a general process rather than an exact implementation.Vitalism
If your standard deviation is 0 and your mean is positive, that would work.Umbrageous
Bounded on both sides: stats.stackexchange.com/questions/87054/…Crumpton
L
11

You may be looking for log-normal distribution, as David Norman suggested, or maybe exponential, binomial, or some other distribution. If you have an algorithm to generate one distribution, it is probably not good for generating numbers conforming to another distribution. But only you know how your numbers are really distributed.

With normal distribution, the random variable's range is from negative infinity to positive infinity, so if you're looking for positive numbers only, then it is not Gaussian.

Different distributions also have unique properties, for example, with Poisson distribution, the standard deviations is always equal to the mean. (That's why your library function doesn't ask from the standard deviation parameter, only the mean).

In the worst case, you could generate a random real number between 0 and 1 and compute the probability density function on your own. (Depending on the distribution, this may be much easier said than done).

Lantana answered 5/11, 2009 at 21:22 Comment(2)
++ Simplest way to do this is 1) take the log of each original data point, 2) get the mean and sigma of that, 3) generate gaussian normal random numbers with that mean and sigma, and 4) take exp of each number. The results should be similar to what you started with. (To generate a gaussian random number, a simple way is to add up 12 uniform random numbers in the range +/- 0.5.)Delicatessen
I've seen this done before. The PPC Rom for the HP41C calculator has a program that would generate random numbers with a Gaussian distribution, but I don't have the manual anymore, so I can't look up the formula. But it was a formula that converted a set of evenly distributed random numbers from zero to one, and converted them to numbers with a gaussian distribution.Mayfield
P
9

First, you can't generate only positive values from a Gaussian distribution.

Second, am I understanding correctly that you are trying to generate a random distribution with given mean and standard deviation? Will any distribution do? If so, let mean = m and standard deviation = s. I am assuming that m - s > 0.

let n = random integer modulo 2;
if n equals 0 return m - s
else return m + s

The values returned by this process will have mean m and standard deviation s.

Physoclistous answered 5/11, 2009 at 20:51 Comment(4)
I doubt your proposition will satisfy his needs, but I have to give it +1 for an interesting answer to the question. That being said, your answer has a flaw: if m < s, your distribution will not be positive.Hiphuggers
@Mathias: I made the statement "I am assuming that m - s > 0."Physoclistous
That is an interesting answer. Unfortunately, in my case it's not always true that m > s. I'd also like a little more variation to the generated values, though I didn't mention that in the question. +1 for a novel solution, though.Vitalism
@Jason: I tried to keep the spirit of your solution (the simplest distribution satisfying the requirements) and worked out a general solution for any m and s below...Hiphuggers
H
8

You could use a log-normal distribution.

Haematopoiesis answered 5/11, 2009 at 20:49 Comment(0)
F
5

Why not use a resampling method? If you have n numbers in your sample, just take n random draws from the sample, with replacement. The resulting set will have expected mean and variance about the same as your original sample, but it will usually be slightly different.

This said, without knowing why you need more random numbers, it's impossible to say what the right answer is. One wonders if you're trying to solve the wrong problem...

Flub answered 7/11, 2009 at 21:54 Comment(2)
Resampling is an interesting suggestion. In his initial statement, Whatsit didn't say that he had a sample, he only mentioned he had a mean + variance. Polling from the sample will not only replicate the mean and variance, it will also by definition match the shape of the distribution... It would be a good idea if Whatsit wants to run simulations.Hiphuggers
I have a similar problem. Would the resampling (I assuming that everytime we get a -ve value, we just ignore it & take another sample in ti's place) change the meaning of the distribution? Would it lead to a different mean and variance?Railway
H
4

I couldn't resist - I really like Jason's angle but wasn't happy that his answer only covers cases where m > s, so I worked out a general solution following his idea.
The most simple distribution with given m,s and positive terms is

with probability p, return 0
with probability (1-p), return m / (1-p)
where (1-p) = m^2 / (m^2 + s^2)

Proof: for a distribution X with two outcomes lowX with probability p and highX with probability (1-p),
m = E[X] = p x lowX + (1-p) x highX
s^2 = Variance(X) = E[X^2] - E[X]^2 = p x lowX^2 + (1-p) x highX^2 - m^2

Set lowX to 0 and resolve in highX and p.

Hiphuggers answered 6/11, 2009 at 20:48 Comment(1)
Thank you - given the spirit of your answer, I thought you would appreciate :)Hiphuggers
S
3

You could use any distribution which has positive support AND can be specified by mean and variance. For example,

  • one-parameter distributions won't work in general. For example chi-square won't work unless your variance is always double its mean. Similarly exponential won't work unless your variance equals your mean squared.
  • some two-parameter distributions won't work in some cases. Binomial distribution won't work unless variance is less than your mean. Similarly the non-central chi-square won't work unless your variance is greater than 2 times your mean and less than 4 times your mean!
  • However log-normal and gamma will work in all cases.
Soiree answered 13/1, 2010 at 9:44 Comment(0)
P
1

If i understand you correctly you want to generate random numbers from a distribution with positive support. There are many possible choices. The simplest is the

chi-square: http://en.wikipedia.org/wiki/Chi-square_distribution (which is just the sum of two squared gaussians)

All the assymetric distribution (exponential, weibull, pareto, Inverse Gaussian, log-normal, Gamma)

All the distributions from the skew familly (skew-normal, skew-student,...)

All the above functions are such that any random number drawn from any of them will allways be positive.

Prorogue answered 6/11, 2009 at 20:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.