Generate a random number with max, min and mean(average) in Java
Asked Answered
R

5

20

I need to generate random numbers with following properties.

Min should be 200

Max should be 20000

Average(mean) is 500.

Optional: 75th percentile to be 5000

Definitely it is not uniform distribution, nor gaussian. I need to give some left skewness.

Reitareiter answered 15/3, 2011 at 15:51 Comment(15)
Hmmm. I'm not sure there's enough info here to define a distribution?Gigolo
This is actually a delightful math problem. I think it has something to do with identifying a function whose integral over 0-300 matches its integral over 300-19800, but I don't know if I can get any further than that, myself!Copyreader
@Richard: even better: there's enough info to define any number of distributions! ;-)Brochure
I don't know about you guys, but the way you explain the problem seems like homework to me. If that's the case you can at least mention it as suchCuret
@Chuck: I can think of many uses of this that would not imply homework. It might be homework, but it can just as well not be.Brochure
@Joachim Is that so ? Can you tell me one ? I wanna know in what can you use itCuret
@Chuck: a monte-carlo simulation for some behaviour that has been observed to show these properties when measured.Brochure
@Joachim Monte-Carlo ? I'll check out, thank for the observationCuret
No this is not a homework. I am working on a prototype, that requires modeling such distribution. See for more info: wiki.mozilla.org/Socorro:ClientAPIReitareiter
Hmmm... Whatever you are making. Don't cheat. :DHumane
It's funny. I was doing this the other day. I'm pretty sure you need an inverse function. I never did solve the problem... I didn't want to make the tables for the inverses.Scaphoid
@Fuad Malikov I have the code needed to generate this, but I haven't tested it and it's bound to not work. Do you still want it? You'll be able to fix it yourself or I can fix it when I have the time.Scaphoid
@Ryan Amos I am done with this problem but it will be interesting for others if you can post the code.Reitareiter
@Fuad I never finished the reader for the inverse file, but I have the writer.Scaphoid
@Fuad Malikov do you still want it?Scaphoid
Z
12

Java Random probably won't work because it only gives you normal(gaussian) distributions.

What you're probably looking for is an f distribution (see below). You can probably use the distlib library here and choose the f distribution. You can use the random method to get your random number.

enter image description here

Zippel answered 15/3, 2011 at 16:0 Comment(12)
An F distribution is not bounded, so you'd have to truncate it to fit the requirements - and that would complictate the computation of the parameters.Crabwise
@Crabwise true it is infinite and not bounded, but you could get the probability so small in the right tail that even if you did happen to get a number past the max value, you can just call it the max value. (if > max then max). If you have a better solution, I think you should put it in an answer rather than pointing out flaws in this solution which I believe is valid with a simple check. I'd like to hear what other people have/would do for the question asked.Zippel
` I think you should put it in an answer rather than pointing out flaws in this solution` Uh? Of course one MUST point out flaws or shortcomings in any answer here in SO, so that anyone (the OP or anyone) who is going to use it is aware! The goal here is not to compete, but to have a good repository of answers. It doesn't matter whether one has posted an answer of his own (which I have, BTW).Crabwise
I see you posted an answer, which is good and an good answer. I just don't think a down vote for a simple truncation of an infinitely small upper value is warranted. There may be better solutions, which I think there probably are, but this one is not terrible with the caveat of upper bound truncation.Zippel
@Zippel For the random function, though, don't you need to use an inverse function? As in, the inverse of the f distribution. From what little work I've done with random functions, that is what I had to find. In that case, you would need to look up the mathematics of the f distribution yourself and decide if it is easily invertible. If it is, easy. If it's not, you'll need to create large inverse tables from which you can determine the inverse function. Obviously, you can't write it for 0<x<1, so you'll need to pick an interval, say .001 and use extrapolation to fill in the holesScaphoid
@Ryan I'm not sure about needing an inverse function, and I have not delved into the implementation of the library I referenced. However, the random function should be pretty easy. You 'throw a dart' at the area under the curve. The x value in the graph is then your result. The implementation I cited has this function available to use. As leonboy mentioned, you do need to do a check to make sure you're not outside of the bounds of the distribution since there f-dist is not bound.Zippel
@Zippel You're saying exactly that you need an inverse. When you throw a dart at a certain Y point and see where it corresponds to the X, you're getting the inverse, which is Y yields X. Do you think I should post what I have written so far? I only have half of the program written (I have the ability to create an inverse table, but only half of that needed to read)Scaphoid
I'd probably need to dig in a bit more to give a better answer. If you look at the source here you can see how they do the random point in the f dist. Taking an inverse of any distribution has been done many times in code, you're sure to find some code out there already that does exactly what you need.Zippel
@Zippel Inverses are not always readily available. In some cases, they're impossible. In fact, in any non one-to-one function, there is no inverse (well, there are multiple conditional inverses, defined for certain ranges). Any distribution like this will have no direct inverse.Scaphoid
@Ryan, my point is that there are plenty of libraries out there (like the one I mentioned) that will find you a random point. I'm not saying that you should write the code yourself. Whether they use a lookup table or not is something you can consult the source code for. My point only is that there are plenty of libraries out there so that writing your own code in most cases is not necessary.Zippel
@Zippel While that may be true, what fun is that? :PScaphoid
For what it's worth, for many distributions there are much more efficient algorithms than inverting the CDF. One of the most popular is envelope rejection, which @Zippel seems to suggest with his dart throwing analogy happens in distlib. This method requires evaluating the PDF and generating random numbers from an envelope distribution, which is typically much more efficient. See wikipedia and this series of 4 blog posts (of mine).Skinnydip
C
9

Say X is your target variable, lets normalize the range by doing Y=(X-200)/(20000-200). So now you want some Y random variable that takes values in [0,1] with mean (500-200)/(20000-200)=1/66.

You have many options, the most natural one seems to me a Beta distribution, Y ~ Beta(a,b) with a/(a+b) = 1/66 - you have an extra degree of freedom, which you can choose either to fit the last quartile requirement.

After that, you simply return X as Y*(20000-200)+200

To generate a Beta random variable, you can use Apache Commons or see here.

Crabwise answered 9/6, 2011 at 16:46 Comment(0)
I
4

This may not be the answer you're looking for, but the specific case with 3 uniform distributions:

Uniform distributions (Ignore the numbers on the left, but it is to scale!)

public int generate() {
  if(random(0, 65) == 0) {
    // 50-100 percentile

    if(random(1, 13) > 3) {
      // 50-75 percentile
      return random(500, 5000);
    } else {
      // 75-100 percentile
      return random(5000, 20000);
    }

  } else {
    // 0-50 percentile
    return random(200, 500);
  }
}

How I got the numbers

First, the area under the curve is equal between 200-500 and 500-20000. This means that the height relationship is 300 * leftHeight == 19500 * rightHeight making leftHeight == 65 * rightHeight

This gives us a 1/66 chance to choose right, and a 65/66 chance to choose left.

I then made the same calculation for the 75th percentile, except the ratio was 500-5000 chance == 5000-20000 chance * 10 / 3. Again, this means we have a 10/13 chance to be in 50-75 percentile, and a 3/13 chance to be in 75-100.

Kudos to @Stas - I am using his 'inclusive random' function.

And yes, I realise my numbers are wrong as this method works with discrete numbers, and my calculations were continuous. It would be good if someone could correct my border cases.

Inbound answered 10/6, 2011 at 8:55 Comment(0)
P
3

You can have a function f working on [0;1] such as

Integral(f(x)dx) on [0;1] = 500
f(0) = 200
f(0.75) = 5000
f(1) = 20000

I guess a function of the form

f(x) = a*exp(x) + b*x + c

could be a solution, you just have to solve the related system.

Then, you do f(uniform_random(0,1)) and there you are !

Petra answered 16/3, 2011 at 8:48 Comment(0)
I
0

Your question is vague as there are numerous random distributions with a given minimum, maximum, and mean.

Indeed, one solution among many is to choose max with probability (mean-min)/(max-min) and min otherwise. That is, this solution generates one of only two numbers — the minimum and the maximum.

The following is another solution.

The PERT distribution (or beta-PERT distribution) is designed to take a minimum and maximum and estimated mode. It's a "smoothed-out" version of the triangular distribution, and generating a random variate from that distribution can be implemented as follows:

startpt + (endpt - startpt) * 
     BetaDist(1.0 + (midpt - startpt) * shape / (endpt - startpt), 
          1.0 + (endpt - midpt) * shape / (endpt - startpt))

where—

  • startpt is the minimum,
  • midpt is the mode (not necessarily average or mean),
  • endpt is the maximum,
  • shape is a number 0 or greater, but usually 4, and
  • BetaDist(X, Y) returns a random variate from the beta distribution with parameters X and Y.

Given a known mean (mean), midpt can be calculated by:

3 * mean / 2 - (startpt + endpt) / 4
Intaglio answered 23/7, 2017 at 9:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.