Enter an If Statement Using Probabilities
Asked Answered
J

1

5

I have the function mutateSequence that takes in three parameters. The parameter p is a value between 0 and 1, inclusive. I need two if statements, one that is entered with probability 4p/5 and another that is entered with probability p/5. How do I write the logic to make this happen?

Code:

void mutateSequence(vector<pair<string, string>> v, int k, double p)
{
       for (int i = 0; i < k - 1; i++)
    {
        string subjectSequence = v[i].second;
        for (int j = 0; j < subjectSequence.length(); j++)
        {
            // with probability 4p/5 replace the nucelotide randomly
            if (//enter with probability of 4p/5)
            {
               //do something
            }
            if (//enter with probability of p/5)
            {
                //do something
            }
          
        }
    }
}

I am expecting that the first if statement is entered with probability 4p/5 and the second if statement is entered with probability p/5

Jeffrey answered 7/11, 2022 at 12:51 Comment(10)
rand()%(5*p) < 4*p should do the trickDionnadionne
Have a look at std::rand(). There might be higher quality RNG if needed, but this should do for a simple simulation. Edit: Ah, ninja'd.Airflow
rand() is biased. Don't use it if you need any quality of randomness at all. It seems that you are looking for std::bernoulli_distribution.Marcos
Are you sure about you logic here ? If you don't enter the first case, only in 20% of cases do you then enter the second case. So in 100%*(1-0.8)*(1-0.2) nothing happens.Airflow
Are you sure the probabilities should be 4p/5 and p/5, not 4/5 and 1/5? Is p a parameter for how big the probability should be or is it some random sample you are given to use for making the mutate/do-not-mutate selection?Amplification
Careful! void mutateSequence(vector<pair<string, string>> v, ... passes v by value! Since you want to mutate v, you probably want to pass it by reference instead, i.e. void mutateSequence(vector<pair<string, string>> &v, ...).Characteristically
@EricPostpischil the probabilities should be 4p/5 and p/5. The parameter p is passed in as a command line argument by the user. I know that it is odd but it is necessary for the goal of this simulation.Jeffrey
Is the requirement that with probability p exactly one of these two branches occurs, and given that something occurs, with probability 4/5 it is the first?Zed
You have three cases to select from: 4p/5 enter the first if, 1p/5 enter the second if, 1−p enter neither. The prototypical way to do this is to divide the line segment from 0 to 1 into three intervals of lengths 4p/5, 1p/5, and 1−p. Then draw a random number in [0, 1). If it is less than 4*p/5, it is in the first interval. Otherwise, if it is less than p, it is in the second interval (the interval from 4p/5 to p has length p/5). Otherwise, it is in the third interval. This can be done with a simple draw from a uniform distribution. I will let others answer on C++ features for this.Amplification
btw in your code it is not either enter the first if or enter the second or neither. In your code both can be entered. Maybe you acutally want else if rather than if, not sureConsequential
Z
7

There's a very straightforward way to do this in modern C++. First we set it up:

#include <random>
std::random_device rd;
std::mt19937 gen(rd());
// p entered by user elsewhere
// give "true" 4p/5 of the time
std::bernoulli_distribution d1(4.0*p/5.0);
// give "true" 1p/5 of the time
std::bernoulli_distribution d2(1.0*p/5.0);

Then when we want to use it:

if (d1(gen)) {
    // replace nucleotide with 4p/5 probability
} else {
    // something else with 1 - 4p/5 probability
}

If instead, you want do one thing with probability 4p/5 and then, independently, another thing with probability 1p/5, that's also easily done:

if (d1(gen)) {
    // replace nucleotide with 4p/5 probability
} 
if (d2(gen)) {
    // something else with 1p/5 probability
}

See bernoulli_distribution for more detail.

Zaccaria answered 7/11, 2022 at 13:5 Comment(7)
The question asks for probabilities 4p/5 and 1p/5, not 4/5 and 1/5. That looks odd to me, and I entered a comment requesting clarification, but giving an answer that is not for the asked question and while the question is unclear risks votes down.Amplification
And now the OP confirms they do mean 4p/5 and 1p/5 (and 1−4p/5-1p/5 = 1-p for doing nothing), making this answer wrong. So there are three options: 4p/5 enter the first if, 1p/5 enter the second if, 1−p enter neither.Amplification
If gen is set up so d(gen) gives 4p/5 probability, then !d(gen) gives 1-4p/5 probability, not 1p/5.Amplification
You say “very straightforward” but then your code seeds the random generator incorrectly.Graaf
Seeding the generator is a whole other question. For a treatment of that topic, I'd recommend pcg-random.org/posts/cpp-seeding-surprises.htmlZaccaria
Yes, that’s my point. There’s very little “straightforward” about the <random> standard header if you want to use it correctly, unfortunately.Graaf
It depends on the needs of the simulation. One could reasonably seed the generator with a single, fixed integer if the goal is to have a repeatable simulation.Zaccaria

© 2022 - 2024 — McMap. All rights reserved.