Rosalind: Mendel's first law

Asked 15/10, 2014 at 15:10 Answered 7/4, 2022 at 13:46

python probability bioinformatics rosalind

I'm trying to solve the problem at http://rosalind.info/problems/iprb/

Given: Three positive integers k, m, and n, representing a population containing k+m+n organisms: k individuals are homozygous dominant for a factor, m are heterozygous, and n are homozygous recessive.

Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype). Assume that any two organisms can mate.

My solution works for the sample, but not for any problems generated. After further research it seems that I should find the probability of choosing any one organism at random, find the probability of choosing the second organism, and then the probability of that pairing producing offspring with a dominant allele.

My question is then: what does my code below find the probability of? Does it find the percentage of offspring with a dominant allele for all possible matings -- so rather than the probability of one random mating, my code is solving for the percentage of offspring with dominant alleles if all pairs were tested?

f = open('rosalind_iprb.txt', 'r')
r = f.read()
s = r.split()
############# k = # homozygotes dominant, m = #heterozygotes, n = # homozygotes recessive
k = float(s[0])
m = float(s[1])
n = float(s[2])
############# Counts for pairing between each group and within groups
k_k = 0
k_m = 0
k_n = 0

m_m = 0
m_n = 0

n_n = 0


##############
if k > 1:
    k_k = 1.0 + (k-2) * 2.0

k_m = k * m

k_n = k * n

if m > 1:
    m_m = 1.0 + (m-2) * 2.0

m_n = m * n

if n> 1:
    n_n = 1.0 + (n-2) * 2.0
#################
dom = k_k + k_m + k_n + 0.75*m_m + 0.5*m_n
total = k_k + k_m + k_n + m_m + m_n + n_n

chance = dom/total
print chance

Smoothshaven answered 15/10, 2014 at 15:10 Comment(5)

Comments on what your various variables mean would be super helpful, but yes, at a quick glance, that does seem to be what you calculated. – Cammack 15/10, 2014 at 15:14

@jonrsharpe You should be able to pretty easily do this as a product of probabilities rather than a Monte Carlo. MC's are really useful when calculating the probability of any particular probability in the chain is impractical and it's easier to just simulate the outcome directly. In this case it should be possible to find the actual, exact answer as a function of p(k,m,n) directly. – Cammack 15/10, 2014 at 15:16

Thanks for doing it right, and posting your actual code with the relevant question, not just some waffle (which is unfortunately quite common). – Battlefield 15/10, 2014 at 15:25

The problem, mathematically, is similar to drawing cards from a deck. That'll let you calculate the odds of getting any particular pairing of mates, and from there you should be able to expand an extra parameter to give you the odds of producing the dominant alleles. – Cammack 15/10, 2014 at 20:32

possible duplicate of Rosalind "Mendel's First Law" IPRB – Calash 19/11, 2014 at 18:10

Looking at your code, I'm having a hard time figuring out what it's supposed to do. I'll work through the problem here.

Let's simplify the wording. There are n1 type 1, n2 type 2, and n3 type 3 items.

How many ways are there to choose a set of size 2 out of all the items? (n1 + n2 + n3) choose 2.

Every pair of items will have item types corresponding to one of the six following unordered multisets: {1,1}, {2,2}, {3,3}, {1,2}, {1,3}, {2,3}

How many multisets of the form {i,i} are there? ni choose 2.

How many multisets of the form {i,j} are there, where i != j? ni * nj.

The probabilities of the six multisets are thus the following:

P({1,1}) = [n1 choose 2] / [(n1 + n2 + n3) choose 2]
P({2,2}) = [n2 choose 2] / [(n1 + n2 + n3) choose 2]
P({3,3}) = [n3 choose 2] / [(n1 + n2 + n3) choose 2]
P({1,2}) = [n1 * n2] / [(n1 + n2 + n3) choose 2]
P({1,3}) = [n1 * n3] / [(n1 + n2 + n3) choose 2]
P({2,3}) = [n2 * n3] / [(n1 + n2 + n3) choose 2]

These sum to 1. Note that [X choose 2] is just [X * (X - 1) / 2] for X > 1 and 0 for X = 0 or 1.

Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype).

To answer this question, you simply need to identify which of the six multisets correspond to this event. Lacking the genetics knowledge to answer that question, I'll leave that to you.

For example, suppose that a dominant allele results if either of the two parents was type 1. Then the events of interest are {1,1}, {1,2}, {1,3} and the probability of the event is P({1,1}) + P({1,2}) + P({1,3}).

Denationalize answered 16/10, 2014 at 17:10 Comment(0)

I spend some time in this question, so, to clarify in python:

lst = ['2', '2', '2']
k, m, n = map(float, lst)
t = sum(map(float, lst))
# organize a list with allele one * allele two (possibles) * dominant probability
# multiplications by one were ignored
# remember to substract the haplotype from the total when they're the same for the second haplotype choosed
couples = [
            k*(k-1),  # AA x AA
            k*m,  # AA x Aa
            k*n,  # AA x aa
            m*k,  # Aa x AA
            m*(m-1)*0.75,  # Aa x Aa
            m*n*0.5,  # Aa x aa
            n*k,  # aa x AA
            n*m*0.5,  # aa x Aa
            n*(n-1)*0  # aa x aa
]
# (t-1) indicate that the first haplotype was select
print(round(sum(couples)/t/(t-1), 5))

Crescendo answered 26/6, 2017 at 2:3 Comment(0)

If you are interested, I just found a solution and put it in C#.

    public double mendel(double k, double m, double n)
    {
        double prob;

        prob = ((k*k - k) + 2*(k*m) + 2*(k*n) + (.75*(m*m - m)) + 2*(.5*m*n))/((k + m + n)*(k + m + n -1)); 

        return prob;
    }

Our parameters are k (dominant), m (hetero), & n (recessive). First I found the probability for each possible breeding pair selection in terms of percentage of the population. So, a first round choice for k would look like k/(k+m+n), and a second round choice of k after a first round choice of k would look like (k-1)/(k+m+n). Then multiply these two to get the outcome. Since there were three identified populations, there were nine possible outcomes.

Then I multiplied each outcome by it's dominance probability - 100% for anything with k, 75% for m&m, 50% for m&n, n&m, and 0% for n&n. Now add the outcomes together, and you have your solution.

http://rosalind.info/problems/iprb/

Ecliptic answered 8/3, 2015 at 1:37 Comment(2)

Thanks, Ben. This is the first equation and explanation that I've seen that has helped me with respect to this problem. However, I am still a bit confused as to where the '-k' and '-m' came from in your numerator. – Preuss 3/12, 2015 at 4:13

Would you be able to refactor the code a bit so it's readable? I am having trouble understanding how you came up with those magic numbers such as 2, .75, and .5 – Copepod 30/5, 2016 at 19:25

Here is the code I did in python: We don't want the offspring to be completely recessive, so we should make the probability tree and look at the cases and the probabilities of the cases that event might happen. Then the probability that we want is 1 - p_reccesive. More explanation is provided in the comment section of the following code.

"""
Let d: dominant, h: hetero, r: recessive
Let a = k+m+n
Let X = the r.v. associated with the first person randomly selected
Let Y = the r.v. associated with the second person randomly selected without replacement
Then:
k = f_d => p(X=d) = k/a => p(Y=d| X=d) = (k-1)/(a-1) ,
                           p(Y=h| X=d) = (m)/(a-1) ,
                           p(Y=r| X=d) = (n)/(a-1)
                           
m = f_h => p(X=h) = m/a => p(Y=d| X=h) = (k)/(a-1) ,
                           p(Y=h| X=h) = (m-1)/(a-1)
                           p(Y=r| X=h) = (n)/(a-1)
                           
n = f_r => p(X=r) = n/a => p(Y=d| X=r) = (k)/(a-1) ,
                           p(Y=h| X=r) = (m)/(a-1) ,
                           p(Y=r| X=r) = (n-1)/(a-1)
Now the joint would be:
                            |    offspring possibilites given X and Y choice
-------------------------------------------------------------------------
X Y |  P(X,Y)               |   d(dominant)     h(hetero)   r(recessive)
-------------------------------------------------------------------------
d d     k/a*(k-1)/(a-1)     |    1               0           0
d h     k/a*(m)/(a-1)       |    1/2            1/2          0
d r     k/a*(n)/(a-1)       |    0               1           0
                            |
h d     m/a*(k)/(a-1)       |    1/2            1/2          0
h h     m/a*(m-1)/(a-1)     |    1/4            1/2         1/4
h r     m/a*(n)/(a-1)       |    0              1/2         1/2
                            |
r d     n/a*(k)/(a-1)       |    0               0           0
r h     n/a*(m)/(a-1)       |    0               1/2        1/2
r r     n/a*(n-1)/(a-1)     |    0               0           1

Here what we don't want is the element in the very last column where the offspring is completely recessive.
so P = 1 - those situations as follow
"""

path = 'rosalind_iprb.txt'

with open(path, 'r') as file:
    lines = file.readlines()
k, m, n = [int(i) for i in lines[0].split(' ')]
a = k + m + n
p_recessive = (1/4*m*(m-1) + 1/2*m*n + 1/2*m*n + n*(n-1))/(a*(a-1))
p_wanted = 1 - p_recessive
p_wanted = round(p_wanted, 5)
print(p_wanted)

Tabb answered 7/4, 2022 at 13:46 Comment(0)

Recommended topics

Hot tags