How to convert log probability into simple probability between 0 and 1 values using python

Asked 26/1, 2018 at 16:42 Answered 15/5, 2020 at 18:21

python gaussian logarithm gmm probability-distribution

I am using Gaussian mixture model for speaker identification. I use this code to predict the speaker for each voice clip.

for path in file_paths:   
    path = path.strip()   
    print (path)
    sr,audio = read(source + path)
    vector   = extract_features(audio,sr)
    #print(vector)
    log_likelihood = np.zeros(len(models))
    #print(len(log_likelihood))

    for i in range(len(models)):
        gmm1   = models[i]  #checking with each model one by one
        #print(gmm1)
        scores = np.array(gmm1.score(vector)) 
        #print(scores)
        #print(len(scores))
        log_likelihood[i] = scores.sum()
        print(log_likelihood)
        winner = np.argmax(log_likelihood)
        #print(winner)
    print ("\tdetected as - ", speakers[winner])

and it gives me the output like this:

[ 311.79769716    0.            0.            0.            0.        ]
[  311.79769716 -5692.56559902     0.             0.             0.        ]
[  311.79769716 -5692.56559902 -6170.21460788     0.             0.        ]
[  311.79769716 -5692.56559902 -6170.21460788 -6736.73192695     0.        ]
[  311.79769716 -5692.56559902 -6170.21460788 -6736.73192695 -6753.00196447]
    detected as -  bart

Here score function gives me the log probability for each speaker. Now i want to decide threshold value, for that i need these log probability value into simple probability value (between 0 to 1). How can i do that? I am using python software.

Path answered 26/1, 2018 at 16:42 Comment(2)

en.wikipedia.org/wiki/Logarithm – Human 26/1, 2018 at 21:5

Although I can't think of a good reason you would need to convert log probabilities back. Log probabilities are easier to work with in general. – Human 26/1, 2018 at 21:6

You have to take exponent (np.exp()) of the log probabilities to get the actual probabilities back. It's because logarithm is the inverse of exponentiation: e^log(p) = p, where p are the probabilities.

Below is an example:

# some input array
In [9]: a
Out[9]: array([1, 2, 3, 4, 5, 6, 7, 8, 9])

# converting to probabilities using "softmax"
In [10]: probs = np.exp(a) / (np.exp(a)).sum()

# sanity check
In [11]: probs.sum()
Out[11]: 1.0

# obtaining log probabilities
In [12]: log_probs = np.log(probs)

In [13]: log_probs
Out[13]: 
array([-8.45855173, -7.45855173, -6.45855173, -5.45855173, -4.45855173,
       -3.45855173, -2.45855173, -1.45855173, -0.45855173])

# In most cases, it won't sum to 1.0
In [14]: log_probs.sum()
Out[14]: -40.126965551706405

# get the probabilities back
In [15]: probabilities = np.exp(log_probs)

In [16]: probabilities.sum()   # check passed
Out[16]: 1.0

In [17]: probabilities
Out[17]: 
array([  2.12078996e-04,   5.76490482e-04,   1.56706360e-03,
         4.25972051e-03,   1.15791209e-02,   3.14753138e-02,
         8.55587737e-02,   2.32572860e-01,   6.32198578e-01])

Stentorian answered 26/1, 2018 at 16:48 Comment(4)

I also tried using np.exp() function, but it does not give me the accurate result. It gives me the output array with scientific value(including greater than 1). How is it possible? because probability is never greater than 1. – Path 27/1, 2018 at 5:55

@Path without knowing the contents of arrays, it's tricky to reproduce your setting. – Stentorian 27/1, 2018 at 13:56

I mentioned my array contents (output) in my question. I mentioned my 5*5 array output in my question. Please look at that output and suggest me how can i convert these array values between 0 and 1. I want to decide threshold value, that's why i need values between 0 and 1. – Path 27/1, 2018 at 17:0

Works great! @Path you must be reading the output incorrectly. Numpy prints in scientific notation. Maybe try np.exp().tolist() for python list – Aggrieved 16/4, 2020 at 13:58

The GMM module's score_sample from sklearn gives the probability density and they won't sum to 0, rather integrate to 1.

data = 10 * np.random.rand(100)
model = mixture.GMM(n_components=1).fit(data[:, None])
xfit = np.linspace(-5, 15, 5000)
logprob, _ = model.score_samples(xfit[:, None])
dx = xfit[1] - xfit[0]
print(dx * np.sum(np.exp(logprob)))
# 0.999773872653

You can also calculate the probability of a data point belonging to a multivariate normal distribution.,

Source: https://github.com/scikit-learn/scikit-learn/issues/4202

Jarv answered 15/5, 2020 at 18:21 Comment(0)

Recommended topics

Hot tags