Equivalent python command for quantile in matlab
Asked Answered
K

3

5

I'm trying to replicate some Matlab code in python. I could not find an exact equivalent to the Matlab function quantile. What I found most close is python's mquantiles.

Matlab example:

 quantile( [ 8.60789925e-05, 1.98989354e-05 , 1.68308882e-04, 1.69379370e-04],  0.8)

...gives: 0.00016958

Same example in python:

scipy.stats.mstats.mquantiles( [8.60789925e-05, 1.98989354e-05, 1.68308882e-04, 1.69379370e-04], 0.8)

...gives 0.00016912

Does anyone know how to exactly replicate Matlab's quantile function?

Kerch answered 5/12, 2012 at 21:44 Comment(0)
E
5

The documentation for quantile (under the More About => Algorithms section) gives the exact algorithm used. Here's some python code that does it for a single quantile for a flat array, using bottleneck to do partial sorting:

import numpy as np
import botteleneck as bn

def quantile(a, prob):
    """
    Estimates the prob'th quantile of the values in a data array.

    Uses the algorithm of matlab's quantile(), namely:
        - Remove any nan values
        - Take the sorted data as the (.5/n), (1.5/n), ..., (1-.5/n) quantiles.
        - Use linear interpolation for values between (.5/n) and (1 - .5/n).
        - Use the minimum or maximum for quantiles outside that range.

    See also: scipy.stats.mstats.mquantiles
    """
    a = np.asanyarray(a)
    a = a[np.logical_not(np.isnan(a))].ravel()
    n = a.size

    if prob >= 1 - .5/n:
        return a.max()
    elif prob <= .5 / n:
        return a.min()

    # find the two bounds we're interpreting between:
    # that is, find i such that (i+.5) / n <= prob <= (i+1.5)/n
    t = n * prob - .5
    i = np.floor(t)

    # partial sort so that the ith element is at position i, with bigger ones
    # to the right and smaller to the left
    a = bn.partsort(a, i)

    if i == t: # did we luck out and get an integer index?
        return a[i]
    else:
        # we'll linearly interpolate between this and the next index
        smaller = a[i]
        larger = a[i+1:].min()
        if np.isinf(smaller):
            return smaller # avoid inf - inf
        return smaller + (larger - smaller) * (t - i)

I only did the single-quantile, 1d case because that's all I needed. If you want several quantiles, it's probably worth just doing the full sort; to do it per-axis and knew you didn't have any nans, all you should need to do is add an axis argument to the sort and vectorize the linear interpolation bit. Doing it per-axis with nans would be a little trickier.

This code gives:

>>> quantile([ 8.60789925e-05, 1.98989354e-05 , 1.68308882e-04, 1.69379370e-04], 0.8)
0.00016905822360000001

and the matlab code gave 0.00016905822359999999; the difference is 3e-20. (which is less than machine precision)

Espionage answered 30/12, 2012 at 22:13 Comment(0)
H
4

Your input vector only has 4 values, which is far too few to get a good approximation of the quantiles of the underlying distribution. The discrepancy is probably the result of Matlab and SciPy using different heuristics to compute quantiles on under sampled distributions.

Hollyhock answered 16/12, 2012 at 17:47 Comment(1)
Why the downvote? If there is a problem with my answer I'd love to know what it is.Hollyhock
U
3

A bit late, but:

mquantiles is very flexible. You just need to provide alphap and betap parameters. Here, since MATLAB does a linear interpolation, you need to set the parameters to (0.5,0.5).

In [9]: scipy.stats.mstats.mquantiles( [8.60789925e-05, 1.98989354e-05, 1.68308882e-04, 1.69379370e-04], 0.8, alphap=0.5, betap=0.5)

EDIT: MATLAB says that it does linear interpolation, however it seems that it calculates the quantile through piece-wise linear interpolation, which is equivalent to Type 5 quantile in R, and (0.5, 0.5) in scipy.

Unexacting answered 20/4, 2015 at 12:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.