statistics - 3

4

Solved

How can I make the output from tapply() into a data.frame

I have a data.frame in R that looks like this: score rms template aln_id description 1 -261.410 4.951 2f22A.pdb 2F22A_1 S_00001_0000002_0 2 -231.987 21.813 1wb9A.pdb 1WB9A_4 S_00002_0000002_0 3 -...

r statistics

Nuss asked 11/4, 2010 at 18:48

18

Solved

What is a good solution for calculating an average where the sum of all values exceeds a double's limits?

I have a requirement to calculate the average of a very large set of doubles (10^9 values). The sum of the values exceeds the upper bound of a double, so does anyone know any neat little tricks for...

java algorithm statistics

Iridize asked 18/12, 2009 at 20:18

4

Solved

Using QUARTILE in an Excel pivot table to summarise data by sub-populations

I've got a large table of data in an Excel spreadsheet that, essentially, can be considered to be a collection of values for individuals identified as belonging to various subpopulations: IndivID...

excel statistics excel-2007 pivot-table data-analysis

Penguin asked 18/11, 2012 at 14:32

4

Solved

numpy.digitize returns values out of range?

I am using the following code to digitize an array into 16 bins: numpy.digitize(array, bins=numpy.histogram(array, bins=16)[1]) I expect that the output is in the range [1, 16], since there are ...

python statistics numpy binning

Mcwhorter asked 4/12, 2010 at 18:37

5

Solved

multivariate student t-distribution with python

To generate samples with multivariate t-distribution I use this function: def multivariatet(mu,Sigma,N,M): ''' Output: Produce M samples of d-dimensional multivariate t distribution Input: mu...

python statistics scipy probability-density

Swale asked 22/4, 2015 at 13:16

6

Solved

Calculating weighted mean and standard deviation

I have a time series x_0 ... x_t. I would like to compute the exponentially weighted variance of the data. That is: V = SUM{w_i*(x_i - x_bar)^2, i=1 to T} where SUM{w_i} = 1 and x_bar=SUM{w_i*x_i}...

r statistics mean weighted

Iover asked 6/4, 2012 at 21:20

18

How to plot ROC curve in Python

I am trying to plot a ROC curve to evaluate the accuracy of a prediction model I developed in Python using logistic regression packages. I have computed the true positive rate as well as the false ...

python matplotlib plot statistics roc

Udella asked 29/7, 2014 at 6:20

5

Solved

How do I run a ldap query using R?

I want to make a query against a LDAP directory of how employees are distributed in departments and groups... Something like: "Give me the department name of all the members of a group" and then u...

r ldap statistics rcurl ldap-query

Braswell asked 1/4, 2014 at 18:23

5

Solved

Advanced Describe Pandas

Is there a more advanced function like the describe that the pandas has? Normally i will go on like : r = pd.DataFrame(np.random.randn(1000), columns = ['A']) r.describe() and i will get a nice ...

python pandas statistics

Judaica asked 30/5, 2014 at 16:25

8

Solved

How to calculate a partial Area Under the Curve (AUC)

In scikit learn you can compute the area under the curve for a binary classifier with roc_auc_score( Y, clf.predict_proba(X)[:,1] ) I am only interested in the part of the curve where the false ...

python machine-learning statistics scikit-learn

Bolivar asked 16/9, 2016 at 17:51

5

Solved

plotting a histogram on a Log scale with Matplotlib

I have a Pandas DataFrame that has the following values in a Series x = [2, 1, 76, 140, 286, 267, 60, 271, 5, 13, 9, 76, 77, 6, 2, 27, 22, 1, 12, 7, 19, 81, 11, 173, 13, 7, 16, 19, 23, 197, 167, 1]...

python pandas numpy matplotlib statistics

Freeliving asked 16/12, 2017 at 21:41

50

Solved

Simple way to calculate median with MySQL

What's the simplest (and hopefully not too slow) way to calculate the median with MySQL? I've used AVG(x) for finding the mean, but I'm having a hard time finding a simple way of calculating the me...

sql mysql statistics median

Electrostatics asked 18/8, 2009 at 0:13

8

What kind of algorithm is behind the Akinator game?

It always amazed me how the Akinator app could guess a character by asking just several questions. So I wonder what kind of algorithm or method let it do that? Is there a name for that class of alg...

algorithm statistics machine-learning artificial-intelligence

Bookcraft asked 30/11, 2012 at 16:59

2

Solved

How to generate random normal distribution without numpy? (Google interview)

So I have a data science interview at Google, and I'm trying to prepare. One of the questions I see a lot (on Glassdoor) from people who have interviewed there before has been: "Write code to ...

python statistics

Polyhydroxy asked 20/1, 2022 at 4:19

4

MCAR Little's test in Python

How can I execute Little's Test, to find MCAR in Python? I have looked at the R package for the same test, but I want to do it in Python. Is there an alternate approach to test MCAR?

python-3.x statistics missing-data imputation hypothesis-test

Brigand asked 28/9, 2019 at 8:44

6

Using pandas, calculate Cramér's coefficient matrix

I have a dataframe in Pandas which contains metrics calculated on Wikipedia articles. Two categorical variables nation which nation the article is about, and lang which language Wikipedia this was ...

python pandas statistics

Plea asked 2/1, 2014 at 22:6

3

Solved

Python equivalent of Excel's PERCENTILE.EXC

I am using Pandas to compute some financial risk analytics, including Value at Risk. In short, to compute Value at Risk (VaR), you take a time series of simulated portfolio changes in value, and th...

python pandas statistics quantile

Lune asked 26/7, 2016 at 17:13

11

Solved

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?

I have a dataframe df and I use several columns from it to groupby: df['col1','col2','col3','col4'].groupby(['col1','col2']).mean() In the above way, I almost get the table (dataframe) that I need...

python pandas dataframe group-by statistics

Blowfish asked 15/10, 2013 at 15:0

11

Solved

how to calculate the Euclidean norm of a vector in R?

I tried norm, but I think it gives the wrong result. (the norm of c(1, 2, 3) is sqrt(1*1+2*2+3*3), but it returns 6.. x1 <- 1:3 norm(x1) # Error in norm(x1) : 'A' must be a numeric matrix norm(...

r vector statistics

Raguelragweed asked 7/6, 2012 at 14:34

3

Solved

Gini Coefficient in Julia: Efficient and Accurate Code

I'm trying to implement the following formula in Julia for calculating the Gini coefficient of a wage distribution: where Here's a simplified version of the code I'm using for this: # Takes a...

statistics distribution julia inequality

Lemos asked 9/7, 2015 at 15:24

2

Solved

R t.test group by

I have stumbled across a bit of an annoying problem. I am trying to perform multiple independent sample t-tests at once, grouped by a value. To put it into an example: In 5 cities we have measured...

r statistics t-test

Twin asked 10/3, 2020 at 11:14

3

Solved

Anova test for GLM in python

I am trying to get the F-statistic and p-value for each of the covariates in GLM. In Python I am using the stats mode.formula.api to conduct the GLM. formula = 'PropNo_Pred ~ Geography + log10BMI ...

python statistics glm statsmodels

Sundaysundberg asked 6/12, 2014 at 5:28

5

Solved

Calculating adjusted p-values in Python

So, I've been spending some time looking for a way to get adjusted p-values (aka corrected p-values, q-values, FDR) in Python, but I haven't really found anything. There's the R function p.adjust, ...

python statistics p-value q-value

Holotype asked 7/8, 2014 at 14:31

9

Difference between standardscaler and Normalizer in sklearn.preprocessing

What is the difference between standardscaler and normalizer in sklearn.preprocessing module? Don't both do the same thing? i.e remove mean and scale using deviation?

machine-learning statistics scikit-learn

Legitimatize asked 24/8, 2016 at 10:36

3

Solved

Two-sample Kolmogorov-Smirnov Test in Python Scipy

I can't figure out how to do a Two-sample KS test in Scipy. After reading the documentation of scipy kstest, I can see how to test whether a distribution is identical to standard normal distributio...

python numpy scipy statistics distribution

Irina asked 4/6, 2012 at 16:25

statistics Questions

Recommended topics

Hot tags