statistics Questions

4

Solved

I have a data.frame in R that looks like this: score rms template aln_id description 1 -261.410 4.951 2f22A.pdb 2F22A_1 S_00001_0000002_0 2 -231.987 21.813 1wb9A.pdb 1WB9A_4 S_00002_0000002_0 3 -...
Nuss asked 11/4, 2010 at 18:48

18

Solved

I have a requirement to calculate the average of a very large set of doubles (10^9 values). The sum of the values exceeds the upper bound of a double, so does anyone know any neat little tricks for...
Iridize asked 18/12, 2009 at 20:18

4

Solved

I've got a large table of data in an Excel spreadsheet that, essentially, can be considered to be a collection of values for individuals identified as belonging to various subpopulations: IndivID...
Penguin asked 18/11, 2012 at 14:32

4

Solved

I am using the following code to digitize an array into 16 bins: numpy.digitize(array, bins=numpy.histogram(array, bins=16)[1]) I expect that the output is in the range [1, 16], since there are ...
Mcwhorter asked 4/12, 2010 at 18:37

5

Solved

To generate samples with multivariate t-distribution I use this function: def multivariatet(mu,Sigma,N,M): ''' Output: Produce M samples of d-dimensional multivariate t distribution Input: mu...
Swale asked 22/4, 2015 at 13:16

6

Solved

I have a time series x_0 ... x_t. I would like to compute the exponentially weighted variance of the data. That is: V = SUM{w_i*(x_i - x_bar)^2, i=1 to T} where SUM{w_i} = 1 and x_bar=SUM{w_i*x_i}...
Iover asked 6/4, 2012 at 21:20

18

I am trying to plot a ROC curve to evaluate the accuracy of a prediction model I developed in Python using logistic regression packages. I have computed the true positive rate as well as the false ...
Udella asked 29/7, 2014 at 6:20

5

Solved

I want to make a query against a LDAP directory of how employees are distributed in departments and groups... Something like: "Give me the department name of all the members of a group" and then u...
Braswell asked 1/4, 2014 at 18:23

5

Solved

Is there a more advanced function like the describe that the pandas has? Normally i will go on like : r = pd.DataFrame(np.random.randn(1000), columns = ['A']) r.describe() and i will get a nice ...
Judaica asked 30/5, 2014 at 16:25

8

Solved

In scikit learn you can compute the area under the curve for a binary classifier with roc_auc_score( Y, clf.predict_proba(X)[:,1] ) I am only interested in the part of the curve where the false ...
Bolivar asked 16/9, 2016 at 17:51

5

Solved

I have a Pandas DataFrame that has the following values in a Series x = [2, 1, 76, 140, 286, 267, 60, 271, 5, 13, 9, 76, 77, 6, 2, 27, 22, 1, 12, 7, 19, 81, 11, 173, 13, 7, 16, 19, 23, 197, 167, 1]...
Freeliving asked 16/12, 2017 at 21:41

50

Solved

What's the simplest (and hopefully not too slow) way to calculate the median with MySQL? I've used AVG(x) for finding the mean, but I'm having a hard time finding a simple way of calculating the me...
Electrostatics asked 18/8, 2009 at 0:13

8

It always amazed me how the Akinator app could guess a character by asking just several questions. So I wonder what kind of algorithm or method let it do that? Is there a name for that class of alg...
Bookcraft asked 30/11, 2012 at 16:59

2

Solved

So I have a data science interview at Google, and I'm trying to prepare. One of the questions I see a lot (on Glassdoor) from people who have interviewed there before has been: "Write code to ...
Polyhydroxy asked 20/1, 2022 at 4:19

4

How can I execute Little's Test, to find MCAR in Python? I have looked at the R package for the same test, but I want to do it in Python. Is there an alternate approach to test MCAR?

6

I have a dataframe in Pandas which contains metrics calculated on Wikipedia articles. Two categorical variables nation which nation the article is about, and lang which language Wikipedia this was ...
Plea asked 2/1, 2014 at 22:6

3

Solved

I am using Pandas to compute some financial risk analytics, including Value at Risk. In short, to compute Value at Risk (VaR), you take a time series of simulated portfolio changes in value, and th...
Lune asked 26/7, 2016 at 17:13

11

Solved

I have a dataframe df and I use several columns from it to groupby: df['col1','col2','col3','col4'].groupby(['col1','col2']).mean() In the above way, I almost get the table (dataframe) that I need...
Blowfish asked 15/10, 2013 at 15:0

11

Solved

I tried norm, but I think it gives the wrong result. (the norm of c(1, 2, 3) is sqrt(1*1+2*2+3*3), but it returns 6.. x1 <- 1:3 norm(x1) # Error in norm(x1) : 'A' must be a numeric matrix norm(...
Raguelragweed asked 7/6, 2012 at 14:34

3

Solved

I'm trying to implement the following formula in Julia for calculating the Gini coefficient of a wage distribution: where Here's a simplified version of the code I'm using for this: # Takes a...
Lemos asked 9/7, 2015 at 15:24

2

Solved

I have stumbled across a bit of an annoying problem. I am trying to perform multiple independent sample t-tests at once, grouped by a value. To put it into an example: In 5 cities we have measured...
Twin asked 10/3, 2020 at 11:14

3

Solved

I am trying to get the F-statistic and p-value for each of the covariates in GLM. In Python I am using the stats mode.formula.api to conduct the GLM. formula = 'PropNo_Pred ~ Geography + log10BMI ...
Sundaysundberg asked 6/12, 2014 at 5:28

5

Solved

So, I've been spending some time looking for a way to get adjusted p-values (aka corrected p-values, q-values, FDR) in Python, but I haven't really found anything. There's the R function p.adjust, ...
Holotype asked 7/8, 2014 at 14:31

9

What is the difference between standardscaler and normalizer in sklearn.preprocessing module? Don't both do the same thing? i.e remove mean and scale using deviation?
Legitimatize asked 24/8, 2016 at 10:36

3

Solved

I can't figure out how to do a Two-sample KS test in Scipy. After reading the documentation of scipy kstest, I can see how to test whether a distribution is identical to standard normal distributio...
Irina asked 4/6, 2012 at 16:25

© 2022 - 2024 — McMap. All rights reserved.