statistics Questions
4
Solved
I have a data.frame in R that looks like this:
score rms template aln_id description
1 -261.410 4.951 2f22A.pdb 2F22A_1 S_00001_0000002_0
2 -231.987 21.813 1wb9A.pdb 1WB9A_4 S_00002_0000002_0
3 -...
Nuss asked 11/4, 2010 at 18:48
18
Solved
I have a requirement to calculate the average of a very large set of doubles (10^9 values). The sum of the values exceeds the upper bound of a double, so does anyone know any neat little tricks for...
Iridize asked 18/12, 2009 at 20:18
4
Solved
I've got a large table of data in an Excel spreadsheet that, essentially, can be considered to be a collection of values for individuals identified as belonging to various subpopulations:
IndivID...
Penguin asked 18/11, 2012 at 14:32
4
Solved
I am using the following code to digitize an array into 16 bins:
numpy.digitize(array, bins=numpy.histogram(array, bins=16)[1])
I expect that the output is in the range [1, 16], since there are ...
Mcwhorter asked 4/12, 2010 at 18:37
5
Solved
To generate samples with multivariate t-distribution I use this function:
def multivariatet(mu,Sigma,N,M):
'''
Output:
Produce M samples of d-dimensional multivariate t distribution
Input:
mu...
Swale asked 22/4, 2015 at 13:16
6
Solved
I have a time series x_0 ... x_t. I would like to compute the exponentially weighted variance of the data. That is:
V = SUM{w_i*(x_i - x_bar)^2, i=1 to T} where SUM{w_i} = 1 and x_bar=SUM{w_i*x_i}...
Iover asked 6/4, 2012 at 21:20
18
I am trying to plot a ROC curve to evaluate the accuracy of a prediction model I developed in Python using logistic regression packages. I have computed the true positive rate as well as the false ...
Udella asked 29/7, 2014 at 6:20
5
Solved
I want to make a query against a LDAP directory of how employees are distributed in departments and groups...
Something like: "Give me the department name of all the members of a group" and then u...
Braswell asked 1/4, 2014 at 18:23
5
Solved
Is there a more advanced function like the describe that the pandas has?
Normally i will go on like :
r = pd.DataFrame(np.random.randn(1000), columns = ['A'])
r.describe()
and i will get a nice ...
Judaica asked 30/5, 2014 at 16:25
8
Solved
In scikit learn you can compute the area under the curve for a binary classifier with
roc_auc_score( Y, clf.predict_proba(X)[:,1] )
I am only interested in the part of the curve where the false ...
Bolivar asked 16/9, 2016 at 17:51
5
Solved
I have a Pandas DataFrame that has the following values in a Series
x = [2, 1, 76, 140, 286, 267, 60, 271, 5, 13, 9, 76, 77, 6, 2, 27, 22, 1, 12, 7, 19, 81, 11, 173, 13, 7, 16, 19, 23, 197, 167, 1]...
Freeliving asked 16/12, 2017 at 21:41
50
Solved
What's the simplest (and hopefully not too slow) way to calculate the median with MySQL? I've used AVG(x) for finding the mean, but I'm having a hard time finding a simple way of calculating the me...
Electrostatics asked 18/8, 2009 at 0:13
8
It always amazed me how the Akinator app could guess a character by asking just several questions. So I wonder what kind of algorithm or method let it do that? Is there a name for that class of alg...
Bookcraft asked 30/11, 2012 at 16:59
2
Solved
So I have a data science interview at Google, and I'm trying to prepare. One of the questions I see a lot (on Glassdoor) from people who have interviewed there before has been: "Write code to ...
Polyhydroxy asked 20/1, 2022 at 4:19
4
How can I execute Little's Test, to find MCAR in Python? I have looked at the R package for the same test, but I want to do it in Python. Is there an alternate approach to test MCAR?
Brigand asked 28/9, 2019 at 8:44
6
I have a dataframe in Pandas which contains metrics calculated on Wikipedia articles. Two categorical variables nation which nation the article is about, and lang which language Wikipedia this was ...
Plea asked 2/1, 2014 at 22:6
3
Solved
I am using Pandas to compute some financial risk analytics, including Value at Risk. In short, to compute Value at Risk (VaR), you take a time series of simulated portfolio changes in value, and th...
Lune asked 26/7, 2016 at 17:13
11
Solved
I have a dataframe df and I use several columns from it to groupby:
df['col1','col2','col3','col4'].groupby(['col1','col2']).mean()
In the above way, I almost get the table (dataframe) that I need...
Blowfish asked 15/10, 2013 at 15:0
11
Solved
I tried norm, but I think it gives the wrong result. (the norm of c(1, 2, 3) is sqrt(1*1+2*2+3*3), but it returns 6..
x1 <- 1:3
norm(x1)
# Error in norm(x1) : 'A' must be a numeric matrix
norm(...
Raguelragweed asked 7/6, 2012 at 14:34
3
Solved
I'm trying to implement the following formula in Julia for calculating the Gini coefficient of a wage distribution:
where
Here's a simplified version of the code I'm using for this:
# Takes a...
Lemos asked 9/7, 2015 at 15:24
2
Solved
I have stumbled across a bit of an annoying problem. I am trying to perform multiple independent sample t-tests at once, grouped by a value.
To put it into an example: In 5 cities we have measured...
Twin asked 10/3, 2020 at 11:14
3
Solved
I am trying to get the F-statistic and p-value for each of the covariates in GLM. In Python I am using the stats mode.formula.api to conduct the GLM.
formula = 'PropNo_Pred ~ Geography + log10BMI ...
Sundaysundberg asked 6/12, 2014 at 5:28
5
Solved
So, I've been spending some time looking for a way to get adjusted p-values (aka corrected p-values, q-values, FDR) in Python, but I haven't really found anything. There's the R function p.adjust, ...
Holotype asked 7/8, 2014 at 14:31
9
What is the difference between standardscaler and normalizer in sklearn.preprocessing module?
Don't both do the same thing? i.e remove mean and scale using deviation?
Legitimatize asked 24/8, 2016 at 10:36
3
Solved
I can't figure out how to do a Two-sample KS test in Scipy.
After reading the documentation of scipy kstest, I can see how to test whether a distribution is identical to standard normal distributio...
Irina asked 4/6, 2012 at 16:25
© 2022 - 2024 — McMap. All rights reserved.