SQL - STDEVP or STDEV and how to use it?
Asked Answered
S

3

41

I have a table:

LocationId OriginalValue Mean
1          0.45         3.99  
2          0.33         3.99
3          16.74        3.99
4          3.31         3.99

and so forth...

How would I work out the Standard Deviation using this table and also what would you recommend - STDEVP or STDEV?

Semitropical answered 15/2, 2013 at 11:39 Comment(4)
Are these the original data points and if so what is mean?Timmytimocracy
Mean is the overall averageSemitropical
Hmm they add up to more than 20 - so mean should be above 5Timmytimocracy
Sorry this is just a sample of the table not the entire table, shouldve mentioned thatSemitropical
G
50

To use it, simply:

SELECT STDEVP(OriginalValue)
FROM yourTable

From below, you probably want STDEVP.

From here:

STDEV is used when the group of numbers being evaluated are only a partial sampling of the whole population. The denominator for dividing the sum of squared deviations is N-1, where N is the number of observations ( a count of items in the data set ). Technically, subtracting the 1 is referred to as "non-biased."

STDEVP is used when the group of numbers being evaluated is complete - it's the entire population of values. In this case, the 1 is NOT subtracted and the denominator for dividing the sum of squared deviations is simply N itself, the number of observations ( a count of items in the data set ). Technically, this is referred to as "biased." Remembering that the P in STDEVP stands for "population" may be helpful. Since the data set is not a mere sample, but constituted of ALL the actual values, this standard deviation function can return a more precise result.

Glynis answered 15/2, 2013 at 11:51 Comment(3)
Explanation at en.wikipedia.org/wiki/Bessel%27s_correction#Source_of_biasDistance
what is the default standard deviation 1, 2 or 3 in STDEV function?. can we change it? 68% of values are within 1 standard deviation of the mean OR 95% of values are within 2 standard deviations of the mean OR 99.7% of values are within 3 standard deviations of the meanValine
@AbubakarRiaz If you're calculating the standard deviation, that is 1 standard deviation by definition. This should be true for basically all functions that calculate standard deviation. If you want 2 or 3 standard deviations, you could just multiply the result by 2 or 3. Note that the 68/95/99.7 rule only applies for data with an approximately normal distribution.Glynis
S
13

Generally, you should use STDEV when you have to estimate standard deviation based on a sample. But if you have entire column-data given as arguments, then use STDEVP.

In general, if your data represents the entire population, use STDEVP; otherwise, use STDEV.

Note that for large samples, the functions return nearly the same value, so better use STDEV in this case.

Scalariform answered 15/2, 2013 at 11:56 Comment(3)
My table contains about 240 rows, what option should I use? And what does it mean 'sample'?Semitropical
as I had stated, it doesnt matter what you use, if u have large data (which is in your case). Sample here means what type of data you have, {like 2.0, 3.0 or 2.4,3.4 like that! }Scalariform
In your last paragraph, you suggest using STDEV for large samples (even when the data represents the complete population and according to your answer, STDEVP would be more appropriate). Why?Wilberwilberforce
J
8

In statistics, there are two types of standard deviations: one for a sample and one for a population.

The sample standard deviation, generally notated by the letter s, is used as an estimate of the population standard deviation.

The population standard deviation, generally notated by the Greek letter lower case sigma, is used when the data constitutes the complete population.

It is difficult to answer your question directly -- sample or population -- because it is difficult to tell what you are working with: a sample or a population. It often depends on context.

Consider the following example. If I want to know the standard deviation of the age of students in my class, then I u=would use STDEVP because the class is my population. But if I want the use my class as a sample of the population of all students in the school (this would be what is known as a convenience sample, and would likely be biased, but I digress), then I would use STDEV because my class is a sample. The resulting value would be my best estimate of STDEVP.

As mentioned above (1) for large sample sizes (say, more than thirty), the difference between the two becomes trivial, and (2) generally you should use STDEV, not STDEVP, because in practice we usually don't have access to the population. Indeed, one could argue that if we always had access to populations, then we wouldn't need statistics. The entire point of inferential statistics is to be able to make inferences about a population based on the sample.

Junco answered 11/5, 2017 at 17:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.