Calculation p-values of a f-statistic with R
Asked Answered
H

2

12

I'm trying to calculate p-values of a f-statistic with R. The formula R uses in the lm() function is equal to (e.g. assume x=100, df1=2, df2=40):

pf(100, 2, 40, lower.tail=F)
[1] 2.735111e-16

which should be equal to

1-pf(100, 2, 40)
[1] 2.220446e-16

It is not the same! There s no BIG difference, but where does it come from? If I calculate (x=5, df1=2, df2=40):

pf(5, 2, 40, lower.tail=F)
[1] 0.01152922

1-pf(5, 2, 40)
[1] 0.01152922

it is exactly the same. Question is...what is happening here? Have I missed something?

Honey answered 29/1, 2014 at 14:10 Comment(2)
.Machine$double.eps is exactly 2.220446e-16.Unfasten
What precision are you using? 2.2e-16 is machine bit precision for floating point numbers.Mickens
S
3

As the comments note, this is a floating point precision issue. In fact both of the examples you show are not precisely equal as evaluated:

> pf(5, 2, 40, lower.tail=F) - (1-pf(5, 2, 40))
[1] 6.245005e-17

> pf(100, 2, 40, lower.tail=F) - (1-pf(500, 2, 40))
[1] 2.735111e-16

It's just that this difference is only apparent in your output for the much smaller number.

Snailfish answered 29/1, 2014 at 14:26 Comment(1)
Okay, I think I got it or at least the idea. Thank you all! Since I am not good at programming let me rephrase the answers: It is a calculating problem within R, not a statistical problem. Theoretically the output should be the same in both calculations.Honey
I
7
> all.equal(pf(100, 2, 40, lower.tail=F),1-pf(100, 2, 40))
[1] TRUE
Insomnia answered 29/1, 2014 at 14:23 Comment(0)
S
3

As the comments note, this is a floating point precision issue. In fact both of the examples you show are not precisely equal as evaluated:

> pf(5, 2, 40, lower.tail=F) - (1-pf(5, 2, 40))
[1] 6.245005e-17

> pf(100, 2, 40, lower.tail=F) - (1-pf(500, 2, 40))
[1] 2.735111e-16

It's just that this difference is only apparent in your output for the much smaller number.

Snailfish answered 29/1, 2014 at 14:26 Comment(1)
Okay, I think I got it or at least the idea. Thank you all! Since I am not good at programming let me rephrase the answers: It is a calculating problem within R, not a statistical problem. Theoretically the output should be the same in both calculations.Honey

© 2022 - 2024 — McMap. All rights reserved.