Calculate probability from density function
Asked Answered
P

2

11

I've build density function and now I want to calculate the probability of a new data point to "fall" into selected interval (say, a=3, b=7). So, I'm looking for:

P(a<x<=b)

Some sample data:

df<- data.frame(x=c(sample(6:9, 50, replace=TRUE), sample(18:23, 25, replace=TRUE)))

dens<- density(df$x)

I'll be happy to hear of any solution, but preferably in base r

Thank you in advance

Pinball answered 23/4, 2017 at 11:6 Comment(0)
M
11

You need to get the density as a function (using approxfun)and then integrate the function over the desired limits.

integrate(approxfun(dens), lower=3, upper=7)
0.258064 with absolute error < 3.7e-05

## Consistency check
integrate(approxfun(dens), lower=0, upper=30)
0.9996092 with absolute error < 1.8e-05
Manichaeism answered 23/4, 2017 at 11:35 Comment(2)
Thank you very much. One more question: I'm trying to limit my density to dens<-density(df$x, from=0, to=24). But then when I'm caculating integrate(approxfun(dens), lower=0, upper=24) I'm not getting the "full" probability (1) I was expected to get. Is there a way to limit my density function such that I'll get what I'm expecting to get?Pinball
The standard bandwidth may be too big for you. Try dens <- density(df$x, from=0, to=24, adjust=0.5) Manichaeism
I
0

This is an R question, but this can also be done in Java using the Riemann approximation You need to define a Riemann interface

public interface Riemann extends
BiFunction<Function<Double,Double>,Integer,BinaryOperator<Double>>{}

Then you can use lambda calculus to implement the interface

int N=100000;
Riemann s = (f, n) -> (a, b) -> 
IntStream.range(0, n).mapToDouble(i->f.apply(a + i*((b-a)/n))*((b-a)/n)).sum();

As an example we will calculate the probability of a Weibull random variable between 1/4 and 3/4 using shape parameter k=1.5

double k=1.5;
Optional<Double>weib=
    Optional.of(s.apply(x->k*pow(x,k-1)*exp(-pow(x,k)), N).apply(0.25,0.75));
weib.ifPresent(System.out::println);

The result should be 0.36 or 36%. The advantage of using your own integral libraries instead of package libraries is that it helps to understand what is going on in the background.

Ita answered 12/10, 2018 at 21:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.