Difference between R.loess and org.apache.commons.math LoessInterpolator
Asked Answered
L

3

12

I'm trying to compute the convert a R script to java using the apache.commons.math library. Can I use org.apache.commons.math.analysis.interpolation.LoessInterpolator in place of R loess ? I cannot get the same result.

EDIT.

here is a java program that creates a random array(x,y) and compute the loess with LoessInterpolator or by calling R. At the end, the results are printed.

import java.io.*;
import java.util.Random;

import org.apache.commons.math.analysis.interpolation.LoessInterpolator;


public class TestLoess
    {
    private String RScript="/usr/local/bin/Rscript";
    private static class ConsummeInputStream
        extends Thread
        {
        private InputStream in;
        ConsummeInputStream(InputStream in)
            {
            this.in=in;
            }
        @Override
        public void run()
            {
            try
                {
                int c;
                while((c=this.in.read())!=-1) 
                    System.err.print((char)c);
                }
            catch(IOException err)
                {
                err.printStackTrace();
                }
            }
        }
    TestLoess()
        {

        }
    private void run() throws Exception
        {
        int num=100;
        Random rand=new Random(0L);
        double x[]=new double[num];
        double y[]=new double[x.length];
        for(int i=0;i< x.length;++i)
            {
            x[i]=rand.nextDouble()+(i>0?x[i-1]:0);
            y[i]=Math.sin(i)*100;
            }
        LoessInterpolator loessInterpolator=new LoessInterpolator(
            0.75,//bandwidth,
            2//robustnessIters

            );
        double y2[]=loessInterpolator.smooth(x, y);

        Process proc=Runtime.getRuntime().exec(
            new String[]{RScript,"-"}
            );
        ConsummeInputStream errIn=new ConsummeInputStream(proc.getErrorStream());
        BufferedReader stdin=new BufferedReader(new InputStreamReader(proc.getInputStream()));
        PrintStream out=new PrintStream(proc.getOutputStream());
        errIn.start();
        out.print("T<-as.data.frame(matrix(c(");
        for(int i=0;i< x.length;++i)
            {
            if(i>0) out.print(',');
            out.print(x[i]+","+y[i]);
            }
        out.println("),ncol=2,byrow=TRUE))");
        out.println("colnames(T)<-c('x','y')");
        out.println("T2<-loess(y ~ x, T)");
        out.println("write.table(residuals(T2),'',col.names= F,row.names=F,sep='\\t')");
        out.flush();
        out.close();
        double y3[]=new double[x.length];
        for(int i=0;i< y3.length;++i)
            {
            y3[i]=Double.parseDouble(stdin.readLine());
            }
        System.out.println("X\tY\tY.java\tY.R");
        for(int i=0;i< y3.length;++i)
            {
            System.out.println(""+x[i]+"\t"+y[i]+"\t"+y2[i]+"\t"+y3[i]);
            }
        }

    public static void main(String[] args)
        throws Exception
        {
        new TestLoess().run();
        }
    }

compilation & exec:

javac -cp commons-math-2.2.jar TestLoess.java && java -cp commons-math-2.2.jar:. TestLoess

output:

X   Y   Y.java  Y.R
0.730967787376657   0.0 6.624884763714674   -12.5936186703287
0.9715042030481429  84.14709848078965   6.5263049649584 71.9725380029913
1.6089216283982513  90.92974268256818   6.269100654071115   79.839773167581
2.159358633515885   14.112000805986721  6.051308261720918   3.9270340708818
2.756903911313087   -75.68024953079282  5.818424835586378   -84.9176311089431
3.090122310789737   -95.89242746631385  5.689740879461759   -104.617807889069
3.4753114955304554  -27.941549819892586 5.541837854229562   -36.0902352062634
4.460153035730264   65.6986598718789    5.168028655980764   58.9472823439219
5.339335553602744   98.93582466233818   4.840314399516663   93.3329030534449
6.280584733084859   41.21184852417566   4.49531113985498    36.7282165788057
6.555538699120343   -54.40211108893698  4.395343460231256   -58.5812856445538
6.68443584999412    -99.99902065507035  4.348559404444451   -104.039069260889
6.831037507640638   -53.657291800043495 4.295400167908642   -57.5419313320511
6.854275630124528   42.016703682664094  4.286978656933373   38.1564179414478
7.401015387322993   99.06073556948704   4.089252482141094   95.7504087842369
8.365502247999844   65.02878401571168   3.7422883733498726  62.5865641279576
8.469992934250815   -28.790331666506532 3.704793544880599   -31.145867173504
9.095139297716374   -96.13974918795569  3.4805388562453574  -98.0047896609079
9.505935493207435   -75.09872467716761  3.3330472034239405  -76.6664588290508

the output values for y are clearly not the same between R and Java; TheY.R column looks good (it's close to the original Y column). How should I change this in order to get Y.java ~ Y.R ?

Leader answered 3/10, 2012 at 8:32 Comment(5)
perhaps add span=2/3 to the loess call? I don't know if span in loess is identical to the 'bandwidth' parameter in 'LoessInterpolator', but the default value for span for loess is 0.75 and you set bandwidth to 2/3.Desiderata
Thank you all for your answers. I'm away from my work. I will explore your suggestions tomorrow.Leader
Quick comment. I just checked that lo(w)ess implementation in R and Java are identical. Both lowess() in R and LoessInterpolator() in Apache Commons Math refer to the same Cleveland (1979) paper and have the same parameters. I get the same fits from both implementations.Asparagus
so, how should I modify my program to get the same result ?Leader
Same problem here with Apache Commons 3-3.2: Commons.bandwidth = 0.1 is smoother than R.span = 0.1 everything else being the default. I also do wonder why they do not apparently have the same parameter interface for the same algorithm! Does anyone have any insight on how to get them to match?Interregnum
A
5

You need to change the default values of three input parameters to make the Java and R versions identical:

  1. The Java LoessInterpolator only does linear local polynomial regression, but R supports linear (degree=1), quadratic (degree=2), and a strange degree=0 option. So you need to specify degree=1 in R to be identical to Java.

  2. LoessInterpolator defaults number of iterations DEFAULT_ROBUSTNESS_ITERS=2, but R defaults iterations=4. So you need to set control = loess.control(iterations=X) in R (X is the number of iterations).

  3. LoessInterpolator defaults DEFAULT_BANDWIDTH=0.3 but R defaults span=0.75.

Aqueous answered 21/7, 2014 at 22:8 Comment(0)
R
4

I can't speak for the java implementation, but lowess has a number of parameters which control the bandwidth of the fit. Unless you're fitting with the same control parameters you should expect the results to differ. My recommendation whenever people are smoothing data is to plot the original data as well as the fit, and decide for yourself what control parameters yield your desired tradeoff between fidelity to the data and smoothing (aka noise removal).

Rattlebrained answered 3/10, 2012 at 11:19 Comment(0)
D
1

There are two problems here. First if you plot the data you are generating it looks almost random and the fit generated by loess in R is very poor e.g.

plot(T$x, T$y)
lines(T$s, T2$fitted, col="blue", lwd=3)

plot of the data generated by the Java code above with a loess fit generated by R

Then in your R script you are writing the residuals not the predictions so in this line

out.println("write.table(residuals(T2),'',
   col.names= F,row.names=F,sep='\\t')");

you need to change residuals(T2) to predict(T2) e.g.

out.println("write.table(predict(T2),'',
   col.names= F,row.names=F,sep='\\t')");

So it was pure chance in your code example that the first couple of lines of residuals generated by R looked a good fit.

For me if I try fitting with some more appropriate data then Java and R do return similar but not identical results. Also I found the results were closer if I did not adjust the default robustnessIter settings.

Daisydaitzman answered 1/8, 2014 at 12:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.