I have roughly 7,500 subsidence values. Each subsidence value (V9) corresponds to a station (V2) and a year (V3). I want a line of best fit (V9~V3) for each station.
I created a function using lm that works when I manually subset the data. However, when I try to use aggregate to run a linear model on each station, I get the same value for every station.
Here's an example of what some of the data looks like:
V2 V3 V9
CRMS0002 2008 -28.4990000
CRMS0002 2009 -28.8080808
CRMS0002 2012 -31.9871795
CRMS0006 2008 -56.8998413
CRMS0006 2013 40.8611111
CRMS0006 2015 32.8555555
CRMS0033 2007 -16.8044444
This is the code:
sub_rate = function(x) {lm(CRMSsub$V9~CRMSsub$V3)}
agg <- aggregate(CRMSsub$V9, by = list(CRMSsub$V2), FUN = sub_rate)
I also tried:
agg <- lapply(split(CRMSsub, CRMSsub$V3), FUN = sub_rate)
The aggregate by part of the first and second code works. So I get 354 elements that are organized by station, but the linear model results which give me intercept and slope are the same for every station, which means it's not performing my function by station. Here's an example of the result:
Group.1 x
CRMS0002 c(`(Intercept)` = -2333.06378840009, `CRMSsub$V3` = 1.1541441797906)
CRMS0006 c(`(Intercept)` = -2333.06378840009, `CRMSsub$V3` = 1.1541441797906)
CRMS0033 c(`(Intercept)` = -2333.06378840009, `CRMSsub$V3` = 1.1541441797906)