I'm trying to create a more parsimonious version of this solution, which entails specifying the RHS of a formula in the form d1 + d1:d2
.
Given that *
in the context of a formula is a pithy stand-in for full interaction (i.e. d1 * d2
gives d1 + d2 + d1:d2
), my approach has been to try and define an alternative operator, say %+:%
using the infix approach I've grown accustomed to in other applications, a la:
"%+:%" <- function(d1,d2) d1 + d2 + d1:d2
However, this predictably fails because I haven't been careful about evaluation; let's introduce an example to illustrate my progress:
set.seed(1029)
v1 <- runif(1000)
v2 <- runif(1000)
y <- .8*(v1 < .3) + .2 * (v2 > .25 & v2 < .8) -
.4 * (v2 > .8) + .1 * (v1 > .3 & v2 > .8)
With this example, hopefully it's clear why simply writing out the two terms might be undesirable:
y ~ cut(v2, breaks = c(0, .25, .8, 1)) +
cut(v2, breaks = c(0, .25, .8, 1)):I(v1 < .3)
One workaround which is close to my desired output is to define the whole formula as a function:
plus.times <- function(outvar, d1, d2){
as.formula(paste0(quote(outvar), "~", quote(d1),
"+", quote(d1), ":", quote(d2)))
}
This gives the expected coefficients when passed to lm
, but with names that are harder to interpret directly (especially in the real data where we take care to give d1
and d2
descriptive names, in contrast to this generic example):
out1 <- lm(y ~ cut(v2, breaks = c(0, .25, .8, 1)) +
cut(v2, breaks = c(0, .25, .8, 1)):I(v1 < .3))
out2 <- lm(plus.times(y, cut(v2, breaks = c(0, .25, .8, 1)), I(v1 < .3)))
any(out1$coefficients != out2$coefficients)
# [1] FALSE
names(out2$coefficients)
# [1] "(Intercept)" "d1(0.25,0.8]" "d1(0.8,1]" "d1(0,0.25]:d2TRUE"
# [5] "d1(0.25,0.8]:d2TRUE" "d1(0.8,1]:d2TRUE"
So this is less than optimal.
Is there any way to define the adjust the code so that the infix operator I mentioned above works as expected? How about altering the form of plus.times
so that the variables are not renamed?
I've been poking around (?formula
, ?"~"
, ?":"
, getAnywhere(formula.default)
, this answer, etc.) but haven't seen how exactly R interprets *
when it's encountered in a formula so that I can make my desired minor adjustments.
formula
but they only seem to usetildeSymbol
. Does this mean anyway that I won't be able to get my own infix without going down to C & defining, sayplusColonSymbol
like is done here? – Colquittterms
component of the results ofmodel.frame()
and (2) looking at the code here ... – Pinpoint