removing offset terms from a formula
Asked Answered
D

3

12

R has a handy tool for manipulating formulas, update.formula(). This works nicely when you want to get something like "formula containing all terms in previous formula except x", e.g.

f1 <- z ~ a + b + c
(f2 <- update.formula(f1, . ~ . - c))
## z ~ a + b

However, this doesn't seem to work with offset terms:

f3 <- z ~ a + offset(b) 
update(f3, . ~ . - offset(b))
## z ~ a + offset(b)

I've dug down as far as terms.formula, which ?update.formula references:

[after substituting, ...] The result is then simplified via ‘terms.formula(simplify = TRUE)’.

terms.formula(z ~ a + offset(b) - offset(b), simplify=TRUE)
## z ~ a + offset(b)

(i.e., this doesn't seem to remove offset(b) ...)

I know I can hack up a solution either by using deparse() and text-processing, or by processing the formula recursively to remove the term I don't want, but these solutions are ugly and/or annoying to implement. Either enlightenment as to why this doesn't work, or a reasonably compact solution, would be great ...

Drift answered 28/10, 2016 at 16:1 Comment(3)
a little more digging inside the code of terms.formula suggests that it explicitly preserves the offset term, although this doesn't seem to be documented anywhere as yet ...Drift
Looking at ?offset the documentation says "There can be more than one offset in a model formula, but - is not supported for offset terms (and is equivalent to +).". Could this be the reason as your offset() terms aren't simplifying?Equality
Not the most glamorous, but could you also try adding in an offset(-b) instead? Your formula won't look simplified but I think the effect should be the same. If you try lm(mpg~cyl,data=mtcars);lm(mpg~cyl+offset(disp),data=mtcars);lm(mpg~cyl+offset(disp) + offset(-disp),data=mtcars); You see the 1st and 3rd lm()s are the same.Equality
M
7

1) Recursion Recursively descend through the formula replacing offset(...) with offset and then remove offset using update. No string manipulation is done and although it does require a number of lines of code it's still fairly short and does remove single and multiple offset terms.

If there are multiple offsets one can preserve some of them by setting preserve so, for example, if preserve = 2 then the second offset is preserved and any others are removed. The default is to preserve none, i.e. remove them all.

no.offset <- function(x, preserve = NULL) {
  k <- 0
  proc <- function(x) {
    if (length(x) == 1) return(x)
    if (x[[1]] == as.name("offset") && !((k<<-k+1) %in% preserve)) return(x[[1]])
    replace(x, -1, lapply(x[-1], proc))
  }
  update(proc(x), . ~ . - offset)
}

# tests

no.offset(z ~ a + offset(b))
## z ~ a

no.offset(z ~ a + offset(b) + offset(c))
## z ~ a

Note that if you don't need the preserve argument then the line initializing k can be omitted and the if simplified to:

if (x[[1]] == as.name("offset")) return(x[[1]])

2) terms this neither uses string manipulation directly nor recursion. First get the terms object, zap its offset attribute and fix it using fixFormulaObject which we extract out of the guts of terms.formula. This could be made a bit less brittle by copying the source code of fixFormulaObject into your source and removing the eval line below. preserve acts as in (1).

no.offset2 <- function(x, preserve = NULL) {
  tt <- terms(x)
  attr(tt, "offset") <- if (length(preserve)) attr(tt, "offset")[preserve]
  eval(body(terms.formula)[[2]]) # extract fixFormulaObject
  f <- fixFormulaObject(tt)
  environment(f) <- environment(x)
  f
}

# tests

no.offset2(z ~ a + offset(b))
## z ~ a

no.offset2(z ~ a + offset(b) + offset(c))
## z ~ a

Note that if you don't need the preserve argument then the line that zaps the offset attribute can be simplified to:

attr(tt, "offset") <- NULL
Mariner answered 28/10, 2016 at 21:58 Comment(2)
Its not clear to me whether this is exactly the behaviour OP was looking for. There can be more than one offset term in a formula, and this method will remove all of them. I got the impression that OP wanted to remove only specified terms in the formula such as offset(b), which would imply leaving offset(c) in place. Perhaps @BenBolker can comment which behaviour is required?Unjaundiced
Not sure that this is important but have added the feature to (1) and (2).Mariner
U
4

This seems to be by design. But a simple workaround is

offset2 = offset
f3 <- z ~ a + offset2(b) 
update(f3, . ~ . - offset2(b))
# z ~ a

If you need the flexibility to accept formulae that do include offset(), for example if the formula is provided by a package user who may be unaware of the need to use offset2 in place of offset, then we should also add a line to change any instances of offset() in the incoming formula:

f3 <- z ~ a + offset(b) 

f4 <- as.formula(gsub("offset\\(", "offset2(", deparse(f3)))
f4 <- update(f4, . ~ . - offset2(b))

# finally, just in case there are any references to offset2 remaining, we should revert them back to offset
f4 <- as.formula(gsub("offset2\\(", "offset(", deparse(f4)))
# z ~ a
Unjaundiced answered 28/10, 2016 at 16:31 Comment(2)
this is fine for ben, but if the user is giving the formula to his package, say, then they would have to know about this caveat beforehand, right?Soporific
@Soporific - yes if the intended use is a package where other users supply the formula, then this would be a problem. Then it would be necessary to deparse the formula and replace any instance of offset with offset2 within Ben's package. Starts to get ugly thenUnjaundiced
C
0

Hmm - I think that you can use the [ method for class terms here:

f1 <-   ~ x1 + offset(a) + offset(b) + h(x2)
f2 <- y ~ x1 + offset(a) + offset(b) + h(x2)

t1 <- terms(f1)
t2 <- terms(f2)

t1. <- t1[seq_along(labels(t1))]
t2. <- t2[seq_along(labels(t2))]

stopifnot(identical(formula(t1.),   ~ x1 + h(x2)),
          identical(formula(t2.), y ~ x1 + h(x2)))

The method is mentioned and aliased in help("drop.terms"), but the source code, stats:::`[.terms`, is more transparent about what is going on ...

I considered another possibility:

tt <- terms(<formula>)
drop.terms(tt, dropx = integer(0L), keep.response = as.logical(attr(tt, "response")))

Unfortunately, drop.terms does not behave sensibly with zero-length dropx. Probably worth a bug report ...

Casual answered 15/7, 2023 at 18:55 Comment(2)
On the other hand, maybe [.tarms and drop.terms should preserve offsets. Offsets are not terms, so there is no reason to expect them be discarded ...Casual
Hence: bugs.r-project.org/show_bug.cgi?id=18565Casual

© 2022 - 2024 — McMap. All rights reserved.