There’s no builtin way, unfortunately. That said, doing this manually is fairly straightforward.
Given:
x = 'ABCDEFG'
len = 3L
start = seq_len(nchar(x) - len + 1L)
result = vapply(start, \(s) substr(x, s, s + len - 1L), character(1L))
Or, wrapped in a function (as mentioned, these overlapping substrings are called “ngrams”):
ngrams = function (x, len) {
start = seq_len(nchar(x) - len + 1L)
vapply(start, \(s) substr(x, s, s + len - 1L), character(1L))
}
Alternatively you can use substring()
instead of substr()
+ vapply()
, because substring()
is vectorised:
ngrams = function (x, len) {
start = seq_len(nchar(x) - len + 1L)
substring(x, start, start + len - 1L)
}
However, since it uses cyclic expansion of its argument lengths, substring()
is somewhat error-prone when the input isn’t what was expected.