Can anyone help me make this R code more efficient?
I'm trying to write a function that changes a list of strings to a vector of strings, or a list of numbers to a vector of numbers, of lists of typed elements to vectors of a certain type in general.
I want to able to change lists to a particular type of vector if they have the folllowing properties:
They are homogenously typed. Every element of the list is of type 'character', or 'complex' or so on.
Each element of the list is length-one.
as_atomic <- local({ assert_is_valid_elem <- function (elem, mode) { if (length(elem) != 1 || !is(elem, mode)) { stop("") } TRUE } function (coll, mode) { if (length(coll) == 0) { vector(mode) } else { # check that the generic vector is composed only # of length-one values, and each value has the correct type. # uses more memory that 'for', but is presumably faster. vapply(coll, assert_is_valid_elem, logical(1), mode = mode) as.vector(coll, mode = mode) } } })
For example,
as_atomic(list(1, 2, 3), 'numeric')
as.numeric(c(1,2,3))
# this fails (mixed types)
as_atomic( list(1, 'a', 2), 'character' )
# ERROR.
# this fails (non-length one element)
as_atomic( list(1, c(2,3,4), 5), 'numeric' )
# ERROR.
# this fails (cannot convert numbers to strings)
as_atomic( list(1, 2, 3), 'character' )
# ERROR.
The above code works fine, but it is very slow and I can't see any way to optimise it without changing the behaviour of the function. It's important the function 'as_atomic' behaves as it does; I can't switch to a base function that I'm familiar with (unlist, for example), since I need to throw an error for bad lists.
require(microbenchmark)
microbenchmark(
as_atomic( as.list(1:1000), 'numeric'),
vapply(1:1000, identity, integer(1)),
unit = 'ns'
)
On my (fairly fast) machine the benchmark has a frequency of about 40Hz, so this function is almost always rate limiting in my code. The vapply control benchmark has a frequency of about 1650Hz, which is still quite slow.
Is there any way to dramatically improve the efficiency of this operation? Any advice is appreciated.
If any clarification or edits are needed, please leave a comment below.
Edit:
Hello all,
Sorry for the very belated reply; I had exams I needed to get to before I could try re-implement this.
Thank you all for the performance tips. I got the performance up from a terrible 40hz to a more acceptable 600hz using plain R code.
The largest speedups was from using typeof or mode instead of is; this really sped up the tight inner checking loop.
I'll probably have to bite the bullet and rewrite this in rcpp to get it really performant though.
as.numeric(list(1,2,3))
? oras.character
... – Lecternlist(1, 'a', 2)
? – Lectern