Sometimes we have to use loops, for example, when we don't know how many iterations we need to get the result. Take while loops as an example. Below are methods you absolutely should avoid:
a=numeric(0)
b=1
system.time(
{
while(b<=1e5){
b=b+1
a<-c(a,pi)
}
}
)
# user system elapsed
# 13.2 0.0 13.2
a=numeric(0)
b=1
system.time(
{
while(b<=1e5){
b=b+1
a<-append(a,pi)
}
}
)
# user system elapsed
# 11.06 5.72 16.84
These are very inefficient because R copies the vector every time it appends.
The most efficient way to append is to use index. Note that this time I let it iterate 1e7 times, but it's still much faster than c
.
a=numeric(0)
system.time(
{
while(length(a)<1e7){
a[length(a)+1]=pi
}
}
)
# user system elapsed
# 5.71 0.39 6.12
This is acceptable. And we can make it a bit faster by replacing [
with [[
.
a=numeric(0)
system.time(
{
while(length(a)<1e7){
a[[length(a)+1]]=pi
}
}
)
# user system elapsed
# 5.29 0.38 5.69
Maybe you already noticed that length
can be time consuming. If we replace length
with a counter:
a=numeric(0)
b=1
system.time(
{
while(b<=1e7){
a[[b]]=pi
b=b+1
}
}
)
# user system elapsed
# 3.35 0.41 3.76
As other users mentioned, pre-allocating the vector is very helpful. But this is a trade-off between speed and memory usage if you don't know how many loops you need to get the result.
a=rep(NaN,2*1e7)
b=1
system.time(
{
while(b<=1e7){
a[[b]]=pi
b=b+1
}
a=a[!is.na(a)]
}
)
# user system elapsed
# 1.57 0.06 1.63
An intermediate method is to gradually add blocks of results.
a=numeric(0)
b=0
step_count=0
step=1e6
system.time(
{
repeat{
a_step=rep(NaN,step)
for(i in seq_len(step)){
b=b+1
a_step[[i]]=pi
if(b>=1e7){
a_step=a_step[1:i]
break
}
}
a[(step_count*step+1):b]=a_step
if(b>=1e7) break
step_count=step_count+1
}
}
)
#user system elapsed
#1.71 0.17 1.89
vector = values
; or you could do vector = vector + values. But I might be misunderstanding your use case – Anchoveta