I'm maintaining a Julia library that contains a function to insert a new line after every 80 characters in a long string.
This function becomes extremely slow (seconds or more) when the string becomes longer than 1 million characters. Time seems to increase more than linearly, maybe quadratic. I don't understand why. Can someone explain?
This is some reproducible code:
function chop(s; nc=80)
nr = ceil(Int64, length(s)/nc)
l(i) = 1+(nc*(i-1))
r(i) = min(nc*i, length(s))
rows = [String(s[l(i):r(i)]) for i in 1:nr]
return join(rows,'\n')
end
s = "A"^500000
chop(s)
It seems that this row is where most of the time is spent: rows = [String(s[l(i):r(i)]) for i in 1:nr]
Does that mean it takes long to initialize a new String
? That wouldn't really explain the super-linear run time.
I know the canonical fast way to build strings is to use IOBuffer
or the higher-level StringBuilders
package: https://github.com/davidanthoff/StringBuilders.jl
Can someone help me understand why this code above is so slow nonetheless?
Weirdly, the below is much faster, just by adding s = collect(s)
:
function chop(s; nc=80)
s = collect(s) #this line is new
nr = ceil(Int64, length(s)/nc)
l(i) = 1+(nc*(i-1))
r(i) = min(nc*i, length(s))
rows = [String(s[l(i):r(i)]) for i in 1:nr]
return join(rows,'\n')
end
nr
to grow in terms of the length of the string? How long do you expects[l(i):r(i)]
to take - what's the time complexity of the functions, and how long do you expect the slice to be in big-O terms? Can you see how to test those assumptions? Profiling is how you find the bottleneck, but more analysis is required to understand it. – Eleaseeleaticl(i) = 1+(nc*(i-1))
is some kind of lambda-like syntax - rather elegant, if it works like I imagine it does. – Eleaseeleatic