What's the fastest/most efficient way to count lines in Rebol?
Asked Answered
E

9

9

Given a string string, what is the fastest/most-efficient way to count lines therein? Will accept best answers for any flavour of Rebol. I've been working under the assumption that the parse [some [thru]] combination was the fastest way to traverse a string, but then I don't know that for certain, hence turning to SO:

count-lines: func [string [string!] /local count][
    parse/all string [
        (count: 1) some [thru newline (count: count + 1)]
    ]
    count
]

Or:

count-lines: func [string [string!] /local count][
    count: 0
    until [
        count: count + 1
        not string: find/tail string newline
    ]
    count
]

And how about counters? How efficient is repeat?

count-lines: func [string [string!]][
    repeat count length? string [
        unless string: find/tail string newline [
            break/return count
        ]
    ]
]

Update: line count goes by the Text Editor principle:

Empty TextMate Document

An empty document still has a line count of one. So:

>> count-lines ""
== 1
>> count-lines "^/"
== 2
Eames answered 8/2, 2013 at 5:2 Comment(3)
Also, any assistance testing the veracity of speed/efficiency claims would be welcome.Eames
Rebol programmers would think that an empty string has 1 line. What's with this fear of zero? :-PSlater
Another thing in any performance consideration is the nature of the input. I would study any technique you try with varied input...some examples: empty string, "long string of all newlines", "long string of zero newlines"...Slater
C
4
count-lines: func [
    str
    /local sort-str ][
sort-str: sort join str "^/"
1 + subtract index? find/last sort-str "^/" index? find sort-str "^/"
]
Century answered 13/2, 2013 at 12:12 Comment(3)
Interesting approach—any thoughts (anyone) how this performs?Eames
Interesting, Darius! BTW, we have an SO chat room for Rebol if you'd like to join us...Slater
Any modifying approaches are likely going to be slow for long strings because of the overhead of shifting the series over during the modifications. But the profiler will tell us which approach will be better.Schwab
T
4

Enhanced PARSE version, as suggested by BrianH:

i: 1 ; add one as TextMate
parse text [any [thru newline (++ i)]]
print i
Temper answered 19/2, 2013 at 22:44 Comment(11)
Should that be ++ i? (didn't notice there was a ++ :)Eames
Could also start i: 0 and say [newline | end] (++ i).Eames
Is there a benefit/penalty using skip instead of thru? With skip, the parse rule is repeated as many times as the length of the string.Eames
The parse approach is likely to be fastest, so I upvoted it. Still, it could use a little work. You might consider using to or thru, for instance.Schwab
Chris, initializing to 0 and then doing that block will not only be slower, but would result in an endless loop in R2: any [end] will never stop at the end of the string in R2. But it would be slower because parse has to recurse into the nested block, and then backtrack through the alternate (the part after the |). So, it's better to initialize to 1 instead. It's faster to use thru than skip, and makes the rule simpler too (exercise for rebolek, but as a hint: you don't need end, or to backtrack).Schwab
Of course, d'oh! My inclination is to have the count fully contained within the rule, but inclinations aren't always rational.Eames
If you want the count contained, move the initialization into the rule, like this: (i: 1)Schwab
Well, I used skip instead of thru because I wrote this reply just to gain enough points to join chat... But I like parse and parse answer was missing, so I wrote something :)Temper
If I may one-lineify it: count-lines: func [text [string!] /i][parse text [(i: 1) any [thru newline (++ i)]] i]Eames
@Eames You mean /local i I take it. :-)Slater
@HostileFork /i also creates a local. /local is a convention that has no formal meaning. Though I would use /local in a formal setting.Eames
S
3

Here's the best simple non-parse version I can think of:

count-lines: function [text [string!]] [
    i: 1
    find-all text newline [++ i]
    i
]

It uses function and ++ from more recent versions of Rebol, and find-all from either R3 or R2/Forward. You could look at the source of find-all and inline what you find and optimize, but situations like this are exactly what we wrote find-all for, so why not use it?

Schwab answered 20/2, 2013 at 18:37 Comment(3)
Rebolek's parse solution is likely faster, but only a profiler can say for sure.Schwab
Another interesting new function, thks!Eames
The find-all function is a good candidate to change to a native.Schwab
I
2

Here is the best for me:

temp: read/lines %mytext.txt
length? temp
Isotron answered 8/2, 2013 at 8:49 Comment(3)
The string source is not a file, do you mean to write the string to a file in order to read/lines?Eames
OK, then we'll have to add the overhead of writing out the value to the file, then reading it back. Or you can try deline/lines, which does the same thing in memory without needing a file. Watch out though: In R2, deline/lines is mezzanine, so it won't be as fast as some of the other solutions. In R3 it's native. Regardless, read/lines and deline/lines make a copy of the source string, so we'll have to see if that overhead overwhelms the advantage of being a single native call. I can confirm this approach works though.Schwab
BriaH is a real programmer, I just propose solution. If you need time optimization or other hight technical solution, contact him. :-)Isotron
W
2

remove-each can be fast as it is native

s: "1^/2^/3"
a: length? s
print a - length? remove-each v s [v = #"^/"]
; >> 2

or as a function

>> f: func [s] [print [(length? s) - (length? remove-each v s [v = #"^/"])]]
>> f "1^/2^/3"
== 2
Wolfsbane answered 18/2, 2013 at 8:57 Comment(3)
I'd need to benchmark this against the 'parse function above to see which is faster. I'd also worry that 's would need to be copied so as not to modify the original string thereby hampering efficiency.Eames
Style Notes: function returns the number of newlines, not the number of lines—should return s + 1 per the 'Text Editor Principle' (see question).Eames
Note that in Rebol 3, REMOVE-EACH returns the removal count, not the modified series. So your suggestion can just be remove-each v s [v = #"^/"] in Rebol 3. (And that is actually the first use case I ever came across where the rather weird return value of REMOVE-EACH in R3 is useful.)Hendrix
W
2

Why no one came with the simplest solution I wonder :)

t: "abc^/de^/f^/ghi"
i: 0 until [i: i + 1 not t: find/tail t newline] i
== 4

Not sure about the performance but I think it's quite fast, as UNTIL and FIND are natives. WHILE could be used as well.

i: 1 while [t: find/tail t newline] [i: i + 1] i
== 4

Just need to check for empty string. And if it would be a function, argument series needs to be HEADed.

Wolfsbane answered 21/2, 2013 at 12:15 Comment(1)
The while solution here is what I meant by my "inline find-all and optimize" comment. The until solution might be faster, since until is a simpler native. i: i + 1 is faster than ++ i in R2, but slower in R3.Schwab
X
2

Not the most efficient, but probably one of the fastest solution (anyway if a benchmark is run, I would like to see how this solution performs):

>> s: "1^/2^/ ^/^/3"
>> (length? s) - length? trim/with copy s newline
== 4
Xray answered 26/9, 2013 at 15:4 Comment(0)
T
1

Do not know about performance, and the last line rule (r3).

>> length? parse "1^/2^/3" "^/"
== 3
Traject answered 11/2, 2013 at 1:39 Comment(3)
My concern here is that it creates as many new strings as there is lines—it's a very concise one-liner, but I fear it does more than the parse example in the question. Would like to hear other opinions...Eames
The other downside to this approach is a quirk in using parse as split—it breaks under this condition: parse {one^/"two^/three"} "^/"Eames
Good points. i doubt the extra memory would be a concern. i cheat and do not benchmark the extra gc :) But the quote-handling is a bad surprise.Traject
A
0

hehehe the read/lines length? temp is a great thing I though about read/lines -> foreach lines temps [ count: count + 1]

another way to do it would be to do

temp: "line 1 ^M line2 ^M  line3 ^M "
length? parse temp newline ; that cuts the strings into a block 
;of multiple strings that represent each a line [ "line 1" "line2" "line3" ] 
:then you count how much  strings you have in the block with length? 

I like to code in rebol it is so funny

Edit I didnt read the whole post so my solution already waas proposed in a different way...

ok to amend for my sin of posting a already posted solution I will bring insight comment of a unexpected behavior of that solution. Multiple chained carriage returns are not counted (using rebol3 linux ...)

>> a: "line1 ^M line2 ^M line3 ^M^M"
== "line1 ^M line2 ^M line3 ^M^M"

>> length? parse a newline 
== 3
Anfractuous answered 16/5, 2013 at 18:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.