What's the fastest/most efficient way to count lines in Rebol?

Asked 8/2, 2013 at 5:2 Answered 26/9, 2013 at 15:4

Given a string string, what is the fastest/most-efficient way to count lines therein? Will accept best answers for any flavour of Rebol. I've been working under the assumption that the parse [some [thru]] combination was the fastest way to traverse a string, but then I don't know that for certain, hence turning to SO:

count-lines: func [string [string!] /local count][
    parse/all string [
        (count: 1) some [thru newline (count: count + 1)]
    ]
    count
]

Or:

count-lines: func [string [string!] /local count][
    count: 0
    until [
        count: count + 1
        not string: find/tail string newline
    ]
    count
]

And how about counters? How efficient is repeat?

count-lines: func [string [string!]][
    repeat count length? string [
        unless string: find/tail string newline [
            break/return count
        ]
    ]
]

Update: line count goes by the Text Editor principle:

Empty TextMate Document

An empty document still has a line count of one. So:

>> count-lines ""
== 1
>> count-lines "^/"
== 2

Eames answered 8/2, 2013 at 5:2 Comment(3)

Also, any assistance testing the veracity of speed/efficiency claims would be welcome. – Eames 8/2, 2013 at 5:8

Rebol programmers would think that an empty string has 1 line. What's with this fear of zero? :-P – Slater 8/2, 2013 at 22:37

Another thing in any performance consideration is the nature of the input. I would study any technique you try with varied input...some examples: empty string, "long string of all newlines", "long string of zero newlines"... – Slater 10/2, 2013 at 0:2

count-lines: func [
    str
    /local sort-str ][
sort-str: sort join str "^/"
1 + subtract index? find/last sort-str "^/" index? find sort-str "^/"
]

Century answered 13/2, 2013 at 12:12 Comment(3)

Interesting approach—any thoughts (anyone) how this performs? – Eames 16/2, 2013 at 18:29

Interesting, Darius! BTW, we have an SO chat room for Rebol if you'd like to join us... – Slater 16/2, 2013 at 19:4

Any modifying approaches are likely going to be slow for long strings because of the overhead of shifting the series over during the modifications. But the profiler will tell us which approach will be better. – Schwab 19/2, 2013 at 23:26

Enhanced PARSE version, as suggested by BrianH:

i: 1 ; add one as TextMate
parse text [any [thru newline (++ i)]]
print i

Temper answered 19/2, 2013 at 22:44 Comment(11)

Should that be ++ i? (didn't notice there was a ++ :) – Eames 19/2, 2013 at 23:12

Could also start i: 0 and say [newline | end] (++ i). – Eames 19/2, 2013 at 23:14

Is there a benefit/penalty using skip instead of thru? With skip, the parse rule is repeated as many times as the length of the string. – Eames 19/2, 2013 at 23:15

The parse approach is likely to be fastest, so I upvoted it. Still, it could use a little work. You might consider using to or thru, for instance. – Schwab 19/2, 2013 at 23:20

Chris, initializing to 0 and then doing that block will not only be slower, but would result in an endless loop in R2: any [end] will never stop at the end of the string in R2. But it would be slower because parse has to recurse into the nested block, and then backtrack through the alternate (the part after the |). So, it's better to initialize to 1 instead. It's faster to use thru than skip, and makes the rule simpler too (exercise for rebolek, but as a hint: you don't need end, or to backtrack). – Schwab 19/2, 2013 at 23:38

Of course, d'oh! My inclination is to have the count fully contained within the rule, but inclinations aren't always rational. – Eames 19/2, 2013 at 23:43

If you want the count contained, move the initialization into the rule, like this: (i: 1) – Schwab 19/2, 2013 at 23:44

Well, I used skip instead of thru because I wrote this reply just to gain enough points to join chat... But I like parse and parse answer was missing, so I wrote something :) – Temper 20/2, 2013 at 14:38

If I may one-lineify it: count-lines: func [text [string!] /i][parse text [(i: 1) any [thru newline (++ i)]] i] – Eames 20/2, 2013 at 18:54

@Eames You mean /local i I take it. :-) – Slater 21/2, 2013 at 14:25

@HostileFork /i also creates a local. /local is a convention that has no formal meaning. Though I would use /local in a formal setting. – Eames 27/9, 2013 at 16:49

Here's the best simple non-parse version I can think of:

count-lines: function [text [string!]] [
    i: 1
    find-all text newline [++ i]
    i
]

It uses function and ++ from more recent versions of Rebol, and find-all from either R3 or R2/Forward. You could look at the source of find-all and inline what you find and optimize, but situations like this are exactly what we wrote find-all for, so why not use it?

Schwab answered 20/2, 2013 at 18:37 Comment(3)

Rebolek's parse solution is likely faster, but only a profiler can say for sure. – Schwab 20/2, 2013 at 18:38

Another interesting new function, thks! – Eames 20/2, 2013 at 18:49

The find-all function is a good candidate to change to a native. – Schwab 22/2, 2013 at 1:22

Here is the best for me:

temp: read/lines %mytext.txt
length? temp

Isotron answered 8/2, 2013 at 8:49 Comment(3)

The string source is not a file, do you mean to write the string to a file in order to read/lines? – Eames 8/2, 2013 at 17:15

OK, then we'll have to add the overhead of writing out the value to the file, then reading it back. Or you can try deline/lines, which does the same thing in memory without needing a file. Watch out though: In R2, deline/lines is mezzanine, so it won't be as fast as some of the other solutions. In R3 it's native. Regardless, read/lines and deline/lines make a copy of the source string, so we'll have to see if that overhead overwhelms the advantage of being a single native call. I can confirm this approach works though. – Schwab 20/2, 2013 at 18:50

BriaH is a real programmer, I just propose solution. If you need time optimization or other hight technical solution, contact him. :-) – Isotron 25/2, 2013 at 11:49

remove-each can be fast as it is native

s: "1^/2^/3"
a: length? s
print a - length? remove-each v s [v = #"^/"]
; >> 2

or as a function

>> f: func [s] [print [(length? s) - (length? remove-each v s [v = #"^/"])]]
>> f "1^/2^/3"
== 2

Wolfsbane answered 18/2, 2013 at 8:57 Comment(3)

I'd need to benchmark this against the 'parse function above to see which is faster. I'd also worry that 's would need to be copied so as not to modify the original string thereby hampering efficiency. – Eames 18/2, 2013 at 22:14

Style Notes: function returns the number of newlines, not the number of lines—should return s + 1 per the 'Text Editor Principle' (see question). – Eames 18/2, 2013 at 22:17

Note that in Rebol 3, REMOVE-EACH returns the removal count, not the modified series. So your suggestion can just be remove-each v s [v = #"^/"] in Rebol 3. (And that is actually the first use case I ever came across where the rather weird return value of REMOVE-EACH in R3 is useful.) – Hendrix 27/9, 2013 at 18:12

Why no one came with the simplest solution I wonder :)

t: "abc^/de^/f^/ghi"
i: 0 until [i: i + 1 not t: find/tail t newline] i
== 4

Not sure about the performance but I think it's quite fast, as UNTIL and FIND are natives. WHILE could be used as well.

i: 1 while [t: find/tail t newline] [i: i + 1] i
== 4

Just need to check for empty string. And if it would be a function, argument series needs to be HEADed.

Wolfsbane answered 21/2, 2013 at 12:15 Comment(1)

The while solution here is what I meant by my "inline find-all and optimize" comment. The until solution might be faster, since until is a simpler native. i: i + 1 is faster than ++ i in R2, but slower in R3. – Schwab 22/2, 2013 at 1:19

Not the most efficient, but probably one of the fastest solution (anyway if a benchmark is run, I would like to see how this solution performs):

>> s: "1^/2^/ ^/^/3"
>> (length? s) - length? trim/with copy s newline
== 4

Xray answered 26/9, 2013 at 15:4 Comment(0)

Do not know about performance, and the last line rule (r3).

>> length? parse "1^/2^/3" "^/"
== 3

Traject answered 11/2, 2013 at 1:39 Comment(3)

My concern here is that it creates as many new strings as there is lines—it's a very concise one-liner, but I fear it does more than the parse example in the question. Would like to hear other opinions... – Eames 11/2, 2013 at 19:37

The other downside to this approach is a quirk in using parse as split—it breaks under this condition: parse {one^/"two^/three"} "^/" – Eames 11/2, 2013 at 19:40

Good points. i doubt the extra memory would be a concern. i cheat and do not benchmark the extra gc :) But the quote-handling is a bad surprise. – Traject 22/2, 2013 at 20:9

hehehe the read/lines length? temp is a great thing I though about read/lines -> foreach lines temps [ count: count + 1]

another way to do it would be to do

temp: "line 1 ^M line2 ^M  line3 ^M "
length? parse temp newline ; that cuts the strings into a block 
;of multiple strings that represent each a line [ "line 1" "line2" "line3" ] 
:then you count how much  strings you have in the block with length?

I like to code in rebol it is so funny

Edit I didnt read the whole post so my solution already waas proposed in a different way...

ok to amend for my sin of posting a already posted solution I will bring insight comment of a unexpected behavior of that solution. Multiple chained carriage returns are not counted (using rebol3 linux ...)

>> a: "line1 ^M line2 ^M line3 ^M^M"
== "line1 ^M line2 ^M line3 ^M^M"

>> length? parse a newline 
== 3

Anfractuous answered 16/5, 2013 at 18:28 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags