Getting the last element of a lazy Seq in Raku
Asked Answered
P

2

11

I want to get the last element of a lazy but finite Seq in Raku, e.g.:

my $s = lazy gather for ^10 { take $_ };

The following don't work:

say $s[* - 1];
say $s.tail;

These ones work but don't seem too idiomatic:

say (for $s<> { $_ }).tail;
say (for $s<> { $_ })[* - 1];

What is the most idiomatic way of doing this while keeping the original Seq lazy?

Pneumonic answered 4/12, 2021 at 17:50 Comment(3)
You don't need the lazy: my $s = gather for ^10 { take $_; }; say $s.tailUnsatisfactory
I am a bit confused. The docs say that “Binding to a scalar or sigilless container will also force laziness”. Why does the behaviour differ if I omit the lazy keyword?Pneumonic
@NikolaBenes If a gather is asked if it's lazy, it returns False, whereas if a lazy is asked if it's lazy, it will return True. See my answer to About Laziness.Carmelocarmen
C
10

Drop the lazy

Lazy sequences in Raku are designed to work well as is. You don't need to emphasize they're lazy by adding an explicit lazy.

If you add an explicit lazy, Raku interprets that as a request to block operations such as .tail because they will almost certainly immediately render laziness moot, and, if called on an infinite sequence, or even just a sufficiently large one, hang or OOM the program.

So, either drop the lazy, or don't invoke operations like .tail that will be blocked if you do.

Expanded version of my original answer

As noted by @ugexe, the idiomatic solution is to drop the lazy.

Quoting my answer to the SO About Laziness:

if a gather is asked if it's lazy, it returns False.

Aiui, something like the following applies:

  • Some lazy sequence producers may be actually or effectively infinite. If so, calling .tail etc on them will hang the calling program. Conversely, other lazy sequences perform fine when all their values are consumed in one go. How should Raku distinguish between these two scenarios?

  • A decision was made in 2015 to let value producing datatypes emphasize or deemphasize their laziness via their response to an .is-lazy call.

  • Returning True signals that a sequence is not only lazy but wants to be known to be lazy by consuming code that calls .is-lazy. (Not so much end-user code but instead built in consuming features such as @ sigilled variables handling an assignment trying to determine whether or not to assign eagerly.) Built in consuming features take a True as a signal they ought block calls like .tail. If a dev knows this is overly conservative, they can add an eager (or remove an unneeded lazy).

  • Conversely, a datatype, or even a particular object instance, may return False to signal that it does not want to be considered lazy. This may be because the actual behaviour of a particular datatype or instance is eager, but it might instead be that it is lazy technically, but doesn't want a consumer to block operations such as .tail because it knows they will not be harmful, or at least prefers to have that be the default presumption. If a dev knows better (because, say, it hangs the program), or at least does not want to block potentially problematic operations, they can add a lazy (or remove an unneeded eager).

I think this approach works well, but it doc and error messages mentioning "lazy" may not have caught up with the shift made in 2015. So:

Further discussion

the docs ... say “If you want to force lazy evaluation use the lazy subroutine or method. Binding to a scalar or sigilless container will also force laziness.”

Yes. Aiui this is correct.

[which] sounds like it implies “my $x := lazy gather { ... } is the same as my $x := gather { ... }”.

No.

An explicit lazy statement prefix or method adds emphasis to laziness, and Raku interprets that to mean it ought block operations like .tail in case they hang the program.

In contrast, binding to a variable alters neither emphasis nor deemphasis of laziness, merely relaying onward whatever the bound producer datatype/instance has chosen to convey via .is-lazy.

not only in connection with gather but elsewhere as well

Yes. It's about the result of .is-lazy:

my $x =      (1, { .say; $_ + 1 } ... 1000);
my $y = lazy (1, { .say; $_ + 1 } ... 1000);

both act lazily ... but $x.tail is possible while $y.tail is not.

Yes.

An explicit lazy statement prefix or method forces the answer to .is-lazy to be True. This signals to a consumer that cares about the dangers of laziness that it should become cautious (eg rejecting .tail etc.).

(Conversely, an eager statement prefix or method can be used to force the answer to .is-lazy to be False, making timid consumers accept .tail etc calls.)

I take from this that there are two kinds of laziness in Raku, and one has to be careful to see which one is being used where.

It's two kinds of what I'll call consumption guidance:

  • Don't-tail-me If an object returns True from an .is-lazy call then it is treated as if it might be infinite. Thus operations like .tail are blocked.

  • You-can-tail-me If an object returns False from an .is-lazy call then operations like .tail are accepted.

It's not so much that there's a need to be careful about which of these two kinds is in play, but if one wants to call operations like tail, then one may need to enable that by inserting an eager or removing a lazy, and one must take responsibility for the consequences:

  • If the program hangs due to use of .tail, well, DIHWIDT.

  • If you suddenly consume all of a lazy sequence and haven't cached it, well, maybe you should cache it.

  • Etc.

What I would say is that the error messages and/or doc may well need to be improved.

Carmelocarmen answered 5/12, 2021 at 2:25 Comment(3)
Also, and I don't know the full history here, as part of the 2015 Great List Refactor, the is-infinite method was replaced with is-lazy (see, e.g., commit). That adds evidence for what you said ^^^^: Raku doesn't really know if a list is infinite or not, and is (now) not going to try to tell you. It'll just give info about laziness, which is much easier to report.Mezereum
The problem with the docs is that they say “If you want to force lazy evaluation use the lazy subroutine or method. Binding to a scalar or sigilless container will also force laziness.” (emphasis mine) which to me still sounds like it implies “my $x := lazy gather { ... } is the same as my $x := gather { ... }”.Pneumonic
Also, it seems to me that the difference between really lazy and somewhat lazy Seqs can be found not only in connection with gather/take, but elsewhere as well: my $x = (1, { .say; $_ + 1 } ... 1000) and my $y = lazy (1, { .say; $_ + 1 } ... 1000) both act lazily (in the sense that the values are computed on the fly), but $x.tail is possible while $y.tail is not. I take from this that there are two kinds of laziness in Raku, and one has to be careful to see which one is being used where.Pneumonic
M
17

What you're asking about ("get[ing] the last element of a lazy but finite Seq … while keeping the original Seq lazy") isn't possible. I don't mean that it's not possible with Raku – I mean that, in principle, it's not possible for any language that defines "laziness" the way Raku does with, for example, the is-lazy method.

If particular, when a Seq is lazy in Raku, that "means that [the Seq's] values are computed on demand and stored for later use." Additionally, one of the defining features of a lazy iterable is that it cannot know its own length while remaining lazy – that's why calling .elems on a lazy iterable throws an error:

my $s = lazy gather for ^10 { take $_ };
say $s.is-lazy; # OUTPUT: «True»
$s.elems;       # THROWS: «Cannot .elems a lazy list onto a Seq»

Now, at this point, you might reasonably be thinking "well, maybe Raku doesn't know how long $s is, but I can tell that it has exactly 10 elements in it." And you're not wrong – with that code, $s is indeed guaranteed to have 10 elements. This means that, if you want to get the tenth (last) element of $s, you can do so with $s[9]. And accessing $s's tenth element like that won't change the fact that $s.is-lazy.

But, importantly, you can only do so because you know something "extra" about $s, and that extra info undoes a good chunk of the reason you might want a list to be lazy in practice.

To see what I mean, consider a very similar Seq

my $s2 = lazy gather for ^10 { last if rand > .95; take $_ };
say $s2.is-lazy; # OUTPUT: «True»

Now, $s2probably has 10 elements, but it might not – the only way to know is to iterate through it and find out. In turn, this means $s2[9] does not jump to the tenth element the way $s[9] did; it iterates through $s2 just like you'd need to. And, as a result, if you run $s2[9], then $s2 will no longer be lazy (i.e., $s2.is-lazy will return False).

And this is, in effect, what you did in the code in your question:

my $s = lazy gather for ^10 { take $_ };
say $s.is-lazy;              # OUTPUT: «True»
say (for $s<> { $_ }).tail;  # OUTPUT: «9»
say $s.is-lazy;              # OUTPUT: «False»

Because Raku cannot ever know that it has reached the tail of a lazy Seq, the only way it could tell you the .tail is to fully iterate $s. And that necessarily means that $s is no longer lazy.

Two complications

It's worth mentioning two adjacent topics that aren't actually related but that are close enough that they trip some people up.

First, nothing I've said about lazy iterables not knowing their length precludes some non-lazy iterables from knowing their length. Indeed, a decent number of Raku types do both the Iterator role and the PredictiveIterator role – and the main point of a PredictiveIterator is that it does know how many elements it can produce without needing to produce/iterate them. But PredictiveIterators cannot be lazy.

The second potentially confusing topic is closely related to the first: while no PredictiveIterator can be lazy (that is, none will ever have an .is-lazy method that returns True), some PredictiveIterators have behavior that is very similar to laziness – and, in fact, may even be colloquially referred to as "lazy".

I can't do a great job explaining this distinction because, quite honestly, I don't fully understand it myself. But I can give you an example: the .lines method on an IO::Handle. It's certainly the case that reading the lines of a huge file behaves a lot like it's dealing with a lazy iterable. most obviously, you can process each line without ever having the whole file in memory. And the docs even say that "lines are read lazily" with the .lines method.

On the other hand:

my $l = 'some-file-with-100_000-lines.txt'.IO.lines;
say $l.is-lazy;                         # OUTPUT: «False»
say $l.iterator ~~ PredictiveIterator;  # OUTPUT: «True»
say $l.elems;                           # OUTPUT: «100000»

So I'm not quite sure whether it's fair to say that $l "is a lazy iterable", but if it is, it's "lazy" in a different way than $s was.

I realize that was a lot, but I hope it is helpful. If you have a more specific use case in mind for laziness (I bet it wasn't gathering the numbers from zero to nine!), I'd be happy to address that more specifically. And if anyone else can fill in some of the details with .lines and other lazy-not-lazy PredictiveIterators, I'd really appreciate it!

Mezereum answered 4/12, 2021 at 19:49 Comment(4)
I may have phrased the question wrong. I don't actually care if the sequence gets exhausted; I want to get the last element (and then forget about the sequence completely). I can do something similar with Python iterables using something like for x in iterable: pass (after exhausting the iterable, x contains the last element).Pneumonic
The use case is today's Advent of Code assignment (I am using this year's AoC to learn Raku). In my solution, I created a gather block that produced all winning bingo board scores, and I was interested in producing the first solution on the fly. The idea was to do my $s = lazy gather { ... } followed by say $s[0]; say $s.tail. Anyway, removing the lazy keyword works (the first value is still produced as soon as the first winning board is found), although the difference between lazy gather and (still somewhat lazy) gather is something I still have not fully grasped yet.Pneumonic
@NikolaBenes Oh, if you don't need/want to keep the list lazy, then you can just make it eager! So, yeah, leaving off the lazy works, but if you do have a lazy list, you can also use $s.eager.tail. Or, a shorter way to make it eager: $s[*].tail. (Oh, and if you're doing Advent of Code in Raku, you might be interested in adding your solutions to the community solution repo)Mezereum
Regarding the laziness of lines... Once, in the deep past, it was explained to me that vi scans the input file, recording file offsets for each of the lines, then uses the offsets for the visible portion of the file to fetch those bytes into RAM (and only those bytes). (Full disclosure: I never followed-up to see if any of that was true. It may not have been. It may have been at one time and is no longer true. YMMV.) So vi's .lines equivalent internally scanned its implicit iterable to know its length and gather some metadata about each item, but lazily evaluated each item.Buchan
C
10

Drop the lazy

Lazy sequences in Raku are designed to work well as is. You don't need to emphasize they're lazy by adding an explicit lazy.

If you add an explicit lazy, Raku interprets that as a request to block operations such as .tail because they will almost certainly immediately render laziness moot, and, if called on an infinite sequence, or even just a sufficiently large one, hang or OOM the program.

So, either drop the lazy, or don't invoke operations like .tail that will be blocked if you do.

Expanded version of my original answer

As noted by @ugexe, the idiomatic solution is to drop the lazy.

Quoting my answer to the SO About Laziness:

if a gather is asked if it's lazy, it returns False.

Aiui, something like the following applies:

  • Some lazy sequence producers may be actually or effectively infinite. If so, calling .tail etc on them will hang the calling program. Conversely, other lazy sequences perform fine when all their values are consumed in one go. How should Raku distinguish between these two scenarios?

  • A decision was made in 2015 to let value producing datatypes emphasize or deemphasize their laziness via their response to an .is-lazy call.

  • Returning True signals that a sequence is not only lazy but wants to be known to be lazy by consuming code that calls .is-lazy. (Not so much end-user code but instead built in consuming features such as @ sigilled variables handling an assignment trying to determine whether or not to assign eagerly.) Built in consuming features take a True as a signal they ought block calls like .tail. If a dev knows this is overly conservative, they can add an eager (or remove an unneeded lazy).

  • Conversely, a datatype, or even a particular object instance, may return False to signal that it does not want to be considered lazy. This may be because the actual behaviour of a particular datatype or instance is eager, but it might instead be that it is lazy technically, but doesn't want a consumer to block operations such as .tail because it knows they will not be harmful, or at least prefers to have that be the default presumption. If a dev knows better (because, say, it hangs the program), or at least does not want to block potentially problematic operations, they can add a lazy (or remove an unneeded eager).

I think this approach works well, but it doc and error messages mentioning "lazy" may not have caught up with the shift made in 2015. So:

Further discussion

the docs ... say “If you want to force lazy evaluation use the lazy subroutine or method. Binding to a scalar or sigilless container will also force laziness.”

Yes. Aiui this is correct.

[which] sounds like it implies “my $x := lazy gather { ... } is the same as my $x := gather { ... }”.

No.

An explicit lazy statement prefix or method adds emphasis to laziness, and Raku interprets that to mean it ought block operations like .tail in case they hang the program.

In contrast, binding to a variable alters neither emphasis nor deemphasis of laziness, merely relaying onward whatever the bound producer datatype/instance has chosen to convey via .is-lazy.

not only in connection with gather but elsewhere as well

Yes. It's about the result of .is-lazy:

my $x =      (1, { .say; $_ + 1 } ... 1000);
my $y = lazy (1, { .say; $_ + 1 } ... 1000);

both act lazily ... but $x.tail is possible while $y.tail is not.

Yes.

An explicit lazy statement prefix or method forces the answer to .is-lazy to be True. This signals to a consumer that cares about the dangers of laziness that it should become cautious (eg rejecting .tail etc.).

(Conversely, an eager statement prefix or method can be used to force the answer to .is-lazy to be False, making timid consumers accept .tail etc calls.)

I take from this that there are two kinds of laziness in Raku, and one has to be careful to see which one is being used where.

It's two kinds of what I'll call consumption guidance:

  • Don't-tail-me If an object returns True from an .is-lazy call then it is treated as if it might be infinite. Thus operations like .tail are blocked.

  • You-can-tail-me If an object returns False from an .is-lazy call then operations like .tail are accepted.

It's not so much that there's a need to be careful about which of these two kinds is in play, but if one wants to call operations like tail, then one may need to enable that by inserting an eager or removing a lazy, and one must take responsibility for the consequences:

  • If the program hangs due to use of .tail, well, DIHWIDT.

  • If you suddenly consume all of a lazy sequence and haven't cached it, well, maybe you should cache it.

  • Etc.

What I would say is that the error messages and/or doc may well need to be improved.

Carmelocarmen answered 5/12, 2021 at 2:25 Comment(3)
Also, and I don't know the full history here, as part of the 2015 Great List Refactor, the is-infinite method was replaced with is-lazy (see, e.g., commit). That adds evidence for what you said ^^^^: Raku doesn't really know if a list is infinite or not, and is (now) not going to try to tell you. It'll just give info about laziness, which is much easier to report.Mezereum
The problem with the docs is that they say “If you want to force lazy evaluation use the lazy subroutine or method. Binding to a scalar or sigilless container will also force laziness.” (emphasis mine) which to me still sounds like it implies “my $x := lazy gather { ... } is the same as my $x := gather { ... }”.Pneumonic
Also, it seems to me that the difference between really lazy and somewhat lazy Seqs can be found not only in connection with gather/take, but elsewhere as well: my $x = (1, { .say; $_ + 1 } ... 1000) and my $y = lazy (1, { .say; $_ + 1 } ... 1000) both act lazily (in the sense that the values are computed on the fly), but $x.tail is possible while $y.tail is not. I take from this that there are two kinds of laziness in Raku, and one has to be careful to see which one is being used where.Pneumonic

© 2022 - 2024 — McMap. All rights reserved.