Strings and Strands in MoarVM
Asked Answered
P

2

5

When running Raku code on Rakudo with the MoarVM backend, is there any way to print information about how a given Str is stored in memory from inside the running program? In particular, I am curious whether there's a way to see how many Strands currently make up the Str (whether via Raku introspection, NQP, or something that accesses the MoarVM level (does such a thing even exist at runtime?).

If there isn't any way to access this info at runtime, is there a way to get at it through output from one of Rakudo's command-line flags, such as --target, or --tracing? Or through a debugger?

Finally, does MoarVM manage the number of Strands in a given Str? I often hear (or say) that one of Raku's super powers is that is can index into Unicode strings in O(1) time, but I've been thinking about the pathological case, and it feels like it would be O(n). For example,

(^$n).map({~rand}).join

seems like it would create a Str with a length proportional to $n that consists of $n Strands – and, if I'm understanding the datastructure correctly, that means that into this Str would require checking the length of each Strand, for a time complexity of O(n). But I know that it's possible to flatten a Strand-ed Str; would MoarVM do something like that in this case? Or have I misunderstood something more basic?

Padre answered 2/3, 2021 at 20:41 Comment(1)
I periodically review questions en masse to see if there's something that could be done that might tie up loose ends. For example, checking questions that haven't gotten an accepted answer. This is one such. One reason might be that I asked you to not accept my answer because I wanted to encourage a core dev to answer, to get something more authoritative than my answer. But now Liz has answered, and she's a core dev, and time has passed, and so I think this Q has likely gotten as good answers as it's going to get. I don't mean to be pushy, but I'm hoping you'll accept one of our answers.Nephralgia
N
4

When running Raku code on Rakudo with the MoarVM backend, is there any way to print information about how a given Str is stored in memory from inside the running program?

My educated guess is yes, as described below for App::MoarVM modules. That said, my education came from a degree I started at the Unseen University, and a wizard had me expelled for guessing too much, so...

In particular, I am curious whether there's a way to see how many Strands currently make up the Str (whether via Raku introspection, NQP, or something that accesses the MoarVM level (does such a thing even exist at runtime?).

I'm 99.99% sure strands are purely an implementation detail of the backend, and there'll be no Raku or NQP access to that information without MoarVM specific tricks. That said, read on.

If there isn't any way to access this info at runtime

I can see there is access at runtime via MoarVM.

is there a way to get at it through output from one of Rakudo's command-line flags, such as --target, or --tracing? Or through a debugger?

I'm 99.99% sure there are multiple ways.

For example, there's a bunch of strand debugging code in MoarVM's ops.c file starting with #define MVM_DEBUG_STRANDS ....

Perhaps more interesting are what appears to be a veritable goldmine of sophisticated debugging and profiling features built into MoarVM. Plus what appear to be Rakudo specific modules that drive those features, presumably via Raku code. For a dozen or so articles discussing some aspects of those features, I suggest reading timotimo's blog. Browsing github I see ongoing commits related to MoarVM's debugging features for years and on into 2021.

Finally, does MoarVM manage the number of Strands in a given Str?

Yes. I can see that the string handling code (some links are below), which was written by samcv (extremely smart and careful) and, I believe, reviewed by jnthn, has logic limiting the number of strands.

I often hear (or say) that one of Raku's super powers is that is can index into Unicode strings in O(1) time, but I've been thinking about the pathological case, and it feels like it would be O(n).

Yes, if a backend that supported strands did not manage the number of strands.

But for MoarVM I think the intent is to set an absolute upper bound with #define MVM_STRING_MAX_STRANDS 64 in MoarVM's MVMString.h file, and logic that checks against that (and other characteristics of strings; see this else if statement as an exemplar). But the logic is sufficiently complex, and my C chops sufficiently meagre, that I am nowhere near being able to express confidence in that, even if I can say that that appears to be the intent.

For example, (^$n).map({~rand}).join seems like it would create a Str with a length proportional to $n that consists of $n Strands

I'm 95% confident that the strings constructed by simple joins like that will be O(1).

This is based on me thinking that a Raku/NQP level string join operation is handled by MVM_string_join, and my attempts to understand what that code does.

But I know that it's possible to flatten a Strand-ed Str; would MoarVM do something like that in this case?

If you read the code you will find it's doing very sophisticated handling.

Or have I misunderstood something more basic?

I'm pretty sure I will have misunderstood something basic so I sure ain't gonna comment on whether you have. :)

Nephralgia answered 2/3, 2021 at 23:13 Comment(1)
Thanks very much for the (n)answer; I definitely found reading it both informative and enjoyable. (Also – and I've thought about making this comment on a few of your other posts – I really enjoy the Pratchett references in your writing (Unseen University, nested footnotes, lies-to-children, etc))Padre
E
4

As far as I understand it, the fact that MoarVM implements strands (aka, a concatenating two strings will only result in creation of a strand that consists of "references" to the original strings), is really that: an implementation detail.

You can implement the Raku Programming Language without needing to implement strands. Therefore there is no way to introspect this, at least to my knowledge.

There has been a PR to expose the nqp:: op that would actually concatenate strands into a single string, but that has been refused / closed: https://github.com/rakudo/rakudo/pull/3975

Embryotomy answered 2/3, 2021 at 22:25 Comment(0)
N
4

When running Raku code on Rakudo with the MoarVM backend, is there any way to print information about how a given Str is stored in memory from inside the running program?

My educated guess is yes, as described below for App::MoarVM modules. That said, my education came from a degree I started at the Unseen University, and a wizard had me expelled for guessing too much, so...

In particular, I am curious whether there's a way to see how many Strands currently make up the Str (whether via Raku introspection, NQP, or something that accesses the MoarVM level (does such a thing even exist at runtime?).

I'm 99.99% sure strands are purely an implementation detail of the backend, and there'll be no Raku or NQP access to that information without MoarVM specific tricks. That said, read on.

If there isn't any way to access this info at runtime

I can see there is access at runtime via MoarVM.

is there a way to get at it through output from one of Rakudo's command-line flags, such as --target, or --tracing? Or through a debugger?

I'm 99.99% sure there are multiple ways.

For example, there's a bunch of strand debugging code in MoarVM's ops.c file starting with #define MVM_DEBUG_STRANDS ....

Perhaps more interesting are what appears to be a veritable goldmine of sophisticated debugging and profiling features built into MoarVM. Plus what appear to be Rakudo specific modules that drive those features, presumably via Raku code. For a dozen or so articles discussing some aspects of those features, I suggest reading timotimo's blog. Browsing github I see ongoing commits related to MoarVM's debugging features for years and on into 2021.

Finally, does MoarVM manage the number of Strands in a given Str?

Yes. I can see that the string handling code (some links are below), which was written by samcv (extremely smart and careful) and, I believe, reviewed by jnthn, has logic limiting the number of strands.

I often hear (or say) that one of Raku's super powers is that is can index into Unicode strings in O(1) time, but I've been thinking about the pathological case, and it feels like it would be O(n).

Yes, if a backend that supported strands did not manage the number of strands.

But for MoarVM I think the intent is to set an absolute upper bound with #define MVM_STRING_MAX_STRANDS 64 in MoarVM's MVMString.h file, and logic that checks against that (and other characteristics of strings; see this else if statement as an exemplar). But the logic is sufficiently complex, and my C chops sufficiently meagre, that I am nowhere near being able to express confidence in that, even if I can say that that appears to be the intent.

For example, (^$n).map({~rand}).join seems like it would create a Str with a length proportional to $n that consists of $n Strands

I'm 95% confident that the strings constructed by simple joins like that will be O(1).

This is based on me thinking that a Raku/NQP level string join operation is handled by MVM_string_join, and my attempts to understand what that code does.

But I know that it's possible to flatten a Strand-ed Str; would MoarVM do something like that in this case?

If you read the code you will find it's doing very sophisticated handling.

Or have I misunderstood something more basic?

I'm pretty sure I will have misunderstood something basic so I sure ain't gonna comment on whether you have. :)

Nephralgia answered 2/3, 2021 at 23:13 Comment(1)
Thanks very much for the (n)answer; I definitely found reading it both informative and enjoyable. (Also – and I've thought about making this comment on a few of your other posts – I really enjoy the Pratchett references in your writing (Unseen University, nested footnotes, lies-to-children, etc))Padre

© 2022 - 2024 — McMap. All rights reserved.