Syntax error in hashref lookup, can not see why
Asked Answered
M

1

3
perl -E 'say for map s/(æ|ø|å)/   {qw(æ ae ø oe å aa)}->{$1}/ger, qw(rød gul blå)'
perl -E 'say for map s/(æ|ø|å)/"".{qw(æ ae ø oe å aa)}->{$1}/ger, qw(rød gul blå)'

The first line above gives me syntax error at -e line 1, near "}->" but the second prints roed, gul and blaa as expected. Is this a weakness of the compiler or are there some reason for it that I can't see? I tested and got this behaviour in versions 5.10, 5.22 and 5.26.

Maladminister answered 4/9, 2019 at 8:14 Comment(0)
S
4

The {...} are interpreted as a BLOCK, not a hashref. We can see this by adding a +

perl -E'say for map s/(æ|ø|å)/+{qw(æ ae ø oe å aa)}->{$1}/ger, qw(rød gul blå)'

and now it works, since what follows the unary + must be an expression; so + disambiguates the code. Then the interpreter goes on to identify the construct as an anonymous hash constructor.

Otherwise it has to guess at { since it can't parse away before deciding whether it is parsing a block or an expression. It could analyze the context to determine what {...} is but I'd find it reasonable if that was simply deemed much too complex as a trade off.

In the other example it is the concatenation operator (.) that does it.


For another example of the unary + forcing treatment of the following code as an expression, and for details about related documentation, see this post.

Supercargo answered 4/9, 2019 at 8:29 Comment(8)
Thx for the + tip. But couldn't or shouldn't the -> prevented it from being interpreted as a block?Maladminister
@Kjetil S., No, because it can only reach the -> after deciding whether to parse what starts with { as a block or a hash constructor. Remember, a block contains statements, whereas the hash constructor contains a (possibly null) expression, so they use different parsing rules. Since they're different, the parser can't simply parse the contents of the curlies then make a decision by looking at what follows the }. It does look ahead at token after the { to guess at what the { means, but to look further ahead is simply not productive.Tremolo
@Tremolo I don't know the Perl parser/compiler, but know enough about parsing to think that my simple example code is well within the reach of a traditional backtracking recursive parser. I suspect the Perl compiler have many rules which uses such backtracking already (and not only look-ahead to try to resolve ambiguous tokens straight away) where it's not immediately clear what it's looking at and have to rewind to try something else. I'm guessing that this case was deemed too rare, is on a todo list or was overlooked.Maladminister
@Kjetil S., First, Perl uses an LR parser, not an LL parser, so backtracking isn't a thing. That's a good thing. When writing an LL parser, you try to eliminate all backtracking. It's slow, and it leads to poor error handling. Aside from the poor error handling, it would lead to other major problems: Perl's syntax is often dependent on whether statement, expression or operator is expected to remove ambiguities, so your solution to remove ambiguities would actually add some.Tremolo
@Kjetil S., Also, you're wrong that ->{$1} would disambiguate because - is perfectly valid after a block. So you'd only discover the error after another level of recursions, which means you'd mishandle that much more bad code. Attempting what you suggest would not be productive. It just changes the kinds of guesses the parser has to make (what is this vs where did the error actually happen?), which simply shifts when you get weird errors. At least the way it is now is predictable, and avoidable once a programmer is made aware. It's a lot hard to ask programmers not to make syntax errors.Tremolo
I understand that the second part of the s///e is attempted to be interpreted as a BLOCK when it starts with {. My question is why. $a={1..8}->{5} is ok (6), then again print({1..8}->{5}) gives the same error while print "",{1..8}->{5} doesn't. Might be about scalar context? But @a=({1..8}->{5}) works. Wouldn't it be more natural to add a do as in s//do{...}/e if we wanted the second part to be read as a block and a missing do should default to the { being interpreted as the start of a hashref?Maladminister
@KjetilS. "question is why" --- again, the best I can think of is that "it has to guess at { ..." (from the answer). As for these other examples, I don't know enough about how parsing works to discern in detail how it guesses. In most of them the {} seems to be taken as a block, with "uninitialized" warnings since 1..8 doesn't return a reference. Perhaps it is the context that guides the guess? Or some heuristics? On the practical side, all these examples clearly push their luck and should be written more properly so that nobody has to guess.Supercargo
@KjetilS. Btw, the print({1..8}->{5}) for me simply fails as syntax error. All these examples could be just discarded by the compiler, as "ambiguous." But that would then probably cripple some useful legitimate (I mean clear) uses so it tolerates it ... and guesses.Supercargo

© 2022 - 2024 — McMap. All rights reserved.