Hidden bugs with given-when and for-match. Is Perl truly cross-platform?
Asked Answered
C

1

1

I have been trying to make a relatively large Perl program that has been working perfectly fine on CentOS for many years to work on Ubuntu and this has become a huge nightmare. CentOS uses Perl built for x86_64-linux-thread-multi and Ubuntu uses the x86_64-linux-gnu-thread-multi build. AFAIK, the interpreter behavior should be the same in both environments when the program invokes the same previous version v5.10.1. However I have been getting very different behavior, including warnings about given/when and smartmatch being experimental and, most importantly, a set of nasty bugs that are hard to trace and resolve. A particular problem occurs when a given statement shown below (form 1) matches and calls a function. Then suddenly the value of the switch variable (called $ailtype) that is never otherwise touched gets erased from memory! If I simply call that function nothing obnoxious happens. So, I replaced the given/when usage with a for statement (form 2) and my question is why does the the problem still occur?! The only form that truly avoids the problem is a simple chain of if/elsifs (form 3) and this clearly shows that the problem is with forms 1 and 2 and the perl interpreter being inconsistent and bug-ridden: it does not even produce an "experimental" warning for form 2.

Here is form 1 (original):

print "ailtype is $ailtype \n"; # prints "ailtype is 8"

given ($ailtype) {
    when (4) { &parse_mascot}
    when (5) { &parse_sequest}
    when (8) { &parse_spectrast($ms2_results, $rttemp)}
    when (9) { &parse_cnstab($ms2_results, $rttemp)}
    default { STDOUT->autoflush(1) and die "ailtype=|$ailtype| unknown.\n"; }
}

print "ailtype is $ailtype \n"; # prints "ailtype is ". $ailtype got destroyed!

The prints are for debug. I can put them inside the when block and confirm that $ailtype gets destroyed after the &parse_spectrast function call. However, the function does not read or touch $ailtype at all! (Interestingly, if I go inside the function and print the value of $ailtype to find exactly where it gets messed up, I see that it occurs within a while loop parsing the lines of an input file. Printing $ailtype somehow returns the lines of that file!)

The program consists of several large perl files with many given/when statements and re-writing all of them by hand would be tedious. I have to make sure that the alternative form works. So I tried form 2 (suggested here):

print "ailtype is $ailtype \n"; # prints "ailtype is 8"

for ($ailtype) {
    /4/ and do { &parse_mascot; last};
    /5/ and do { &parse_sequest; last};
    /8/ and do { &parse_spectrast($ms2_results, $rttemp); last};
    /9/ and do { &parse_cnstab($ms2_results, $rttemp); last};
    do { STDOUT->autoflush(1) and die "ailtype=|$ailtype| unknown.\n"; }
}

print "ailtype is $ailtype \n"; # prints "ailtype is ". $ailtype still gets destroyed!

The problem still occurs in this form, and I really don't understand why?! The interpreter no longer warns about experimental given/when here (they're still used elsewhere in the code though. I ensured they didn't occur before the problematic block and that didn't help).

Surprising or not, a chain of if/elsifs (form 3) works fine:

print "ailtype is $ailtype \n"; # prints "ailtype is 8"

if    (4 == $ailtype) { 
    &parse_mascot;
}
elsif (5 == $ailtype) { 
    &parse_sequest;
}
elsif (8 == $ailtype) { 
    &parse_spectrast($ms2_results, $rttemp); 
}
elsif (9 == $ailtype) { 
    &parse_cnstab($ms2_results, $rttemp);
}
else { 
    STDOUT->autoflush(1) and die "ailtype=|$ailtype| unknown.\n";
}

print "ailtype is $ailtype \n"; # prints "ailtype is 8". It was never changed.

But I thought the powers of Perl were there to save us from having to code all of this. I would be willing to do so if I was sure I had an otherwise reliable interpreter. Changing the version to v5.16.3 (the latest installed on both platforms) only produces new errors regarding declarations, "bareword STDOUT not allowed", etc. Having fought with this bug alone for 10 hours, I am seriously in doubt.

Candracandy answered 6/2, 2022 at 19:1 Comment(12)
What do the subs you call do? How can you ask a question like this and not include that information? You should include complete and minimal code that can be run to demonstrate your problem.Thessa
@Thessa They do a lot of things and would be too large to post here. But as I said, they do not read or change $ailtype at all. Not even mention it once. The fact that simply calling the point-of-problem function or using if avoids the problem should make it clear that it doesn't matter what the functions do.Candracandy
@Thessa I'll try to make a minimal stand-alone code that reproduces the problem.Candracandy
"the interpreter behavior should be the same in both environments when the program invokes the same previous version v5.10.1". Not clear what you mean with this. If you think that a use v5.10.1 will make it use this version of Perl - it doesn't. It only makes sure that you are running at least 5.10.1 and it will switch on some features introduced with 5.10.1.Catercorner
@SteffenUllrich Ah, I didn't know that. Thanks for clarifying. So this could just be purely a version issue and not platform? In that case, the version on CentOS is v5.16.3, and on Ubuntu it is v5.30.3. Do you know which version first introduced the behavior change seen in this problem?Candracandy
I can't help it but say this: how many times do we need to say that given-when and smartmatch (in particular!) have serious problems and that they will be changed, perhaps beyond recoginition (or worse)? Tedious or not, just replace those code sections, and be careful to use the simplest and clearest code possible (part of the problem with those overloaded features, that they quietly pull in all kinds of assumptions). Sorry. It happens.Lindsylindy
@zdim, I get that, but form 2 doesn't use given-when or smartmatch, or does it implicitly?Candracandy
@FNia: Lots of internals changed between 5.16.3 and 5.30.3. If your code accidentally relied on some unspecified behavior (timing, scoping, bugs ...) it might run into problems with later versions. This is doubly true with features which are marked as experimental. Also, it is recommend to have a look at the perldelta documentation since there are cases were behavior gets changed knowingly (to be more consistent or other reasons).Catercorner
Even without knowing the status of these features -- they are experimental, so why use them in a large (production I presume) program that needs to work on different systems and Perl versions? Not a very cautios decision?Lindsylindy
"but form 2 doesn't use given-when or smartmatch" -- true, that's where another problem got introduced (as choroba shows) -- essentially because of trying to re-implement them and stick to their logic. (Note yet another problem: that /4/ matches 41 as well as 4 or 14. Maybe not an issue but did you think of that? Got to be really careful when rewriting code.)Lindsylindy
@SteffenUllrich Thank you for the information and suggesting perldelta. Very helpful!Candracandy
@Lindsylindy Very good point on /4/ matching other numbers! Didn't think of that. Thank you very much!! You are exactly right, this is important software that our whole research lab and collaborators rely on and needs to be stable and cross-platform.Candracandy
B
5

for aliases $_ for each value in the loop. If the called function changes the value of $_, the original variable will be changed, as well. The diamond operator used in a while loop changes the global $_ and it becomes undef on the eof.

#!/usr/bin/perl
use warnings;
use strict;

sub f {
    while (<DATA>) {
        warn "read:\t$_";
    }
}

my $x = 8;

print "Before:<<$x>>\n";

for ($x) {
    &f();
}

print "After:<<$x>>\n";

__DATA__
A
B

Solution: Insert the following line before the while (<>) {:

    local $_;

This will restore the original value of $_ when leaving the scope of the local statement.

Bayless answered 6/2, 2022 at 19:17 Comment(22)
You are exactly right! Thank you very much! Now, why is this behavior platform-dependent? (or is it a version issue, noting Steffen Ullrich's comment?)Candracandy
@Candracandy It's not platform dependent. I suspect this might have to do with the lexical $_ given used to use. Like given/when, that was a failed experimental feature. And it's actually been removed from Perl (like they want to do with given/when)Mindimindless
@Mindimindless Oh, you mean the whole lexical $_ has been removed?! Or just the global form? Could you link the docs please? This will break lots of code. And I also wonder when the change was introduced regarding what happens to the global $_ in the while (<>) loop, or whether for () uses the global or local $_? What is the general safe practice?Candracandy
@Candracandy my $_ is gone. That was used by given (implicitly). for uses a (localized) global $_ like it always has.Mindimindless
@Candracandy This is actually not about $_, it is about for and similar are aliasing the loop variable to the named variable in the list. for my $foo ($x) { $foo = 0 } will overwrite $x in the same way that for ($x) { $_ = 0 } will.Thessa
@Thessa I agree! That is very dangerous! It's also true that aliasing becomes necessary because there is only one global $_ everywhere. Many operators and functions use the same global $_ and that can easily change the loop iterable object unintentionally during iteration. I assumed they would prioritize local/lexical scoping at least for built-in functions and am very surprised to learn the opposite has happened.Candracandy
@ikegami, thank you for the information. It all makes it very clear now. It's also very surprising to me, as I explained in the above comment.Candracandy
@FNia. Lexical scoping wasn't a thing originally. And like I said, they tried to introduce it for $_, but that caused a lot of problems. So we're stuck with localization instead.Mindimindless
@ikegami, "Lexical scoping wasn't a thing originally." That sounds quite the opposite to most programming languages (unless referring to $_ only).Candracandy
@FNia, I am not referring to $_. Lexical (my) variables were introduced in 5.6, more than a decade after Perl's first release.Mindimindless
@Candracandy No, $_ is not easily changed. Usually you don't have to worry about it, as each built-in uses its own copy of $_. The issue with your code is the aliasing, not the $_ variable. Test it out and see.Thessa
@TLP: Well, while (<>) is one of the few constructs that don't use their own copy of $_.Bayless
@choroba, @TLP, perlvar simply says "$_ is a global variable." That's it. All the previous lexical scoping was 'experimental', confirming what @Mindimindless says. It also lists a whole bunch of functions and operators that use the $_ by default. (Unless there is something in between the lines that I don't understand. perldoc.perl.org/perlvar)Candracandy
@TLP, "The issue with your code is the aliasing, not the $_ variable." not sure what you mean by this and not sure how to test. @choroba's example is the perfect minimal code you were asking for and it's precisely what happens in my code. How would you test/explain differently?Candracandy
@Candracandy Consider the code for ('a' .. 'b') { for (1..3) { print "a; $_" } } print "b: $_" }. You can do whatever you like to $_ in the inner loop, it does not effect $_ in the outer loop. for ($x) { $_ = 0 } print $x. $x is now 0. It is not about the $_ variable, its about the aliasing, and the fact that you used it in the for list.Thessa
@TLP, Interesting. A couple of questions: In your 1st example, are these localized copies and what exactly is the difference with the lexical my copies if the latter is deprecated? When is this local copy made versus using the global copy (as in aliasing or in while (<>))? Perlvar puts everything (for, given, while, etc) on the same list when it comes to $_. Where is the documentation on this behavior difference?Candracandy
@Thessa In your 2nd example: The aliasing itself seems to be limited to the scope of the loop, i.e. changing $_ after the loop won't change $x anymore. But then how does an outside function can overwrite $x when called within the loop? Again, I'm having difficulty getting this from the official docs.Candracandy
The aliasing on $_ has dynamic scope, i.e. it propagates to subroutines called from within the loop scope.Bayless
@Candracandy Perhaps it is something you should ask a new question about. Seems complicated to answer in comments. Also, I don't know how your subs managed to overwrite the variable, since you didn't show any code. It is really just an educated guess that that is what happened.Thessa
@Thessa sorry I failed to clarify, as mentioned in my post and seen in @choroba's answer, the particular sub in question uses a while (<>) loop to read from a file and that's how it managed to overwrite the variable. The minimal code that you asked to see would be simply open SPTXT, "<", "spec_file.txt"; while (<SPTXT>) { } . That problem is already resolved. Good idea about posting a new question. Thanks!Candracandy
@TLP: It would, as my code shows.Bayless
@Mindimindless You wrote: "Lexical (my) variables were introduced in 5.6, more than a decade after Perl's first release." That is absolutely untrue, and you're at least four years too late. I can absolutely guarantee that lexical variables using my were a fundamental part of the perl5 rewrite from its very inception in 1993 in its initial alpha stage at the latest. See perl5.000/Changes which reads: Lexical scoping available via "my". eval can see the current lexical variables. Kindly download the tarball and you'll see. Or read the git logs.Impose

© 2022 - 2024 — McMap. All rights reserved.